Impl. live streaming system - RTMP: Introduction & Basic Handshake
Introduction
RTMP Protocol is a set of these three things
- Handshake
- Check reliable stream transport
- Chunk Stream
- Multiplexing
- Packetizing
- Message
- Video/Audio
- RPC
RTMP is message multiplexing protocol, not only media service.
Adobe’s Real Time Messaging Protocol (RTMP) provides a bidirectional message multiplex service over a reliable stream transport, such as TCP [RFC0793], intended to carry parallel streams of video, audio, and data messages, with associated timing information, between a pair of communicating peers.
Handshake stage check protocol version and ping-pong echo(In specification, can be estimate of bandwidth, latency. but, not useful)
Either peer can use the time and time2 fields together with the current timestamp as a quick estimate of the bandwidth and/or latency of the connection, but this is unlikely to be useful.
Chunk Stream provides Multiplexing. Determine each stream, control connection(Notice chunk size, peer bandwidth, ACK).
Messages over Chunk Stream are Video data, Audio data and AMF-coded RPC. Details of RTMP’s Audio, Video Message are sharing FLV’s AdobeMuxPacket.
Handshake
Sequence
The handshake begins with the client sending the C0 and C1 chunks. The client MUST wait until S1 has been received before sending C2. The client MUST wait until S2 has been received before sending any other data. The server MUST wait until C0 has been received before sending S0 and S1, and MAY wait until after C1 as well. The server MUST wait until C1 has been received before sending S2. The server MUST wait until C2 has been received before sending any other data.
Sequence Diagram that follow marked MUST.
sequenceDiagram
Client->>Server: C0
Server->>Client: S0/S1
Client->>Server: C1
Server->>Client: S2
Client->>Server: C2
Simply
- S2/C2 require receive C1/S1.
- Server MUST check C0
C0/S0
(def boxes-per-row 1)
(draw-column-headers)
(draw-box "version" {:span 1})
- Version
- SHOULD be 3 (It means Version 1.0)
C1/S1
(def boxes-per-row 8)
(draw-column-headers)
(draw-box "time" {:span 4})
(draw-box 0 {:span 4})
(draw-gap "random bytes (1528 bytes)")
(draw-bottom)
- Time
- may be 0
- To synchronize multiple chunkstreams: other chunkstream’s timestamp
- timestamp is epoch in milliseconds
This may be 0, or some arbitrary value. To synchronize multiple chunkstreams, the endpoint may wish to send the current value of the other chunkstream’s timestamp. Timestamps in RTMP are given as an integer number of milliseconds relative to an unspecified epoch.
- Random bytes
- SHOULD send something sufficiently random.
- no need for cryptographically-secure randomness, or even dynamic values
Random Number Generator Benchmark
For this stage, need random bytes generation.
That require no need dynamic values. but, I will use dynamic values that generated by well-known non-cryptographically-secure random number generator, wyrng
from wyhash
.
pub fn wyrand(seed: &mut u64) -> u64 {
*seed = seed.wrapping_add(0xa0761d6478bd642f);
let r = u128::from(*seed) * u128::from(*seed ^ 0xe7037ed1a0b428db);
((r >> 64) ^ r) as u64
}
code from this repository
For RTMP handshake, need to generate random 1528 bytes. Benchmark with getrandom(this provide OS entropy sources). And some variants about value assignments(ex. no multiplication on loop)
-
MUST NOT
Using OS entropy sources as RNG. it can be using Side-Channel Attack. -
Default: direct access to
[u8; 1528]
array -
wyrand(with offset): replace array indexing
[8 * i + 0]
by[offset + 0]
; commonly automatically optimized on compile time -
Insertion method s like
.put_u64(n)
is from bytes::buf::buf_mut::BufMut -
Scenario with
.put_slice(&[u8])
: Generate random numbers into[u8; 1528]
, and than insert into collections(getrandom only support&mut [u8]
as destication, directly access collection’s memory is unsafe; not consider in this)
Tag | Measured |
---|---|
getrandom | 6.3M iterations |
wyrand | 9.5M iterations |
wyrand(with offset) | 9.5M iterations |
getrandom(with bytes, .put_slice(&[u8]) ) |
5.4M iterations |
wyrand(with bytes, .put_slice(&[u8]) ) |
4.8M iterations |
wyrand(with bytes, .put_u64(n) ) |
3.4M iterations |
getrandom(with vec, .put_slice(&[u8]) ) |
6.0M iterations |
wyrand(with vec, .put_slice(.to_ne_bytes()) ) |
8.1M iterations |
wyrand(with vec, .put_slice(.to_be_bytes()) ) |
7.9M iterations |
wyrand(with vec, .put_slice(.to_le_bytes()) ) |
8.3M iterations |
wyrand(with vec, .put_u64(n) ) |
7.7M iterations |
wyrand(with vec, .extend(.to_ne_bytes()) ) |
4.1M iterations |
wyrand(with vec, .extend(.to_be_bytes()) ) |
4.1M iterations |
wyrand(with vec, .extend(.to_le_bytes()) ) |
4.1M iterations |
- With offset is not provide significant difference
- Using Bytes with [u8] array insertion is not good idea to provide random bytes
- Bytes is slower than std::vec::Vec.
- Using std::vec::Vec with
.put_slice()
provide acceptable performance wyrand
withstd::vec::Vec
and.put_slice()
is best score.- Casting
u64
into&[u8]
byto_ne_bytes()
,to_be_bytes()
, andto_le_bytes()
.to_be_bytes()
was worst in every trials.to_ne_bytes()
andto_le_bytes()
are simillar, but in my test enviroment(AMD Ryzen 7 4800HS no throttling configed labtop)to_le_bytes()
is better.- maybe
ne
(native endian) has little overhead for determine which to use.
- maybe
C2/S2
(def boxes-per-row 8)
(draw-column-headers)
(draw-box "time" {:span 4})
(draw-box "time2" {:span 4})
(draw-gap "random bytes (1528 bytes)")
(draw-bottom)
-
Time
the timestamp sent by the peer in S1 (for C2) or C1 (for S2).
-
Time2
the timestamp at which the previous packet(s1 or c1) sent by the peer was read.
-
Random bytes
- Echo C1/S1’s Random bytes
-
Echo Time and Random bytes
-
Time and Time2 is used to estimate RTT(Round Trip Time)
Discussion
- Well-known RTMP implementation is librtmp. but it is different with RTMP 1.0 Specification
- There is so many undocumented specifications exist.
- In librtmp, time2 response is dead code. player version checking by C1’s zero.(That defined
MUST be all 0s.
in specification)
Implemented codes pushed into my github