Thought about latency
Problem
- Current design was RTMP for Ingest, fMP4 for remain
- RTMP and MPEG-TS are passing frame by frame
- fMP4 is passing GoP(Group of Pictures) by GoP
- So, Additional delay(key-frame interval) occured on each reconstruction of media container
- If key-frame interval is 2 seconds
- Streamer to viewer delay will be like below
- Capture frame $\frac{1}{FPS}\text{sec.}$
- Encoding delay
- Network delay
- Waiting for GoP $2\text{sec.}$
- Passing into transcoder
- Waiting for GoP $2\text{sec.}$
- Passing into viewer
- So, Total delay should be more than $4 + \frac{1}{FPS} \text{sec.}$
- Streamer to viewer delay will be like below
- So, Major services like Youtube Live or Twitch said (Real-time) low latency setting can archive 2~5 sec. delay.
Self-evident delay on live streaming
- Capture frame
- Encoding
- Network latency
- Transcoding
- Packing media container
- Edge deploy
- Network latency
- Reconstruct media container into viewer’s environment
- Decoding
- Play
Figure out optimizable points
- Reduce elapsed time between 1 to 10 is important.
- Network delay on 3, 7 cannot control.
- 1 is fixed. $\frac{1}{FPS}\text{sec.}$
- 2 is related streamer’s HW spec. Cannot control.
- 4 is Ingest to Transcoder delay and Encoding delay
- 5 is related specification, overhead and compatibility
- 6 Delay at copy into CDN(passing media container into content delievery server)
- 8~10 is related End User’s device and experience.
- Nobody like hot device and higher RPM fan noise.
I think optimizable point is two.
- Ingest to Transcoder
- Ingest(and Transcoder) to Edge deploy
At Ingest to Transcoder
-
Best performance is allocate One-by-One within same processor(reduce I/O delay)
-
But, Increase cost (Ingest hold many connection in same time)
-
Passing each GoP immediately when determine it’s keyframe
- Current RTMP code, cannot access until all payload receive.
- So, need to implement previewable stage or signalling method that can access RTMP message header + 1 byte(Video data’s contol byte)
-
So, Pooling Transcoders and queuing GoPs.
- For example, 1 instance of Ingest can handle maximum 1000 connection
- Transcoder pool instance that can handle less than Ingest connection(like 100? maybe more less, due to memory requirement) at once
- If Ingest receive keyframe, request transcoder from pool
- and than, each GoP is done, free transcoder
- If connected transcoder pool reach threshold, connect addtional Transcoder pool instance
At Ingest(and Transcoder) to Edge deploy
-
fMP4’s demerit on live streaming is each block has length and require data base offset(this is reason fMP4 require following order moof, mdat;
default-base-is-moof
flag). -
So, Cannot send until Build GoP completely.
-
For archive lowest latency, maybe need to use MPEG2-TS.
- due to it can passing frame-by-frame.
- Cons. and Unknown
- At browser, most of library transmuxing into fMP4. so this isn’t merit. If anything, require more cost.(increase network outbound, due to overhead(about 10%))
- At mobile, iOS and Android support MPEG2-TS natively. If media player support decoding before ready full GoP. maybe can archive better than fMP4.
- Pros.
- If transcoder following H.264 baseline profile for WebRTC.
- Edge deploy can passing each frame over SRTP under WebRTC
- Just consider Live-streaming performance. it seems can archive less latency.
- But, Require development and maintance cost
- At browser, need to handle fMP4 over HLS and WebRTC track.
- At backend, need to managing both of HLS manifest generator and WebRTC-related.
- But, Require development and maintance cost
- If transcoder following H.264 baseline profile for WebRTC.