Problem

  • Current design was RTMP for Ingest, fMP4 for remain
  • RTMP and MPEG-TS are passing frame by frame
  • fMP4 is passing GoP(Group of Pictures) by GoP
  • So, Additional delay(key-frame interval) occured on each reconstruction of media container
  • If key-frame interval is 2 seconds
    • Streamer to viewer delay will be like below
      • Capture frame $\frac{1}{FPS}\text{sec.}$
      • Encoding delay
      • Network delay
      • Waiting for GoP $2\text{sec.}$
      • Passing into transcoder
      • Waiting for GoP $2\text{sec.}$
      • Passing into viewer
      • So, Total delay should be more than $4 + \frac{1}{FPS} \text{sec.}$
  • So, Major services like Youtube Live or Twitch said (Real-time) low latency setting can archive 2~5 sec. delay.

Self-evident delay on live streaming

  1. Capture frame
  2. Encoding
  3. Network latency
  4. Transcoding
  5. Packing media container
  6. Edge deploy
  7. Network latency
  8. Reconstruct media container into viewer’s environment
  9. Decoding
  10. Play

Figure out optimizable points

  • Reduce elapsed time between 1 to 10 is important.
  • Network delay on 3, 7 cannot control.
  • 1 is fixed. $\frac{1}{FPS}\text{sec.}$
  • 2 is related streamer’s HW spec. Cannot control.
  • 4 is Ingest to Transcoder delay and Encoding delay
  • 5 is related specification, overhead and compatibility
  • 6 Delay at copy into CDN(passing media container into content delievery server)
  • 8~10 is related End User’s device and experience.
    • Nobody like hot device and higher RPM fan noise.

I think optimizable point is two.

  • Ingest to Transcoder
  • Ingest(and Transcoder) to Edge deploy

At Ingest to Transcoder

  • Best performance is allocate One-by-One within same processor(reduce I/O delay)

  • But, Increase cost (Ingest hold many connection in same time)

  • Passing each GoP immediately when determine it’s keyframe

    • Current RTMP code, cannot access until all payload receive.
    • So, need to implement previewable stage or signalling method that can access RTMP message header + 1 byte(Video data’s contol byte)
  • So, Pooling Transcoders and queuing GoPs.

    • For example, 1 instance of Ingest can handle maximum 1000 connection
    • Transcoder pool instance that can handle less than Ingest connection(like 100? maybe more less, due to memory requirement) at once
    • If Ingest receive keyframe, request transcoder from pool
      • and than, each GoP is done, free transcoder
    • If connected transcoder pool reach threshold, connect addtional Transcoder pool instance

At Ingest(and Transcoder) to Edge deploy

  • fMP4’s demerit on live streaming is each block has length and require data base offset(this is reason fMP4 require following order moof, mdat; default-base-is-moof flag).

  • So, Cannot send until Build GoP completely.

  • For archive lowest latency, maybe need to use MPEG2-TS.

    • due to it can passing frame-by-frame.
    • Cons. and Unknown
      • At browser, most of library transmuxing into fMP4. so this isn’t merit. If anything, require more cost.(increase network outbound, due to overhead(about 10%))
      • At mobile, iOS and Android support MPEG2-TS natively. If media player support decoding before ready full GoP. maybe can archive better than fMP4.
    • Pros.
      • If transcoder following H.264 baseline profile for WebRTC.
        • Edge deploy can passing each frame over SRTP under WebRTC
      • Just consider Live-streaming performance. it seems can archive less latency.
        • But, Require development and maintance cost
          • At browser, need to handle fMP4 over HLS and WebRTC track.
          • At backend, need to managing both of HLS manifest generator and WebRTC-related.

References