What is Fragmented MP4

MP4 is defined in ISOBMFF.

Mandatory, minimum ISOBMFF is a set of ftyp, mdat, moov.

(def boxes-per-row 4)
(draw-column-headers)
(draw-box "ftyp" {:span 4})
(draw-gap "mdat")
(draw-box "moov" {:span 4})
  • ftyp has compatible information.
  • mdat has real media data.
  • moov has metadata and sample information such as data offset, size and other information that required for decode.

The point of Fragmented is sample information.

In fragmented mp4, moov don’t have any sample information.

Sample information is stored in moof box.

Structure of Fragmented MP4

Fragmented MP4’s recommandations is below.

  • SHOULD in following order
    • ftyp
    • moov
    • pair of moof and mdat
    • mfra (optional)
  • moof at most one traf
  • mfra is recording first (random accessible) samples for both video and audio
(def row-header-fn {})
(def boxes-per-row 4)
(draw-box "ftyp" {:span 4})
(draw-box "moov" {:span 4})
(draw-box "moof" {:span 4})
(draw-gap "mdat")
(draw-gap "...")

Commonly, a set of ftyp and moov is calling as initial segment. moov only has metadata that used for decode media without sample information. commonly call it as empty moov.

sample information is packaging into moof.

Commonly, a set of moof and mdat is calling as chunk segment.

Typically, each chunk segment has at least one GoP(Group of Pictures; you can simply think it as one Keyframe + interframes). So, this is bottleneck point in live streaming.

If keyframe interval is 2 seconds, then chunk segment’s duration is 2 seconds. Due to moof box is base offset of data(that configured by default_base_is_moof flag in tfhd), and moof have sample information. cannot write before flushing to create next chunk. So, Latency is always 2+@ seconds (In producer)Encoding + (In server)Receive 2 seconds GoP + Write to player + (In customer) Decoding).

In, Low-Latency HLS or Low-Latency DASH split GoP into small pieces.

For example, keyframe interval is 2 seconds, then chunk segment’s duration is 2 seconds. Server can split GoP into 4 chunks with 500ms duration. then, Latency is always 0.5+@ seconds (In producer)Encoding + (In server)Receive 0.5 seconds + Write to player + (In customer) Decoding).

Latency is reduced at least 1.5 seconds. But, you need to check moof overhead. Smaller chunk archive low latency, but require more network traffic and storage.

Reference