Fragmented MP4
What is Fragmented MP4
MP4 is defined in ISOBMFF.
Mandatory, minimum ISOBMFF is a set of ftyp
, mdat
, moov
.
(def boxes-per-row 4)
(draw-column-headers)
(draw-box "ftyp" {:span 4})
(draw-gap "mdat")
(draw-box "moov" {:span 4})
ftyp
has compatible information.mdat
has real media data.moov
has metadata and sample information such as data offset, size and other information that required for decode.
The point of Fragmented is sample information.
In fragmented mp4, moov don’t have any sample information.
Sample information is stored in moof
box.
Structure of Fragmented MP4
Fragmented MP4’s recommandations is below.
- SHOULD in following order
ftyp
moov
- pair of
moof
andmdat
mfra
(optional)
moof
at most onetraf
mfra
is recording first (random accessible) samples for both video and audio
(def row-header-fn {})
(def boxes-per-row 4)
(draw-box "ftyp" {:span 4})
(draw-box "moov" {:span 4})
(draw-box "moof" {:span 4})
(draw-gap "mdat")
(draw-gap "...")
Commonly, a set of ftyp
and moov
is calling as initial segment.
moov
only has metadata that used for decode media without sample information.
commonly call it as empty moov.
sample information is packaging into moof
.
Commonly, a set of moof
and mdat
is calling as chunk segment.
Typically, each chunk segment has at least one GoP(Group of Pictures; you can simply think it as one Keyframe + interframes). So, this is bottleneck point in live streaming.
If keyframe interval is 2 seconds, then chunk segment’s duration is 2 seconds.
Due to moof
box is base offset of data(that configured by default_base_is_moof
flag in tfhd
), and moof
have sample information. cannot write before flushing to create next chunk.
So, Latency is always 2+@ seconds (In producer)Encoding + (In server)Receive 2 seconds GoP + Write to player + (In customer) Decoding).
In, Low-Latency HLS or Low-Latency DASH split GoP into small pieces.
For example, keyframe interval is 2 seconds, then chunk segment’s duration is 2 seconds. Server can split GoP into 4 chunks with 500ms duration. then, Latency is always 0.5+@ seconds (In producer)Encoding + (In server)Receive 0.5 seconds + Write to player + (In customer) Decoding).
Latency is reduced at least 1.5 seconds.
But, you need to check moof
overhead.
Smaller chunk archive low latency, but require more network traffic and storage.