Standard MP4 has one big moov box (metadata) at the start, then mdat (data) for the whole file. Fragmented MP4 (fMP4) splits into many small fragments, each with its own metadata + data. This subtle change is what lets you serve adaptive bitrate streaming.

Advertisement

MP4 box structure

MP4 is a tree of 'boxes' (atoms). Each box has type (4 chars) + length + payload. Top level: ftyp (file type), moov (metadata), mdat (audio/video samples). Standard MP4 puts all samples in one mdat.

fMP4 = many fragments

Each fragment: moof (fragment metadata) + mdat (its samples). Append fragments to grow the file. Player can start playing as soon as it has the init segment + one fragment — no need for the full moov upfront.

Advertisement

Why streaming needs this

HLS/DASH players fetch one segment at a time. Standard MP4 requires the moov (which can be 1+ MB for a long video) before any byte of media can play. fMP4's per-fragment moof lets each segment be independently playable.

Init segment + media segments

init.mp4              <- ftyp + moov (codec info, NO samples)
seg-001.m4s           <- moof + mdat (2-6 seconds of media)
seg-002.m4s
...

Client: fetch init.mp4 once. Then fetch segments in order.

Practical packaging

Tools: Shaka Packager (Google, open source, production-grade). Bento4 (mp4fragment + mp4dash). FFmpeg (-f dash or -f hls -hls_segment_type fmp4). Pick Shaka for serious deployments — it handles DRM, multiple codecs, and edge cases.

fMP4 = ftyp + moov + N×(moof+mdat). One packaged source serves HLS + DASH + smooth-streaming.