Introduction

I choose live streaming as the first project for posting. because it can be expanded widely in these days and future.

This article is a decision to select protocols for both streamer and viewer.

Define live streaming as Real-Time(low-latency) Video/Audio Delivery service.
live streaming + low latency bi-directional communication => game streaming service

In this series, I want to cover the whole flow are explained below from scratch as possible as possible.

flowchart LR
p(Streamer)
subgraph Service Provider
i(Ingest)
t(Transcoder)
e(Edge)
end
s(Viewer)
p --> i
i --> t
t --> e
e --> s

Select Protocols

TL;DR

I will use RTMP and HLS with CMAF

At first, think as a Stakeholder(streamer, viewer, and service provider).

The important point is easy to use, well documented to using by the streamer. (for well user acquisition)

Let’s think about the end-users first

Service Provider => Viewer

The most important points to consider are compatibility, low-latency, and operation cost.

These days, live video streaming services use HLS, DASH, CMAF.

Some services use WebRTC to provide low-latency. but, it cost more than common CDN-based delivery. (WebRTC based CDN existx. but I don’t know production-ready global service) And it needs to implement too much than others. (These series implement from scratch, with less dependency)

Most of the legacy systems in production are based on HLS. due to iOS issues. (HLS is the only format native supported by Safari browser)

CMAF is made by Apple and Microsoft. so CMAF is native support in iOS.

The primary difference between them is the media container.

-	HLS	CMAF HLS	DASH	CMAF
Container	`MPEG-2 TS`	fMP4	`ISO_BMFF`(MPEG-4)	CMAF(fMP4 compatible)

My thought about most valuable point of CMAF is requiring less storage cost and computing power on the service provider.

Support both of MPEG-2 TS and fMP4 is required 2 times of original media.
Then, What about only MPEG-2 TS?
- MPEG-2 TS packet size is fixed at about 200 bytes. streams will be fragmented with overhead. (it means require more storage)
- ISO_BMFF is variable, header has a length value, so NALU is not fragmented like MPEG-2 TS. (I don’t read ISO_BMFF’s specification, so I’m not sure it always not fragmented)
- Related slide from BITMOVIN

So, CMAF is best for VOD. due to requiring less storage, computing power, and bandwidth. but, We need to check out the streamer side. because this series is about live streaming.

Streamer => Service Provider

TL;DR

Use RTMP

There exists so many real-time video/audio transport protocols. due to different requirements. Widely mentioned in the live-streaming industry is RTMP, WebRTC, SRT. So, check them now.

-	RTMP	WebRTC	SRT
Protocol	RTMP	SRTP	SRT
L2	TCP	UDP	UDP
Latency(approx.)	5	<1	1
Media container	FLV	no use	MPEG-2 TS

At first, I will drop SRT from the list.

According to SRT Specification’, there is no way to determine a unique id.
as I understand(read specification partially and searching supported services user guide)
- At handshake, sharing unique SRT Socket ID that determines stream at SRT header parsing.
- But, the problem is how to determine the streamer?
  - RTMP: use the stream key to determine.
  - WebRTC: developer can handle
- IP-based? (Commonly used in services)
  - How support the mobile environment?
    - Wi-Fi change meaning IP change. if the connection lost by accident then need to update the IP.

Next, I will drop WebRTC.

WebRTC is good stack to provide low-latency service.
But, it is not implemented in most of the streamer’s environment like XSplit, OBS Studio, or OBS’s folk projects.
It is implemented on the web browser. not compatible with broadcasting systems.

the other options?

Youtube supports HLS uplink stream.
- Based on HTTPS POST/PUT
- Supported by OBS
- Higher latency than RTMP

So, I think RTMP is the best choice.

Let’s check again Service Provider => Viewer side.

RTMP uses FLV as a media container. (the specification is the same structure)

If protocol uses MPEG-2 TS like SRT, then the same container using at HLS is a good choice. due to code reusability. Just streaming FLV to the viewer is lower latency. (less overhead) But, it is not compatible. for example how to play on iOS Safari? And, FLV only supports legacy codecs.

Results

Streamers are using RTMP to publish stream.
Viewers are receiving HLS with CMAF segments

So, need to implement RTMP, FLV, ISO_BMFF, CMAF, HLS.

In the next post, I will minimal rough system design.

Discussion

At WOWZA’s article, I have some questions
- WebRTC is faster than RTSP/RTP.
  - As I know, WebRTC uses SRTP that a secured version of RTP as a video/audio track transport protocol.
  - RTP has less overhead due to SRTP having encryption processing.
  - I think, maybe different segment length?
- UDP-based protocols have low latency.
  - Then, how about RTMFP(UDP-based RTMP with Secure)? Why isn’t using in production on these days?
  - How aboue WebTransport?
    - WebRTC is separated as WebCodecs and WebTransport(At first, it can be explained as UDP-based WebSocket. but P2P-WebTransport draft is a work in progress)
    - So, using WebTransport or P2P-WebTransport as transport layer, it can be faster like WebRTC?

Min-Jae Lee

Impl. live streaming system: Intro