sclaggett

A Peek Inside Adobe's Real Time Messaging Protocol (RTMP)

The need came up recently to set up a server application capable of streaming H.264-encoded video to Adobe Flash clients. The requirements of this particular project called for the able to modify the source code of whatever server technology was chosen, narrowing the options down to open-source solutions. An excellent library that's been around for a while is FluorineFx, an open-source .NET toolkit that mimics the behavior of Adobe’s own Flash Media Server. FluorineFx comes with a host of sample programs that made it easy to create an application that streams an H.264-encoded video file to a simple Flex client. There was just one small problem: the Flash player didn’t actually show the video or play the audio. The same video file and a nearly identical client would play just fine when connecting to Flash Media Server, indicating that FluorineFx itself was likely the source of the problem. One of the sample FLV video files that came with Flash Media Server played fine through both servers, further isolating the issue to FluorineFx serving up an H.264-encoded MP4 video file. No error messages were raised by the Flex client when it was failed to properly interpret and display the H.264 stream but this is not unusual: the Flash runtime generally fails silently when fed a media stream it doesn't like. The interactions between the Flash client and both FluorineFx and Flash Media Server were studied in an attempt to track down the source of the problem and find a solution.

RTMP on the wire

Direct analysis of the messages that the client and server were sending one another was made ridiculously simple thanks to Wireshark, an open-source network protocol analyzer.  Both client and server were configured to use Adobe’s Real Time Messaging Protocol (RMTP). This protocol involves the client creating a socket connection to the server on port 1935 over which RTMP messages are then passed back and forth in a bidirectional manner. A typical packet capture in Wireshark is shown in Figure 1:

Figure 1
Figure 1: Screenshot of Wireshark that shows a typical RTMP packet capture.

Each packet consists of four layers as shown in Figure 2:

  1. Ethernet II: Contains the source and destination MAC addresses.
  2. Internet Protocol (IP): Contains the source and destination IP addresses.
  3. Transmission Control Protocol (TCP): Contains information required to reliably fragment a stream of bytes produced by one program and transmit them to another program where the stream can be reconstructed.
  4. Real Time Messaging Protocol (RMTP): Adobe’s protocol for transmitting audio, video and application data.

Figure 2
Figure 2: The four layers of a typical RTMP packet.

RTMP messages have a predictable format whose interpretation was deduced using a combination of Wireshark’s output, the FluorineFx source code and a number of hours spent staring at frame captures. The format of an RTMP message is shown in Figure 3 and consists of the following:

  1. Channel and caller source flag (1 byte): Contains two pieces of information:
    • Bit 6: Caller source flag that indicates if a caller source value follows the function ID. If this bit it clear then the caller source is present, otherwise it is absent.
    • Bits 5-0: Channel on which this message is being sent. The Flash player uses this value to multiplex messages on a single socket connection.
  2. Timestamp (3 bytes): Message timestamp. The FluorineFx code indicates that this value is in milliseconds.
  3. Body size (3 bytes): Size of the message body that follows the header fields.
  4. Function ID (1 byte): Indicates the purpose of this message and aids in interpreting the contents of the message. The message shown in Figure 3 has a function ID of 0x14 (Invoke).
  5. Caller source (3 bytes): Purpose of this field is not clear, but it is identified as the caller source by Wireshark.
  6. Message body (body size bytes): Contents of the message.

Figure 3
Figure 3: Format of an RTMP message.

All of the observed RTMP packets fit the above structure except for those involved in the initial handshaking and one unknown message sent from the Flash client to the server.

Connection and Streaming

Table 1 shows the first 26 messages sent between the client and Flash Media Server when the H.264 file was being streamed properly.

Table 1: RTMP messages sent during video streaming.
Number Direction Function Description
1 Client to Server Handshake1 Initial client handshake
2 Server to Client Handshake2 Server response handshake
3 Client to Server Handshake3 Final client handshake
4 Client to Server 0x14 (Invoke) Client invokes connect() on server
5 Server to Client 0x5 (Server Bandwidth) Bandwidth negotiation
6 Server to Client 0x6 (Client Bandwidth) Bandwidth negotiation
7 Server to Client 0x1 (Chunk size) Server chunk size (4096 bytes for Flash Media Server)
8 Server to Client 0x14 (Invoke) Server invokes _result(“NetConnection.Connect.Success”) on client
9 Client to Server 0x5 (Server Bandwidth) Bandwidth negotiation
10 Server to Client 0x14 (Invoke) Server invokes onBWDone() on client
11 Client to Server 0x11 (FlexInvoke) Client invokes createStream() on server
12 Server to Client 0x14 (Invoke) Server invokes _result() on client
13 Client to Server 0x4 (Ping) Client pings server
14 Client to Server 0x11 (FlexInvoke) Client invokes play() on server with file name
15 Server to Client 0x1 (Chunk size) Server chunk size (4096 bytes for Flash Media Server)
16 Server to Client 0x4 (Ping) Server pings client
17 Client to Server Unknown Unknown client to server message
18 Server to Client 0x4 (Ping) Server pings client
19 Server to Client 0x14 (Invoke) Server invokes onStatus("NetStream.Play.Reset") on client
20 Server to Client 0x14 (Invoke) Server invokes onStatus("NetStream.Play.Start") on client
21 Server to Client 0x12 (Notify) Server sends client a "|RtmpSampleAccess" notification
22 Server to Client 0x8 (Audio) Empty audio message
23 Server to Client 0x12 (Notify) Server sends client an onStatus("NetStream.Data.Start") notification
24 Server to Client 0x12 (Notify) Server sends client an "onMetaData" notification
25 Server to Client 0x9 (Video) Chunk of video data
26 Server to Client 0x8 (Audio) Chunk of audio data

The messages in Table 1 can be divided into two basic steps:

  1. Connect: The first ten messages correspond to the execution of the following chunk of code in the Flex client:

// Create a new connection
_netConnection = new NetConnection();
_netConnection.addEventListener(NetStatusEvent.NET_STATUS, OnNetConnectionNetStatus);
var client:Object = new Object();
client.onBWDone = function():void
{
    trace("OnBWDone()");
};
_netConnection.client = client;
_netConnection.connect(”rtmp://192.168.0.201/vod”);

  1. Play: The eleventh message and beyond correspond to the execution of the following chunk of code in the Flex client:

// Create a new stream for video playback
_netStream = new NetStream(_netConnection);
_netStream.bufferTime = 0;

// Get status information from the stream object
_netStream.addEventListener(NetStatusEvent.NET_STATUS, OnNetStreamNetStatus);
var client:Object = new Object();
client.onMetaData = function(metadata:Object):void
{
    trace("onMetaData");
};
_netStream.client = client;

// Subscribe to the named stream
_netStream.play("mp4:simpsons.mp4");

// Attach the player to the stream
_videoRemote.attachNetStream(_netStream);

Solution

A number of differences were found in the messages exchanged between the client and server when FluorineFx and Flash Media Server were compared. Many of these differences were corrected in the FluorineFx source in an attempt to determine which one(s) was preventing the player from properly interpreting the stream, all without success. Ultimately the problem was solved by switching from FluorineFx to rtmpd, aka crtmpserver.  This application had initially been passed over because the website was a bit lacking, but in practice it only took about five minutes to acquire the source, compile it and use it to successfully stream the same H.264 file. Several advantages of rtmpd include the fact that it is written in C++, is largely platform-independent and appear to be quite efficient with regards to its performance. Hopefully the FluorineFx community will hunt down and fix the bug that prevented the H.264 file from streaming successfully via RTMP.