RTP timestamp in data packets vs RTCP SR packets - rtp

I'm trying to understand the difference between the RTP timestamp as it occurs in RTP data packets vs as it is used in RTCP Sender Report (SR) packets.
For the RTP timestamp in data packets I have established that:
They are not based on wall-clock time but represent more of a counter
They typically have a random offset chosen at the beginning of a session
For simple standard audio codecs they typically increment by 160 (roughly speaking 1000ms / 20ms * 160samples = 8000 bit rate) and this increment also includes not sent silent packets
For the RTP timestamp in RTCP sender report packets I originally thought they were just a snapshot of the current RTP timestamp of the data packets and in conjunction with the NTP timestamp (which is typically wall-clock) could be used to calculate the wall-clock time of further incoming RTP packets which is also what I understood from this analysis on the subject.
However a sentence of the RFC 3550 Section 6.4.1 makes me wonder about that assumption:
Note that in most cases this timestamp will not be equal to the RTP timestamp in any adjacent data packet.
This ruins my assumption, because I assumed that the SR packet would contain an RTP timestamp that is found in a data packet that has just been sent by the same source. Unfortunately the next sentences are pretty much meaningless for me (maybe this is an English language barrier but it sound like non-helpful nonsense to me)
Rather, it MUST be
calculated from the corresponding NTP timestamp using the
relationship between the RTP timestamp counter and real time as
maintained by periodically checking the wallclock time at a
sampling instant.
Could you clarify for me how the RTP timestamp in an RTCP SR packet can be calculated?

The process of sending RTCP report packets is separated from sending the related RTP packet stream(s). With that I mean that they usually won't be sent at the same time. Therefore a sent RTP packet and RTCP report packet will typically contain different RTP timestamp values.
As you know a relationship exists between the NTP (wallclock) time and the RTP timestamp. Since RTCP report packets contain both an NTP timestamp and an RTP timestamp these packets can be used to learn how these values relate at the side of the sender of the RTCP packet. Any RTP packets received from the same sender will contain their own (typically different) RTP timestamp. Using the relationship learned from the received RTCP packets this RTP timestamp can be used to calculate the wallclock time of the moment the RTP packet was sent.
The answer to this stackoverflow question might also help you.

Related

NTP Timestamp in RTCP Sender Report is Incorrect

I'm using Amazon's sample code to upload a rtsp stream from an IP camera to Kinesis video streams. Code found here: https://github.com/awslabs/amazon-kinesis-video-streams-producer-sdk-cpp/blob/master/samples/kvs_gstreamer_sample.cpp
I would like to get the NTP time from the camera for each frame. My understanding is that the first step in doing this is reading the RTCP sender report to get the synchronizing time for the camera's RTP and NTP times.
To do that, I've added a callback on receiving RTCP packets like so:
g_signal_connect_after(session, "on-receiving-rtcp", G_CALLBACK(on_rtcp_callback), data);
Then, in my callback function after getting the SR packet from the buffer, I try to get the two timestamps:
gst_rtcp_packet_sr_get_sender_info (packet, ssrc, ntptime, rtptime, packet_count, octet_count);
When comparing the 'ntptime' and 'rtptime' variables I get here, with what I see in Wireshark, the rtp times match perfectly. However, the NTP time I get in my C++ code is very wrong, it shows a time from about a month ago, while the Wireshark packet shows an NTP time that appears correct.
Is there some setting causing gstreamer to overwrite the NTP time in my sender report packets, and if so, how do I disable that setting?
It turns out the NTP time provided gst_rtcp_packet_sr_get_sender_info is not in any format I've seen before. To convert to a meaningful timestamp you have to use gst_rtcp_ntp_to_unix which then gives you a unix time that actually makes some sense.

[rtp/rtcp server]How to prepare a stored media file for steaming?

Now i'm trying to understand the rtp/rtcp protocol(RFC3550).
I knew that in common case,the audio and video steaming is separately.
But if i want to steaming a stored media file(such as *.mp4) in the server,
how does the server get those tracks from that media file?
RTP is all about carrying the real time data, how you break it up and put it into a RTP packet payload (Called "Packetizing") is up to the implementer, but let's look at a common use case of how you'd actually do this.
If you wanted to send your existing recorded MP4 file through an RTP stream you'd first break it into smaller chunks to be sent down the wire at regular intervals packed inside RTP packets.
Let's say you've got a 10 second MP4 file and you decide your packetization timer is 1 second, we'd split it into 10x 1 second long chunks of data we can put into our RTP payloads. (In practice you could use FFMPeg or something similar to split the MP4 into 1 second chunks)
Then we form our RTP header, we set the Payload Type to something custom, as there's no payload type for MP4 data assigned by IANA. We'd assign a starting sequence number, a Synchronization Source Identifier and a timestamp, and then we'd fill the payload with the first 1 second of data.
1 second after that we'd increment the sequence number by 1, add 1 second to the timestamp, add the next 1 second of data to the payload and send the next RTP header.
We'd then repeat this 8 more times until we've sent 10 RTP packets containing our 10x 1 second MP4 payloads.
If you actually wanted to go about implementing this I wrote this simple Python Library for creating RTP packets,
To learn more about RTP there's obviously RFC 3550, for a really in depth look at RTP there's a great book by Colin Perkins called "RTP: Audio and Video for the Internet" and I've written a bit about all the RTP headers and their meaning.
In practice if you want to get a pre-recorded MP4 file from point A to point B there's better protocols for it than RTP, RTP is focused on the real time transfer of media, as in live-streaming style, not transferring existing pre-recorded media files, FTP, HTTP or even some of the peer-to-peer protocols would be better suited at transferring this.

RTP: SSRC collision detection in unicast sessions

From RFC 3550:
If a receiver discovers that two other
sources are colliding, it MAY keep the packets from one and discard
the packets from the other when this can be detected by different
source transport addresses or CNAMEs. The two sources are expected
to resolve the collision so that the situation doesn't last.
In a unicast configuration with one receiver and two senders that only communicate with receiver, how SSRC collisions may be detected by senders?
One guess is that receiver should periodically send all known CNAMEs to all known participants (senders). Is it true? But in this case, how senders will associate received CNAME with a transport address?
Update:
As answered below, there are two separate RTP sessions with separate SSRC spaces, so no collision detection is needed.
The distinguishing feature of an RTP session is that each
maintains a full, separate space of SSRC identifiers
And:
The set of participants included in one RTP session
consists of those that can receive an SSRC identifier transmitted
by any one of the participants either in RTP as the SSRC or a CSRC
(also defined below) or in RTCP.
And there is even an example for the situation I have described:
For example, consider a three-
party conference implemented using unicast UDP with each
participant receiving from the other two on separate port pairs.
If each participant sends RTCP feedback about data received from
one other participant only back to that participant, then the
conference is composed of three separate point-to-point RTP
sessions.
As far as I understand, this rule is applicable only for multicasting and/or loops of packets. With the setup described by you (two senders are unicasting to one receiver), they don't know each other and have no measures to detect the collision. It's receiver task to deal with this issue. If the receiver is media processor, it likely will act as an end party, reformat the stream and resend needed contents under its own SSRC.
A Goodbye can be sent with a Reason set to the appropriate value..
See http://www.ietf.org/rfc/rfc3550.txt # 6.6 BYE: Goodbye RTCP Packet
By tradition I have seen the value "ssrc" used to indicate the SSRC is changing.
Additionally if a RTCP packet is received with a new SSRC the RTP packets ssrc should probably change as well and thus would be handled when verifying the sequence number, if the ssrc is changed but the sequence number is still valid then the new ssrc will be used.

Which method to send/receive data properly in a network game (UDP, but why not TCP)

I have a C++ application with GUI that runs (on PC 1) just like a network game, and receives data packets from another computer (2) via WiFi (ad-hoc, so it's quite reliable) at fairly regular intervals (like 40ms), once per loop on program (2). I use send/read.
Here is the problem:
- Packets are not always fully sent (but apparently you can simply keep send()ing the remaining data until all is sent, and thats works well)
- More importantly, packets are stacked in the socket during (1)'s loop until the read() occurs, and then there is no way to distinguish packets in the big stream of data, or know if you were already in the middle of a packet.
I tried to fix this with ID headers (you find an ID as first bytes and you know the length of the packet), but I often get lost (unknown ID : we are not at the beginning of the packet) and am forced to ignore all the remaining data.
So my question is:
Why do packets stack? (generally I have 400B of data whereas my packets are <100B long and fps (1) and (2) are not very different)
How can I have a more reliable way to receive actual packets, say, 80% of packets (discarding packet loss, it's not a question of UDP/TCP)?
Would a separate thread for receiving packets work? (on (1), the server)
How do real-time network games to that (including multiple client management)?
Thanks in advance.
(Sorry I do not have the code here, but I tried to be as clear as I could)
Well:
1) UDP transfers MESSAGES, but is unreliable.
2) TCP transfers BYTE STREAMS, and is reliable.
UDP cannot reliably transfer messages. Anything more reliable requires a protocol on top of UDP.
TCP cannot transfer messages unless they are one byte long. Anything more complex requires a protocol on top of TCP.
Why do packets stack? (generally I have 400B of data whereas my packets are <100B long and fps (1) and (2) are not very different)
Because the time to send packets across the net varies, it typically does not make sense to send packets at a high rate, so most networking libraries (e.g. RakNet) will queue up packets and do a send every 10 ms.
In the case of TCP, there is Nagle's algorithm which is a more principled way of doing the same thing. You can turn Nagle's off by setting the NO_DELAY TCP flag.
How can I have a more reliable way to receive actual packets, say, 80% of packets (discarding packet loss, it's not a question of UDP/TCP)?
If you use TCP, you will receive all of the packets and in the right order. The penalty for using TCP is if a packet is dropped, the packets after it wait until that packet can be resent before they are processed. This results in a noticeable delay, so any games that use TCP have sophisticated prediction techniques to hide this delay and other techniques to smoothly "catch up" once the missing packet arrives.
If you use UDP, you can implement a layer on top that gives you reliability but without the ordering if the order of the packets doesn't matter by sending a counter with each packet and having the receiver repeatedly notify the sender of gaps in the counts. You can also implement ordering by doing something similar. Of course, if you enforce both, then you are creating your own TCP layer. See http://www.jenkinssoftware.com/raknet/manual/reliabilitytypes.html for more details.
What you describe is what would happen if you are using TCP without a protocol on top of it to structure your transmitted data. Your idea of using an ID header and packet length is one such protocol. If you send a 4-byte ID followed by a 4-byte length followed by X number of bytes, then the receiver knows that it has to read 4 bytes followed by 4 bytes followed by X bytes to receive a complete packet. It doesn't get much simplier than that. The fact that you are still having problems reading packets with such a simple protocol suggests that your underlying socket reading code is flawed to begin with. Without seeing your actual code, it is difficult to tell you what you are doing wrong.

Marker Bit In RTP for Voice Samples for codec like AMR and G729

I want to know the significance of Marker Bit in RTP for Voice packets and is here any RFC which tell that.
I know that the for the Video packets marker bit means last packet for the same image and hence, its the last packet with PTS time-stamp corresponding to image but for the Voice Packets for a codec say AMR-NB or G711 alaw or G729, the Marker Bit is usually false in each of the RTP packet.
So, do the meaning of Marker bit changes in this case of RTP packets??
Regards
Nitin
In audio codecs, if you will analyse the wireshark traces for any codec. Lets say AMR, you will have the following observations
For voice packets, the marker bits indicates the beginning of a talkspurt. Beginning of talkspurts are good opportunities to adjust the playout delay at the receiver to compensate for differences between the sender and receiver clock rates as well as changes in the network delay jitter. Packets during a talkspurt need to played out continuously, while listeners generally are not sensitive to slight variations in the durations of a pause.
The marker bit is a hint; the beginning of a talkspurt can also be computed by comparing the difference in timestamps and sequence numbers between two packets, assuming the timestamp clock rate is known.
Packets may arrive out of order, so that the packet with the marker bit is received after the second packet in the talkspurt. As long as the playout delay is longer than this reordering, the receiver can still perform delay adaptation. If not, it simply has to wait for the next talkspurt.
Source: http://www.cs.columbia.edu/~hgs/rtp/faq.html#marker
The same thing can be read here too.
http://msdn.microsoft.com/en-us/library/dd944715(v=office.12).aspx
As per RFC
marker (M): 1 bit
The interpretation of the marker is defined by a profile. It is
intended to allow significant events such as frame boundaries to
be marked in the packet stream. A profile MAY define additional
marker bits or specify that there is no marker bit by changing the
number of bits in the payload type field .
My understanding is that for voice packet a data require for single frame (mostly for 20 ms) is not so big that we can send it in to more then 1 RTP packets.
So, for voice packet marker bit means start of new stream & consider time stamp from here.
When you look in to video packet (like H261, H263, ...) then single frame require multiple RTP packet. In that case marker bit represent end of single frame & after receiving that you can start parsing of whole frame.
This is also use for DTMF in RFC 2833 case where single event represented by multiple RTP packets.