Marker Bit In RTP for Voice Samples for codec like AMR and G729 - rtp

I want to know the significance of Marker Bit in RTP for Voice packets and is here any RFC which tell that.
I know that the for the Video packets marker bit means last packet for the same image and hence, its the last packet with PTS time-stamp corresponding to image but for the Voice Packets for a codec say AMR-NB or G711 alaw or G729, the Marker Bit is usually false in each of the RTP packet.
So, do the meaning of Marker bit changes in this case of RTP packets??
Regards
Nitin

In audio codecs, if you will analyse the wireshark traces for any codec. Lets say AMR, you will have the following observations
For voice packets, the marker bits indicates the beginning of a talkspurt. Beginning of talkspurts are good opportunities to adjust the playout delay at the receiver to compensate for differences between the sender and receiver clock rates as well as changes in the network delay jitter. Packets during a talkspurt need to played out continuously, while listeners generally are not sensitive to slight variations in the durations of a pause.
The marker bit is a hint; the beginning of a talkspurt can also be computed by comparing the difference in timestamps and sequence numbers between two packets, assuming the timestamp clock rate is known.
Packets may arrive out of order, so that the packet with the marker bit is received after the second packet in the talkspurt. As long as the playout delay is longer than this reordering, the receiver can still perform delay adaptation. If not, it simply has to wait for the next talkspurt.
Source: http://www.cs.columbia.edu/~hgs/rtp/faq.html#marker
The same thing can be read here too.
http://msdn.microsoft.com/en-us/library/dd944715(v=office.12).aspx

As per RFC
marker (M): 1 bit
The interpretation of the marker is defined by a profile. It is
intended to allow significant events such as frame boundaries to
be marked in the packet stream. A profile MAY define additional
marker bits or specify that there is no marker bit by changing the
number of bits in the payload type field .
My understanding is that for voice packet a data require for single frame (mostly for 20 ms) is not so big that we can send it in to more then 1 RTP packets.
So, for voice packet marker bit means start of new stream & consider time stamp from here.
When you look in to video packet (like H261, H263, ...) then single frame require multiple RTP packet. In that case marker bit represent end of single frame & after receiving that you can start parsing of whole frame.
This is also use for DTMF in RFC 2833 case where single event represented by multiple RTP packets.

Related

RTP timestamp in data packets vs RTCP SR packets

I'm trying to understand the difference between the RTP timestamp as it occurs in RTP data packets vs as it is used in RTCP Sender Report (SR) packets.
For the RTP timestamp in data packets I have established that:
They are not based on wall-clock time but represent more of a counter
They typically have a random offset chosen at the beginning of a session
For simple standard audio codecs they typically increment by 160 (roughly speaking 1000ms / 20ms * 160samples = 8000 bit rate) and this increment also includes not sent silent packets
For the RTP timestamp in RTCP sender report packets I originally thought they were just a snapshot of the current RTP timestamp of the data packets and in conjunction with the NTP timestamp (which is typically wall-clock) could be used to calculate the wall-clock time of further incoming RTP packets which is also what I understood from this analysis on the subject.
However a sentence of the RFC 3550 Section 6.4.1 makes me wonder about that assumption:
Note that in most cases this timestamp will not be equal to the RTP timestamp in any adjacent data packet.
This ruins my assumption, because I assumed that the SR packet would contain an RTP timestamp that is found in a data packet that has just been sent by the same source. Unfortunately the next sentences are pretty much meaningless for me (maybe this is an English language barrier but it sound like non-helpful nonsense to me)
Rather, it MUST be
calculated from the corresponding NTP timestamp using the
relationship between the RTP timestamp counter and real time as
maintained by periodically checking the wallclock time at a
sampling instant.
Could you clarify for me how the RTP timestamp in an RTCP SR packet can be calculated?
The process of sending RTCP report packets is separated from sending the related RTP packet stream(s). With that I mean that they usually won't be sent at the same time. Therefore a sent RTP packet and RTCP report packet will typically contain different RTP timestamp values.
As you know a relationship exists between the NTP (wallclock) time and the RTP timestamp. Since RTCP report packets contain both an NTP timestamp and an RTP timestamp these packets can be used to learn how these values relate at the side of the sender of the RTCP packet. Any RTP packets received from the same sender will contain their own (typically different) RTP timestamp. Using the relationship learned from the received RTCP packets this RTP timestamp can be used to calculate the wallclock time of the moment the RTP packet was sent.
The answer to this stackoverflow question might also help you.

[rtp/rtcp server]How to prepare a stored media file for steaming?

Now i'm trying to understand the rtp/rtcp protocol(RFC3550).
I knew that in common case,the audio and video steaming is separately.
But if i want to steaming a stored media file(such as *.mp4) in the server,
how does the server get those tracks from that media file?
RTP is all about carrying the real time data, how you break it up and put it into a RTP packet payload (Called "Packetizing") is up to the implementer, but let's look at a common use case of how you'd actually do this.
If you wanted to send your existing recorded MP4 file through an RTP stream you'd first break it into smaller chunks to be sent down the wire at regular intervals packed inside RTP packets.
Let's say you've got a 10 second MP4 file and you decide your packetization timer is 1 second, we'd split it into 10x 1 second long chunks of data we can put into our RTP payloads. (In practice you could use FFMPeg or something similar to split the MP4 into 1 second chunks)
Then we form our RTP header, we set the Payload Type to something custom, as there's no payload type for MP4 data assigned by IANA. We'd assign a starting sequence number, a Synchronization Source Identifier and a timestamp, and then we'd fill the payload with the first 1 second of data.
1 second after that we'd increment the sequence number by 1, add 1 second to the timestamp, add the next 1 second of data to the payload and send the next RTP header.
We'd then repeat this 8 more times until we've sent 10 RTP packets containing our 10x 1 second MP4 payloads.
If you actually wanted to go about implementing this I wrote this simple Python Library for creating RTP packets,
To learn more about RTP there's obviously RFC 3550, for a really in depth look at RTP there's a great book by Colin Perkins called "RTP: Audio and Video for the Internet" and I've written a bit about all the RTP headers and their meaning.
In practice if you want to get a pre-recorded MP4 file from point A to point B there's better protocols for it than RTP, RTP is focused on the real time transfer of media, as in live-streaming style, not transferring existing pre-recorded media files, FTP, HTTP or even some of the peer-to-peer protocols would be better suited at transferring this.

Efficiently send a stream of UDP packets

I know how to open an UDP socket in C++, and I also know how to send packets through that. When I send a packet I correctly receive it on the other end, and everything works fine.
EDIT: I also built a fully working acknowledgement system: packets are numbered, checksummed and acknowledged, so at any time I know how many of the packets that I sent, say, during the last second were actually received from the other endpoint. Now, the data I am sending will be readable only when ALL the packets are received, so that I really don't care about packet ordering: I just need them all to arrive, so that they could arrive in random sequences and it still would be ok since having them sequentially ordered would still be useless.
Now, I have to transfer a big big chunk of data (say 1 GB) and I'd need it to be transferred as fast as possible. So I split the data in say 512 bytes chunks and send them through the UDP socket.
Now, since UDP is connectionless it obviously doesn't provide any speed or transfer efficiency diagnostics. So if I just try to send a ton of packets through my socket, my socket will just accept them, then they will be sent all at once, and my router will send the first couple and then start dropping them. So this is NOT the most efficient way to get this done.
What I did then was making a cycle:
Sleep for a while
Send a bunch of packets
Sleep again and so on
I tried to do some calibration and I achieved pretty good transfer rates, however I have a thread that is continuously sending packets in small bunches, but I have nothing but an experimental idea on what the interval should be and what the size of the bunch should be. In principle, I can imagine that sleeping for a really small amount of time, then sending just one packet at a time would be the best solution for the router, however it is completely unfeasible in terms of CPU performance (I probably would need to busy wait since the time between two consecutive packets would be really small).
So is there any other solution? Any widely accepted solution? I assume that my router has a buffer or something like that, so that it can accept SOME packets all at once, and then it needs some time to process them. How big is that buffer?
I am not an expert in this so any explanation would be great.
Please note, however, that for technical reasons there is no way at all I can use TCP.
As mentioned in some other comments, what you're describing is a flow control system. The wikipedia article has a good overview of various ways of doing this:
http://en.wikipedia.org/wiki/Flow_control_%28data%29
The solution that you have in place (sleeping for a hard-coded period between packet groups) will work in principle, but in order to get reasonable performance in a real-world system you need to be able to react to changes in the network. This means implementing some kind of feedback where you automatically adjust both the outgoing data rate and packet size in response to to network characteristics, such as throughput and packetloss.
One simple way of doing this is to use the number of re-transmitted packets as an input into your flow control system. The basic idea would be that when you have a lot of re-transmitted packets, you would reduce the packet size, reduce the data rate, or both. If you have very few re-transmitted packets, you would increase packet size & data rate until you see an increase in re-transmitted packets.
That's something of a gross oversimplification, but I think you get the idea.

Which method to send/receive data properly in a network game (UDP, but why not TCP)

I have a C++ application with GUI that runs (on PC 1) just like a network game, and receives data packets from another computer (2) via WiFi (ad-hoc, so it's quite reliable) at fairly regular intervals (like 40ms), once per loop on program (2). I use send/read.
Here is the problem:
- Packets are not always fully sent (but apparently you can simply keep send()ing the remaining data until all is sent, and thats works well)
- More importantly, packets are stacked in the socket during (1)'s loop until the read() occurs, and then there is no way to distinguish packets in the big stream of data, or know if you were already in the middle of a packet.
I tried to fix this with ID headers (you find an ID as first bytes and you know the length of the packet), but I often get lost (unknown ID : we are not at the beginning of the packet) and am forced to ignore all the remaining data.
So my question is:
Why do packets stack? (generally I have 400B of data whereas my packets are <100B long and fps (1) and (2) are not very different)
How can I have a more reliable way to receive actual packets, say, 80% of packets (discarding packet loss, it's not a question of UDP/TCP)?
Would a separate thread for receiving packets work? (on (1), the server)
How do real-time network games to that (including multiple client management)?
Thanks in advance.
(Sorry I do not have the code here, but I tried to be as clear as I could)
Well:
1) UDP transfers MESSAGES, but is unreliable.
2) TCP transfers BYTE STREAMS, and is reliable.
UDP cannot reliably transfer messages. Anything more reliable requires a protocol on top of UDP.
TCP cannot transfer messages unless they are one byte long. Anything more complex requires a protocol on top of TCP.
Why do packets stack? (generally I have 400B of data whereas my packets are <100B long and fps (1) and (2) are not very different)
Because the time to send packets across the net varies, it typically does not make sense to send packets at a high rate, so most networking libraries (e.g. RakNet) will queue up packets and do a send every 10 ms.
In the case of TCP, there is Nagle's algorithm which is a more principled way of doing the same thing. You can turn Nagle's off by setting the NO_DELAY TCP flag.
How can I have a more reliable way to receive actual packets, say, 80% of packets (discarding packet loss, it's not a question of UDP/TCP)?
If you use TCP, you will receive all of the packets and in the right order. The penalty for using TCP is if a packet is dropped, the packets after it wait until that packet can be resent before they are processed. This results in a noticeable delay, so any games that use TCP have sophisticated prediction techniques to hide this delay and other techniques to smoothly "catch up" once the missing packet arrives.
If you use UDP, you can implement a layer on top that gives you reliability but without the ordering if the order of the packets doesn't matter by sending a counter with each packet and having the receiver repeatedly notify the sender of gaps in the counts. You can also implement ordering by doing something similar. Of course, if you enforce both, then you are creating your own TCP layer. See http://www.jenkinssoftware.com/raknet/manual/reliabilitytypes.html for more details.
What you describe is what would happen if you are using TCP without a protocol on top of it to structure your transmitted data. Your idea of using an ID header and packet length is one such protocol. If you send a 4-byte ID followed by a 4-byte length followed by X number of bytes, then the receiver knows that it has to read 4 bytes followed by 4 bytes followed by X bytes to receive a complete packet. It doesn't get much simplier than that. The fact that you are still having problems reading packets with such a simple protocol suggests that your underlying socket reading code is flawed to begin with. Without seeing your actual code, it is difficult to tell you what you are doing wrong.

Search for i-frame in RTP Packet

I am implementing RTSP in C# using an Axis IP Camera. Everything is working fine but when i try to display the video, I am getting first few frames with lots of Green Patches. I suspect the issue that I am not sending the i-frame first to the client.
Hence, I want to know the algorithm required to detect an i-frame in RTP Packet.
when initiating a RTSP-Session the server normaly starts the RTP-stream with config-data followed by the first I-Frame.
It is thinkable, that your Axis-camera is set to "always multicast" - in this case the RTSP-communication leads to a SDP description which tells the client all necessary network and streaming details for receiving the multicast stream.
Since the multicast stream is always present, you most probably receive some P- or B- frames first (depending on GOP-size).
You can detect these P/B-frames in your RTP client the same way you were detecting the I-frames as suggested by Ralf by identyfieng them via the NAL-unit type. Simply skip all frames in the RTP client until you receive the first I-frame.
Now you can forward all following frames to the decoder.
or you gave to change you camera settings!
jens.
ps: don't forget that you have fragmentation in your RTP stream - that means that beside of the RTP header there are some fragmentation information. Before identifying a frame you have to reassemble it.
It depends on the video media type. If you take H.264 for instance, you would look at the NAL unit header to check the nal unit type.
The green patches can indeed be caused by not having received an iframe first.