How to debug packet loss?

How to debug packet loss? - c++

I wrote a C++ application (running on Linux) that serves an RTP stream of about 400 kbps. To most destinations this works fine, but some destinations expericence packet loss. The problematic destinations seem to have a slower connection in common, but it should be plenty fast enough for the stream I'm sending.
Since these destinations are able to receive similar RTP streams for other applications without packet loss, my application might be at fault.
I already verified a few things:
- in a tcpdump, I see all RTP packets going out on the sending machine
- there is a UDP send buffer in place (I tried sizes between 64KB and 300KB)
- the RTP packets mostly stay below 1400 bytes to avoid fragmentation
What can a sending application do to minimize the possibility of packet loss and what would be the best way to debug such a situation ?

Don't send out packets in big bursty chunks.
The packet loss is usually caused by slow routers with limited packet buffer sizes. The slow router might be able to handle 1 Mbps just fine if it has time to send out say, 10 packets before receiving another 10, but if the 100 Mbps sender side sends it a big chunk of 50 packets it has no choice but to drop 40 of them.
Try spreading out the sending so that you write only what is necessary to write in each time period. If you have to write one packet every fifth of a second, do it that way instead of writing 5 packets per second.

netstat has several usefull option to debug the situation.
First one is netstat -su (dump UDP statistics):
dima#linux-z8mw:/media> netstat -su
IcmpMsg:
InType3: 679
InType4: 20
InType11: 548
OutType3: 100
Udp:
12945 packets received
88 packets to unknown port received.
0 packet receive errors
13139 packets sent
RcvbufErrors: 0
SndbufErrors: 0
UdpLite:
InDatagrams: 0
NoPorts: 0
InErrors: 0
OutDatagrams: 0
RcvbufErrors: 0
SndbufErrors: 0
IpExt:
InNoRoutes: 0
InTruncatedPkts: 0
InMcastPkts: 3877
OutMcastPkts: 3881
InBcastPkts: 0
OutBcastPkts: 0
InOctets: 7172779304
OutOctets: 785498393
InMcastOctets: 525749
OutMcastOctets: 525909
InBcastOctets: 0
OutBcastOctets: 0
Notice "RcvbufErrors" and "SndbufErrors"
Additional option is to monitor receive and send UDP buffers of the process:
dima#linux-z8mw:/media> netstat -ua
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
udp 0 0 *:bootpc *:*
udp 0 0 *:40134 *:*
udp 0 0 *:737 *:*
udp 0 0 *:mdns *:*
Here you need to look at Recv-Q and Send-Q column of the connection you're interested. If the values high and don't drop to zero, than the process can not handle the load.
You can use these commands on sending and on receiving machine.
Also you can use mtr, which combines traceroute and ping - it pings each hop in route.
This may detect a slow hop in your route. Run it on oth machines to check connectivity to the second one.

RTP typically uses UDP, which is inherently lossy. Packets could be lost anywhere between sender and receiver, so local debug will show you nothing useful.
Obvious things to do:
a: Reduce the overall data rate
b: Reduce the 'peak' data rate, by
sending small packets more often
rather than one huge chunk every few
seconds. ie, REDUCE your UDP send
buffer - maybe even to just 1400
bytes.
c: See if you can switch to a TCP
variant of RTP.
If all else fails, WireShark is your friend. It will give you a true picture of how much data - and when is being sent by your app.

You should try reducing the rate you send packets. A slow connection can mean all sorts of things, and trying to send it packets (small or large) at a high rate won't help.

This may not be the answer you want, but if I had packet loss problems I'd try to switch my application to use TCP, and have most worries of packet loss taken off my mind.

Related

Understanding the TCP packet size limit with UDP packet size limit & what it means at boost::asio level of programming

I am using boost::asio to do UDP as well as TCP communication in my Client app & Server applications. I found that I am only able to transmit data of size 65535 bytes using UDP as it seems to be the max packet size in UDP.
The max packet size limit is also there in TCP which is 65535 bytes ? But I am able send chunks larger than max packet size using boost::asio::write in TCP & read it all fine on the client app. I see that I don't have to bother about the max packet size in TCP but in UDP I have ensure each socket.send_to is done with a buffer smaller than max packet size
How does this work ? Is this because TCP is stream based takes care of creating packets at the lower layer ?
Is there some way I can increase the max packet size in UDP ?
Is it possible that some of the bytes of an UDP packet I sent from server side could be missing when I read on client side ? If yes, then is there way only to detect the loss on client side of UDP ?

TCP takes care of transmission control (that's actually what T and C stand for in TCP). You usually don't bother with the amount of data you send to a TCP socket, because it manages on its own how much data to send in each packet. Each TCP packet can have up to 65536 bytes of payload, but you usually don't think about it, because TCP is rather complex and can do a lot of things.
UDP however lacks any control mechanism and kept as simple as possible, so you need to decide how much data to send with each packet. Maximal size is again 65536 bytes because you have only two bytes in the UDP headers to specify the length of a message. Another thing to consider when deciding on a UDP packet size is the fact that lower level protocols have their own limits too (65K for IP, ~1500 bytes for ethernet).
You can't increase the maximum size of a UDP packet and you generally don't want to do it because large UDP packets can be dropped without any notice. Other answers on SO suggest using 512-8K packets for datagrams over internet.
It is possible to receive a UDP datagram with damaged bytes (not "missing" though). But each packet is covered by a checksum so a client will know if the datagram has been damaged in transition.

The problem is not so much related to UDP and TCP, as it is to IP.
UDP and TCP are transport protocols, which does not define a maximum packet (or segment) size.
IP is a network protocol. An IP packet can contains at most 65536 (2^16) bytes, since there are two bytes used to define the packet size.
Large IP packets are divided into segments. If one of the segments is lost, or corrupted, then the entire IP packet is lost. The size of the segments depends on the link layer protocol, usually Ethernet. For Ethernet the usual maximum size is 1500 bytes at most, or more if jumbo frames are allowed.
So, if you transmit UDP packets larger than 1500 bytes it may be divided into several segments. This is normally fine, if there are no losses on the network. However, if there are losses the impact will only be bigger when there are more dependent segments. For example, consider a network with 1% losses, if you transmit a UDP packet of 65536 bytes, the it will most likely be divided into 44 segments. The probability of this packet being received is then: (1-0.01)^44 = 64 %...
This is also why many TCP implementations and UDP based applications use at most 1500 bytes packets.
Extracting corrupted packets is a nontrivial task, look for libraries like libpcap.

dropped frames over UDP

this is my first "question", I hope I do it right :)
I am experimenting with network programming and in particular I want to broadcast data from one machine to some other >10 devices using UDP, over a wireless network. The data comes in packets of about 300 bytes, and at about 30 frames per second, i.e., one every ~33ms.
My implementation is based on the qt example: http://qt-project.org/doc/qt-4.8/network-broadcastreceiver.html
I am testing the application with just one client and experiencing quite a few dropped frames, not really sure why. All works fine if I used ethernet cables. I hope someone here can help me find a reason.
I can spot dropped frames because the packets contain a timestamp: After I receive one datagram, I can check for the difference between its timestamp and the last one received, if this is greater than e.g. 50ms, it means that I lost one packet on the way.
This happens quite often, even though I have a dedicated wi-fi network (not connected to the internet and with just 3 machines connected to a router I just bought). Most of the times I drop one or two packets, which would not be a problem, but sometimes the difference between the timestamps suggests that some >30 packets are lost, which is not good for what I am trying to achieve.
When I ping from one machine to the other, I get these values:
50 packets transmitted, 50 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 1.244/91.405/508.959/119.074 ms
pretty bad for a new router, in a dedicated network with just 3 clients, isn't it? The router is advertised as a very fast Wi-Fi router, with three times faster performance than 802.11n routers.
Compare it with the values I get from an older router, sitting in the same room, with some 10 machines connected to it, during office hour:
39 packets transmitted, 39 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 1.458/47.297/142.201/37.186 ms
Perhaps the router is defective?
One thing I cannot explain is that, if I ping while running my UDP client/server application, the statistics improve:
55 packets transmitted, 55 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 1.164/6.174/197.962/26.181 ms
I was wondering if anyone had tips on what to test, hints on how to achieve a "reliable" UDP connection between these machines over wi-fi. By reliable I mean that I would be ok dropping 2 consecutive packets, but not more.
Thanks.
Edit
It seems that the router (?) sends the packets in bursts. I am measuring the time it passes between receiving two datagrams on the client and this value is about 3 ms for a sequence of ~10 packets, and then, around 300 ms for the next packet. I think my issues at the client is more related to this inconsistency in the intervals between frames, rather than the dropped frames. I probably just need to have a queue and a delay of >300ms wrt to the server.

The first and easiest way to tackle any problem related to network is to capture them on wireshark.
And also check if packets are really being sent out from broadcasting machine.
And also, based on your description if packets being transmitted fine with etherne cables and not with UDP then
it could be issue with UDP port too.

C++ Reading UDP packets [duplicate]

I have a java app on linux which opens UDP socket and waits for messages.
After couple of hours under heavy load, there is a packet loss, i.e. the packets are received by kernel but not by my app (we see the lost packets in sniffer, we see UDP packets lost in netstat, we don't see those packets in our app logs).
We tried enlarging socket buffers but this didnt help - we started losing packets later then before, but that's it.
For debugging, I want to know how full the OS udp buffer is, at any given moment. Googled, but didn't find anything. Can you help me?
P.S. Guys, I'm aware that UDP is unreliable. However - my computer receives all UDP messages, while my app is unable to consume some of them. I want to optimize my app to the max, that's the reason for the question. Thanks.

UDP is a perfectly viable protocol. It is the same old case of the right tool for the right job!
If you have a program that waits for UDP datagrams, and then goes off to process them before returning to wait for another, then your elapsed processing time needs to always be faster than the worst case arrival rate of datagrams. If it is not, then the UDP socket receive queue will begin to fill.
This can be tolerated for short bursts. The queue does exactly what it is supposed to do – queue datagrams until you are ready. But if the average arrival rate regularly causes a backlog in the queue, it is time to redesign your program. There are two main choices here: reduce the elapsed processing time via crafty programming techniques, and/or multi-thread your program. Load balancing across multiple instances of your program may also be employed.
As mentioned, on Linux you can examine the proc filesystem to get status about what UDP is up to. For example, if I cat the /proc/net/udp node, I get something like this:
$ cat /proc/net/udp
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops
40: 00000000:0202 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 3466 2 ffff88013abc8340 0
67: 00000000:231D 00000000:0000 07 00000000:0001E4C8 00:00000000 00000000 1006 0 16940862 2 ffff88013abc9040 2237
122: 00000000:30D4 00000000:0000 07 00000000:00000000 00:00000000 00000000 1006 0 912865 2 ffff88013abc8d00 0
From this, I can see that a socket owned by user id 1006, is listening on port 0x231D (8989) and that the receive queue is at about 128KB. As 128KB is the max size on my system, this tells me my program is woefully weak at keeping up with the arriving datagrams. There have been 2237 drops so far, meaning the UDP layer cannot put any more datagrams into the socket queue, and must drop them.
You could watch your program's behaviour over time e.g. using:
watch -d 'cat /proc/net/udp|grep 00000000:231D'
Note also that the netstat command does about the same thing: netstat -c --udp -an
My solution for my weenie program, will be to multi-thread.
Cheers!

Linux provides the files /proc/net/udp and /proc/net/udp6, which lists all open UDP sockets (for IPv4 and IPv6, respectively). In both of them, the columns tx_queue and rx_queue show the outgoing and incoming queues in bytes.
If everything is working as expected, you usually will not see any value different than zero in those two columns: as soon as your application generates packets they are sent through the network, and as soon those packets arrive from the network your application will wake up and receive them (the recv call immediately returns). You may see the rx_queue go up if your application has the socket open but is not invoking recv to receive the data, or if it is not processing such data fast enough.

rx_queue will tell you the queue length at any given instant, but it will not tell you how full the queue has been, i.e. the highwater mark. There is no way to constantly monitor this value, and no way to get it programmatically (see How do I get amount of queued data for UDP socket?).
The only way I can imagine monitoring the queue length is to move the queue into your own program. In other words, start two threads -- one is reading the socket as fast as it can and dumping the datagrams into your queue; and the other one is your program pulling from this queue and processing the packets. This of course assumes that you can assure each thread is on a separate CPU. Now you can monitor the length of your own queue and keep track of the highwater mark.

The process is simple:
If desired, pause the application process.
Open the UDP socket. You can snag it from the running process using /proc/<PID>/fd if necessary. Or you can add this code to the application itself and send it a signal -- it will already have the socket open, of course.
Call recvmsg in a tight loop as quickly as possible.
Count how many packets/bytes you got.
This will discard any datagrams currently buffered, but if that breaks your application, your application was already broken.

UDP packets are dropped when its size is less than 12 byte in a certain PC. how do i figure it out the reason?

i've stuck in a problem that is never heard about before.
i'm making an online game which uses UDP packets in a certain character action. after i developed the udp module, it seems to work fine. though most of our team members have no problem, but a man, who is my boss, told me something is wrong for that module.
i have investigated the problem, and finally i found the fact that... on his PC, if udp packet size is less than 12, the packet is never have been delivered to the other host.
the following is some additional information:
1~11 bytes udp packets are dropped, 12 bytes and over 12 bytes packets are OK.
O/S: Microsoft Windows Vista Business
NIC: Attansic L1 Gigabit Ethernet 10/100/1000Base-T Controller
WSASendTo returns TRUE.
loopback udp packet works fine.
how do you think of this problem? and what do you think... what causes this problem?
what should i do for the next step for the cause?
PS. i don't want to padding which makes length of all the packets up to 12 bytes.

Just to get one of the non-obvious answers in: maybe UDP checksum offload is broken on that card, i.e. the packets are sent, but dropped by the receiver?
You can check for this by looking at the received packets using Wireshark.

IF you already checked firewall, antivirus, network firewall, network intrusion. read this
For a UDP packet ethernet_header(14 bytes) + IPv4_header(20 bytes min) + UDP_header (8 bytes) = 42 bytes
Now since its less than the 64 bytes or 60 on linux, network driver will pad the packet with (64-42 = 22 ) zeros to make it 60 bytes before it send out the packet.
that's the minimum length for a UDP packet.
theoretically you can send 0 data bytes packet, but haven't tried it yet.
as for your issue it must be an OS issue . check your network's driver's manual or check with manufacturer. because this isn't suuposed to happen.
REF:http://www.freesoft.org/CIE/Course/Section4/8.htm
REF:http://en.wikipedia.org/wiki/User_Datagram_Protocol

Run Wireshark on his PC AND on the destination PC.
Does the log show the udp packet leaving his machine? Does it show it arriving on the destination PC?
What kind of router hardware or switches are between his PC and the destination? Can you remove them and link the two with a cross over cable? (or replace the destination with a laptop and link that to his PC with a cross over cable?)
Have you removed or at least listed all anti virus and firewall products on his machine and anything that installs a Winsock LSP ?
Do ALL 12 byte or less packets get dropped or just some, can you generate packets with random content and see if it's something in the content, rather than just the size, that's causing the issue.

Assuming your problem is with sending from his PC: First, run a packet sniffer on the problematic PC to see if it arrives at the NIC. If it makes it there, there may be a problem in the NIC or NIC driver.
Next, check for any running firewall software. Try disabling it and see what happens.
If that doesn't work, clear out any Winsock Layered Service Providers with netsh winsock catalog reset.
If that doesn't work, I'm stumped :)
Finally, you're probably going to find other customers with the same problem; you might want to think about that workaround anyway. Try sending a few small-size UDP packets on connect, and if they consistently fail to go through, enable a padding workaround. For hosts where the probe packets make it through, you don't need to pad them out.

Pure conjecture: RTP, which is a very common packet to send on UDP, defines a 12 byte header. I wonder if some layer of network software is assuming that anything smaller is a malformed RTP packet and throwing it away?

What should i know about UDP programming?

I don't mean how to connect to a socket. What should I know about UDP programming?
Do I need to worry about bad data in my socket?
I should assume if I send 200bytes I may get 120 and 60 bytes separately?
Should I worry about another connection sending me bad data on the same port?
If data doesnt arrive typically how long may I (typically) not see data for (250ms? 1 second? 1.75sec?)
What do I really need to know?

"i should assume if i send 200bytes i
may get 120 and 60bytes separately?"
When you're sending UDP datagrams your read size will equal your write size. This is because UDP is a datagram protocol, vs TCP's stream protocol. However, you can only write data up to the size of the MTU before the packet could be fragmented or dropped by a router. For general internet use, the safe MTU is 576 bytes including headers.
"i should worry about another
connection sending me bad data on the
same port?"
You don't have a connection, you have a port. You will receive any data sent to that port, regardless of where it's from. It's up to you to determine if it's from the right address.
If data doesnt arrive typically how
long may i (typically) not see data
for (250ms? 1 second? 1.75sec?)
Data can be lost forever, data can be delayed, and data can arrive out of order. If any of those things bother you, use TCP. Writing a reliable protocol on top of UDP is a very non trivial task and there is no reason to do so for almost all applications.

Should I worry about another
connection sending me bad data on the
same port?
Yes you should worry about it. Any application can send data to your open UDP port at any time. One of the big uses of UDP is many to one style communications where you multiplex communications with several peers on a single port using the addressed passed back during the recvfrom to differentiate between peers.
However, if you want to avoid this and only accept packets from a single peer you can actually call connect on your UDP socket. This cause the IP stack to reject packets coming from any host:port combo ( socket ) other than the one you want to talk to.
A second advantage of calling connect on your UDP socket is that in many OS's it gives a significant speed / latency improvement. When you call sendto on an unconnected UDP socket the OS actually temporarily connects the socket, sends your data and then disconnects the socket adding significant overhead.
A third advantage of using connected UDP sockets is it allows you to receive ICMP error messages back to your application, such as routing or host unknown due to a crash. If the UDP socket isn't connected the OS won't know where to deliver ICMP error messages from the network to and will silently discard them, potentially leading to your app hanging while waiting for a response from a crashed host ( or waiting for your select to time out ).

Your packet may not get there.
Your packet may get there twice or even more often.
Your packets may not be in order.
You have a size limitation on your packets imposed by the underlying network layers. The packet size may be quite small (possibly 576 bytes).
None of this says "don't use UDP". However you should be aware of all the above and think about what recovery options you may want to take.

Fragmentation and reassembly happens at the IP level, so you need not worry about that (Wikipedia). (This means that you won't receive split or truncated packets).
UDP packets have a checksum for the data and the header, so receiving bogus data is unlikely, but possible. Lost or duplicate packets are also possible. You should check your data in any case anyway.
There's no congestion control, so you may wish to consider that, if you plan on clogging the tubes with a lot of UDP packets.

UDP is a connectionless protocol. Sending data over UDP can get to the receiver, but can also get lost during transmission. UDP is ideal for things like broadcasting and streaming audio or video (i.e. a dropped packet is never a problem in those situations.) So if you need to ensure your data gets to the other side, stick with TCP.
UDP has less overhead than TCP and is therefore faster. (TCP needs to build a connection first and also checks data packets for data corruption which takes time.)
Fragmented UDP packets (i.e. packets bigger than about half a Kb) will probably be dropped by routers, so split your data into small chuncks before sending it over. (In some cases, the OS can take care of that.) Note that it is allways a packet that might make it, or not. Half packets aren't processed.
Latency over long distances can be quite big. If you want to do retransmission of data, I would go with something like 5 to 10 times the agerage latency time over the current connection. (You can measure the latency by sending and receiving a few packets.)
Hope this helps.

I won't follow suit with the other people who answered this, they all seem to push you toward TCP, and that's not for gaming at all, except maybe for login/chat info. Let's go in order:
Do I need to worry about bad data in my socket?
Yes. Even though UDP contains an extremely simple checksum for routers and such, it is not 100% efficient. You can add your own checksum device, but most of the time UDP is used when reliability is already not an issue, so data that doesn't conform should just be dropped.
I should assume if I send 200bytes I may get 120 and 60 bytes separately?
No, UDP is direct data write and read. However, if the data is too large, some routers will truncate and you lose part of the data permanently. Some have said roughly 576 bytes with header, I personally wouldn't use more than 256 bytes (nice round log2 number).
Should I worry about another connection sending me bad data on the same port?
UDP listens for any data from any computer on a port, so on this sense yes. Also note that UDP is a primitive and a raw format can be used to fake the sender, so you should use some sort of "key" in order for the listener to verify the sender against their IP.
If data doesnt arrive typically how long may I (typically) not see data for (250ms? 1 second? 1.75sec?)
Data sent on UDP is usually disposable, so if you don't receive data, then it can easily be ignored...however, sometimes you want "semi-reliable" but you don't want 'ordered reliable' like TCP uses, 1 second is a good estimate of a drop. You can number your packets on a rotation and write your own ACK communication. When a packet is received, it records the number and sends back a bitfield letting the sender know which packets it received. You can read this unfinished document for more information (although unfinished, it still yields valiable info):
http://gafferongames.com/networking-for-game-programmers/

The big thing to know when attempting to use UDP is:
Your packets might not all make it over the line, which means there is going to be possible data corruption.
If you're working on an application where 100% of the data needs to arrive reliably to provide functionality, use TCP. If you're working on an application where some loss is allowable (streaming media, etc.) then go for UDP but don't expect everything to get from one of the pipe to the other intact.

One way to look at the difference between applications appropriate for UDP vs. TCP is that TCP is good when data delivery is "better late than never", UDP is good when data delivery is "better never than late".
Another aspect is that the stateless, best-effort nature of most UDP-based applications can make scalability a bit easier to achieve. Also note that UDP can be multicast while TCP can't.

In addition to don.neufeld's recommendation to use TCP.
For most applications TCP is easier to implement. If you need to maintain packet boundaries in a TCP stream, a good way is to transmit a two byte header before the data to delimit the messages. The header should contain the message length. At the receiving end just read two bytes and evaluate the value. Then just wait until you have received that many bytes. You then have a complete message and are ready to receive the next 2-byte header.
This gives you some of the benefit of UDP without the hassle of lost data, out-of-order packet arrival etc.

And don't assume that if you send a packet it got there.

If there is a packet size limitation imposed by some router along the way, your UDP packets could be silently truncated to that size.

Two things:
1) You may or may not received what was sent
2) Whatever you receive may not be in the same order it was sent.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js