How to determine how much of the socket receive buffer is filled? - c++

In my application, I am trying to send and receive datagrams to different servers as quickly as possible. I noticed that increasing the sending rate is greatly lowering the percentage of responses I receive for a given # of sends. I believe one cause could be recvfrom()'s queue is filling faster than I can call the function to read the packets, reaching the end and dropping a a lot of packets. Is there a way I can check how full the receive buffer is for a socket?
This is on Windows, not Linux, and I would prefer a programmatic approach.

Related

Efficiently send a stream of UDP packets

I know how to open an UDP socket in C++, and I also know how to send packets through that. When I send a packet I correctly receive it on the other end, and everything works fine.
EDIT: I also built a fully working acknowledgement system: packets are numbered, checksummed and acknowledged, so at any time I know how many of the packets that I sent, say, during the last second were actually received from the other endpoint. Now, the data I am sending will be readable only when ALL the packets are received, so that I really don't care about packet ordering: I just need them all to arrive, so that they could arrive in random sequences and it still would be ok since having them sequentially ordered would still be useless.
Now, I have to transfer a big big chunk of data (say 1 GB) and I'd need it to be transferred as fast as possible. So I split the data in say 512 bytes chunks and send them through the UDP socket.
Now, since UDP is connectionless it obviously doesn't provide any speed or transfer efficiency diagnostics. So if I just try to send a ton of packets through my socket, my socket will just accept them, then they will be sent all at once, and my router will send the first couple and then start dropping them. So this is NOT the most efficient way to get this done.
What I did then was making a cycle:
Sleep for a while
Send a bunch of packets
Sleep again and so on
I tried to do some calibration and I achieved pretty good transfer rates, however I have a thread that is continuously sending packets in small bunches, but I have nothing but an experimental idea on what the interval should be and what the size of the bunch should be. In principle, I can imagine that sleeping for a really small amount of time, then sending just one packet at a time would be the best solution for the router, however it is completely unfeasible in terms of CPU performance (I probably would need to busy wait since the time between two consecutive packets would be really small).
So is there any other solution? Any widely accepted solution? I assume that my router has a buffer or something like that, so that it can accept SOME packets all at once, and then it needs some time to process them. How big is that buffer?
I am not an expert in this so any explanation would be great.
Please note, however, that for technical reasons there is no way at all I can use TCP.
As mentioned in some other comments, what you're describing is a flow control system. The wikipedia article has a good overview of various ways of doing this:
http://en.wikipedia.org/wiki/Flow_control_%28data%29
The solution that you have in place (sleeping for a hard-coded period between packet groups) will work in principle, but in order to get reasonable performance in a real-world system you need to be able to react to changes in the network. This means implementing some kind of feedback where you automatically adjust both the outgoing data rate and packet size in response to to network characteristics, such as throughput and packetloss.
One simple way of doing this is to use the number of re-transmitted packets as an input into your flow control system. The basic idea would be that when you have a lot of re-transmitted packets, you would reduce the packet size, reduce the data rate, or both. If you have very few re-transmitted packets, you would increase packet size & data rate until you see an increase in re-transmitted packets.
That's something of a gross oversimplification, but I think you get the idea.

Limitations on sending through UDP sockets

I have a big 1GB file, which I am trying to send to another node. After the sender sends 200 packets (before sending the complete file) the code jumps out. Saying "Sendto no send space available". What can be the problem and how to take care of it.
Apart from this, we need maximum throughput in this transfer. So what send buffer size we should use to be efficient?
What is the maximum MTU which we can use to transfer the file without fragmentation?
Thanks
Ritu
Thank you for the answers. Actually, our project specifies to use UDP and then some additional code to take care of lost packets.
Now I am able to send the complete file, using blocking UDP sockets.
I am running the whole setup on an emulab like environment, called deter. I have set link loss to 0 but still my some packets are getting lost. What could be the possible reason behind that? Even if I add delay (assuming receiver drops the packet when its buffer is full) after sending every packet..still this packet losts persists.
It's possible to use UDP for high speed data transfer, but you have to make sure not to send() the data out faster than your network card can pump it onto the wire. In practice that means either using blocking I/O, or blocking on select() and only sending the next packet when select() indicates that the socket is ready-for-write. (ideally you'd also not send the data faster than the receiving machine can receive it, but that's less of an issue these days since modern CPU speeds are generally much faster than modern network I/O speeds)
Once you have that logic working properly, the size of your send-buffer isn't terribly important. (i.e. your send buffer will never be large enough to hold a 1GB file anyway, so making sure your program doesn't overflow the send buffer is the key issue whether the send buffer is large or small) The size of the receive-buffer on the receiver is important though... best to make that as large as possible, so the receiving computer won't drop packets if the receiving process gets held off of the CPU by another program.
Regarding MTU, if you want to avoid packet fragmentation (and assuming your packets are traveling over Ethernet), then you shouldn't place more than 1468 bytes into each UDP packet (or 1452 bytes if you're using IPv6). (Calculated by subtracting the size of the necessary IP and UDP headers from Ethernet's 1500-byte frame size)
Also agree with #jonfen. No UDP for high speed file transfer.
UDP incur less protocol overhead. However, at the maximum transfer rate, transmit errors are inevitable (such as packet loss). So one must incorporate TCP like error correction scheme. End result is lower than TCP performance.

Using Tcp, why do large blocks of data get transmitted with a lower bandwidth then small blocks of data?

Using 2 PC's with Windows XP, 64kB Tcp Window size, connected with a crossover cable
Using Qt 4.5.3, QTcpServer and QTcpSocket
Sending 2000 messages of 40kB takes 2 seconds (40MB/s)
Sending 1 message of 80MB takes 80 seconds (1MB/s)
Anyone has an explanation for this? I would expect the larger message to go faster, since the lower layers can then fill the Tcp packets more efficiently.
This is hard to comment on without seeing your code.
How are you timing this on the sending side? When do you know you're done?
How does the client read the data, does it read into fixed sized buffers and throw the data away or does it somehow know (from the framing) that the "message" is 80MB and try and build up the "message" into a single data buffer to pass up to the application layer?
It's unlikely to be the underlying Windows sockets code that's making this work poorly.
TCP, from the application side, is stream-based which means there are no packets, just a sequence of bytes. The kernel may collect multiple writes to the connection before sending it out and the receiving side may make any amount of the received data available to each "read" call.
TCP, on the IP side, is packets. Since standard Ethernet has an MTU (maximum transfer unit) of 1500 bytes and both TCP and IP have 20-byte headers, each packet transferred over Ethernet will pass 1460 bytes (or less) of the TCP stream to the other side. 40KB or 80MB writes from the application will make no difference here.
How long it appears to take data to transfer will depend on how and where you measure it. Writing 40KB will likely return immediately since that amount of data will simply get dropped in TCP's "send window" inside the kernel. An 80MB write will block waiting for it all to get transferred (well, all but the last 64KB which will fit, pending, in the window).
TCP transfer speed is also affected by the receiver. It has a "receive window" that contains everything received from the peer but not fetched by the application. The amount of space available in this window is passed to the sender with every return ACK so if it's not being emptied quickly enough by the receiving application, the sender will eventually pause. WireShark may provide some insight here.
In the end, both methods should transfer in the same amount of time since an application can easily fill the outgoing window faster than TCP can transfer it no matter how that data is chunked.
I can't speak for the operation of QT, however.
Bug in Qt 4.5.3
..................................

Receive many datagrams from several hosts using winsock

I am developing an application that distributes rendering across several devices (a university project).
Each frame consists of several blocks (16x16 pixels), and each device is "assigned" a number of blocks to be rendered. These blocks, when rendered, are compressed and serialized into a buffer until the max size of this is reached, at which point it will be sent.
My problem is on the receiving end, which needs to recieve several datagrams per frame from several devices. Currently I call recv for each packet, but this requires a context switch for each packet. It would be better to receive many packets with one call. A packet identify which client it is from, so the addresses of these are irrelevant.
I have looked at WSARecv and WSARecvFrom but neither seems to be able to receive several packets from several hosts.
Thanks in advance :)
EDIT
Using a separate thread to fetch packets from the OS layer does not seem to improve the situation. In fact the solution proposed by Amardeep has a performance that is lower than the previous scheme.
Fetching several packets would be nice to try out too, anyone knows how to do that?
That is probably not the cause of any perceived performance issue you are trying to address. Without seeing your code, I'll take a guess that what you're doing is:
1. recv a packet
2. Process a packet
3. repeat
What you should be doing is:
1. use one thread to recv packets with a pool of buffers and shove the received buffer pointers into a queue.
2. use another slightly lower priority thread to pull items from the queue and process them.
This would greatly improve your system's ability to digest the data and miss fewer datagrams.

How to cope with high frequency data?

I have a C++ application which receives stock data and forward to another application via socket (acting as a server).
Actually the WSASend function returns with error code 10055 after small seconds and I found that is the error message
"No buffer space available. An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full".
The problem arises only when I run the application after the market hours as we are receiving the whole day data (approximately 130 MB) in few minutes (I assume this is relatively large)
I am doing so as a robustness test.
I tried to increase the sending buffer SO_SNDBUF using setsockopt function but the same problem still there.
How can I solve this problem? is this related to receiver buffer?
Sending details:
For each complete message I call the send method which uses overlapped sockets
EDIT:
Can someone give general guidelines to handle high frequency data in C++?
The flow-control of TCP will cause the internal send-buffer to fill up if the receiver is not processing their end of the socket fast enough. It would seem from the error message that you are sending data without regard for how quickly the Winsock stack can process it. It would be helpful if you could state exactly how you are sending the data? Are you waiting for all the data to arrive and then sending one big block, or sending piecemeal?
Are you sending via a non-blocking or overlapped socket? In either case, after each send you should probably wait for a notification that the socket is in a state where it can send more data, either because select()/WaitForMultipleObjects() indicates it can (for non-blocking sockets), or the overlapped I/O completes, signalling that the data has been successfully copied to the sockets internal send buffers.
You can overlap sends, i.e. queue up more than one buffer at a time - that's what overlapped I/O is for - but you need to pay careful regard to the memory implications of locking large numbers of pages and potentially exhausting the non-paged pool.
Nick's answer pretty much hits the nail on the head; you're most likely exhausting the 'locked pages limit' by starting too many overlapped sends at once. Ideally you need to buffer your data in your own memory buffers and only have a set number of overlapped sends pending at any one time. I talk about how my IOCP framework allows you to deal with this kind of situation here http://www.lenholgate.com/blog/2008/07/write-completion-flow-control.html and the related TCP receive window flow control issues here http://www.lenholgate.com/blog/2008/06/data-distribution-servers.html and here http://www.serverframework.com/asynchronousevents/2011/06/tcp-flow-control-and-asynchronous-writes.html.
My preferred solution is to allow a configurable number of pending overlapped sends at any one time and once this limit is exceeded to start buffering data and then using the completion of the pending overlapped sends to drive the sending of the buffered data. This allows you to strictly control the amount of non-paged pool and the amount of 'locked pages' used and makes it possible to have lots of connections sending as fast as possible yet still control the resources that they use.