I'm writing a program on linux C++ using non-blocking sockets with epoll, waiting for EPOLLOUT in order to do send() for some data.
My question is: I've read that on non-blocking mode the data is copied to the kernel's buffer, thus a send() call may return immediately indicating that all the data has been sent, where in reality it was only copied to the kernel's buffer.
How do I know when the data was actually sent and received by the remote peer, for knowing the real transfer rate?
Whether in non-blocking mode or not, send will return as soon as the data is copied into the kernel buffer. The difference between blocking and non-blocking mode is when the buffer is full. In the full buffer case, blocking mode will suspend the current thread until the the write takes place while non-blocking mode will return immediately with EAGAIN or EWOULDBLOCK.
In a TCP connection, the kernel buffer normally is equal to the window size, so as soon as too much data remains unacknowledged, the connection blocks. This means that the sender is aware of how fast the remote end is receiving data.
With UDP it is a bit more complex because there is no acknowledgements. Here only the receiving end is capable of measuring the true speed since sent data may be lost en-route.
In both the TCP and UDP cases, the kernel will not attempt to send data that the link layer is unable to process. The link layer can also flow off the data if the network is congested.
Getting back to your case, when using non-blocking sockets, you can measure the network speed provided you handle the EAGAIN or EWOULDBLOCK errors correctly. This is certainly true for TCP where you send more data than the current window size (probably 64K or so) and you can get an idea of the link layer speed with UDP sockets as well.
You can get the current amount of data in the kernels socket buffers using an IOCTL. This would allow you to check what's actually been sent. I'm not sure it matters that much though, unless you have MASSIVE buffers and a tiny amount of data to send it's probably not of interest.
Investigate the TIOCOUTQ/TIOCINQ ioctl on your socket fd.
My question is: I've read that on non-blocking mode the data is copied to the kernel's buffer
That happens in all modes, not just non-blocking mode. I suggest you review your reading matter.
thus a send() call may return immediately indicating that all the data has been sent, where in reality it was only copied to the kernel's buffer.
Again that is true in all modes.
How do I know when the data was actually sent and received by the remote peer, for knowing the real transfer rate?
When you've sent all the data, shutdown the socket for output, then either set blocking mode and read, or keep selecting for 'readable'; and then in either case read the EOS that should result. That functions as a peer acknowledgement of the close. Then stop the timer.
send() merely puts data into the kernel's buffer and then exits, letting the kernel perform the actual transmission in the background, so all you can really do is measure the speed in which the kernel is accepting your outgoing data. You can't really measure the actual transmission speed unless the peer sends an acknowledgement for every buffer received (and there is no way to detect when TCP's own acks are received). But using the fact that send() can block when too much data is still in flight can help you figure out how fact your code is passing outgoing data to send().
send() tells you how many bytes were accepted. So it is very easy to calculate an approximate acceptance speed - divide the number of bytes accepted by the amount of time elapsed since the previous call to send(). So when you call send() to send X bytes and get Y bytes returned, record the time as time1, call send() again to send X bytes and get Y bytes returned, record the time as time2, you will see that your code is sending data at roughly Y / ((time2-time1) in ms) bytes per millisecond, which you can then use to calculate B/KB/MB/GB per ms/sec/min/hr as needed. Over the lifetime of the data transfer, that gives you fairly good idea of your app's general transmission speed.
Related
From time to time I see network related code in legacy source code and elsewhere modifying the receive buffer size for sockets (using setsockopt with the SO_RCVBUF option). On my Windows 10 system the default buffer size for sockets seems to be 64kB. The legacy code I am working on now (written 10+ years ago) sets the receive buffer size to 256kB for each socket.
Some questions related to this:
Is there any reason at all to modify receive buffer sizes when sockets are monitored and read continuously, e.g. using select?
If not, was there some motivation for this 10+ years ago?
Are there any examples, use cases or applications, where modification of receive buffer sizes (or even send buffer sizes) for sockets are needed?
Typically receive-buffer sizes are modified to be larger because the code's author is trying to reduce the likelihood of the condition where the socket's receive-buffer becomes full and therefore the OS has to drop some incoming packets because it has no place to put the data. In a TCP-based application, that condition will cause the stream to temporarily stall until the dropped packets are successfully resent; in a UDP-based application, that condition will cause incoming UDP packets to be silently dropped.
Whether or not doing that is necessary depends on two factors: how quickly data is expected to fill up the socket's receive-buffer, and how quickly the application can drain the socket's receive-buffer via calls to recv(). If the application is reliably able to drain the buffer faster than the data is received, then the default buffer size is fine; OTOH if you see that it is not always able to do so, then a larger receive-buffer-size may help it handle sudden bursts of incoming data more gracefully.
Is there any reason at all to modify receive buffer sizes when sockets
are monitored and read continuously, e.g. using select?
There could be, if the incoming data rate is high (e.g. megabytes per second, or even just occasional bursts of data at that rate), or if the thread is doing something between select()/recv() calls that might keep it busy for a significant period of time -- e.g. if the thread ever needs to write to disk, disk-write calls might take several hundred milliseconds in some cases, potentially allowing the socket's receive buffer to fill during that period.
For very high-bandwidth applications, even a very short pause (e.g. due to the thread being kicked off of the CPU for a few quanta, so that another thread can run for a quantum or two) might be enough to allow the buffer to fill up. It depends a lot on the application's use-case, and of course on the speed of the CPU hardware relative to the network.
As for when to start messing with receive-buffer-sizes: don't do it unless you notice that your application is dropping enough incoming packets that it is noticeably limiting your app's network performance. There's no sense allocating more RAM than you need to.
For TCP, the RECVBUF buffer is the maximum number of unread bytes that the kernel can hold. In TCP the window size reflects the maximum number of unacknowledged bytes the sender can safely send. The sender will receive an ACK which will include a new window which depends on the free space in the RECVBUF.
When RECVBUF is full the sender will stop sending data. Since mechanism means the sender will not be able to send more data than the receiving application can receive.
A small RECVBUF will work well on low latency networks but on high bandwidth high latency networks ACKS may take too long to get to the sender and since the sender has run out of window, the sender will not make use of the full bandwidth.
Increasing the RECVBUF size increases the window which means the sender can send more data while waiting for an ACK, this then will allow the sender to make use of the entire bandwidth. It does mean that things are less responsive.
Shrinking the RECVBUF means the sender is more responsive and aware of the receiver not eating the data and can back off a lot quicker.
The same logic applies for the SENDBUF as well.
I read somewhere that every TCP connection has it's own 125kB output and input buffer. What happens if this buffer is full, and I still continue sending data on linux?
According to http://www.kernel.org/doc/man-pages/online/pages/man2/send.2.html the packets are just silently dropped, without notifying me. What can I do to stop this from happening? Is there any way to find out if at least some of my data has been sent correctly, so that I can continue at a later point in time?
Short answer is this. "send" calls on a TCP socket will just block until the TCP sliding window (or internal queue buffers) opens up as a result of the remote endpoint receiving and consuming data. It's not much different than trying to write bytes to a file faster than the disk can save it.
If your socket is configured for non-blocking mode, send will return EWOULDBLOCK or EAGAIN, until data can be sent. Standard poll, select, and epoll calls will work as expected so you know when to "send" again.
I don't know that the "packets are dropped". I think that what is more likely is that the calls that the program makes to write() will either block or return a failure.
I have a big 1GB file, which I am trying to send to another node. After the sender sends 200 packets (before sending the complete file) the code jumps out. Saying "Sendto no send space available". What can be the problem and how to take care of it.
Apart from this, we need maximum throughput in this transfer. So what send buffer size we should use to be efficient?
What is the maximum MTU which we can use to transfer the file without fragmentation?
Thanks
Ritu
Thank you for the answers. Actually, our project specifies to use UDP and then some additional code to take care of lost packets.
Now I am able to send the complete file, using blocking UDP sockets.
I am running the whole setup on an emulab like environment, called deter. I have set link loss to 0 but still my some packets are getting lost. What could be the possible reason behind that? Even if I add delay (assuming receiver drops the packet when its buffer is full) after sending every packet..still this packet losts persists.
It's possible to use UDP for high speed data transfer, but you have to make sure not to send() the data out faster than your network card can pump it onto the wire. In practice that means either using blocking I/O, or blocking on select() and only sending the next packet when select() indicates that the socket is ready-for-write. (ideally you'd also not send the data faster than the receiving machine can receive it, but that's less of an issue these days since modern CPU speeds are generally much faster than modern network I/O speeds)
Once you have that logic working properly, the size of your send-buffer isn't terribly important. (i.e. your send buffer will never be large enough to hold a 1GB file anyway, so making sure your program doesn't overflow the send buffer is the key issue whether the send buffer is large or small) The size of the receive-buffer on the receiver is important though... best to make that as large as possible, so the receiving computer won't drop packets if the receiving process gets held off of the CPU by another program.
Regarding MTU, if you want to avoid packet fragmentation (and assuming your packets are traveling over Ethernet), then you shouldn't place more than 1468 bytes into each UDP packet (or 1452 bytes if you're using IPv6). (Calculated by subtracting the size of the necessary IP and UDP headers from Ethernet's 1500-byte frame size)
Also agree with #jonfen. No UDP for high speed file transfer.
UDP incur less protocol overhead. However, at the maximum transfer rate, transmit errors are inevitable (such as packet loss). So one must incorporate TCP like error correction scheme. End result is lower than TCP performance.
I am working on a network programming using epoll. I was wondering a situation that server didn't receive all the data client sent. For example if a client sent 100 bytes and somehow server only received 94bytes. How do I handle this case?
Thanks in advance..
An epoll signals readiness, it does not give any guarantees about the amount of data. EPOLLIN only gives to you the guarantee that the next read operation on the descriptor will not block and will at least read 1 byte.
As one normally sets descriptors to non-blocking for a variety of (partly os-specific) reasons, the usual idiom is to read until EAGAIN is returned. If that is less data than was expected (for example if you have a message with a header that says "my size is 100 bytes"), then you will wait for the next EPOLLIN (or EPOLLHUP) and repeat (or abort).
For TCP, recieving less data than expected is an absolutely normal condition. Repeat.
With UDP, unless you supply a too small buffer (this will discard the remainder!), this will not happen. Never, ever. UDP delivers an entire datagram at a time or nothing. There are no partial deliveries. If IP fragmentation occurs, UDP will reassemble all fragments into one datagram and deliver a whole datagram. If a fragment was lost, UDP will deliver nothing.
I have a C++ application which receives stock data and forward to another application via socket (acting as a server).
Actually the WSASend function returns with error code 10055 after small seconds and I found that is the error message
"No buffer space available. An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full".
The problem arises only when I run the application after the market hours as we are receiving the whole day data (approximately 130 MB) in few minutes (I assume this is relatively large)
I am doing so as a robustness test.
I tried to increase the sending buffer SO_SNDBUF using setsockopt function but the same problem still there.
How can I solve this problem? is this related to receiver buffer?
Sending details:
For each complete message I call the send method which uses overlapped sockets
EDIT:
Can someone give general guidelines to handle high frequency data in C++?
The flow-control of TCP will cause the internal send-buffer to fill up if the receiver is not processing their end of the socket fast enough. It would seem from the error message that you are sending data without regard for how quickly the Winsock stack can process it. It would be helpful if you could state exactly how you are sending the data? Are you waiting for all the data to arrive and then sending one big block, or sending piecemeal?
Are you sending via a non-blocking or overlapped socket? In either case, after each send you should probably wait for a notification that the socket is in a state where it can send more data, either because select()/WaitForMultipleObjects() indicates it can (for non-blocking sockets), or the overlapped I/O completes, signalling that the data has been successfully copied to the sockets internal send buffers.
You can overlap sends, i.e. queue up more than one buffer at a time - that's what overlapped I/O is for - but you need to pay careful regard to the memory implications of locking large numbers of pages and potentially exhausting the non-paged pool.
Nick's answer pretty much hits the nail on the head; you're most likely exhausting the 'locked pages limit' by starting too many overlapped sends at once. Ideally you need to buffer your data in your own memory buffers and only have a set number of overlapped sends pending at any one time. I talk about how my IOCP framework allows you to deal with this kind of situation here http://www.lenholgate.com/blog/2008/07/write-completion-flow-control.html and the related TCP receive window flow control issues here http://www.lenholgate.com/blog/2008/06/data-distribution-servers.html and here http://www.serverframework.com/asynchronousevents/2011/06/tcp-flow-control-and-asynchronous-writes.html.
My preferred solution is to allow a configurable number of pending overlapped sends at any one time and once this limit is exceeded to start buffering data and then using the completion of the pending overlapped sends to drive the sending of the buffered data. This allows you to strictly control the amount of non-paged pool and the amount of 'locked pages' used and makes it possible to have lots of connections sending as fast as possible yet still control the resources that they use.