C++ socket programming Max size of TCP/IP socket Buffer?

C++ socket programming Max size of TCP/IP socket Buffer? - c++

I am using C++ TCP/IP sockets. According to my requirements my client has to connect to a server and read the messages sent by it (that's something really new, isn't it) but... in my application I have to wait for some time (typically 1 - 2 hrs) before I actually start reading messages (through recv() or read()) and the server still keeps on sending messages.
I want to know whether there is a limit on the capacity of the buffer which keeps those messages in case they are not read and whose physical memory is used to buffer those messages? Sender's or receiver's?

TCP data is buffered at both sender and receiver. The size of the receiver's socket receive buffer determines how much data can be in flight without acknowledgement, and the size of the sender's send buffer determines how much data can be sent before the sender blocks or gets EAGAIN/EWOULDBLOCK, depending on blocking/non-blocking mode. You can set these socket buffers as large as you like up to 2^32-1 bytes, but if you set the client receive buffer higher than 2^16-1 you must do so before connecting the socket, so that TCP window scaling can be negotiated in the connect handshake, so that the upper 16 bits can come into play. [The server receive buffer isn't relevant here, but if you set it >= 64k you need to set it on the listening socket, from where it will be inherited by accepted sockets, again so the handshake can negotiate window scaling.]
However I agree entirely with Martin James that this is a silly requirement. It wastes a thread, a thread stack, a socket, a large socket send buffer, an FD, and all the other associated resources at the server for two hours, and possibly affects other threads and therefore other clients. It also falsely gives the server the impression that two hours' worth of data has been received, when it has really only been transmitted to the receive buffer, which may lead to unknown complications in recovery situations: for example, the server may be unable to reconstruct the data sent so far ahead. You would be better off not connecting until you are ready to start receiving the data, or else reading and spooling the data to yourself at the client for processing later.

Related

C++: How to measure real upload rate on non blocking sockets

I'm writing a program on linux C++ using non-blocking sockets with epoll, waiting for EPOLLOUT in order to do send() for some data.
My question is: I've read that on non-blocking mode the data is copied to the kernel's buffer, thus a send() call may return immediately indicating that all the data has been sent, where in reality it was only copied to the kernel's buffer.
How do I know when the data was actually sent and received by the remote peer, for knowing the real transfer rate?

Whether in non-blocking mode or not, send will return as soon as the data is copied into the kernel buffer. The difference between blocking and non-blocking mode is when the buffer is full. In the full buffer case, blocking mode will suspend the current thread until the the write takes place while non-blocking mode will return immediately with EAGAIN or EWOULDBLOCK.
In a TCP connection, the kernel buffer normally is equal to the window size, so as soon as too much data remains unacknowledged, the connection blocks. This means that the sender is aware of how fast the remote end is receiving data.
With UDP it is a bit more complex because there is no acknowledgements. Here only the receiving end is capable of measuring the true speed since sent data may be lost en-route.
In both the TCP and UDP cases, the kernel will not attempt to send data that the link layer is unable to process. The link layer can also flow off the data if the network is congested.
Getting back to your case, when using non-blocking sockets, you can measure the network speed provided you handle the EAGAIN or EWOULDBLOCK errors correctly. This is certainly true for TCP where you send more data than the current window size (probably 64K or so) and you can get an idea of the link layer speed with UDP sockets as well.

You can get the current amount of data in the kernels socket buffers using an IOCTL. This would allow you to check what's actually been sent. I'm not sure it matters that much though, unless you have MASSIVE buffers and a tiny amount of data to send it's probably not of interest.
Investigate the TIOCOUTQ/TIOCINQ ioctl on your socket fd.

My question is: I've read that on non-blocking mode the data is copied to the kernel's buffer
That happens in all modes, not just non-blocking mode. I suggest you review your reading matter.
thus a send() call may return immediately indicating that all the data has been sent, where in reality it was only copied to the kernel's buffer.
Again that is true in all modes.
How do I know when the data was actually sent and received by the remote peer, for knowing the real transfer rate?
When you've sent all the data, shutdown the socket for output, then either set blocking mode and read, or keep selecting for 'readable'; and then in either case read the EOS that should result. That functions as a peer acknowledgement of the close. Then stop the timer.

send() merely puts data into the kernel's buffer and then exits, letting the kernel perform the actual transmission in the background, so all you can really do is measure the speed in which the kernel is accepting your outgoing data. You can't really measure the actual transmission speed unless the peer sends an acknowledgement for every buffer received (and there is no way to detect when TCP's own acks are received). But using the fact that send() can block when too much data is still in flight can help you figure out how fact your code is passing outgoing data to send().
send() tells you how many bytes were accepted. So it is very easy to calculate an approximate acceptance speed - divide the number of bytes accepted by the amount of time elapsed since the previous call to send(). So when you call send() to send X bytes and get Y bytes returned, record the time as time1, call send() again to send X bytes and get Y bytes returned, record the time as time2, you will see that your code is sending data at roughly Y / ((time2-time1) in ms) bytes per millisecond, which you can then use to calculate B/KB/MB/GB per ms/sec/min/hr as needed. Over the lifetime of the data transfer, that gives you fairly good idea of your app's general transmission speed.

TCP sockets: Where does incoming data go after ack(leaves tcp read buffer) but before read()/recv()?

If i have a TCP connection that transfers data at 200 KB/sec but i only read()/recv() from the socket once a second, where are those 200 KB of data stored in the meanwhile?
As much as I know, data leaves the TCP socket's read buffer after an ack gets sent to the sender, and it's too small anyways to hold 200KB of data, where does it wait in the meanwhile until it can be read()/recv() by my client?
Thanks!!
The following answer claims data leaves the TCP read buffer as soon as it is ACK'ed, before being read()/recv()d:
https://stackoverflow.com/a/12934115/2378033
"The size of the receiver's socket receive buffer determines how much data can be in flight without acknowledgement"
Could it be that my assumption is wrong and the data gets ACK'd only after it is read()/recv()d by the userspace program?

data leaves the TCP socket's read buffer after an ack gets sent to the sender
No. It leaves the receive buffer when you read it, via recv(), recvfrom(), read(), etc.
The following answer claims data leaves the TCP read buffer as soon as it is ACK'ed
Fiddlesticks. I wrote it, and it positively and absolutely doesn't 'claim' any such thing.
You are thinking of the send buffer. Data is removed from the sender's send buffer when it is ACKed by the receiver. That's because the sender now knows it has arrived and doesn't need it for any more resends.
Could it be that my assumption is wrong and the data gets ACK'd only after it is read()/recv()d by the userspace program?
Yes, your assumption is wrong, and so is this alternative speculation. The data gets ACK'd on arrival, and removed by read()/recv().

When data is correctly received it enters the TCP read buffer and is subject to acknowledgement immediately. That doesn't mean that the acknowledgement is sent immediately, as it will be more efficient to combine the acknowledgement with a window size update, or with data being sent over the connection in the other direction, or acknowledgement of more data.
For example suppose you are sending one byte at a time, corresponding to a user's typing, and the other side has a receive buffer of 50000 bytes. It tells you that the window size is 50000 bytes, meaning that you can send that many bytes of data without receiving anything further. Every byte of data you send closes the window by one byte. Now the receiver could send a packet acknowledging the single byte as soon as it was correctly received and entered the TCP receive buffer, with a window size of 49999 bytes because that is how much space is left in the receive buffer. The acknowledgement would allow you to remove the byte from your send buffer, since you now know that the byte was received correctly and will not need to be resent. Then when the application read it from the TCP receive buffer using read() or recv() that would make space in the buffer for one additional byte of data to be received, so it could then send another packet updating the TCP window size by one byte to allow you to once again send 50000 bytes, rather than 49999. Then the application might echo the character or send some other response to the data, causing a third packet to be sent. Fortunately, a well-designed TCP implementation will not do that as that would create a lot of overhead. It will ideally send a single packet containing any data going in the other direction as well as any acknowledgement and window size update as part of the same packet. It might appear that the acknowledgement is sent when the application reads the data and it leaves the receive buffer, but that may simply be the event that triggered the sending of the packet. However it will not always delay an acknowledgement and will not delay it indefinitely; after a short timeout with no other activity it will send any delayed acknowledgement.
As for the size of the receive buffer, which contains the received data not yet read by the application, that can be controlled using setsockopt() with the SO_RCVBUF option. The default may vary by OS, memory size, and other parameters. For example a fast connection with high latency (e.g. satellite) may warrant larger buffers, although that will increase memory use. There is also a send buffer (SO_SNDBUF) which includes data that has either not yet been transmitted, or has been transmitted but not yet acknowledged.

Your OS will buffer a certain amount of incoming TCP data. For example on Solaris this defaults to 56K but can be reasonably configured for up to several MB if heavy bursts are expected. Linux appears to default to much smaller values, but you can see instructions on this web page for increasing those defaults: http://www.cyberciti.biz/faq/linux-tcp-tuning/

Increasing TCP Window Size

I have some doubts over increasing TCP Window Size in application. In my C++ software application, we are sending data packets of size around 1k from client to server using TCP/IP blocking socket. Recently I came across this concept TCP Window Size. So I tried increasing the value to 64K using setsockopt() for both SO_SNDBUF and SO_RCVBUF. After increasing this value, I get some improvements in performance for WAN connection but not in LAN connection.
As per my understanding in TCP Window Size,
Client will send data packets to server. Upon reaching this TCP Window Size, it will wait to make sure ACK received from the server for the first packet in the window size. In case of WAN connection, ACK is getting delayed from the server to the client because of latency in RTT of around 100ms. So in this case, increasing TCP Window Size compensates ACK wait time and thereby improving performance.
I want to understand how the performance improves in my application.
In my application, even though TCP Window Size (Both Send and Receive Buffer) is increased using setsockopt at socket level, we still maintain the same packet size of 1k (i.e the bytes we send from client to server in a single socket send). Also we disabled Nagle algorithm (inbuilt option to consolidate small packets into a large packet thereby avoiding frequent socket call).
My doubts are as follows:
Since I am using blocking socket, for each data packet send of 1k, it should block if ACK doesn't come from the server. Then how does the performance improve after improving the TCP window Size in WAN connection alone ? If I misunderstood the concept of TCP Window Size, please correct me.
For sending 64K of data, I believe I still need to call socket send function 64 times ( since i am sending 1k per send through blocking socket) even though I increased my TCP Window Size to 64K. Please confirm this.
What is the maximum limit of TCP window size with windows scaling enabled with RFC 1323 algorithm ?
I am not so good in my English. If you couldn't understand any of the above, please let me know.

First of all, there is a big misconception evident from your question: that the TCP window size is what is controlled by SO_SNDBUF and SO_RCVBUF. This is not true.
What is the TCP window size?
In a nutshell, the TCP window size determines how much follow-up data (packets) your network stack is willing to put on the wire before receiving acknowledgement for the earliest packet that has not been acknowledged yet.
The TCP stack has to live with and account for the fact that once a packet has been determined to be lost or mangled during transmission, every packet sent, from that one onwards, has to be re-sent since packets may only be acknowledged in order by the receiver. Therefore, allowing too many unacknowledged packets to exist at the same time consumes the connection's bandwidth speculatively: there is no guarantee that the bandwidth used will actually produce anything useful.
On the other hand, not allowing multiple unacknowledged packets at the same time would simply kill the bandwidth of connections that have a high bandwidth-delay product. Therefore, the TCP stack has to strike a balance between using up bandwidth for no benefit and not driving the pipe aggressively enough (and thus allowing some of its capacity to go unused).
The TCP window size determines where this balance is struck.
What do SO_SNDBUF and SO_RCVBUF do?
They control the amount of buffer space that the network stack has reserved for servicing your socket. These buffers serve to accumulate outgoing data that the stack has not yet been able to put on the wire and data that has been received from the wire but not yet read by your application respectively.
If one of these buffers is full you won't be able to send or receive more data until some space is freed. Note that these buffers only affect how the network stack handles data on the "near" side of the network interface (before they have been sent or after they have arrived), while the TCP window affects how the stack manages data on the "far" side of the interface (i.e. on the wire).
Answers to your questions
No. If that were the case then you would incur a roundtrip delay for each packet sent, which would totally destroy the bandwidth of connections with high latency.
Yes, but that has nothing to do with either the TCP window size or with the size of the buffers allocated to that socket.
According to all sources I have been able to find (example), scaling allows the window to reach a maximum size of 1GB.

Since I am using blocking socket, for each data packet send of 1k, it should block if ACK doesn't come from the server.
Wrong. Sending in TCP is asynchronous. send() just transfers the data to the socket send buffer and returns. It only blocks while the socket send buffer is full.
Then how does the performance improve after improving the TCP window Size in WAN connection alone?
Because you were wrong about it blocking until it got an ACK.
For sending 64K of data, I believe I still need to call socket send function 64 times
Why? You could just call it once with the 64k data buffer.
( since i am sending 1k per send through blocking socket)
Why? Or is this a repetition of your misconception under (1)?
even though I increased my TCP Window Size to 64K. Please confirm this.
No. You can send it all at once. No loop required.
What is the maximum limit of TCP window size with windows scaling enabled with RFC 1323 algorithm?
Much bigger than you will ever need.

Send buffer empty of Socket in Linux?

Is there a way to check if the send buffer of an TCP Connection is completely empty?
I haven't found anything until now and just want to make sure a connection is not closed by my server while there are still data being transmitted to a certain client.
I'm using poll to check if I'm able to send data on a non-blocking socket. But by that I'm not able to find out if EVERYTHING has been sent in buffer, am I?

In Linux, you can query a socket's send queue with ioctl(sd, SIOCOUTQ, &bytes). See man ioctl for details.
The information is not completely reliable in the sense that it is possible that the data has been received by the remote host, since the buffer cannot be emptied until an ACK is received. You probably should not use it to add another level of flow-control on top of TCP.
If the remote host actually closes the connection (or half-closes it), then the socket become unwriteable, regardless of how much data might have been in the buffer. You can detect this condition by writing 0 bytes to the socket.
The more difficult (and often more likely) condition is the remote host becoming unreachable, because of network issues or because it crashes. In that case, data will pile up in the send buffer, but that can also happen because the remote host's receive buffer is full (perhaps because the process reading the buffer doesn't have enough resources to process its input). In the case of network routing issues, you might get a router notification (an ICMP error), which should make the socket unwritable; unfortunately, there are many network errors which just result in black holes.

UDP Server Socket Buffer Overflow

I am writing a C++ application on Linux. My application has a UDP server which sends data to clients on some events. The UDP server also receives some feedback/acknowledgement back from the clients.
To implement this application I used a single UDP Socket(e.g. int fdSocket) to send and receive data from all the clients. I bound this socked to port 8080 and have set the socket into NON_BLOCKING mode.
I created two threads. In one thread I wait for some event to happen, if an event occurs then I use the fdsocket to send data to all the clients(in a for loop).
In the another thread I use the fdSocket to receive data from clients (recvfrom()). This thread is scheduled to run every 4 seconds (i.e. every 4 seconds it will call recvfrom() to retrieve the data from the socket buffer. Since it is in NON-BLOCKING mode the recvfrom() function will return immediately if no UDP data is available, then I will go to sleep for 4 secs).
The UDP Feedback/Acknowledge from all the clients has a fixed payload whose size is 20bytes.
Now I have two questions related to this implementation:
Is it correct to use the same socket for sending/receiving UDP data
with Mulitiple clients ?
How to find the maximum number of UDP Feedback/Acknowledge Packets my application can handling without UDP Socket Buffer Overflow (since I am reading at every 4secs, if I
receive lot of packets within this 4 seconds I might loose some packet ie., I need to find the rate in packets/sec I can handle safely)?
I tried to get the Linux Socket Buffer size for my socket (fdsocket) using the function call getsockopt(fdsocket,SOL_SOCKET,SO_RCVBUF,(void *)&n, &m);. From this function I discover that my Socket Buffer size is 110592. But I am not clear as what data will be stored in this socket buffer: will it store only the UDP Payload or Entire UDP Packet or event the Entire Ethernet Packet? I referred this link to get some idea but got confused.
Currently my code it little bit dirty , I will clean and post it soon here.
The following are the links I have referred before posting this question.
Linux Networking
UDP SentTo and Recvfrom Max Buffer Size
UDP Socket Buffer Overflow Detection
UDP broadcast and unicast through the same socket?
Sending from the same UDP socket in multiple threads
How to flush Input Buffer of an UDP Socket in C?
How to find the socket buffer size of linux
How do I get amount of queued data for UDP socket?

Having socket reading at fixed interval of four seconds definitely sets you up for losing packets. The conventional tried-and-true approach to non-blocking I/O is the de-multiplexer system calls select(2)/poll(2)/epoll(7). See if you can use these to capture/react to your other events.
On the other hand, since you are already using threads, you can just do blocking recv(2) without that four second sleep.
Read Stevens for explanation of SO_RCVBUF.

You can see the maximum allowed buffer size:
sysctl net.core.rmem_max
You can set the maximum buffer size you can use by:
sysctl -w net.core.rmem_max=8388608
You can also set the buffer size at run-time (not exceeding the max above) by using setsockopt and changing SO_RCVBUF. You can see the buffer level by looking at /proc/net/udp.
The buffer is used to store the UDP header and application data, rest belong to lower levels.

Q: Is it correct to use the same socket for sending/receiving UDP data with Mulitiple clients ?
A: Yes, it is correct.
Q: How to find the maximum number of UDP Feedback/Acknowledge Packets my application can handling without UDP Socket Buffer Overflow (since I am reading at every 4secs, if I receive lot of packets within this 4secs I might loose some packet ie., I need to find the rate : noofpackets/sec I can handle safely)?
A: The bottleneck might be the network bandwidth, or CPU, or memory. You could simply do a testing, using a client which sends ACK to the server with consecutive number, and verify whether there is packet loss at the server.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js