I am unable to find the specification of what it means that a TCP WSASend call completes. Does the completion of a WSASend operation require that an ACK response be received?
This question is relevant for slower networks with a 200ms - 2s ping timeout. Will it take 200ms - 2s for the WSASend completion callback to be invoked (or whatever completion mechanism is used)? Or perhaps only on some packets will Windows wait for an ACK and consider the WSASend operation complete much faster for all other packets?
The exact behavior makes a big difference with regard to buffer life cycle management and in turn has a significant impact on performance (locking, allocation/deallocation, and reference counting).
WSASend does not guarantee the following:
That the data was sent (it might have been buffered)
That it was received (it might have been lost)
That the receiving application processed it (the sender cannot ever know this by principle)
It does not require a round-trip. In fact, with nagling enabled small amounts of data are always buffered for 200ms hoping that the application will send more. WSASend must return quickly so that nagling has a chance to work.
If you require confirmation, change the application protocol so that you get a confirmation back. No other way to do it.
To clarify, even without nagling (TCP_NODELAY) you do not get an ACK for your send operation. It will be sent out to the network but the remote side does not know that it should ACK. TCP has no way to say "please ACK this data immediately". Data being sent does not mean it will ever be received. The network could drop a second after the data was pushed out to a black hole.
It's not documented. It will likely be different depending on whether you have turned off send buffering. However, you always need to pay attention to the potential time that it will take to get a WSASend() completion, especially if you're using asynchronous calls. See this article of mine for details.
You get a WSASend() completion when the TCP stack has finished with your buffer. If you have NOT turned off send buffering by setting SO_SNDBUF to zero then it likely means you will get a completion once the stack copies your data into its buffers. If you HAVE turned off send buffering then it likely means that you will get a completion once you get an ACK (simply because the stack should need your buffer for any potential retransmissions). However, it's not documented.
Related
I'm aware that the ::send within a Linux TCP server can limit the sending of the payload such that ::send needs to be called multiple times until the entire payload is sent.
i.e. Payload is 1024 bytes
sent_bytes = ::send(fd, ...) where sent_bytes is only 256 bytes so this needs to be called again.
Is there any way to know exactly how many bytes can be sent before sending? If the socket will allow for the entire message, or that the message will be fragmented and by how much?
Example Case
2 messages are sent to the same socket by different threads at the same time on the same tcp client via ::send(). In some cases where messages are large multiple calls to ::send() are required as not all the bytes are sent at the initial call. Thus, go with the loop solution until all the bytes are sent. The loop is mutexed so can be seen as thread safe, so each thread has to perform the sending after the other. But, my worry is that beacuse Tcp is a stream the client will receive fragments of each message and I was thinking that adding framing to each message I could rebuild the message on the client side, if I knew how many bytes are sent at a time.
Although the call to ::send() is done sequentially, is the any chance that the byte stream is still mixed?
Effectively, could this happen:
Server Side
Message 1: "CiaoCiao"
Message 2: "HelloThere"
Client Side
Received Message: "CiaoHelloCiaoThere"
Although the call to ::send() is done sequentially, is the any chance that
the byte stream is still mixed?
Of course. Not only there's a chance of that, it is pretty much going to be a certainty, at one point or another. It's going to happen at one point. Guaranteed.
sent to the same socket by different threads
It will be necessary to handle the synchronization at this level, by employing a mutex that each thread locks before sending its message and unlocking it only after the entire message is sent.
It goes without sending that this leaves open a possibility that a blocked/hung socket will result in a single thread locking this mutex for an excessive amount of time, until the socket times out and your execution thread ends up dealing with a failed send() or write(), in whatever fashion it is already doing now (you are, of course, checking the return value from send/write, and handling the exception conditions appropriately).
There is no single, cookie-cutter, paint-by-numbers, solution to this that works in every situation, in every program, that needs to do something like this. Each eventual solution needs to be tailored based on each program's unique requirements and purpose. Just one possibility would be a dedicated execution thread that handles all socket input/output, and all your other execution threads sending their messages to the socket thread, instead of writing to the socket directly. This would avoid having all execution thread wedged by a hung socket, at expense of grown memory, that's holding all unsent data.
But that's just one possible approach. The number of possible, alternative solutions has no limit. You will need to figure out which logic/algorithm based solution will work best for your specific program. There is no operating system/kernel level indication that will give you any kind of a guarantee as to the amount of a send() or write() call on a socket will accept.
Scenario is rather simple... not allowed to use "sendto()" so using "send()" instead...
Under winsock2.2, normal operation on an brand new i7 machine running Windows 7 Professional...
Using SOCK_DGRAM socket, Client and Server console applications connect over localhost (127.0.0.1) to test things ...
Have to use packets of constant size...
Client socket uses connect(), Server socket uses bind()...
Client sends N packets using series of BLOCKING send() calls. Server only uses ioctlsocket call with FIONREAD, running in a while loop to constantly printf() number of bytes awaiting to be received...
PACKETS GET LOST UNLESS I PUT SLEEP() WITH CONSIDERABLE AMMOUNT OF TIME... What I mean is the number of bytes on the receiving socket differs between runs if I do not use SLEEP()...
Have played with changing buffer sizes, situation did not change much, except now there is no overflow, but the problem with the delay remains the same ...
I have seen many discussions about the issue between send() and recv(), but in this scenario, recv() is not even involved...
Thoughts anyone?
(P.S. The constraints under which I am programming are required for reasons beyond my control, so no WSA, .NET, MFC, STL, BOOST, QT or other stuff)
It is NOT an issue of buffer overflow for three reasons:
Both incoming and outgoing buffers are set and checked to be
significantly larger than ALL of the information being sent.
There is no recv(), only checking of the incoming buffer via ioctl() call, recv() is called long after, upon user input.
When Sleep() of >40ms is added between send()-s, the whole thing works, i.e. if there was an overflow no ammount of
Sleep() would have helped (again, see point (2) )
PACKETS GET LOST UNLESS I PUT SLEEP() WITH CONSIDERABLE AMMOUNT OF
TIME... What I mean is the number of bytes on the receiving socket
differs between runs if I do not use SLEEP()...
This is expected behavior; as others have said in the comments, UDP packets can and do get dropped for any reason. In the context of localhost-only communication, however, the reason is usually that a fixed-size packet buffer somewhere is full and can't hold the incoming UDP packet. Note that UDP has no concept of flow control, so if your receiving program can't keep up with your sending program, packet loss is definitely going to occur as soon as the buffers get full.
As for what to do about it, the insert-a-call-to-sleep() solution isn't particularly good because you have no good way of knowing what the "right" sleep-duration ought to be. (To short a sleep() and you'll still drop packets; too long a sleep() and you're transferring data more slowly than you might otherwise do; and of course the "best" value will likely vary from one computer to the next, or one moment to the next, in non-obvious ways).
One thing you could do is switch to a different transport protocol such as TCP, or (since you're only communicating within localhost), a simple pipe or socketpair. These protocols have the lossless FIFO semantics that you are looking for, so they might be the right tool for the job.
Assuming you are required to use UDP, however, UDP packet loss will be a fact of life for you, but there are some things you can do to reduce packet loss:
send() in blocking mode, or if using non-blocking send(), be sure to wait until the UDP socket select()'s as ready-for-write before calling send(). (I know you said you send() in blocking mode; I'm just including this for completeness)
Make your SO_RCVBUF setting as large as possible on the receiving UDP socket(s). The larger the buffer, the lower the chance of it filling up to capacity.
In the receiving program, be sure that the thread that calls recv() does nothing else that would ever hold it off from getting back to the next recv() call. In particular, no blocking operations (even printf() is a blocking operation that can slow your thread down, especially under Windows where the DOS prompt is infamous for slow scrolling under load)
Run your receiver's network recv() loop in a separate thread that does nothing else but call recv() and place the received data into a FIFO queue (or other shared data structure) somewhere. Then another thread can do the less time-critical work of examining and parsing the data in the FIFO, without fear of causing a dropped packet.
Run the UDP-receive thread at the highest priority you can convince the OS to let you run at. The fewer other tasks that can hold of the UDP-receive thread, the fewer opportunities for packets to get dropped during those hold-off periods.
Just keep in mind that no matter how clever you are at reducing the chances for UDP packet loss, UDP packet loss will still happen. So regardless you need to come up with a design that allows your programs to still function in a reasonably useful manner even when packets are lost. This could be done by implementing some kind of automatic-resend mechanism, or (depending on what you are trying to accomplish) by designing the protocol such that packet loss can simply be ignored.
I have found out that WSASend() may not send all of the data, for example if I asked it to send 800 bytes, it may only send 600 bytes.
Now my question is: are the situations where this can happen are extremely rare that I should not bother handling this kind of event. Or do I have to handle it? Can't I for example just show an error message to the user that not all data have been sent and abort the connection instead of trying to recover?
Note: I am using IOCP.
When sending using overlapped I/O and IOCP it's unlikely that you'll ever see a partial send. It's possibly more likely that you'll see a send with an error where you wanted to send a multiple of the system page size and only a smaller multiple was sent, but even that's fairly unlikely (and I don't have unit tests that force that condition as I view it as theoretical).
When sending via overlapped I/O, over a TCP connection, when your peer is receiving and processing slower than you are sending then this is the more likely situation that you'll encounter, that is, TCP flow control kicking in and your WSASend() calls taking longer and longer to complete.
It's really unlikely that you'll actually see an error either from a WSASend() call or a subsequent GetQueuedCompletionStatus() call. Things will just keep working until they don't...
It can happen any time the receiver is slower than the sender. You must handle it by rescheduling the remaining data to be written. Don't treat it as an error.
Many of you know the original "send()" will not write to the wire the amount of bytes you ask it to. Easily you can use a pointer and a loop to make sure your data is all sent.
However, I don't see how in WSASend() and completion ports work in this case. It returns immediately and you have no control over how much was sent (except in a lpLength which you have access in the routine). How does this get solved?
Do you have to call WSASend() in the routine multiple times in order the get all the data out? Doesn't this seem like a great disadvantage, especially if you want your data out in a particular order and multiple threads access the routines?
When you call WSASend with a socket that is associated with an IOCP and an OVERLAPPED structure you effectively pass off your data to the network stack to send. The network stack will give you a "completion" once the data buffer that you used is no longer required by the network stack. At that point you are free to reuse or release the memory used for your data buffer.
Note that the data is unlikely to have reached the peer at the point the completion is generated and the generation of the completion means nothing more than the network stack has taken ownership of the contents of the buffer.
This is different to how send operates. With send in blocking mode the call to send will block until the network stack has used all of the data that you have supplied. For calls to send in non-blocking mode the network stack takes as much data as it can from your buffer and then returns to you with details of how much it used; this means that some of your data has been used. With WSASend, generally, all of your data is used before you are notified.
It's possible for an overlapped WSASend to fail due to resource limits or network errors. It's unusual to get a failure which indicates that some data has been send but not all. Usually it's all sent OK or none sent at all. However it IS possible to get a completion with an error which indicates that some data has been used but not all. How you proceed from this point depends on the error (temporary resource limit or hard network fault) and how many other WSASends you have pending on that socket (zero or non-zero). You can only try and send the rest of the data if you have a temporary resource error and no other outstanding WSASend calls for this socket; and this is made more complicated by the fact that you don't know when the temporary resource limit situation will pass... If you ever have a temporary resource limit induced partial send and you DO have other WSASend calls pending then you should probably abort the connection as you may have garbled your data stream by sending part of the buffer from this WSASend call and then all (or part) of a subsequent WSASend call.
Note that it's a) useful and b) efficient to have multiple WSASend calls outstanding on a socket. It's the only way to keep the connection fully utilised. You should, however, be aware of the memory and resource usage implications of having multiple overlapped WSASend calls pending at one time (see here) as effectively you are handing control of the lifetime of your buffers (and thus the amount of memory and resources that your code uses) to the peer due to TCP flow control issues). See SIO_IDEAL_SEND_BACKLOG_QUERY and SIO_IDEAL_SEND_BACKLOG_CHANGE if you want to get really clever...
WSASend() on a completion port does not notify you until all of the requested data has been accepted by the socket, or until an error occurs, whichever happens first. It keeps working in the background until all of the data has been accepted (or errored). Until it notifies you, that buffer has to remain active in memory, but your code is free to move on to do other things while WSASend() is busy. There is no notification when the data is actually transmitted to the peer. IF you need that, then you have to implement an ACK in your data protocol so the peer can notify you when it receives the data.
First regarding send. Actually there may happen 2 different things, depending on how the socket is configured.
If socket is in so-called blocking mode (the default) - the call to send will block the calling thread, until all the input buffer is consumed by the underlying network driver. (Note that this doesn't mean that the data has already arrived at the peer).
If the socket is transferred to a non-blocking mode - the call to send will fail if the underlying driver may not consume all the input immediately. The GetLastError returns WSAEWOULDBLOCK in such a case. The application should wait until it may retry to send. Instead of calling send in a loop the application should get the notification from the system about the socket state change. Functions such as WSAEventSelect or WSAAsyncSelect may be used for this (as well as legacy select).
Now, with I/O completion ports and WSASend the story is somewhat different. When the socket is associated with the completion port - it's automatically transferred to a non-blocking mode.
If the call to WSASend can't be completed immediately (i.e. the network driver can't consume all the input) - the WSASend returns an error and GetLastError returns STATUS_PENDING. This actually means that an asynchronous operation has started but not finished yet**.
That is, you should not call WSASend repeatedly, because the send operation is already in the progress. When it's finished (either successfully or not) you'll get the notification on the I/O completion port, but meanwhile the calling thread is free to do other things.
I have a questions concerning boost::asio::ip::tcp::socket and the associated write functions. From reading the Wikipedia article on the TCP, I understand, that TCP contains acknowledgement messages as well as checksums. Unfortunaly, I can't find any information on this in the boost::asio reference. As far as I understand boost::asio uses the OS implementation of the TCP, which should contain both features.
My question is what do the functions boost::asio::write or boost::asio::async_write guarantee when called with an boost::asio::ip::tcp::socket. So what does it mean if the function returns/the callback function is called without error. I can imagine some possibilities:
Basically nothing, It only means that the program told the OS to send the data, but nothing more.
Data is underway, meaning that the OS acknowledged that it has send the data.
Data has arrived, meaning that a acknowledgement message from the other side was received.
Data has arrived and is not corrupted, same as 3. plus that the checksum adds up.
If it is not 4. is there a way to enforce this using boost::asio (I mean within boost::asio, not implementing it yourself)?
It is #1, which is the way it should be. There are no guarantees that the data will ever be sent.
You think you want #4, but you really don't. The fact that the remote peer's network stack received the correct data is probably irrelevant to your application. You really want to know whether the data was received and processed correctly, which is beyond the scope of TCP, but easy enough to implement on top of TCP. (I recommend reading up on the OSI Model for an introduction to what TCP can be expected to do. Basically, you want to ensure that your data gets to the right application, or perhaps more, and TCP only ensures that it gets as far as the computer that the application is running on.)
To do what you want, send an in-band acknowledgement over the TCP link. You can also put the SHA-2 or some other hash of the data in the acknowledgement. You can also wait to send the acknowledgement until the data has been processed — e.g., wait until it has been written to disk and fsync() has been called.
Locally detected errors will be reported. Connection error will also be reported.
If you are using TCP, tcp-ack failure will be reported, but maybe at a later read or write call (when the os is notified of the tcp-hack failure).
So you can't be sure that when you issue a write that it is actually received. no write error means that the os knows currently no errors on the tcp connection you are using and that he buffered your data internally to transmit it to the tcp peer.