Calling WSASend() in completion port?

Calling WSASend() in completion port? - c++

Many of you know the original "send()" will not write to the wire the amount of bytes you ask it to. Easily you can use a pointer and a loop to make sure your data is all sent.
However, I don't see how in WSASend() and completion ports work in this case. It returns immediately and you have no control over how much was sent (except in a lpLength which you have access in the routine). How does this get solved?
Do you have to call WSASend() in the routine multiple times in order the get all the data out? Doesn't this seem like a great disadvantage, especially if you want your data out in a particular order and multiple threads access the routines?

When you call WSASend with a socket that is associated with an IOCP and an OVERLAPPED structure you effectively pass off your data to the network stack to send. The network stack will give you a "completion" once the data buffer that you used is no longer required by the network stack. At that point you are free to reuse or release the memory used for your data buffer.
Note that the data is unlikely to have reached the peer at the point the completion is generated and the generation of the completion means nothing more than the network stack has taken ownership of the contents of the buffer.
This is different to how send operates. With send in blocking mode the call to send will block until the network stack has used all of the data that you have supplied. For calls to send in non-blocking mode the network stack takes as much data as it can from your buffer and then returns to you with details of how much it used; this means that some of your data has been used. With WSASend, generally, all of your data is used before you are notified.
It's possible for an overlapped WSASend to fail due to resource limits or network errors. It's unusual to get a failure which indicates that some data has been send but not all. Usually it's all sent OK or none sent at all. However it IS possible to get a completion with an error which indicates that some data has been used but not all. How you proceed from this point depends on the error (temporary resource limit or hard network fault) and how many other WSASends you have pending on that socket (zero or non-zero). You can only try and send the rest of the data if you have a temporary resource error and no other outstanding WSASend calls for this socket; and this is made more complicated by the fact that you don't know when the temporary resource limit situation will pass... If you ever have a temporary resource limit induced partial send and you DO have other WSASend calls pending then you should probably abort the connection as you may have garbled your data stream by sending part of the buffer from this WSASend call and then all (or part) of a subsequent WSASend call.
Note that it's a) useful and b) efficient to have multiple WSASend calls outstanding on a socket. It's the only way to keep the connection fully utilised. You should, however, be aware of the memory and resource usage implications of having multiple overlapped WSASend calls pending at one time (see here) as effectively you are handing control of the lifetime of your buffers (and thus the amount of memory and resources that your code uses) to the peer due to TCP flow control issues). See SIO_IDEAL_SEND_BACKLOG_QUERY and SIO_IDEAL_SEND_BACKLOG_CHANGE if you want to get really clever...

WSASend() on a completion port does not notify you until all of the requested data has been accepted by the socket, or until an error occurs, whichever happens first. It keeps working in the background until all of the data has been accepted (or errored). Until it notifies you, that buffer has to remain active in memory, but your code is free to move on to do other things while WSASend() is busy. There is no notification when the data is actually transmitted to the peer. IF you need that, then you have to implement an ACK in your data protocol so the peer can notify you when it receives the data.

First regarding send. Actually there may happen 2 different things, depending on how the socket is configured.
If socket is in so-called blocking mode (the default) - the call to send will block the calling thread, until all the input buffer is consumed by the underlying network driver. (Note that this doesn't mean that the data has already arrived at the peer).
If the socket is transferred to a non-blocking mode - the call to send will fail if the underlying driver may not consume all the input immediately. The GetLastError returns WSAEWOULDBLOCK in such a case. The application should wait until it may retry to send. Instead of calling send in a loop the application should get the notification from the system about the socket state change. Functions such as WSAEventSelect or WSAAsyncSelect may be used for this (as well as legacy select).
Now, with I/O completion ports and WSASend the story is somewhat different. When the socket is associated with the completion port - it's automatically transferred to a non-blocking mode.
If the call to WSASend can't be completed immediately (i.e. the network driver can't consume all the input) - the WSASend returns an error and GetLastError returns STATUS_PENDING. This actually means that an asynchronous operation has started but not finished yet**.
That is, you should not call WSASend repeatedly, because the send operation is already in the progress. When it's finished (either successfully or not) you'll get the notification on the I/O completion port, but meanwhile the calling thread is free to do other things.

Related

Any way to know how many bytes will be sent on TCP before sending?

I'm aware that the ::send within a Linux TCP server can limit the sending of the payload such that ::send needs to be called multiple times until the entire payload is sent.
i.e. Payload is 1024 bytes
sent_bytes = ::send(fd, ...) where sent_bytes is only 256 bytes so this needs to be called again.
Is there any way to know exactly how many bytes can be sent before sending? If the socket will allow for the entire message, or that the message will be fragmented and by how much?
Example Case
2 messages are sent to the same socket by different threads at the same time on the same tcp client via ::send(). In some cases where messages are large multiple calls to ::send() are required as not all the bytes are sent at the initial call. Thus, go with the loop solution until all the bytes are sent. The loop is mutexed so can be seen as thread safe, so each thread has to perform the sending after the other. But, my worry is that beacuse Tcp is a stream the client will receive fragments of each message and I was thinking that adding framing to each message I could rebuild the message on the client side, if I knew how many bytes are sent at a time.
Although the call to ::send() is done sequentially, is the any chance that the byte stream is still mixed?
Effectively, could this happen:
Server Side
Message 1: "CiaoCiao"
Message 2: "HelloThere"
Client Side
Received Message: "CiaoHelloCiaoThere"

Although the call to ::send() is done sequentially, is the any chance that
the byte stream is still mixed?
Of course. Not only there's a chance of that, it is pretty much going to be a certainty, at one point or another. It's going to happen at one point. Guaranteed.
sent to the same socket by different threads
It will be necessary to handle the synchronization at this level, by employing a mutex that each thread locks before sending its message and unlocking it only after the entire message is sent.
It goes without sending that this leaves open a possibility that a blocked/hung socket will result in a single thread locking this mutex for an excessive amount of time, until the socket times out and your execution thread ends up dealing with a failed send() or write(), in whatever fashion it is already doing now (you are, of course, checking the return value from send/write, and handling the exception conditions appropriately).
There is no single, cookie-cutter, paint-by-numbers, solution to this that works in every situation, in every program, that needs to do something like this. Each eventual solution needs to be tailored based on each program's unique requirements and purpose. Just one possibility would be a dedicated execution thread that handles all socket input/output, and all your other execution threads sending their messages to the socket thread, instead of writing to the socket directly. This would avoid having all execution thread wedged by a hung socket, at expense of grown memory, that's holding all unsent data.
But that's just one possible approach. The number of possible, alternative solutions has no limit. You will need to figure out which logic/algorithm based solution will work best for your specific program. There is no operating system/kernel level indication that will give you any kind of a guarantee as to the amount of a send() or write() call on a socket will accept.

What happens in boost::asio when TCP TX buffer fills up?

I'm trying to get to grips with boost asio but I'm having trouble understanding some of the behavior behind the asynchronous interface.
I have a simple setup with a client and a server.
The client calls async_write regularly with a fixed amount of data
The server polls for data regularly
What happens when the server stops polling for data ?
I guess the various buffers would fill up in the server OS and it would stop sending ACKs ?
Regardless of what happens it seems that the client can happily continue to send several gigabytes of data without receiving any error callback (doesn't receive any success either of course).
I assume the client OS stops accepting packets at one point since they can't be TX'ed ?
Does this means that boost::asio buffers data internally ?
If it does, can I use socket.cancel() to drop packets in case I don't want to wait for delivery ? (I need to make sure ASIO forgets about my packets so I can reuse old buffers for new packets)

asio doesn't buffer internally. And you will always get signaled if you can't transfer more data to the remote.
E.g. if you use synchronous writes in asio they will block until the data could be sent (or at least be copied into the kernel send buffers). If you use async writes the callback/acknowledgement will only be called once it could be sent. If you use nonblocking writes you get EAGAIN/WOULD_BLOCK errors. If you use multiple async_write's in parallel - well - you shouldn't do that, it's behavior is undefined according to the asio docs:
This operation is implemented in terms of zero or more calls to the stream's async_write_some function, and is known as a composed operation. The program must ensure that the stream performs no other write operations (such as async_write, the stream's async_write_some function, or any other composed operations that perform writes) until this operation completes.
Guarantee in your application that you always only perform a single async write operation and once that finishes write the next piece of data. If you need to write data in between you would need to buffer that inside your application.

Should I make simultaneous WSASend() calls?

I know that in order to call WSASend() simultaneously, I need to provide for each call a unique WSAOVERLAPPED and WSABUF instances. But this means that I have to keep track of these instances for each call, which will complicate things.
I think it would be a better idea if I create a thread that only make WSASend() calls not simultaneously but rather sequentially. This thread will wait on a queue that will hold WSASend() requests (each request will contain the socket handle and the string I want to send). When I eventually call WSASend() I will block the thread until I receive a wake up signal from the thread that waits on the completion port telling me that the WSASend() has been completed, and then I go on to fetch the next request.
If this is a good idea, then how should I implement the queue and how to make a blocking fetch call on it (instead of using polling)?

The WSABUF can be stack based as it is the responsibility of WSASend() to duplicate it before returning. The OVERLAPPED and the data buffer itself must live until the IOCP completion for the operation is extracted and processed.
I've always used an 'extended' OVERLAPPED structure which incorporates the data buffer, the overlapped structure AND the WSABUF. I then use a reference counting system to ensure that the 'per operation data' exists until nobody needs it any more (that is I take a reference before the API call initiates the operation and I release a reference when the operation is completed after removal of the completion from the IOCP - note that references aren't 100% necessary here but they make it easier to then pass the resulting data buffer off to other parts of the code).
It is MOST optimal for a TCP connection to have the TCP "window size" of data in transit at any one time and to have some more data pending so that the window is always kept full and you are always sending at the maximum that the connection can take. To achieve this with overlapped I/O it's usually best to have many WSASend() calls pending. However, you don't want to have too many pending (see here) and the easiest way to achieve this is to track the number of bytes that you have pending, queue bytes for later transmission and send from your transmission queue when existing sends complete...

Should I handle the fact that WSASend() may not send all data?

I have found out that WSASend() may not send all of the data, for example if I asked it to send 800 bytes, it may only send 600 bytes.
Now my question is: are the situations where this can happen are extremely rare that I should not bother handling this kind of event. Or do I have to handle it? Can't I for example just show an error message to the user that not all data have been sent and abort the connection instead of trying to recover?
Note: I am using IOCP.

When sending using overlapped I/O and IOCP it's unlikely that you'll ever see a partial send. It's possibly more likely that you'll see a send with an error where you wanted to send a multiple of the system page size and only a smaller multiple was sent, but even that's fairly unlikely (and I don't have unit tests that force that condition as I view it as theoretical).
When sending via overlapped I/O, over a TCP connection, when your peer is receiving and processing slower than you are sending then this is the more likely situation that you'll encounter, that is, TCP flow control kicking in and your WSASend() calls taking longer and longer to complete.
It's really unlikely that you'll actually see an error either from a WSASend() call or a subsequent GetQueuedCompletionStatus() call. Things will just keep working until they don't...

It can happen any time the receiver is slower than the sender. You must handle it by rescheduling the remaining data to be written. Don't treat it as an error.

TCP WSASend Completion Criteria

I am unable to find the specification of what it means that a TCP WSASend call completes. Does the completion of a WSASend operation require that an ACK response be received?
This question is relevant for slower networks with a 200ms - 2s ping timeout. Will it take 200ms - 2s for the WSASend completion callback to be invoked (or whatever completion mechanism is used)? Or perhaps only on some packets will Windows wait for an ACK and consider the WSASend operation complete much faster for all other packets?
The exact behavior makes a big difference with regard to buffer life cycle management and in turn has a significant impact on performance (locking, allocation/deallocation, and reference counting).

WSASend does not guarantee the following:
That the data was sent (it might have been buffered)
That it was received (it might have been lost)
That the receiving application processed it (the sender cannot ever know this by principle)
It does not require a round-trip. In fact, with nagling enabled small amounts of data are always buffered for 200ms hoping that the application will send more. WSASend must return quickly so that nagling has a chance to work.
If you require confirmation, change the application protocol so that you get a confirmation back. No other way to do it.
To clarify, even without nagling (TCP_NODELAY) you do not get an ACK for your send operation. It will be sent out to the network but the remote side does not know that it should ACK. TCP has no way to say "please ACK this data immediately". Data being sent does not mean it will ever be received. The network could drop a second after the data was pushed out to a black hole.

It's not documented. It will likely be different depending on whether you have turned off send buffering. However, you always need to pay attention to the potential time that it will take to get a WSASend() completion, especially if you're using asynchronous calls. See this article of mine for details.
You get a WSASend() completion when the TCP stack has finished with your buffer. If you have NOT turned off send buffering by setting SO_SNDBUF to zero then it likely means you will get a completion once the stack copies your data into its buffers. If you HAVE turned off send buffering then it likely means that you will get a completion once you get an ACK (simply because the stack should need your buffer for any potential retransmissions). However, it's not documented.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js