Should I make simultaneous WSASend() calls?

Should I make simultaneous WSASend() calls? - c++

I know that in order to call WSASend() simultaneously, I need to provide for each call a unique WSAOVERLAPPED and WSABUF instances. But this means that I have to keep track of these instances for each call, which will complicate things.
I think it would be a better idea if I create a thread that only make WSASend() calls not simultaneously but rather sequentially. This thread will wait on a queue that will hold WSASend() requests (each request will contain the socket handle and the string I want to send). When I eventually call WSASend() I will block the thread until I receive a wake up signal from the thread that waits on the completion port telling me that the WSASend() has been completed, and then I go on to fetch the next request.
If this is a good idea, then how should I implement the queue and how to make a blocking fetch call on it (instead of using polling)?

The WSABUF can be stack based as it is the responsibility of WSASend() to duplicate it before returning. The OVERLAPPED and the data buffer itself must live until the IOCP completion for the operation is extracted and processed.
I've always used an 'extended' OVERLAPPED structure which incorporates the data buffer, the overlapped structure AND the WSABUF. I then use a reference counting system to ensure that the 'per operation data' exists until nobody needs it any more (that is I take a reference before the API call initiates the operation and I release a reference when the operation is completed after removal of the completion from the IOCP - note that references aren't 100% necessary here but they make it easier to then pass the resulting data buffer off to other parts of the code).
It is MOST optimal for a TCP connection to have the TCP "window size" of data in transit at any one time and to have some more data pending so that the window is always kept full and you are always sending at the maximum that the connection can take. To achieve this with overlapped I/O it's usually best to have many WSASend() calls pending. However, you don't want to have too many pending (see here) and the easiest way to achieve this is to track the number of bytes that you have pending, queue bytes for later transmission and send from your transmission queue when existing sends complete...

Related

What should I do if boost::beast write_some doesn't write everything?

I am sending data on a boost::beast::websocket
I would like to send the data synchronously, so am trying to decide if I should use write or write_some.
From this SO answer (which is about asio rather than beast specifically, but I assume(!) the same rules apply?) I understand that write will block until the entire message is confirmed sent, whereas write_some may return early, and will return the number of bytes sent which may not be all the bytes which were requested be sent.
In my particular use-case I am using a single thread, and the write is done from within this thread's context (ie: from inside a callback issued after entering io_context.run())
Since I don't want to block the caller for some indeterminate amount of time, I want to avoid using write if there is a more elegant solution.
So if I then turn to async_write, I am uncertain what I should do if the number of bytes is less than the number of bytes I requested be sent?
How I would normally handle this with standard tcp sockets is use non-blocking mode, and when I get back EWOULDBLOCK, enqueue the data and carry on. When the socket becomes writeable again, only then complete the write (much akin to an asio async_write). Since non-blocking is not supported in beast, I'm wondering what the analogous approach is?
Presumably I need to perform some additional write operation to ensure the rest of the bytes are sent in due course?
The beast docs say
Callers are responsible for synchronizing operations on the socket
using an implicit or explicit strand, as per the Asio documentation.
The websocket stream asynchronous interface supports one of each of
the following operations to be active at the same time:
async_read or async_read_some
async_write or async_write_some
async_ping or async_pong
async_close
Is it ok to start an async write of the remaining bytes, so long as I ensure that a new synchronous write/write_some isn't started before the outstanding async write has completed?
If I cannot start an async write to complete the send of the remaining bytes, how is one supposed to handle a synchronous write_some which doesn't completely send all bytes?
As to why I don't just use async_write always, I have additional slow processing to do after the attempt to write, such as logging etc. Since I am using a single thread, and the call to async_write happens within that thread, the write will only occur after I return control to the event loop.
So what I'd like to do is attempt to write synchronously (which will work in 90% of the cases) so the data is sent, and then perform my slow tasks which would otherwise delay the write. In the 10% of cases where a sync write doesn't complete immediately, then an alternative async_write operation should be employed - but only in the fallback situation.
Possibly related: I see that write_some has a flag fin, which should be set to true if this is the last part of the message.
I am only ever attempting to write complete messages, so should I always use true for this?

Any way to know how many bytes will be sent on TCP before sending?

I'm aware that the ::send within a Linux TCP server can limit the sending of the payload such that ::send needs to be called multiple times until the entire payload is sent.
i.e. Payload is 1024 bytes
sent_bytes = ::send(fd, ...) where sent_bytes is only 256 bytes so this needs to be called again.
Is there any way to know exactly how many bytes can be sent before sending? If the socket will allow for the entire message, or that the message will be fragmented and by how much?
Example Case
2 messages are sent to the same socket by different threads at the same time on the same tcp client via ::send(). In some cases where messages are large multiple calls to ::send() are required as not all the bytes are sent at the initial call. Thus, go with the loop solution until all the bytes are sent. The loop is mutexed so can be seen as thread safe, so each thread has to perform the sending after the other. But, my worry is that beacuse Tcp is a stream the client will receive fragments of each message and I was thinking that adding framing to each message I could rebuild the message on the client side, if I knew how many bytes are sent at a time.
Although the call to ::send() is done sequentially, is the any chance that the byte stream is still mixed?
Effectively, could this happen:
Server Side
Message 1: "CiaoCiao"
Message 2: "HelloThere"
Client Side
Received Message: "CiaoHelloCiaoThere"

Although the call to ::send() is done sequentially, is the any chance that
the byte stream is still mixed?
Of course. Not only there's a chance of that, it is pretty much going to be a certainty, at one point or another. It's going to happen at one point. Guaranteed.
sent to the same socket by different threads
It will be necessary to handle the synchronization at this level, by employing a mutex that each thread locks before sending its message and unlocking it only after the entire message is sent.
It goes without sending that this leaves open a possibility that a blocked/hung socket will result in a single thread locking this mutex for an excessive amount of time, until the socket times out and your execution thread ends up dealing with a failed send() or write(), in whatever fashion it is already doing now (you are, of course, checking the return value from send/write, and handling the exception conditions appropriately).
There is no single, cookie-cutter, paint-by-numbers, solution to this that works in every situation, in every program, that needs to do something like this. Each eventual solution needs to be tailored based on each program's unique requirements and purpose. Just one possibility would be a dedicated execution thread that handles all socket input/output, and all your other execution threads sending their messages to the socket thread, instead of writing to the socket directly. This would avoid having all execution thread wedged by a hung socket, at expense of grown memory, that's holding all unsent data.
But that's just one possible approach. The number of possible, alternative solutions has no limit. You will need to figure out which logic/algorithm based solution will work best for your specific program. There is no operating system/kernel level indication that will give you any kind of a guarantee as to the amount of a send() or write() call on a socket will accept.

boost::asio internal queue capacity

I was attempting to understand Boost Asio implementation and limitations. As I understand from here - https://www.boost.org/doc/libs/1_75_0/doc/html/boost_asio/overview/core/basics.html
When you do an async_receive_from call on a socket, the following things happen
The socket forwards the request to the I/O execution context.
The I/O execution context signals to the operating system that it should start an asynchronous connect.
The operating system indicates that the connect operation has completed by placing the result on a queue, ready to be picked up by the I/O execution context.
When using an io_context as the I/O execution context, your program must make a call to io_context::run() (or to one of the similar io_context member functions) in order for the result to be retrieved. A call to io_context::run() blocks while there are unfinished asynchronous operations, so you would typically call it as soon as you have started your first asynchronous operation.
Assuming I have very high throughput of data coming in, what I'm trying to understand is
Is there a possibility of data loss in step 2 above where IO execution context signals OS to perform the async receive operation? Can the OS get somehow overwhelmed with the volume of asynchronous reads?
In step 3 above, OS puts completed reads in a queue. What is the capacity of this queue? Can this queue overflow if for example, there was a burst of network traffic and all the threads running io_context::run() are occupied, hence read data keeps accumulating in the queue? Is this queue bounded or unbounded?
The ASIO code is open-source, but I'm fairly new to C++ and am finding it a little difficult to understand the code. Appreciate any help on these questions. Thanks!

There's no buffering in ASIO whatsoever; ASIO is a thin wrapper around native OS select/epoll/kqueue/IOCP (depending on OS) as well as non-blocking send/recv calls.
Your question can thus be re-phased as "what happens when I don't call recv fast enough?". As it turns out, that question has already been asked before, see What happens if one doesn't call POSIX's recv “fast enough”?.
Anyway, to answer the specific questions:
1. Is there a possibility of data loss in step 2 above where IO execution context signals OS to perform the async receive operation? Can the OS get somehow overwhelmed with the volume of asynchronous reads?
The OS can't get overwhelmed with async receive calls because you can have at most 1 active async receive and send per socket, and the number of sockets is limited.
2. ... What is the capacity of this queue? Can this queue overflow if for example, there was a burst of network traffic and all the threads running io_context::run() are occupied, hence read data keeps accumulating in the queue? Is this queue bounded or unbounded?
The queueing characteristics of a TCP stream are determined by the TCP receive buffer and TCP receive window. These are configurable in most modern OSes, and can even by dynamic. The receive buffer is bounded, and if you don't receive fast enough, TCP has built-in mechanisms to signal the sending side to slow down/retransmit (a.k.a. TCP Flow Control).
Similarly UDP has a receive buffer. When that one gets full, new incoming packets are dropped.

What happens in boost::asio when TCP TX buffer fills up?

I'm trying to get to grips with boost asio but I'm having trouble understanding some of the behavior behind the asynchronous interface.
I have a simple setup with a client and a server.
The client calls async_write regularly with a fixed amount of data
The server polls for data regularly
What happens when the server stops polling for data ?
I guess the various buffers would fill up in the server OS and it would stop sending ACKs ?
Regardless of what happens it seems that the client can happily continue to send several gigabytes of data without receiving any error callback (doesn't receive any success either of course).
I assume the client OS stops accepting packets at one point since they can't be TX'ed ?
Does this means that boost::asio buffers data internally ?
If it does, can I use socket.cancel() to drop packets in case I don't want to wait for delivery ? (I need to make sure ASIO forgets about my packets so I can reuse old buffers for new packets)

asio doesn't buffer internally. And you will always get signaled if you can't transfer more data to the remote.
E.g. if you use synchronous writes in asio they will block until the data could be sent (or at least be copied into the kernel send buffers). If you use async writes the callback/acknowledgement will only be called once it could be sent. If you use nonblocking writes you get EAGAIN/WOULD_BLOCK errors. If you use multiple async_write's in parallel - well - you shouldn't do that, it's behavior is undefined according to the asio docs:
This operation is implemented in terms of zero or more calls to the stream's async_write_some function, and is known as a composed operation. The program must ensure that the stream performs no other write operations (such as async_write, the stream's async_write_some function, or any other composed operations that perform writes) until this operation completes.
Guarantee in your application that you always only perform a single async write operation and once that finishes write the next piece of data. If you need to write data in between you would need to buffer that inside your application.

Calling WSASend() in completion port?

Many of you know the original "send()" will not write to the wire the amount of bytes you ask it to. Easily you can use a pointer and a loop to make sure your data is all sent.
However, I don't see how in WSASend() and completion ports work in this case. It returns immediately and you have no control over how much was sent (except in a lpLength which you have access in the routine). How does this get solved?
Do you have to call WSASend() in the routine multiple times in order the get all the data out? Doesn't this seem like a great disadvantage, especially if you want your data out in a particular order and multiple threads access the routines?

When you call WSASend with a socket that is associated with an IOCP and an OVERLAPPED structure you effectively pass off your data to the network stack to send. The network stack will give you a "completion" once the data buffer that you used is no longer required by the network stack. At that point you are free to reuse or release the memory used for your data buffer.
Note that the data is unlikely to have reached the peer at the point the completion is generated and the generation of the completion means nothing more than the network stack has taken ownership of the contents of the buffer.
This is different to how send operates. With send in blocking mode the call to send will block until the network stack has used all of the data that you have supplied. For calls to send in non-blocking mode the network stack takes as much data as it can from your buffer and then returns to you with details of how much it used; this means that some of your data has been used. With WSASend, generally, all of your data is used before you are notified.
It's possible for an overlapped WSASend to fail due to resource limits or network errors. It's unusual to get a failure which indicates that some data has been send but not all. Usually it's all sent OK or none sent at all. However it IS possible to get a completion with an error which indicates that some data has been used but not all. How you proceed from this point depends on the error (temporary resource limit or hard network fault) and how many other WSASends you have pending on that socket (zero or non-zero). You can only try and send the rest of the data if you have a temporary resource error and no other outstanding WSASend calls for this socket; and this is made more complicated by the fact that you don't know when the temporary resource limit situation will pass... If you ever have a temporary resource limit induced partial send and you DO have other WSASend calls pending then you should probably abort the connection as you may have garbled your data stream by sending part of the buffer from this WSASend call and then all (or part) of a subsequent WSASend call.
Note that it's a) useful and b) efficient to have multiple WSASend calls outstanding on a socket. It's the only way to keep the connection fully utilised. You should, however, be aware of the memory and resource usage implications of having multiple overlapped WSASend calls pending at one time (see here) as effectively you are handing control of the lifetime of your buffers (and thus the amount of memory and resources that your code uses) to the peer due to TCP flow control issues). See SIO_IDEAL_SEND_BACKLOG_QUERY and SIO_IDEAL_SEND_BACKLOG_CHANGE if you want to get really clever...

WSASend() on a completion port does not notify you until all of the requested data has been accepted by the socket, or until an error occurs, whichever happens first. It keeps working in the background until all of the data has been accepted (or errored). Until it notifies you, that buffer has to remain active in memory, but your code is free to move on to do other things while WSASend() is busy. There is no notification when the data is actually transmitted to the peer. IF you need that, then you have to implement an ACK in your data protocol so the peer can notify you when it receives the data.

First regarding send. Actually there may happen 2 different things, depending on how the socket is configured.
If socket is in so-called blocking mode (the default) - the call to send will block the calling thread, until all the input buffer is consumed by the underlying network driver. (Note that this doesn't mean that the data has already arrived at the peer).
If the socket is transferred to a non-blocking mode - the call to send will fail if the underlying driver may not consume all the input immediately. The GetLastError returns WSAEWOULDBLOCK in such a case. The application should wait until it may retry to send. Instead of calling send in a loop the application should get the notification from the system about the socket state change. Functions such as WSAEventSelect or WSAAsyncSelect may be used for this (as well as legacy select).
Now, with I/O completion ports and WSASend the story is somewhat different. When the socket is associated with the completion port - it's automatically transferred to a non-blocking mode.
If the call to WSASend can't be completed immediately (i.e. the network driver can't consume all the input) - the WSASend returns an error and GetLastError returns STATUS_PENDING. This actually means that an asynchronous operation has started but not finished yet**.
That is, you should not call WSASend repeatedly, because the send operation is already in the progress. When it's finished (either successfully or not) you'll get the notification on the I/O completion port, but meanwhile the calling thread is free to do other things.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js