Win32 Registered Socket I/O: cancelling pending receive operations? - c++

I've recently begun to implement a UDP socket receiver with Registered I/O on Win32. I've stumbled upon the following issue: I don't see any way to cancel pending RIOReceive()/RIOReceiveEx() operations without closing the socket.
To summarize the basic situation:
During normal operation I want to queue quite a few RIOReceive()/RIOReceiveEx() operations in the request queue to ensure that I get the best possible performance when it comes to receiving the UDP packets.
However, at some point I may want to stop what I'm doing. If UDP packets are still arriving at that point, fine, I can just wait until all pending requests have been processed. Unfortunately, if the sender has also stopped sending UDP packets, I still have the pending receive operations.
That in and by itself is not a problem, because I can just keep going once operations start again.
However, if I want to reconfigure the buffers used in between, I run into an issue. Because the documentation states that it's an error to deregister a buffer with RIO while it's still in use, but as long as receive operations are still pending, the buffers are still officially in use, so I can't do that.
What I've tried so far related to cancellation of these requests:
CancelIo() on the socket (no effect)
CancelSynchronousIo(GetCurrentThread()) (no effect)
shutdown(s, SD_RECEIVE) (success, but no effect, the socket even receives packets afterwards -- though shutdown probably wouldn't have been helpful anyway)
WSAIoctl(s, SIO_FLUSH, ...) because the docs of RIOReceiveEx() mentioned it, but that just gives me WSAEOPNOTSUPP on UDP sockets (probably only useful for TCP and probably also only useful for sending, not receiving)
Just for fun I tried to set setsockopt(s, SOL_SOCKET, SO_RCVTIMEO, ...) with 1ms as the timeout -- and that doesn't appear to have any effect on RIO regardless of whether I call it before or after I queue the RIOReceive()/RIOReceiveEx() calls
Only closing the socket will successfully cancel the I/O.
I've also thought about doing RIOCloseCompletionQueue(), but there I wouldn't even know how to proceed afterwards, since there's no way of reassigning a completion queue to a request queue, as far as I can see, and you can only ever create a single request queue for a socket. (If there was something like RIOCloseRequestQueue() and that did cancel the pending requests, I'd be happy, but the docs only mention that closesocket() will free resources associated with the request queue.)
So what I'm left with is the following:
Either I have to write my logic so that the buffers that are being used are always fixed once the socket is opened, because I can't really ever change them in practice due to requests that could still be pending.
Or I have to close the socket and reopen it every time I want to change something here. But that is a race condition, because I'd have to bind the socket again, and I'd really like to avoid that if possible.
I've tested sending UDP packets to my own socket from a newly created different socket until all of the requests have been 'eaten up' -- and while that works in principle, I really don't like it, because if any kind of firewall rule decides to not allow this, the code would deadlock instantly.
On Linux io_uring I can just cancel existing operations, or even exit the uring, and once that's done, I'm sure that there are no receive operations still active, but the socket is still there and accessible. (And on Linux it's nice that the socket still behaves like a normal socket, on Windows if I create the socket with the WSA_FLAG_REGISTERED_IO flag, I can't use it outside of RIO except for operations such as bind().)
Am I missing something here or is this simply not possible with Registered I/O?

Related

What common programming mistakes can cause stuck CLOSE_WAIT in epoll edge triggered mode?

I'm wondering what common programming situations/bugs might cause a server process I have enter into CLOSE_WAIT but not actually close the socket.
What I'm wanting to do is trigger this situation so that I can fix it. In a normal development environment I've not been able to trigger it, but the same code used on a live server is occasionally getting them so that after many many days we have hundreds of them.
Googling for close_wait and it actually seems to be a very common problem, even in mature and supposedly well written services like nginx.
CLOSE_WAIT is basically when the remote end shut down the socket but the local application has not yet invoked a close() on it. This is usually happens when you are not expecting to read data from the socket and thus aren't watching it for readability.
Many applications for convenience sake will always monitor a socket for readability to detect a close.
A scenario to try out is this:
Peer sends 2k of data and immediately closes the data
Your socket is then registered with epoll and gets a notification for readability
Your application only reads 1k of data
You stop monitoring the socket for readability
(I'm not sure if edge-triggered epoll will end up delivering the shutdown event as a separate event).
See also:
(from man epoll_ctl)
EPOLLRDHUP (since Linux 2.6.17)
Stream socket peer closed connection, or shut down writing half of connection. (This flag is especially useful for writing
simple code
to detect peer shutdown when using Edge Triggered monitoring.)

Receiving data from already closed socket?

Suppose I have a server application - the connection is over TCP, using UNIX sockets.
The connection is asynchronous - in other words, clients' and servers' sockets are non-blocking.
Suppose the following situation: in some conditions, the server may decide to send some data to a connected client and immediately close the connection: using shutdown with SHUT_RDWR.
So, my question is - is it guaranteed, that when the client call recv, it will receive the (sent by the server) data?
Or, to receive the data, recv must be called before the server's shutdown? If so, what should I do (or, to be more precise, how should I do this), to make sure, that the data is received by the client?
You can control this behavior with "setsockopt(SO_LINGER)":
man setsockopt
SO_LINGER
Waits to complete the close function if data is present. When this option is enabled and there is unsent data present when the close
function is called, the calling application is blocked during the
close function until the data is transmitted or the connection has
timed out. The close function returns without blocking the caller.
This option has meaning only for stream sockets.
See also:
man read
Beej's Guide to Network Programming
There's no guarantee you will receive any data, let alone this data, but the data pending when the socket is closed is subject to the same guarantees as all the other data: if it arrives it will arrive in order and undamaged and subject to TCP's best efforts.
NB 'Asynchronous' and 'non-blocking' are two different things, not two terms for the same thing.
Once you have successfully written the data to the socket, it is in the kernel's buffer, where it will stay until it has been sent and acknowledged. Shutdown doesn't cause the buffered data to get lost. Closing the socket doesn't cause the buffered data to get lost. Not even the death of the sending process would cause the buffered data to get lost.
You can observe the size of the buffer with netstat. The SendQ column is how much data the kernel still wants to transmit.
After the client has acknowledged everything, the port disappears from the server. This may happen before the client has read the data, in which case it will be in RecvQ on the client. Basically you have nothing to worry about. After a successful write to a TCP socket, every component is trying as hard as it can to make sure that your data gets to the destination unharmed regardless of what happens to the sending socket and/or process.
Well, maybe one thing to worry about: If the client tries to send anything after the server has done its shutdown, it could get a SIGPIPE and die before it has read all the available data from the socket.

Handling POSIX socket read() errors

Currently I am implementing a simple client-server program with just the basic functionalities of read/write.
However I noticed that if for example my server calls a write() to reply my client, and if my client does not have a corresponding read() function, my server program will just hang there.
Currently I am thinking of using a simple timer to define a timeout count, and then to disconnect the client after a certain count, but I am wondering if there is a more elegant/or standard way of handling such errors?
There are two general approaches to prevent server blocking and to handle multiple clients by a single server instance:
use POSIX threads to handle each client's connection. If one thread blocks because of erroneous client, other threads will still continue to run. If the remote client has just disappeared (crashed, network down, etc.), then sooner or later the TCP stack will signal a timeout and the blocked write operation will fail with error.
use non-blocking I/O together with a polling mechanism, e.g. select(2) or poll(2). It is quite harder to program using polling calls though. Network sockets are made non-blocking using fcntl(2) and in cases where a normal write(2) or read(2) on the socket would block an EAGAIN error is returned instead. You can use select(2) or poll(2) to wait for something to happen on the socket with an adjustable timeout period. For example, waiting for the socket to become writable, means that you will be notified when there is enough socket send buffer space, e.g. previously written data was flushed to the client machine TCP stack.
If the client side isn't going to read from the socket anymore, it should close down the socket with close. And if you don't want to do that because the client still might want to write to the socket, then you should at least close the read half with shutdown(fd, SHUT_RD).
This will set it up so the server gets an EPIPE on the write call.
If you don't control the clients... if random clients you didn't write can connect, the server should handle clients actively attempting to be malicious. One way for a client to be malicious is to attempt to force your server to hang. You should use a combination of non-blocking sockets and the timeout mechanism you describe to keep this from happening.
In general you should write the protocols for how the server and client communicate so that neither the server or client are trying to write to the socket when the other side isn't going to be reading. This doesn't mean you have to synchronize them tightly or anything. But, for example, HTTP is defined in such a way that it's quite clear for either side as to whether or not the other side is really expecting them to write anything at any given point in the protocol.

Ensuring data is being read with async_read

I am currently testing my network application in very low bandwidth environments. I currently have code that attempts to ensure that the connection is good by making sure I am still receiving information.
Traditionally I have done this by recording the timestamp in my ReadHandler function so that each time it gets called I know I have received data on the socket. With very low bandwidths this isn't sufficient because my ReadHandler is not getting called frequently enough.
I was toying around with the idea of writing my own completion condition function (right now I am using tranfer_at_least(1)) thinking it would get called more frequently and I could record my timestamp there, but I was wondering if there wasn't some other more standard way to go about this.
We had a similar issue in production: some of our connections may be idle for days, but we must detect if the remote is dead ASAP.
We solved it by enabling the TCP_KEEPALIVE option:
boost::asio::socket_base::keep_alive option(true);
mSocketTCP.set_option(option);
which had to be accompanied by new startup script that writes sensible values to /proc/sys/net/ipv4/tcp_keepalive_* which have very long timeouts by default (on LInux)
You can use the read_some method to get partial reads, and deal with the book keeping. This is more efficient than transfer_at_least(1), but you still have to keep track of what is going on.
However, a cleaner approach is just to use a concurrent deadline_timer. If the timer goes off before you are finished, then is taking too long and cancel whatever is going on. If not, just stop the timer and continue. Something like:
boost::asio::deadline_timer t;
t.expires_from_now(boost::posix_time::seconds(20));
t.async_wait(bind(&Class::timed_out, this, _1));
// Do stuff.
if (!t.cancel()) {
// Timer went off, abort
}
// And the timeout method
void Class::timed_out(error_code const& error)
{
if (error == boost::asio::error::operation_aborted) return;
// Deal with the timeout, close the socket, etc.
}
I don't know how to handle low latency of network from within application. Can you be sure if it's network latency, or if peer server or peer application busy and react slowly. Does it matter if it network/server/application quilt?
Even if you can discover network latency and find it's big, what are you going to do?
You can not improve the situation.
Consider other critical case which is a subset of what you're trying to handle - network is down (e.g. you disconnect cable from your machine). Since it a subset of your problem you want to handle it too.
Let's examine the network down effect on active TCP connection.How can you discover your active TCP connection is still alive? Calling send() will success, but it merely says that the message queued in TCP outgoing queue in kernel. TCP stack will try to send it, but since TCP ACK won't be sent back, TCP stack on your side will try to resend it again and again. You can see your message in netstat output (Send-Q column).
I'm aware of the following ways to deal with it:
One standard way is TCP keep alive proposed #Cubby.
Another way is to implement Keep Alive mechanism. Send Keep Alive req message and peer is obligated to send back Keep Alive ack message.
If you don't receive ack message after predefined timeout, try to send Keep Alive req N more times (e.g. N=2). If still no success, close the socket and open it again. If peer server is not available you'll not be abable to open connection, since TCP 3 way handshake requires peer to respond.

some OVERLAPS using WSASend not returning in a timely manner using GetQueuedCompletionStatus?

Background: I'm using CreateIoCompletionPort, WSASend/Recv, and GetQueuedCompletionStatus to do overlapped socket io on my server. For flow control, when sending to the client, I only allow several WSASend() to be called when all pending OVERLAPs have popped off the IOCP.
Problem: Recently, there are occassions when the OVERLAPs do not get returned to the IOCP. The thread calling GetQueuedCompletionStatus does not get them and they remain in my local pending queue. I've verified that the client DOES receive the data off the socket and the socket is connected. No errors were returned when the WSASend() calls were made. The OVERLAPs simply "never" come back without an external stimulus like the following:
Disconnecting the socket from the client or server, immediately allows the GetQueuedCompletionStatus thread to retrieve the OVERLAPs
Making additional calls to WSASend(), sometimes several are needed, before all the OVERLAPs suddenly pop off the queue.
Question: Has anyone seen this type of behavior? Any ideas on what is causing this?
Thanks,
Geoffrey
WSASend() can fail to complete in a timely manner if the TCP window is full. In this case the stack can't send any more data so your WSASend() waits and your completion doesn't occur until the TCP stack CAN send more data.
If you happen to have a protocol between your client and server that has no flow control built into the protocol itself AND you aren't doing any flow control yourself based on write completions and are just sending data as fast as your server can send then you may get to a point where either the network or your client can't keep up and TCP flow control kicks in (when the TCP window gets full). If you continue to just fire off data asynchronously with additional calls to WSASend() then eventually you'll chew your way through all of the non-paged memory on the machine and at that point all bets are off (chances are high that a driver may cause the box to bluescreen).
So, in summary, completions from overlapped socket writes can and will sometimes take longer to come back than you may expect. In your example, I expect that the completions that you get when you close the socket are all failures?
I talk about this some more on my blog; here: http://www.lenholgate.com/blog/2008/07/write-completion-flow-control.html and here: http://www.serverframework.com/asynchronousevents/2011/06/tcp-flow-control-and-asynchronous-writes.html