I am using Apache thrift in C++ on Windows and I would like to ask for your help with cancellation of a blocking read operation that is in progress. The read operation (for example – TProtocol::readByte) is blocked until the data is received. When I close the transport from another thread, I get a failed assertion about a null pointer.
Is there any other way to cancel a blocked read operation?
Assuming you are running on Windows (according to the tags on your question): You can cancel a blocking socket operation with WSACancelBlockingCall (although this operation is deprecated, it should still work). Your socket will then return the error code WSAEINTR (Interrupted function call) instead of WSAETIMEDOUT.
In Thrift, you can use TSocket::getSocketFD() or TPipe::getPipeHandle() to get the according handle for canceling the current operation.
if you're using blocking mode, so the only option to abort the read operation is set a timeout on the TSocket before read it.
Related
I am currently working on a server application in C++. My main inspirations are these examples:
Windows SDK IOCP Excample
The I/O Completion Port IPv4/IPv6 Server Program Example
My app is strongly similar to these (socketobj, packageobj, ...).
In general, my app is running without issues. The only things which still causes me troubles are half open connections.
My strategy for this is: I check every connected client in a time period and count an "idle counter" up. If one completion occurs, I reset this timer. If the Idle counter goes too high, I set a boolean to prevent other threads from posting operations, and then call closesocket().
My assumption was that now the socket is closed, the pending operations will complete (maybe not instantly but after a time). This is also the behavior the MSDN documentation is describing (hints, second paragraph). I need this because only after all operations are completed can I free the resources.
Long story short: this is not the case for me. I did some tests with my testclient app and some cout and breakpoint debugging, and discovered that pending operations for closed sockets are not completing (even after waiting 10 min). I also already tried with a shutdown() call before the closesocket(), and both returned no error.
What am I doing wrong? Does this happen to anyone else? Is the MSDN documentation wrong? What are the alternatives?
I am currently thinking of the "linger" functionality, or to cancel every operation explicitly with the CancelIoEx() function
Edit: (thank you for your responses)
Yesterday evening I added a chained list for every sockedobj to hold the per io obj of the pending operations. With this I tried the CancelIOEx() function. The function returned 0 and GetLastError() returned ERROR_NOT_FOUND for most of the operations.
Is it then safe to just free the per Io Obj in this case?
I also discovered, that this is happening more often, when I run my server app and the client app on the same machine. It happens from time to time, that the server is then not able to complete write operations. I thought that this is happening because the client side receive buffer gets to full. (The client side does not stop to receive data!).
Code snipped follows as soon as possible.
The 'linger' setting can used to reset the connection, but that way you will (a) lose data and (b) deliver a reset to the peer, which may terrify it.
If you're thinking of a positive linger timeout, it doesn't really help.
Shutdown for read should terminate read operations, but shutdown for write only gets queued after pending writes so it doesn't help at all.
If pending writes are the problem, and not completing, they will have to be cancelled.
I am using boost::asio to transfer data to & fro from client to server. I have a reader thread on client side to read data received on the socket on client side. Please note that I am using boost::asio::read on client side & boost::asio::writeon server side.
Not using async_read or async_write. Everything works great.
However when I close my application, 2 out 10 times the app does not cleanly tear down or close properly. It gets hung while closing down The issue is the following:
My closing function gets called when destructors get called during my app's close down. Following is the code of the Close function:
socket.cancel();
socket.close();
boost::system::error_code ec;
socket.shutdown(boost::asio::ip::tcp::socket::shutdown_both, ec);
The problem is that the boost::asio::read call does not return when it does not get any data & keeps waiting on it. This should be fine as long as I can cancel it. I am trying to do a socket.cancel on it to cancel all read operations while exiting.
However, it doesn't seems to work. I read in some forums that socket.cancel only cancels async_read operations. Is it so ? Then what is the way to cancel a boost::asio::read` operation when my app needs to exit ?
That's the nature of blocking IO.
Indeed socket.cancel() (or even io_service::stop()) will not work on synchronous operations.
The only way to interrupt this is to use socket-level timeouts (but Asio doesn't expose that) or to use asynchronous signals (e.g. pressing Ctrl-C in a terminal sends the child process a SIGINT).
I've previously created a poor-man's wrapper if you insist on running single operations with a timeout:
boost::asio + std::future - Access violation after closing socket
boost::system::error_code _error_code;
client_socket_->shutdown(boost::asio::ip::tcp::socket::shutdown_both, _error_code);
Above code help me close sync read immediately.
And sync read wiil return with error code: boost::asio::error::eof
I wonder why your code socket.shutdown(boost::asio::ip::tcp::socket::shutdown_both, ec); did not work.
Maybe you should try again.
The error is due to the call to socket.close() before the call to socket.shutdown(). If you close a socket while there is a pending synchronous read(), you will occasionally get that error. It is really due to an expected data race in the underlying asio socket code.
Try removing the socket.close() call. Assuming your socket is wrapped in some kind of shared_ptr, you can let the socket destructor close the underlying socket.
You will still want to call socket.cancel() and socket.shutdown() explicitly in your use case in order to cancel outstanding operations.
In IOCP, when starting an IO operation such as WSARecv(), a completion packet will be sent to the completion port when the IO operation completes.
What I want to know is what IO operations causes completion packets to be sent to the completion port when using sockets, for example, I know that WSASend(), WSARecv(), AcceptEx(), and PostQueuedCompletionStatus() causes completion packets to be sent. Is there other IO operations that does that?
A completion will be queued to the IOCP associated with a socket only if an API call that can generate completions is called in a way that requests a completion to be queued. So you will know which API calls can generate completions by the fact that you've read the documentation and you're passing an OVERLAPPED structure to them.
Thus you don't really need to know the answer to your question as you will never get a completion that you do not expect to get as you have to have called an appropriate API with appropriate parameters for a completion to be generated.
You can then differentiate between the API that caused the completion to be generated by adding some form of identifying "per operation data" to the OVERLAPPED either by making an 'extended overlapped stucture' or by using the event handle as opaque data. Either way you get a chance to send some context from the API call site to the IOCP completion handling site. This context is of your own design and can tell you what initiated the completion.
Then you get to use the return value from the GetQueuedCompletionStatus() call to determine if the completion is a success or failure and you can then access the error code for failures using WSAGetLastError() (though see this answer for more detail on an additional hoop that you could jump through to get more accurate error codes).
This then lets you determine which of the events listed in EJP's answer you have.
The actual set of functions that can generate a completion for socket operations can change with changes in the OS. The easiest way to determine what these are for the operating system that you're targeting is to either read the MSDN docs or do a search of the SDK headers for lpOverlapped... As you'll see from the current VS2013 headers there are quite a few that relate to sockets; AcceptEx(), ConnectEx(), DisconnectEx(), TransmitFile(), the HTTP.sys API, the RIO API, etc.
You're missing the point. What causes completion packets to be sent is events, not API calls. There are basically only a few TCP events:
inbound connection
outbound connection complete
data
write finished
timeout
end of stream, and
error.
Copied from the site
Supported I/O Functions
The following functions can be used to start I/O operations that complete by using I/O completion ports. You must pass the function an instance of the OVERLAPPED structure and a file handle previously associated with an I/O completion port (by a call to CreateIoCompletionPort) to enable the I/O
completion port mechanism:
ConnectNamedPipe
DeviceIoControl
LockFileEx
ReadDirectoryChangesW
ReadFile
TransactNamedPipe
WaitCommEvent
WriteFile
WSASendMsg
WSASendTo
WSASend
WSARecvFrom
WSARecvMsg
WSARecv
In a larger server app I have one thread with a basic OpenSSL server using BIO in blocking mode because that seemed the simplest way. My code accepts a single type of request from a phone (Android or iOS, and I'm not writing that code) and returns a hex string wrapped in basic HTML (describing part of my server state). I've gone with SSL and a psuedo-HTTPS server because that makes things easier for the phone developer. If there's anything in the request that the server doesn't understand I return a 404. This all works.
The problem : When my server shuts down this thread doesn't exit because of the blocking BIO_do_accept call.
I have tried BIO_get_fd() and setsockopt() to put a timeout on the underlying socket but it still blocks. Somewhat worryingly SSL_state() stays at "before/accept initialization", but looping on that obviously won't work.
I assume other people have server code like this, and those servers can shut down gracefully. How do they do that? Is there some way for another thread to break that block and get the accept call to return with an error? Or do I have to drop the idea of blocking calls and grind through the apparently awful non-blocking version?
When my server shuts down this thread doesn't exit because of the blocking BIO_do_accept call.
To stop the blocking, close the associated socket. It will return immediately.
Perform the shutdown from your signal handler.
Don't do anything else in the signal handler with respect to OpenSSL because it is not async-signal safe. Let the main thread cleanup once your worker thread has returned. See, for example, libcrypto Thread Safety.
I would like to know if the following scenario is real?!
select() (RD) on non-blocking TCP socket says that the socket is ready
following recv() would return EWOULDBLOCK despite the call to select()
For recv() you would get EAGAIN rather than EWOULDBLOCK, and yes it is possible. Since you have just checked with select() then one of two things happened:
Something else (another thread) has drained the input buffer between select() and recv().
A receive timeout was set on the socket and it expired without data being received.
It's possible, but only in a situation where you have multiple threads/processes trying to read from the same socket.
On Linux it's even documented that this can happen, as I read it.
See this question:
Spurious readiness notification for Select System call
I am aware of an error in a popular desktop operating where O_NONBLOCK TCP sockets, particularly those running over the loopback interface, can sometimes return EAGAIN from recv() after select() reports the socket is ready for reading. In my case, this happens after the other side half-closes the sending stream.
For more details, see the source code for t_nx.ml in the NX library of my OCaml Network Application Environment distribution. (link)
Though my application is a single-threaded one, I noticed that the described behavior is not uncommon in RHEL5. Both with TCP and UDP sockets that were set to O_NONBLOCK (the only socket option that is set). select() reports that the socket is ready but the following recv() returns EAGAIN.
Yes, it's real. Here's one way it can happen:
A future modification to the TCP protocol adds the ability for one side to "revoke" information it sent provided it hasn't been received yet by the other side's application layer. This feature is negotiated on the connection. The other side sends you some data, you get a select hit. Before you can call recv, the other side "revokes" the data using this new extension. Your read gets a "would block" error because no data is available to be read.
The select function is a status-reporting function that does not come with future guarantees. Assuming that a hit on select now assures that a subsequent operation won't block is as invalid as using any other status-reporting function this way. It's as bad as using access to try to ensure a subsequent operation won't fail due to incorrect permissions or using statfs to try to ensure a subsequent write won't fail due to a full disk.
It is possible in a multithreaded environment where two threads are reading from the socket. Is this a multithreaded application?
If you do not call any other syscall between select() and recv() on this socket, then recv() will never return EAGAIN or EWOULDBLOCK.
I don't know what they mean with recv-timeout, however, the POSIX standard does not mention it here so you can be safe calling recv().