The client does some ssl::stream<tcp_socket>::async_read_some()/ssl::stream<tcp_socket>::async_write() calls and at some point needs to exit, i.e. it needs to shutdown the connection.
Calling ssl::stream<tcp_socket>::lowest_layer().close() works, but (as it is expected) the server (a openssl s_server -state ... command) reports an error on closing the connection.
Looking at the API the right way seems to be to call ssl::stream<tcp_socket>::async_shutdown().
Now there are basically 2 situation where a shutdown is needed:
1) Client is in the async_read_some() callback and reacts on a 'quit' command from the server. Calling from there async_shutdown() yields a 'short read' error in the shutdown callback.
This is surprising but after googling around this seems to be normal behaviour - one seem to have to check if it is a real error or not like this:
// const boost::system::error_code &ec
if (ec.category() == asio::error::get_ssl_category() &&
ec.value() == ERR_PACK(ERR_LIB_SSL, 0, SSL_R_SHORT_READ)) {
// -> not a real error, just a normal TLS shutdown
}
The TLS server seems to be happy, though - it reports:
DONE
shutting down SSL
CONNECTION CLOSED
2) A async_read_some() is active - but a user decides to exit the client (e.g. via a command from stdin). When calling async_shutdown() from that context following happens:
the async_read_some() callback is executed with a 'short read' error code - kind of expected now
the async_shutdown() callback is executed with a decryption failed or bad record mac error code - this is unexpected
The server side does not report an error.
Thus my question how to properly shutdown a TLS client with boost asio.
One way to resolve the 'decryption failed or bad record mac' error code from the 2nd context is:
a) from inside the stdin handler call:
ssl::stream<tcp_socket>::lowest_layer()::shutdown(tcp::socket::shutdown_receive)
b) this results in the async_read_some() callback getting executed with a 'short read' 'error' code
c) in that callback under that 'error' condition async_shutdown() is called:
// const boost::system::error_code &ec
if (ec.category() == asio::error::get_ssl_category() &&
ec.value() == ERR_PACK(ERR_LIB_SSL, 0, SSL_R_SHORT_READ)) {
// -> not a real error:
do_ssl_async_shutdown();
}
d) the async_shutdown() callback is executed with a 'short read' error code, from where we finally call:
ssl::stream::lowest_layer()::close()
These steps result in a connection shutdown without any weird error messages on the client or server side.
For example, when using openssl s_server -state ... as server it reports on sutdown:
SSL3 alert read:warning:close notify
DONE
shutting down SSL
CONNECTION CLOSED
ACCEPT
(the last line is because the command accepts new connections)
Alternative
Instead of lowest_layer()::shutdown(tcp::socket::shutdown_receive) we can also call
ssl::stream<tcp_socket>::lowest_layer()::cancel()
to initiate a proper shutdown. It has the same effect, i.e. it yields the execution of the scheduled async_read_some() callback (but with operation_aborted error code). Thus, one can call async_shutdown() from there:
if (ec.value() == asio::error::operation_aborted) {
cout << "(not really an error)\n";
do_async_ssl_shutdown();
}
Related
I have inherited two applications, one Test Harness (a client) running on a Windows 7 PC and one server application running on a Windows 10 PC. I am attempting to communicate between the two using TCP/IP sockets. The Client sends requests (for data in the form of XML) to the Server and the Server then sends the requested data (also XML) back to the client.
The set up is as shown below:
Client Server
-------------------- --------------------
| | Sends Requests | |
| Client Socket | -----------------> | Server Socket |
| | <----------------- | |
| | Sends Data | |
-------------------- --------------------
This process always works on an initial connection (i.e. freshly launched client and server applications). The client has the ability to disconnect from the server, which triggers cleanup of sockets. Upon reconnection, I almost always (it does not always happen, but does most of the time) receive the following error:
"Receive() - The socket is marked as nonblocking and the receive operation would block"
This error is displayed at the client and the socket in question is an asynchronous, non-blocking socket.
The line which causes this SOCKET_ERROR is:
numBytesReceived = theSocket->Receive(theReceiveBuffer, 10000));
where:
- numBytesReceived is an integer (int)
- theSocket is a pointer to a class called CClientSocket which is a specialisation of CASyncSocket, which is part of the MFC C++ Library. This defines the socket object which is embedded within the client. It is an asynchonous, non-blocking socket.
- Receive() is a virtual function within the CASyncSocket object
- theReceiveBuffer is a char array (10000 elements)
In executing the line descirbed above, SOCKET_ERROR is returned from the function and calling theSocket->GetLastError() returns WSAEWOULDBLOCK.
SocketTools highlights that
When a non-blocking (asynchronous) socket attempts to perform an operation that cannot be performed immediately, error 10035 will be returned. This error is not fatal, and should be considered advisory by the application. This error code corresponds to the Windows Sockets error WSAEWOULDBLOCK.
When reading data from a non-blocking socket, this error will be returned if there is no more data available to be read at that time. In this case, the application should wait for the OnRead event to fire which indicates that more data has become available to read. The IsReadable property can be used to determine if there is data that can be read from the socket.
When writing data to a non-blocking socket, this error will be returned if the local socket buffers are filled while waiting for the remote host to read some of the data. When buffer space becomes available, the OnWrite event will fire which indicates that more data can be written. The IsWritable property can be used to determine if data can be written to the socket.
It is important to note that the application will not know how much data can be sent in a single write operation, so it is possible that if the client attempts to send too much data too quickly, this error may be returned multiple times. If this error occurs frequently when sending data it may indicate high network latency or the inability for the remote host to read the data fast enough.
I am consistently getting this error and failing to receive anything on the socket.
Using Wireshark, the following communications occur with the source, destinaton and TCP Bit Flags presented here:
Event: Connect Test Harness to Server via TCP/IP
Client --> Server: SYN
Server --> Client: SYN, ACK
Client --> Server: ACK
This appears to be correct and represents the Three-Way Handshake of connecting.
SocketSniff confirms that a Socket is closed on the client side. It was not possible to get SocketSniff to work with the Windows 10 Server application.
Event: Send a Request for Data from the Test Harness
Client --> Server: PSH, ACK
Server --> Client: PSH, ACK
Client --> Server: ACK
Both request data and received data is confirmed to be exchanged successfully
Event: Disconnect Test Harness from Server
Client --> Server: FIN, ACK
Server --> Client: ACK
Server --> Client: FIN, ACK
Client --> Server: ACK
This appears to be correct and represents the Four-Way handshake of connection closure.
SocketSniff confirms that a Socket is closed on the client side. It was not possible to get SocketSniff to work with the Windows 10 Server application.
Event: Reconnect Test Harness to Server via TCP/IP
Client --> Server: SYN
Server --> Client: SYN, ACK
Client --> Server: ACK
This appears to be correct and represents the Three-Way Handshake of connecting.
SocketSniff confirms that a new Socket is opened on the client side. It was not possible to get SocketSniff to work with the Windows 10 Server application.
Event: Send a Request for Data from the Test Harness
Client --> Server: PSH, ACK
Server --> Client: ACK
We see no data being pushed (PSH) back to the client, yet we do see an acknowledgement.
Has anyone got any ideas what may be going on here? I understand it would be difficult for you to diagnose without seeing the source code, however I was hoping others may have had experience with this error and could point me down the specific route to investigate.
More Info:
The Server initialises a listening thread and binds to 0.0.0.0:49720. The 'WSAStartup()', 'bind()' and 'listen()' functions all return '0', indicating success. This thread persists throughout the lifetime of the server application.
The Server initialises two threads, a read and a write thread. The read thread is responsible for reading request data off its socket and is initialised as follows with a class called Connection:
HANDLE theConnectionReadThread
= CreateThread(NULL, // Security Attributes
0, // Default Stacksize
Connection::connectionReadThreadHandler, // Callback
(LPVOID)this, // Parameter to pass to thread
CREATE_SUSPENDED, // Don't start yet
NULL); // Don't Save Thread ID
The write thread is initialised in a similar way.
In each case, the CreateThread() function returns a suitable HANDLE, e.g.
theConnectionReadThread = 00000570
theConnectionWriteThread = 00000574
The threads actually get started within the following function:
void Connection::startThreads()
{
ResumeThread(theConnectionReadThread);
ResumeThread(theConnectionWriteThread);
}
And this function is called from within another class called ConnectionManager which manages all the possible connections to the server. In this case, I am only concerned with a single connection, for simplicity.
Adding text output to the server application reveals that I can successfully connect/disconnect the client and server several times before the faulty behaviour is observed. For example, Within the connectionReadThreadHandler() and connectionWriteThreadHandler() functions, I am outputing text to a log file as soon as they execute.
When correct behaviour is observed, the following lines are output to the log file:
Connection::ResumeThread(theConnectionReadThread) returned 1
Connection::ResumeThread(theConnectionWriteThread) returned 1
ConnectionReadThreadHandler() Beginning
ConnectionWriteThreadHandler() Beginning
When faulty behaviour is observed, the following lines are output to the log file:
Connection::ResumeThread(theConnectionReadThread) returned 1
Connection::ResumeThread(theConnectionWriteThread) returned 1
The callback functions do not appear to being invoked.
It is at this point that the error is displayed on the client indicating that:
"Receive() - The socket is marked as nonblocking and the receive operation would block"
On the Client side, I've got a class called CClientDoc, which contains the client side socket code. It first initialises theSocket which is the socket object which is embedded within a client:
private:
CClientSocket* theSocket = new CClientSocket;
When a connection is initialised between client and server, this class calls a function called CreateSocket() part of which is included below, along with ancillary functions which it calls:
void CClientDoc::CreateSocket()
{
AfxSocketInit();
int lastError;
theSocket->Init(this);
if (theSocket->Create()) // Calls CAyncSocket::Create() (part of afxsock.h)
{
theErrorMessage = "Socket Creation Successful"; // this is a CString
theSocket->SetSocketStatus(WAITING);
}
else
{
// We don't fall in here
}
}
void CClientDoc::Init(CClientDoc* pDoc)
{
pClient = pDoc; // pClient is a pointer to a CClientDoc
}
void CClientDoc::SetSocketStatus(SOCKET_STATUS sock_stat)
{
theSocketStatus = sock_stat; // theSocketStatus is a private member of CClientSocket of type SOCKET_STATUS
}
Immediately after CreateSocket(), SetupSocket() is called which is also provided here:
void CClientDoc::SetupSocket()
{
theSocket->AsyncSelect(); // Function within afxsock.h
}
Upon disconnection of the client from the server,
void CClientDoc::OnClienDisconnect()
{
theSocket->ShutDown(2); // Inline function within afxsock.inl
delete theSocket;
theSocket = new CClientSocket;
CreateSocket();
SetupSocket();
}
So we delete the current socket and then create a new one, ready for use, which appears to work as expected.
The error is being written on the Client within the DoReceive() function. This function calls the socket to attempt to read in a message.
CClientDoc::DoReceive()
{
int lastError;
switch (numBytesReceived = theSocket->Receive(theReceiveBuffer, 10000))
{
case 0:
// We don't fall in here
break;
case SOCKET_ERROR: // We come in here when the faulty behaviour occurs
if (lastError = theSocket->GetLastError() == WSAEWOULDBLOCK)
{
theErrorMessage = "Receive() - The socket is marked as nonblocking and the receive operation would block";
}
else
{
// We don't fall in here
}
break;
default:
// When connection works, we come in here
break;
}
}
Hopefully the addition of some of the code proves insightful. I should be able to add a bit more if needed.
Thanks
The WSAEWOULDBLOCK error DOES NOT mean the socket is marked as blocking. It means the socket is marked as non-blocking and there is NO DATA TO READ at that time.
WSAEWOULDBLOCK means the socket WOULD HAVE blocked the calling thread waiting for data if the socket HAD BEEN marked as blocking.
To know when a non-blocking socket has data waiting to be read, use Winsock's select() function, or the CClientSocket::AsyncSelect() method to request FD_READ notifications, or other equivalent. Don't try to read until there is something to read.
In your analysis, you see the client sending data to the server, but the server is not sending data to the client. So you clearly have a logic bug in your code somewhere, you need to find and fix it. Either the client is not terminating its request correctly, or the server is not receiving/processing/replying to it correctly. But since you did not show your actual code, we can't tell you what is actually wrong with it.
I'm trying to implement OpenSSL into my application which uses raw C sockets and the only issue I'm having is the SSL_accept / SSL_connect part of the code which starts the KeyExchange phase but does not seem to complete it on the serverside.
I've had a look at countless websites and Q&A's here on StackOverflow to get myself through the OpenSSL API since this is basically the first time I'm attempting to implement SSL into an application but the only thing I could not find yet was how to properly manage failed handshakes.
Basically, running process A which serves as a server will listen for incoming connections. Once I run process B, which acts as a client, it will successfully connect to process A but SSL_accept (on the server) fails with error code -2 SSL_ERROR_WANT_READ.
According to openssl handshake failed, the problem is "easily" worked around by calling SSL_accept within a loop until it finally returns 1 (It successfully connects and completes the handshake). However, I do not believe that this is the proper way of doing things as it looks like a dirty trick. The reason for why I believe it is a dirty trick is because I tried to run a small application I found on https://www.cs.utah.edu/~swalton/listings/articles/ (ssl_client and ssl_server) and magically, everything works just fine. There are no multiple calls to SSL_accept and the handshake is completed right away.
Here's some code where I'm accepting the SSL connection on the server:
if (SSL_accept(conn.ssl) == -1)
{
fprintf(stderr, "Connection failed.\n");
fprintf(stderr, "SSL State: %s [%d]\n", SSL_state_string_long(conn.ssl), SSL_state(conn.ssl));
ERR_print_errors_fp(stderr);
PrintSSLError(conn.ssl, -1, "SSL_accept");
return -1;
}
else
{
fprintf(stderr, "Connection accepted.\n");
fprintf(stderr, "Server -> Client handshake completed");
}
This is the output of PrintSSLError:
SSL State: SSLv3 read client hello B [8465]
[DEBUG] SSL_accept : Failed with return -1
[DEBUG] SSL_get_error() returned : 2
[DEBUG] Error string : error:00000002:lib(0):func(0):system lib
[DEBUG] ERR_get_error() returned : 0
[DEBUG] errno returned : Resource temporarily unavailable
And here's the client side snippet which connects to the server:
if (SSL_connect(conn.ssl) == -1)
{
fprintf(stderr, "Connection failed.\n");
ERR_print_errors_fp(stderr);
PrintSSLError(conn.ssl, -1, "SSL_connect");
return -1;
}
else
{
fprintf(stderr, "Connection established.\n");
fprintf(stderr, "Client -> Server handshake completed");
PrintSSLInfo(conn.ssl);
}
The connection is successfully enstablished client-side (SSL_connect does not return -1) and PrintSSLInfo outputs:
Connection established.
Cipher: DHE-RSA-AES256-GCM-SHA384
SSL State: SSL negotiation finished successfully [3]
And this is how I wrap the C Socket into SSL:
SSLConnection conn;
conn.fd = fd;
conn.ctx = sslContext;
conn.ssl = SSL_new(conn.ctx);
SSL_set_fd(conn.ssl, conn.fd);
The code snippet here resides within a function that takes a file-descriptor of the accepted incoming connection on the raw socket and the SSL Context to use.
To initialize the SSL Contexts I use TLSv1_2_server_method() and TLSv1_2_client_method(). Yes, I know that this will prevent clients from connecting if they do not support TLS 1.2 but this is exactly what I want. Whoever connects to my application will have to do it through my client anyway.
Either way, what am I doing wrong? I'd like to avoid loops in the authentication phase to avoid possible hang ups/slow downs of the application due to unexpected infinite loops since OpenSSL does not specify how many attempts it might take.
The workaround that worked, but that I'd like to avoid, is this:
while ((accept = SSL_accept(conn.ssl)) != 1)
And inside the while loop I check for the return code stored inside accept.
Things I've tried to workaround the SSL_ERROR_WANT_READ error:
Added usleep(50) inside the while loop (still takes several cycles to complete)
Added SSL_do_handshake(conn.ssl) after SSL_connect and SSL_accept (didn't change anything on the end-result)
Had a look at the code shown on roxlu.com (search on Google for "Using OpenSSL with memory BIOs - Roxlu") to guide me through the handshaking phase but since I'm new to this, and I don't directly use BIOs in my code but simply wrap my native C sockets into SSL, it was kind of confusing. I'm also unable to re-write the Networking part of the application as it'd would be too much work for me right now.
I've done some tests with the openssl command-line as well to troubleshoot the issue but it gives no error. The handshake appears to be successful as no errors such as:
24069864:error:1409E0E5:SSL routines:ssl3_write_bytes:ssl handshake failure:s3_pkt.c:656
appear. Here's the whole output of the command
openssl s_client -connect IP:Port -tls1_2 -prexit -msg
http://pastebin.com/9u1bfuf4
Things to note:
1. I'm using the latest OpenSSL version 1.0.2h
2. Application runs on a Unix system
3. Using self-signed certificates to encrypt the network traffic
Thanks everyone who's going to help me out.
Edit:
I forgot to mention that the sockets are in non-blocking mode since the application serves multiple clients in one-go. Though, client-side they are in blocking mode.
Edit2:
Leaving this here for future reference: jmarshall.com/stuff/handling-nbio-errors-in-openssl.html
You have clarified that the socket question is non-blocking.
Well, that's your answer. Obviously, when the socket is in a non-blocking mode, the handshake cannot be immediately completed. The handshake involves an exchange of protocol packets between the client and the server, with each one having to wait to receive the response from its peer. This works fine when the socket is in its default blocking mode. The library simply read()s and write()s, which blocks and waits until the message gets succesfully read or written. This obviously can't happen when the socket is in the non-blocking mode. Either the read() or write() immediately succeeds, or fails, if there's nothing to read or if the socket's output buffer is full.
The manual pages for SSL_accept() and SSL-connect() explain the procedure you must implement to execute the SSL handshake when the underlying socket is in a non-blocking mode. Rather than repeating the whole thing here, you should read the manual pages yourself. The capsule summary is to use SSL_get_error() to determine if the handshake actually failed, or if the library wants to read or write to/from the socket; and in that eventuality call poll() or select(), accordingly, then call SSL_accept() and SSL_connect() again.
Any other approach, like sprinkling silly sleep() calls, here and there, will result in an unreliable house of cards, that will fail randomly.
In my C++ application I use OpenSSL to connect to a server using nonblocking BIO. I am developing for mac OS X and iOS.
The first call to SSL_shutdown() returns 0. Which means I have to call SSL_shutdown() again:
The following return values can occur:
0 The shutdown is not yet finished. Call SSL_shutdown() for a second time, if a bidirectional shutdown shall be performed. The output of SSL_get_error may be misleading, as an erroneous SSL_ERROR_SYSCALL may be flagged even though no error occurred.
<0
The shutdown was not successful because a fatal error occurred either at the protocol level or a connection failure occurred. It can also occur if action is need to continue the operation for non-blocking BIOs. Call SSL_get_error with the return value ret to find out the reason.
https://www.openssl.org/docs/ssl/SSL_shutdown.html
So far so god. The problem occurs on the second call to SSL_shutdown(). This returns -1 which means an error has occurred (see above). Now if I check with SSL_get_error() I get error SSL_ERROR_SYSCALL which in turn is supposed to mean a system error has occurred. But now the catch. If I check the errno it returns 0 -> unknown error. What I have read so far about the issue is, that it could mean that the server did just "hang up", but to be honest this does not satisfy me.
Here is my implementation of the shutdown:
int result = 0;
int shutdownResult;
while ((shutdownResult = SSL_shutdown(sslHandle)) != 1) { //close connection 1 means everything is shut down ok
if (shutdownResult == 0) { //we are supposed to call shutdown again
continue;
} else if (SSL_get_error(sslHandle, shutdownResult) == SSL_ERROR_WANT_READ) {
[...] //omitted want read code, in this case the application never reaches this point
} else if (SSL_get_error(sslHandle, shutdownResult) == SSL_ERROR_WANT_WRITE) {
[...] //omitted want write code, in this case the application never reaches this point
} else {
logError("Error in ssl shutdown, ssl error: " + std::to_string(SSL_get_error(sslHandle, shutdownResult)) + ", system error: " + std::string(strerror(errno))); //something went wrong
break;
}
}
When run the application logs:
ERROR:: Error in ssl shutdown, ssl error: 5, system error: Undefined error: 0
So is here just the server shutting down the connection or is there a more critical issue? Am I just missing something really obvious?
A full SSL shutdown consists of two parts:
sending the 'close notify' alert to the peer
receiving the 'close notify' alert from the peer
The first SSL_shutdown returned 0 which means that it did send the 'close notify' to the peer but did not receive anything back yet. The second call of SSL_shutdown fails because the peer did not do a proper SSL shutdown and send a 'close notify' back, but instead just closed the underlying TCP connection.
This behavior is actually very common and you can usually just ignore the error. It does not matter much if the underlying TCP connection should be closed anyway. But a proper SSL shutdown is usually needed when you want to continue in plain text on the same TCP connection, like needed for the CCC command in FTPS connections (but even there various implementation fail to handle this case properly).
After developing a sample client server application which can exchange some data, I'm trying to implement the retry mechanism into it. Currently my application is following below protocol:
Client connects to server (non blocking mode) with 3 secs timeout and with 2 reties.
Start sending data from client with fixed length. Send has some error checking whether it is sending the complete data or not.
Receive response (timeout: 3secs) from server and verify that. If incorrect response received, re-send the data and wait for response. Repeat this for two times if failed.
For the above implementation code sections look likes something below:
connect() and select() for opening connection
select() and send() for data send
select() and recv() for data receiving
Now I'm making the retries based on return types of the socket functions, and if send() or recv() fails I'm retring the same methods. But not recalling connect().
I tested the thing by restarting the server in between the data transfer, and as a result client fails to communicate with the server and it quits after several retries, I believe this is happening as because there is no connect() call on retry methods.
Any suggestions?
Example code for receiving socket data
bool CTCPCommunication::ReceiveSocketData(char* pchBuff, int iBuffLen)
{
bool bReturn = true;
//check whether the socket is ready to receive
fd_set stRead;
FD_ZERO(&stRead);
FD_SET(m_hSocket, &stRead);
int iRet = select(0, &stRead, NULL, NULL, &m_stTimeout);
//if socket is not ready this line will be hit after 3 sec timeout and go to the end
//if it is ready control will go inside the read loop and reads data until data ends or
//socket error is getting triggered continuously for more than 3 secs.
if ((iRet > 0) && (FD_ISSET(m_hSocket, &stRead)))
{
DWORD dwStartTime = GetTickCount();
DWORD dwCurrentTime = 0;
while ((iBuffLen-1) > 0)
{
int iRcvLen = recv(m_hSocket, pchBuff, iBuffLen-1, 0);
dwCurrentTime = GetTickCount();
//receive failed due to socket error
if (iRcvLen == SOCKET_ERROR)
{
if((dwCurrentTime - dwStartTime) >= SOCK_TIMEOUT_SECONDS * 1000)
{
WRITELOG("Call to socket API 'recv' failed after 3 secs continuous retries, error: %d", WSAGetLastError());
bReturn = false;
break;
}
}
//connection closed by remote host
else if (iRcvLen == 0)
{
WRITELOG("recv() returned zero - time to do something: %d", WSAGetLastError());
break;
}
pchBuff += iRcvLen;
iBuffLen -= iRcvLen;
}
}
else
{
WRITELOG("Call to API 'select' failed inside 'ReceiveSocketData', error: %d", WSAGetLastError());
bReturn = false;
}
return bReturn;
}
Currently my application is following below protocol:
Client connects to server (non blocking mode) with 3 secs timeout and with 2 retries.
You can't retry a connection. You have to close the socket whose connect attempt failed, create a new socket, and call connect() again.
Start sending data from client with fixed length. Send has some error checking whether it is sending the complete data or not.
This isn't necessary in blocking mode: the POSIX standard guarantees that a blocking-mode send() will send all the data, or fail with an error.
Receive response (timeout: 3secs) from server and verify that. If incorrect response received, re-send the data and wait for response. Repeat this for two times if failed.
This is a bad idea. Most probably all the data willl arrive including all the retries, or none of it. You need to make sure that your transactions are idempotent if you use this technique. You also need to pay close attention to the actual timeout period. 3 seconds is not adequate in general. A starting point is double the expected service time.
For the above implementation code sections look likes something below:
connect() and select() for opening connection
select() and send() for data send
select() and recv() for data receiving
You don't need the select() in blocking mode. You can just set a read timeout with SO_RCVTIMEO.
Now I'm making the retries based on return types of the socket functions, and if send() or recv() fails I'm retrying the same methods. But not recalling connect().
I tested the thing by restarting the server in between the data transfer, and as a result client fails to communicate with the server and it quits after several retries, I believe this is happening as because there is no connect() call on retry methods.
If that was true you would get an error that said so.
Single-threaded application.
It happens not every time, only after 1.5 hours of high load.
tcp::socket::async_connect
tcp::socket::close (by deadline_timer)
async_connect_handler gives success error_code (one of a million times), but socket is closed by(2). 99.999% of time it gives errno=125 (ECANCELED).
Is it possible that socket implementation or boost asio somehow do this:
async_connect
async success posted to io_service
close by timer
async success handled by me, not affected by close
Right now solved by saving state in my variables, ignoring accept success.
Linux 2.6 (fedora).
Boost 1.46.0
PS: ofcouse possible bug on my part... But runs smoothly for days if not this.
As Igor mentions in the comments, the completion handler is already queued.
This scenario is the result of a separation in time between when an operation executes and when a handler is invoked. The documentation for io_service::run(), io_service::run_one(), io_service::poll(), and io_service::poll_one() is specific to mention handlers, and not operations. In the scenario, the socket::async_connect() operation and deadline_timer::async_wait() operation complete in the same event loop iteration. This results in both handlers being added to the io_service for deferred invocation, in an unspecified order.
Consider the following snippet that accentuates the scenario:
void handle_wait(const boost::system::error_code& error)
{
if (error) return;
socket_.close();
}
timer_.expires_from_now(boost::posix_time::seconds(30));
timer_.async_wait(&handle_wait);
socket_.async_connect(endpoint_, handle_connect);
boost::this_thread::sleep(boost::posix_time::seconds(60));
io_service_.run_one();
When io_service_.run_one() is invoked, both socket::async_connect() and deadline_timer::async_wait() operations may have completed, causing handle_wait and handle_connect to be ready for invocation from within the io_service in an unspecified order. To properly handle this unspecified order, additional logic need to occur from within handle_wait() and handle_connect() to query the current state, and determine if the other handler has been invoked, rather than depending solely on the status (error_code) of the operation.
The easiest way to determine if the other handler has invoked is:
In handle_connect(), check if the socket is still open via is_open(). If the socket is still open, then handle_timer() has not been invoked. A clean way to indicate to handle_timer() that handle_connect() has ran is to update the expiry time.
In handle_timer(), check if the expiry time has passed. If this is true, then handle_connect() has not ran, so close the socket.
The resulting handlers could look like the following:
void handle_wait(const boost::system::error_code& error)
{
// On error, return early.
if (error) return;
// If the timer expires in the future, then connect handler must have
// first.
if (timer_.expires_at() > deadline_timer::traits_type::now()) return;
// Timeout has occurred, so close the socket.
socket_.close();
}
void handle_connect(const boost::system::error_code& error)
{
// The async_connect() function automatically opens the socket at the start
// of the asynchronous operation. If the socket is closed at this time then
// the timeout handler must have run first.
if (!socket_.is_open()) return;
// On error, return early.
if (error) return;
// Otherwise, a connection has been established. Update the timer state
// so that the timeout handler does not close the socket.
timer_.expires_at(boost::posix_time::pos_infin);
}
Boost.Asio provides some examples for handling timeouts.
I accept twsansbury's answer, just want to add some more info.
About shutdown():
void async_recv_handler( boost::system::error_code ec_recv, std::size_t count )
{
if ( !m_socket.is_open() )
return; // first time don't trust to ec_recv
if ( ec_recv )
{
// oops, we have error
// log
// close
return;
}
// seems that we are just fine, no error in ec_recv, we can gracefully shutdown the connection
// but shutdown may fail! this check is working for me
boost::system::error_code ec_shutdown;
// second time don't trusting to ec_recv
m_socket.shutdown( t, ec_shutdown );
if ( !ec_shutdown )
return;
// this error code is expected
if ( ec_shutdown == boost::asio::error::not_connected )
return;
// other error codes are unexpected for me
// log << ec_shutdown.message()
throw boost::system::system_error(ec_shutdown);
}