Set timeout for boost socket.connect - c++

I am using boost::asio::connect on a tcp::socket. When all goes fine, the connect returns immediately but on a poor network, the connect times out after a log wait of 15 seconds. I cannot afford to wait that long and so want to reduce the timeout. Unfortunately I have not come across any solution so far.
I see solutions where async_wait is been used together with deadline_timer but all those examples are for receive / send operations and not for connect.
Can anyone help me with a sample code for boost::asio::connect(socket, endpoints);. Requirement is that it should timeout in 5 seconds instead of 15.

Have you take a look to the following example? It contains a sample code an async_connect with timeout.
The connect with timeout method could be implemented using the following code:
void connect(const std::string& host, const std::string& service,
boost::posix_time::time_duration timeout) {
// Resolve the host name and service to a list of endpoints.
tcp::resolver::query query(host, service);
tcp::resolver::iterator iter = tcp::resolver(io_service_).resolve(query);
// Set a deadline for the asynchronous operation. As a host name may
// resolve to multiple endpoints, this function uses the composed operation
// async_connect. The deadline applies to the entire operation, rather than
// individual connection attempts.
deadline_.expires_from_now(timeout);
// Set up the variable that receives the result of the asynchronous
// operation. The error code is set to would_block to signal that the
// operation is incomplete. Asio guarantees that its asynchronous
// operations will never fail with would_block, so any other value in
// ec indicates completion.
boost::system::error_code ec = boost::asio::error::would_block;
// Start the asynchronous operation itself. The boost::lambda function
// object is used as a callback and will update the ec variable when the
// operation completes. The blocking_udp_client.cpp example shows how you
// can use boost::bind rather than boost::lambda.
boost::asio::async_connect(socket_, iter, var(ec) = _1);
// Block until the asynchronous operation has completed.
do io_service_.run_one(); while (ec == boost::asio::error::would_block);
// Determine whether a connection was successfully established. The
// deadline actor may have had a chance to run and close our socket, even
// though the connect operation notionally succeeded. Therefore we must
// check whether the socket is still open before deciding if we succeeded
// or failed.
if (ec || !socket_.is_open())
throw boost::system::system_error(
ec ? ec : boost::asio::error::operation_aborted);
}

Related

Understand the usage of timeout in beast::tcp_stream?

Reference:
https://www.boost.org/doc/libs/1_78_0/libs/beast/example/websocket/client/async/websocket_client_async.cpp
https://www.boost.org/doc/libs/1_78_0/libs/beast/doc/html/beast/using_io/timeouts.html
https://www.boost.org/doc/libs/1_78_0/libs/beast/doc/html/beast/ref/boost__beast__tcp_stream.html
void on_resolve(beast::error_code ec, tcp::resolver::results_type results)
{
if(ec) return fail(ec, "resolve");
// Set the timeout for the operation
beast::get_lowest_layer(ws_).expires_after(std::chrono::seconds(30));
// Make the connection on the IP address we get from a lookup
beast::get_lowest_layer(ws_).async_connect(
results, beast::bind_front_handler(
&session::on_connect, shared_from_this()));
}
void on_connect(beast::error_code ec, tcp::resolver::results_type::endpoint_type ep)
{
if(ec) return fail(ec, "connect");
// Turn off the timeout on the tcp_stream, because
// the websocket stream has its own timeout system.
// beast::get_lowest_layer(ws_).expires_never(); // Note: do NOT call this line for this question!!!
...
host_ += ':' + std::to_string(ep.port());
// Perform the websocket handshake
ws_.async_handshake(host_, "/",
beast::bind_front_handler(&session::on_handshake, shared_from_this()));
}
Question 1>
Will the timeout of beast::tcp_stream continue to work after a previous asynchronous operation finishes on time?
For example,
In above example, the timeout will expire after 30 seconds. If async_connect doesn't finish within 30 seconds, session::on_connect will receive an error::timeout as the value of ec. Let's assume the async_connect takes 10 seconds,
can I assume that async_handshake needs to finish within 20(i.e. 30-10) seconds otherwise a error::timeout will be sent to session::on_handshake? I infer to this idea based on the comments within on_connect function(i.e.
Turn off the timeout on the tcp_stream
). In other words, a timeout will only be turned off after it finishes the specified expiration period or is disabled by expires_never. Is my understanding correct?
Question 2> Also I want to know what a good pattern I should use for timeout in both async_calling and async_callback functions.
When we call an async_calling operation:
void func_async_calling()
{
// set some timeout here(i.e. XXXX seconds)
Step 1> beast::get_lowest_layer(ws_).expires_after(std::chrono::seconds(XXXX));
Step 2> ws_.async_operation(..., func_async_callback, )
Step 3> beast::get_lowest_layer(ws_).expires_never();
}
When we define a async_callback handle for an asynchronous operation:
void func_async_callback()
{
Step 1>Either call
// Disable the timeout for the next logical operation.
beast::get_lowest_layer(ws_).expires_never();
or
// Enable a new timeout
beast::get_lowest_layer(ws_).expires_after(std::chrono::seconds(YYYY));
Step 2> call another asynchronous function
Step 3> beast::get_lowest_layer(ws_).expires_never();
}
Does this make sense?
Thank you
Question 1
Yes that's correct. The linked page has the confirmation:
// The timer is still running. If we don't want the next
// operation to time out 30 seconds relative to the previous
// call to `expires_after`, we need to turn it off before
// starting another asynchronous operation.
stream.expires_never();
Question 2
That looks fine. The only subtleties I can think of are
often, because of Thread Safety often the initiation as well as the completion happen on the same (implicit) strand.
If that's the case, then in your completion handler example, the expires_never(); would be redundant.
If the completion handler is not on the same strand, you want to actively avoid touching the expiry, because that would be a data race
An alternative pattern is to set the expiry only once for a lengthier episode (e.g. an multi-message conversation between client/server). Obviously in this pattern, nobody would touch the expiry after initial setting. This seems pretty obvious, but I thought I'd mention it before someone casts this pattern in stone to never think about it again.
Always do what you need, prefer simple code. I think your basic understanding of the feature is right. (No wonder, this documentation is a piece of art).

How to recover from network interruption using boost::asio

I am writing a server that accepts data from a device and processes it. Everything works fine unless there is an interruption in the network (i.e., if I unplug the Ethernet cable, then reconnect it). I'm using read_until() because the protocol that the device uses terminates the packet with a specific sequence of bytes. When the data stream is interrupted, read_until() blocks, as expected. However when the stream starts up again, it remains blocked. If I look at the data stream with Wireshark, the device continues transmitting and each packet is being ACK'ed by the network stack. But if I look at bytes_readable it is always 0. How can I detect the interruption and how to re-establish a connection to the data stream? Below is a code snippet and thanks in advance for any help you can offer. [Go easy on me, this is my first Stack Overflow question....and yes I did try to search for an answer.]
using boost::asio::ip::tcp;
boost::asio::io_service IOservice;
tcp::acceptor acceptor(IOservice, tcp::endpoint(tcp::v4(), listenPort));
tcp::socket socket(IOservice);
acceptor.accept(socket);
for (;;)
{
len = boost::asio::read_until(socket, sbuf, end);
// Process sbuf
// etc.
}
Remember, the client initiates a connection, so the only thing you need to achieve is to re-create the socket and start accepting again. I will keep the format of your snippet but I hope your real code is properly encapsulated.
using SocketType = boost::asio::ip::tcp::socket;
std::unique_ptr<SocketType> CreateSocketAndAccept(
boost::asio::io_service& io_service,
boost::asio::ip::tcp::acceptor& acceptor) {
auto socket = std::make_unique<boost::asio::ip::tcp::socket>(io_service);
boost::system::error_code ec;
acceptor.accept(*socket.get(), ec);
if (ec) {
//TODO: Add handler.
}
return socket;
}
...
auto socket = CreateSocketAndAccept(IOservice, acceptor);
for (;;) {
boost::system::error_code ec;
auto len = boost::asio::read_until(*socket.get(), sbuf, end, ec);
if (ec) // you could be more picky here of course,
// e.g. check against connection_reset, connection_aborted
socket = CreateSocketAndAccept(IOservice, acceptor);
...
}
Footnote: Should go without saying, socket needs to stay in scope.
Edit: Based on the comments bellow.
The listening socket itself does not know whether a client is silent or whether it got cut off. All operations, especially synchronous, should impose a time limit on completion. Consider setting SO_RCVTIMEO or SO_KEEPALIVE (per socket, or system wide, for more info How to use SO_KEEPALIVE option properly to detect that the client at the other end is down?).
Another option is to go async and implement a full fledged "shared" socket server (BOOST example page is a great start).
Either way, you might run into data consistency issues and be forced to deal with it, e.g. when the client detects an interrupted connection, it would resend the data. (or something more complex using higher level protocols)
If you want to stay synchronous, the way I've seen things handled is to destroy the socket when you detect an interruption. The blocking call should throw an exception that you can catch and then start accepting connections again.
for (;;)
{
try {
len = boost::asio::read_until(socket, sbuf, end);
// Process sbuf
// etc.
}
catch (const boost::system::system_error& e) {
// clean up. Start accepting new connections.
}
}
As Tom mentions in his answer, there is no difference between inactivity and ungraceful disconnection so you need an external mechanism to detect this.
If you're expecting continuous data transfer, maybe a timeout per connection on the server side is enough. A simple ping could also work. After accepting a connection, ping your client every X seconds and declare the connection dead if he doesn't answer.

Is it expected for poll() to take 40ms to return even though data will be available sooner?

I created a proxy server to handle CQL orders from website clients. The proxy listens for incoming connections and each connection is given a thread. The thread loops as long as the socket exists and dies on HUP. You may also stop the proxy, which will stop the threads by sending an event (See eventfd()) to each thread.
By itself, this already allows me to save a good 100ms because the proxy is local and connecting to a local service is much faster than a service on a remote computer... (even if the computer is local.)
However, I send orders and once in a while the proxy sees no incoming data (i.e. it calls read() on the socket which is setup as NONBLOCK and gets -1 in return and errno == EAGAIN.) When that happens, I call poll() to wait for additional data, the HUP, or a hit on the eventfd meaning I have to quit (i.e. 2 fds, the socket and the eventfd).
Somehow, more often than not, when I hit the poll() function call, it adds an extra 40ms to the time it takes for a message to go round trip. Although one would think this only happens on larger messages, it happens when I receive an order, which is less than 100 bytes! So the size should not be the culprit. I also changed the code to make sure I send the entire order from the client to the proxy in one write() and to avoid the poll() if at all possible (i.e. I call read() first, and poll() only if nothing is available.)
Note that I have no timeout in this case because there is nothing to check other than the incoming orders and the eventfd. So I would imagine that the timeout won't be a problem.
The code base is really big. But the client/server comes down to something like this (the sizes in original are fully dynamic):
// Client
...
connect(socket);
...
write(socket, order, sizeof(order));
read(socket, result, sizeof(result));
// repeat for other orders, as required by client...
// server
...
socket = accept(); // happens for each client
...
pthread_create(runner);
...
// server thread (runner)
...
for(;;)
{
int r(0);
for(;;)
{
r += read(socket, order, sizeof(order));
if(r >= sizeof(order))
{
break;
}
// wait for more data is not enough received yet
poll(..."socket" + "eventfd"...); // <-- this will often take 40ms
if(eventfd_happened)
{
// quit thread
return;
}
}
...
[work on order]
...
write(socket, result, sizeof(result));
}
Note 1: I see the problem when I have a single client. So having multiple clients does not in itself cause the problem either.
Note 2: The client really uses BIO_connect(), BIO_read() and BIO_write() [from OpenSSL], but I doubt that would be a problem. I do not use any kind of encryption.
I don't see why you're using non-blocking I/O given you have a dedicated thread per socket. Just block in read(). Use SO_RCVTIMEO if you need an overall read timeout.

SO_RCVTIME and SO_RCVTIMEO not affecting Boost.Asio operations

Below is my code
boost::asio::io_service io;
boost::asio::ip::tcp::acceptor::reuse_address option(true);
boost::asio::ip::tcp::acceptor accept(io);
boost::asio::ip::tcp::resolver resolver(io);
boost::asio::ip::tcp::resolver::query query("0.0.0.0", "8080");
boost::asio::ip::tcp::endpoint endpoint = *resolver.resolve(query);
accept.open(endpoint.protocol());
accept.set_option(option);
accept.bind(endpoint);
accept.listen(30);
boost::asio::ip::tcp::socket ps(io);
accept.accept(ps);
struct timeval tv;
tv.tv_sec = 1;
tv.tv_usec = 0;
//setsockopt(ps.native(), SOL_SOCKET, SO_SNDTIMEO, &tv, sizeof(tv));
setsockopt(ps.native(), SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));
char buf[1024];
ps.async_receive(boost::asio::buffer(buf, 1024), boost::bind(fun));
io.run();
When I use Telnet to connect, but not sending data, it does not disconnect from a Telnet timeout.Will need to do to make setsockopt kick in?
Thanks!
I have modified SO_RCVTIMEO to SO_SNDTIMEO. Still unable to timeout in the specified time
Using SO_RCVTIMEO and SO_SNDTIMEO socket options with Boost.Asio will rarely produce the desired behavior. Consider using either of the following two patterns:
Composed Operation With async_wait()
One can compose an asynchronous read operation with timeout by using a Boost.Asio timer and an async_wait() operation with a async_receive() operation. This approach is demonstrated in the Boost.Asio timeout examples, something similar to:
// Start a timeout for the read.
boost::asio::deadline_timer timer(io_service);
timer.expires_from_now(boost::posix_time::seconds(1));
timer.async_wait(
[&socket, &timer](const boost::system::error_code& error)
{
// On error, such as cancellation, return early.
if (error) return;
// Timer has expired, but the read operation's completion handler
// may have already ran, setting expiration to be in the future.
if (timer.expires_at() > boost::asio::deadline_timer::traits_type::now())
{
return;
}
// The read operation's completion handler has not ran.
boost::system::error_code ignored_ec;
socket.close(ignored_ec);
});
// Start the read operation.
socket.async_receive(buffer,
[&socket, &timer](const boost::system::error_code& error,
std::size_t bytes_transferred)
{
// Update timeout state to indicate the handler has ran. This
// will cancel any pending timeouts.
timer.expires_at(boost::posix_time::pos_infin);
// On error, such as cancellation, return early.
if (error) return;
// At this point, the read was successful and buffer is populated.
// However, if the timeout occurred and its completion handler ran first,
// then the socket is closed (!socket.is_open()).
});
Be aware that it is possible for both asynchronous operations to complete in the same iteration, making both completion handlers ready to run with success. Hence, the reason why both completion handlers need to update and check state. See this answer for more details on how to manage state.
Use std::future
Boost.Asio's provides support for C++11 futures. When boost::asio::use_future is provided as the completion handler to an asynchronous operation, the initiating function will return a std::future that will be fulfilled once the operation completes. As std::future supports timed waits, one can leverage it for timing out an operation. Do note that as the calling thread will be blocked waiting for the future, at least one other thread must be processing the io_service to allow the async_receive() operation to progress and fulfill the promise:
// Use an asynchronous operation so that it can be cancelled on timeout.
std::future<std::size_t> read_result = socket.async_receive(
buffer, boost::asio::use_future);
// If timeout occurs, then cancel the read operation.
if (read_result.wait_for(std::chrono::seconds(1)) ==
std::future_status::timeout)
{
socket.cancel();
}
// Otherwise, the operation completed (with success or error).
else
{
// If the operation failed, then read_result.get() will throw a
// boost::system::system_error.
auto bytes_transferred = read_result.get();
// process buffer
}
Why SO_RCVTIMEO Will Not Work
System Behavior
The SO_RCVTIMEO documentation notes that the option only affects system calls that perform socket I/O, such as read() and recvmsg(). It does not affect event demultiplexers, such as select() and poll(), that only watch the file descriptors to determine when I/O can occur without blocking. Furthermore, when a timeout does occur, the I/O call fails returning -1 and sets errno to EAGAIN or EWOULDBLOCK.
Specify the receiving or sending timeouts until reporting an error. [...] if no data has been transferred and the timeout has been reached then -1 is returned with errno set to EAGAIN or EWOULDBLOCK [...] Timeouts only have effect for system calls that perform socket I/O (e.g., read(), recvmsg(), [...]; timeouts have no effect for select(), poll(), epoll_wait(), and so on.
When the underlying file descriptor is set to non-blocking, system calls performing socket I/O will return immediately with EAGAIN or EWOULDBLOCK if resources are not immediately available. For a non-blocking socket, SO_RCVTIMEO will not have any affect, as the call will return immediately with success or failure. Thus, for SO_RCVTIMEO to affect system I/O calls, the socket must be blocking.
Boost.Asio Behavior
First, asynchronous I/O operations in Boost.Asio will use an event demultiplexer, such as select() or poll(). Hence, SO_RCVTIMEO will not affect asynchronous operations.
Next, Boost.Asio's sockets have the concept of two non-blocking modes (both of which default to false):
native_non_blocking() mode that roughly corresponds to the file descriptor's non-blocking state. This mode affects system I/O calls. For example, if one invokes socket.native_non_blocking(true), then recv(socket.native_handle(), ...) may fail with errno set to EAGAIN or EWOULDBLOCK. Anytime an asynchronous operation is initiated on a socket, Boost.Asio will enable this mode.
non_blocking() mode that affects Boost.Asio's synchronous socket operations. When set to true, Boost.Asio will set the underlying file descriptor to be non-blocking and synchronous Boost.Asio socket operations can fail with boost::asio::error::would_block (or the equivalent system error). When set to false, Boost.Asio will block, even if the underlying file descriptor is non-blocking, by polling the file descriptor and re-attempting system I/O operations if EAGAIN or EWOULDBLOCK are returned.
The behavior of non_blocking() prevents SO_RCVTIMEO from producing desired behavior. Assuming socket.receive() is invoked and data is neither available nor received:
If non_blocking() is false, the system I/O call will timeout per SO_RCVTIMEO. However, Boost.Asio will then immediately block polling on the file descriptor to be readable, which is not affected by SO_RCVTIMEO. The final result is the caller blocked in socket.receive() until either data has been received or failure, such as the remote peer closing the connection.
If non_blocking() is true, then the underlying file descriptor is also non-blocking. Hence, the system I/O call will ignore SO_RCVTIMEO, immediately return with EAGAIN or EWOULDBLOCK, causing socket.receive() to fail with boost::asio::error::would_block.
Ideally, for SO_RCVTIMEO to function with Boost.Asio, one needs native_non_blocking() set to false so that SO_RCVTIMEO can take affect, but also have non_blocking() set to true to prevent polling on the descriptor. However, Boost.Asio does not support this:
socket::native_non_blocking(bool mode)
If the mode is false, but the current value of non_blocking() is true, this function fails with boost::asio::error::invalid_argument, as the combination does not make sense.
Since you are receiving data, you may want to set: SO_RCVTIMEO instead of SO_SNDTIMEO
Although mixing boost and system calls may not produce the expected results.
For reference:
SO_RCVTIMEO
Sets the timeout value that specifies the maximum amount of time an input function waits until it completes. It accepts a timeval
structure with the number of seconds and microseconds specifying the
limit on how long to wait for an input operation to complete. If a
receive operation has blocked for this much time without receiving
additional data, it shall return with a partial count or errno set to
[EAGAIN] or [EWOULDBLOCK] if no data is received. The default for this
option is zero, which indicates that a receive operation shall not
time out. This option takes a timeval structure. Note that not all
implementations allow this option to be set.
This option however only has effect on read operations, not on other low level function that may wait on the socket in an asynchronous implementation (e.g. select and epoll) and it seems that it does not affect asynchronous asio operations as well.
I found an example code from boost that may work for your case here.
An over simplified example (to be compiled in c++11):
#include <boost/asio.hpp>
#include <boost/bind.hpp>
#include <iostream>
void myclose(boost::asio::ip::tcp::socket& ps) { ps.close(); }
int main()
{
boost::asio::io_service io;
boost::asio::ip::tcp::acceptor::reuse_address option(true);
boost::asio::ip::tcp::acceptor accept(io);
boost::asio::ip::tcp::resolver resolver(io);
boost::asio::ip::tcp::resolver::query query("0.0.0.0", "8080");
boost::asio::ip::tcp::endpoint endpoint = *resolver.resolve(query);
accept.open(endpoint.protocol());
accept.set_option(option);
accept.bind(endpoint);
accept.listen(30);
boost::asio::ip::tcp::socket ps(io);
accept.accept(ps);
char buf[1024];
boost::asio::deadline_timer timer(io, boost::posix_time::seconds(1));
timer.async_wait(boost::bind(myclose, boost::ref(ps)));
ps.async_receive(boost::asio::buffer(buf, 1024),
[](const boost::system::error_code& error,
std::size_t bytes_transferred )
{
std::cout << bytes_transferred << std::endl;
});
io.run();
return 0;
}

boost asio async_connect success after close

Single-threaded application.
It happens not every time, only after 1.5 hours of high load.
tcp::socket::async_connect
tcp::socket::close (by deadline_timer)
async_connect_handler gives success error_code (one of a million times), but socket is closed by(2). 99.999% of time it gives errno=125 (ECANCELED).
Is it possible that socket implementation or boost asio somehow do this:
async_connect
async success posted to io_service
close by timer
async success handled by me, not affected by close
Right now solved by saving state in my variables, ignoring accept success.
Linux 2.6 (fedora).
Boost 1.46.0
PS: ofcouse possible bug on my part... But runs smoothly for days if not this.
As Igor mentions in the comments, the completion handler is already queued.
This scenario is the result of a separation in time between when an operation executes and when a handler is invoked. The documentation for io_service::run(), io_service::run_one(), io_service::poll(), and io_service::poll_one() is specific to mention handlers, and not operations. In the scenario, the socket::async_connect() operation and deadline_timer::async_wait() operation complete in the same event loop iteration. This results in both handlers being added to the io_service for deferred invocation, in an unspecified order.
Consider the following snippet that accentuates the scenario:
void handle_wait(const boost::system::error_code& error)
{
if (error) return;
socket_.close();
}
timer_.expires_from_now(boost::posix_time::seconds(30));
timer_.async_wait(&handle_wait);
socket_.async_connect(endpoint_, handle_connect);
boost::this_thread::sleep(boost::posix_time::seconds(60));
io_service_.run_one();
When io_service_.run_one() is invoked, both socket::async_connect() and deadline_timer::async_wait() operations may have completed, causing handle_wait and handle_connect to be ready for invocation from within the io_service in an unspecified order. To properly handle this unspecified order, additional logic need to occur from within handle_wait() and handle_connect() to query the current state, and determine if the other handler has been invoked, rather than depending solely on the status (error_code) of the operation.
The easiest way to determine if the other handler has invoked is:
In handle_connect(), check if the socket is still open via is_open(). If the socket is still open, then handle_timer() has not been invoked. A clean way to indicate to handle_timer() that handle_connect() has ran is to update the expiry time.
In handle_timer(), check if the expiry time has passed. If this is true, then handle_connect() has not ran, so close the socket.
The resulting handlers could look like the following:
void handle_wait(const boost::system::error_code& error)
{
// On error, return early.
if (error) return;
// If the timer expires in the future, then connect handler must have
// first.
if (timer_.expires_at() > deadline_timer::traits_type::now()) return;
// Timeout has occurred, so close the socket.
socket_.close();
}
void handle_connect(const boost::system::error_code& error)
{
// The async_connect() function automatically opens the socket at the start
// of the asynchronous operation. If the socket is closed at this time then
// the timeout handler must have run first.
if (!socket_.is_open()) return;
// On error, return early.
if (error) return;
// Otherwise, a connection has been established. Update the timer state
// so that the timeout handler does not close the socket.
timer_.expires_at(boost::posix_time::pos_infin);
}
Boost.Asio provides some examples for handling timeouts.
I accept twsansbury's answer, just want to add some more info.
About shutdown():
void async_recv_handler( boost::system::error_code ec_recv, std::size_t count )
{
if ( !m_socket.is_open() )
return; // first time don't trust to ec_recv
if ( ec_recv )
{
// oops, we have error
// log
// close
return;
}
// seems that we are just fine, no error in ec_recv, we can gracefully shutdown the connection
// but shutdown may fail! this check is working for me
boost::system::error_code ec_shutdown;
// second time don't trusting to ec_recv
m_socket.shutdown( t, ec_shutdown );
if ( !ec_shutdown )
return;
// this error code is expected
if ( ec_shutdown == boost::asio::error::not_connected )
return;
// other error codes are unexpected for me
// log << ec_shutdown.message()
throw boost::system::system_error(ec_shutdown);
}