Do any boost::asio async calls automatically time out? - c++

I have a client and server using boost::asio asynchronously. I want to add some timeouts to close the connection and potentially retry if something goes wrong.
My initial thought was that any time I call an async_ function I should also start a deadline_timer to expire after I expect the async operation to complete. Now I'm wondering if that is strictly necessary in every case.
For example:
async_resolve presumably uses the system's resolver which has timeouts built into it (e.g. RES_TIMEOUT in resolv.h possibly overridden by configuration in /etc/resolv.conf). By adding my own timer, I may conflict with how the user wants his resolver to work.
For async_connect, the connect(2) syscall has some sort of timeout built into it
etc.
So which (if any) async_ calls are guaranteed to call their handlers within a "reasonable" time frame? And if an operation [can|does] timeout would the handler be passed the basic_errors::timed_out error or something else?

So I did some testing. Based on my results, it's clear that they depend on the underlying OS implementation. For reference, I tested this with a stock Fedora kernel: 2.6.35.10-74.fc14.x86_64.
The bottom line is that async_resolve() looks to be the only case where you might be able to get away without setting a deadline_timer. It's practically required in every other case for reasonable behavior.
async_resolve()
A call to async_resolve() resulted in 4 queries 5 seconds apart. The handler was called 20 seconds after the request with the error boost::asio::error::host_not_found.
My resolver defaults to a timeout of 5 seconds with 2 attempts (resolv.h), so it appears to send twice the number of queries configured. The behavior is modifiable by setting options timeout and options attempts in /etc/resolv.conf. In every case the number of queries sent was double whatever attempts was set to and the handler was called with the host_not_found error afterwards.
For the test, the single configured nameserver was black-hole routed.
async_connect()
Calling async_connect() with a black-hole-routed destination resulted in the handler being called with the error boost::asio::error::timed_out after ~189 seconds.
The stack sent the initial SYN and 5 retries. The first retry was sent after 3 seconds, with the retry timeout doubling each time (3+6+12+24+48+96=189). The number of retries can be changed:
% sysctl net.ipv4.tcp_syn_retries
net.ipv4.tcp_syn_retries = 5
The default of 5 is chosen to comply with RFC 1122 (4.2.3.5):
[The retransmission timers] for a SYN
segment MUST be set large enough to
provide retransmission of the segment
for at least 3 minutes. The
application can close the connection
(i.e., give up on the open attempt)
sooner, of course.
3 minutes = 180 seconds, though the RFC doesn't appear to specify an upper bound. There's nothing stopping an implementation from retrying forever.
async_write()
As long as the socket's send buffer wasn't full, this handler was always called right away.
My test established a TCP connection and set a timer to call async_write() a minute later. During the minute where the connection was established but prior to the async_write() call, I tried all sorts of mayhem:
Setting a downstream router to black-hole subsequent traffic to the destination.
Clearing the session in a downstream firewall so it would reply with spoofed RSTs from the destination.
Unplugging my Ethernet
Running /etc/init.d/network stop
No matter what I did, the next async_write() would immediately call its handler to report success.
In the case where the firewall spoofed the RST, the connection was closed immediately, but I had no way of knowing that until I attempted the next operation (which would immediately report boost::asio::error::connection_reset). In the other cases, the connection would remain open and not report errors to me until it eventually timed out 17-18 minutes later.
The worst case for async_write() is if the host is retransmitting and the send buffer is full. If the buffer is full, async_write() won't call its handler until the retransmissions time out. Linux defaults to 15 retransmissions:
% sysctl net.ipv4.tcp_retries2
net.ipv4.tcp_retries2 = 15
The time between the retransmissions increases after each (and is based on many factors such as the estimated round-trip time of the specific connection) but is clamped at 2 minutes. So with the default 15 retransmissions and worst-case 2-minute timeout, the upper bound is 30 minutes for the async_write() handler to be called. When it is called, error is set to boost::asio::error::timed_out.
async_read()
This should never call its handler as long as the connection is established and no data is received. I haven't had time to test it.

Those two calls MAY have timeouts that get propigated up to your handlers, but you might be supprised at the length of time it takes before either of those times out. (I know I have let a connection just sit and try to connect on a single connect call for over 10 minutes with boost::asio before killing the process). Also the async_read and async_write calls do not have timeouts associated with them, so if you wish to have timeouts on your reads and writes, you will still need a deadline_timer.

Related

closesocket() not completing pending operations of IOCP

I am currently working on a server application in C++. My main inspirations are these examples:
Windows SDK IOCP Excample
The I/O Completion Port IPv4/IPv6 Server Program Example
My app is strongly similar to these (socketobj, packageobj, ...).
In general, my app is running without issues. The only things which still causes me troubles are half open connections.
My strategy for this is: I check every connected client in a time period and count an "idle counter" up. If one completion occurs, I reset this timer. If the Idle counter goes too high, I set a boolean to prevent other threads from posting operations, and then call closesocket().
My assumption was that now the socket is closed, the pending operations will complete (maybe not instantly but after a time). This is also the behavior the MSDN documentation is describing (hints, second paragraph). I need this because only after all operations are completed can I free the resources.
Long story short: this is not the case for me. I did some tests with my testclient app and some cout and breakpoint debugging, and discovered that pending operations for closed sockets are not completing (even after waiting 10 min). I also already tried with a shutdown() call before the closesocket(), and both returned no error.
What am I doing wrong? Does this happen to anyone else? Is the MSDN documentation wrong? What are the alternatives?
I am currently thinking of the "linger" functionality, or to cancel every operation explicitly with the CancelIoEx() function
Edit: (thank you for your responses)
Yesterday evening I added a chained list for every sockedobj to hold the per io obj of the pending operations. With this I tried the CancelIOEx() function. The function returned 0 and GetLastError() returned ERROR_NOT_FOUND for most of the operations.
Is it then safe to just free the per Io Obj in this case?
I also discovered, that this is happening more often, when I run my server app and the client app on the same machine. It happens from time to time, that the server is then not able to complete write operations. I thought that this is happening because the client side receive buffer gets to full. (The client side does not stop to receive data!).
Code snipped follows as soon as possible.
The 'linger' setting can used to reset the connection, but that way you will (a) lose data and (b) deliver a reset to the peer, which may terrify it.
If you're thinking of a positive linger timeout, it doesn't really help.
Shutdown for read should terminate read operations, but shutdown for write only gets queued after pending writes so it doesn't help at all.
If pending writes are the problem, and not completing, they will have to be cancelled.

Multiple Timers in C++ / MySQL

I've got a service system that gets requests from another system. A request contains information that is stored on the service system's MySQL database. Once a request is received, the server should start a timer that will send a FAIL message to the sender if the time has elapsed.
The problem is, it is a dynamic system that can get multiple requests from the same, or various sources. If a request is received from a source with a timeout limit of 5 minutes, and another request comes from the same source after only 2 minutes, it should be able to handle both. Thus, a timer needs to be enabled for every incoming message. The service is a web-service that is programmed in C++ with the information being stored in a MySQL database.
Any ideas how I could do this?
A way I've seen this often done: Use a SINGLE timer, and keep a priority queue (sorted by target time) of every timeout. In this way, you always know the amount of time you need to wait until the next timeout, and you don't have the overhead associated with managing hundreds of timers simultaneously.
Say at time 0 you get a request with a timeout of 100.
Queue: [100]
You set your timer to fire in 100 seconds.
Then at time 10 you get a new request with a timeout of 50.
Queue: [60, 100]
You cancel your timer and set it to fire in 50 seconds.
When it fires, it handles the timeout, removes 60 from the queue, sees that the next time is 100, and sets the timer to fire in 40 seconds. Say you get another request with a timeout of 100, at time 80.
Queue: [100, 180]
In this case, since the head of the queue (100) doesn't change, you don't need to reset the timer. Hopefully this explanation makes the algorithm pretty clear.
Of course, each entry in the queue will need some link to the request associated with the timeout, but I imagine that should be simple.
Note however that this all may be unnecessary, depending on the mechanism you use for your timers. For example, if you're on Windows, you can use CreateTimerQueue, which I imagine uses this same (or very similar) logic internally.

Default timeout of curl call using libcurl in c++

My application(A) in c++ makes curl call to another machine to start another application(B). When curl call is made by A then it waits till B finishes it's job. So I just want to ask that what is the default timeout for application A or it is by default disables i.e infinite timeout ?
From http://curl.haxx.se/libcurl/c/curl_easy_setopt.html
CURLOPT_CONNECTTIMEOUT
Pass a long. It should contain the maximum time
in seconds that you allow the connection to the server to take. This
only limits the connection phase, once it has connected, this option
is of no more use. Set to zero to switch to the default built-in
connection timeout - 300 seconds. See also the CURLOPT_TIMEOUT option.
.
CURLOPT_TIMEOUT
Pass a long as parameter containing the maximum time in seconds that
you allow the libcurl transfer operation to take. Normally, name
lookups can take a considerable time and limiting operations to less
than a few minutes risk aborting perfectly normal operations. This
option will cause curl to use the SIGALRM to enable time-outing system
calls.
In unix-like systems, this might cause signals to be used unless
CURLOPT_NOSIGNAL is set.
Default timeout is 0 (zero) which means it never times out.

epoll performance for smaller timeout values

I have a single thread server process that watches few (around 100) sockets via epoll in a loop, my question is that how to decide the optimum value of epoll_wait timeout value, since this is a single threaded process, everything is triggered off epoll_wait , if there is no activity on sockets, program remains idle, my guess is that if i give too small timeout, which causes too many epoll_wait calls there is no harm because even though my process is doing too many epoll_wait calls, it would be sitting idle otherwise, but there is another point, I run many other processes on this (8 core) box, something like 100 other process which are clients to this process, I am wondering how timeout value impacts cpu context switiching, i.e if i give too small timeout which results in many epoll_wait call will my server process be put in waiting many more times vs when I give a larger timeout value which results in fewer epoll_wait calls.
any thoughts/ideas.
Thanks
I believe there is no good reason to make your process wake up if it has nothing to do. Simply set the timeout to when you first need to do something. For example, if your server has a semantic of disconnecting a client after N seconds of inactivity, set the epoll timeout to the time after the first client would have to be disconnected assuming no activity. In other words, set it to:
min{expire_time(client); for each client} - current_time
Or, if that's negative, you can disconnect at least one client immediately. In general, this works not only for disconnecting clients; you can abstract the above into "software timers" within your application.
I'm failing to see this compromise you've mentioned. If you use a timeout any smaller than you have to, you'll wake up before you have to, then, presumably, go back to sleep because you have nothing to do. What good does that do? On the other hand, you must not use a timeout any larger than what you have to - because that would make your program not respect the disconnect timeout policy.
If your program is not waiting for any time-based event (like disconnecting clients), just give epoll_wait() timeout value -1, making it wait forever.
UPDATE If you're worried that this process being given less CPU when other processes are active, just give it lower nice value (scheduler priority). On the other hand, if you're worried that your server process will be swapped out to disk in favour of other processes when it's idle, it is possible to avoid swapping it out. (or you can just lower /proc/sys/vm/swappiness, affecting all processes)

Ensuring data is being read with async_read

I am currently testing my network application in very low bandwidth environments. I currently have code that attempts to ensure that the connection is good by making sure I am still receiving information.
Traditionally I have done this by recording the timestamp in my ReadHandler function so that each time it gets called I know I have received data on the socket. With very low bandwidths this isn't sufficient because my ReadHandler is not getting called frequently enough.
I was toying around with the idea of writing my own completion condition function (right now I am using tranfer_at_least(1)) thinking it would get called more frequently and I could record my timestamp there, but I was wondering if there wasn't some other more standard way to go about this.
We had a similar issue in production: some of our connections may be idle for days, but we must detect if the remote is dead ASAP.
We solved it by enabling the TCP_KEEPALIVE option:
boost::asio::socket_base::keep_alive option(true);
mSocketTCP.set_option(option);
which had to be accompanied by new startup script that writes sensible values to /proc/sys/net/ipv4/tcp_keepalive_* which have very long timeouts by default (on LInux)
You can use the read_some method to get partial reads, and deal with the book keeping. This is more efficient than transfer_at_least(1), but you still have to keep track of what is going on.
However, a cleaner approach is just to use a concurrent deadline_timer. If the timer goes off before you are finished, then is taking too long and cancel whatever is going on. If not, just stop the timer and continue. Something like:
boost::asio::deadline_timer t;
t.expires_from_now(boost::posix_time::seconds(20));
t.async_wait(bind(&Class::timed_out, this, _1));
// Do stuff.
if (!t.cancel()) {
// Timer went off, abort
}
// And the timeout method
void Class::timed_out(error_code const& error)
{
if (error == boost::asio::error::operation_aborted) return;
// Deal with the timeout, close the socket, etc.
}
I don't know how to handle low latency of network from within application. Can you be sure if it's network latency, or if peer server or peer application busy and react slowly. Does it matter if it network/server/application quilt?
Even if you can discover network latency and find it's big, what are you going to do?
You can not improve the situation.
Consider other critical case which is a subset of what you're trying to handle - network is down (e.g. you disconnect cable from your machine). Since it a subset of your problem you want to handle it too.
Let's examine the network down effect on active TCP connection.How can you discover your active TCP connection is still alive? Calling send() will success, but it merely says that the message queued in TCP outgoing queue in kernel. TCP stack will try to send it, but since TCP ACK won't be sent back, TCP stack on your side will try to resend it again and again. You can see your message in netstat output (Send-Q column).
I'm aware of the following ways to deal with it:
One standard way is TCP keep alive proposed #Cubby.
Another way is to implement Keep Alive mechanism. Send Keep Alive req message and peer is obligated to send back Keep Alive ack message.
If you don't receive ack message after predefined timeout, try to send Keep Alive req N more times (e.g. N=2). If still no success, close the socket and open it again. If peer server is not available you'll not be abable to open connection, since TCP 3 way handshake requires peer to respond.