I'm using boost::asio to do some very basic UDP packet collection. The io_service object is instantiated in a worker thread, and io_service.run() is called from inside that thread. My problem is getting io_service.run() to return when I am done collecting packets.
I'm not clear on what methods of io_service can be called from other threads when it comes time to stop my worker thread. I have a reference to the io_service object, and from a different thread I make this call:
ios.dispatch( boost::bind( &udp_server::handle_kill, this ) );
In my udp_server class, the handler for that function cancels the pending work from a single boost::asio::ip::udp::socket and a single boost::asio::deadline_timer object. Both have pending async work to do. At that point I call ios.stop():
void udp_server::handle_kill()
{
m_socket.cancel();
m_timer.cancel();
m_ios.stop();
}
With no work pending, I expect at this point that my call to ios.run() should return - but this does not happen.
So why does it not return? The most likely explanation to me is that I shouldn't be calling io_service::dispatch() from another thread. But the dispatch() method kind of seems like it was built to do just that - dispatch a function call in the thread that io_service::run() is working in. And it seems to do just that.
So this leaves me with a few related questions:
Am I using io_service::dispatch() correctly?
If all tasks are canceled, is there any reason that io_service::run() should not return?
socket::upd::cancel() doesn't seem to be the right way to close a socket and abort all work. What is the right way?
asio is behaving pretty well for me, but I need to get a better understanding of this bit of architecture.
More data
socket::udp::cancel() is apparently an unsupported operation on an open socket under Win32 - so this operation fails by throwing an exception - which does in fact cause an exit from io_service::run(), but definitely not the desired exit.
socket::udp::close() doesn't seem to cancel the pending async_receive_from() task, so calling it instead of socket::udp::cancel() seems to leave the thread somewhere inside io_service::run().
Invoking io_service::stop from another thread is safe, this is well described in the documentation
Thread Safety
Distinct objects: Safe.
Shared objects: Safe, with the
exception that calling reset() while
there are unfinished run(), run_one(),
poll() or poll_one() calls results in
undefined behaviour.
as the comments to your question indicate, you really need to boil this down to a reproducible example.
Related
I am looking at the Boost Asio Blocking TCP Client timeout example with a special interest on how connection timeouts are implmented. How do we know from the documentation that the callback handler and subsequent checks don't introduce a race condition?
The Asynchronous connection command
boost::asio::async_connect(socket_, iter, var(ec) = _1);
executes the var(ec) = _1 which is the handler for setting the error code once execute. Alternatively, a full and explicit lambda could be used here.
At the same time, the check_deadline function appears to be
called by the deadline_ member. The timeout appears to be enforced by having the deadline forcibly close the socket whereup we assume that perhaps that the blocking statement
do io_service_.run_one(); while (ec == boost::asio::error::would_block);
would return. At first I thought that the error code must be atomic but that doesn't appear to be the case. Instead, this page, appears to indicate that the strand model will work whenever the calls to the socket/context come from the same thread.
So we assume that each callback for the deadline (which is in Asio) and the handle for the async_connect routine will not be run concurrently. Pages such as this in the documentation hint that handlers will only execute during run() calls which will prevent the command while(ec == whatever) from behind executed during the handler currently changing its value.
How do I know this explicitly? What in the documentation that tells me explicitly that no handlers will ever execute outside these routines? If true, the page on the proactor design pattern must infer this, but never explicitly where the "Initiator" leads to the "Completion Handler".
The closes I've found is the documentation for io_context saying
Synchronous operations on I/O objects implicitly run the io_context
object for an individual operation. The io_context functions run(),
run_one(), run_for(), run_until(), poll() or poll_one() must be called
for the io_context to perform asynchronous operations on behalf of a
C++ program. Notification that an asynchronous operation has completed
is delivered by invocation of the associated handler. Handlers are
invoked only by a thread that is currently calling any overload of
run(), run_one(), run_for(), run_until(), poll() or poll_one() for the
io_context.
This implies that if I have one thread running the run_one() command then its control path will wait until a handler is available and eventually wind its way through a handler whereupon it will return and check ther ec value.
Is this correct and is "Handlers are invoked only by a thread that is currently calling any overload of run(), run_one(), run_for(), run_until(), poll() or poll_one() for the io_context." the best statement to find for understanding how the code will always function? Is there any other exposition?
The Asio library is gearing up to be standardized as NetworkingTS. This part is indeed the deal:
Handlers are invoked only by a thread that is currently calling any overload of run(), run_one(), run_for(), run_until(), poll() or poll_one() for the io_context
You are correct in concluding that the whole example is 100% single-threaded¹. There cannot be a race.
I personally feel the best resource is the Threads and Boost.Asio page:
By only calling io_context::run() from a single thread, the user's code can avoid the development complexity associated with synchronisation. For example, a library user can implement scalable servers that are single-threaded (from the user's point of view).
It also reiterates the truth from earlier:
[...] the following guarantee:
Asynchronous completion handlers will only be called from threads that are currently calling io_context::run().
¹ except potential internal service threads depending on platform/extensions, as the threads page details
io_service::run() is called by thread A. Is it safe to call async_write from thread B?
io_service::run() is called by thread A. Are async operations executed by thread A, or is thread A only guaranteed to call handlers and behind the scenes there could be additional threads that execute the operations?
io_service::run() is called by thread A. Some thread calls async_read and async_write using the same buffer. Is it safe to assume that the buffer will be accessed by at most one operation at a time? Or is it so that only handlers are called serially, but behind the scenes reads and writes can occur simultaneously?
The documentation says "The program must ensure that the stream performs no other read operations (such as async_read, the stream's async_read_some function, or any other composed operations that perform reads) until this operation completes.". Is it correct to interpret this as "You must not perform more than one read operation on a socket at a time. But you may perform 10 read operations on 10 distinct sockets."?
Having a socket that indefinitely accepts data, is it a good idea to call async_read and call it again from async_read's handler?
Does io_service::stop() stop all pending async operations or simply stops accepting new ones and executes the pending ones?
Yes, providing the io_service is tied to whatever is calling async_write. However, it should be noted that it is safe to call async_write from thread B even if the run is not called: it'll get queued in the io_service and wait until one of the run-ing calls are completed.
The callbacks posted to the io_service will run on thread A. Other async operations (such as timer operations) can happen on other threads. What is guarenteed to be on A and what is on its own thread is defined by the specific object being used, not by io_service.
Nope. Yup-ish. Depends on the class calling io_service.
Yes.
Yes, in fact this is super common, as it both ensures that only 1 async_read call is running at a time for a given socket and that there is always "work" for the io_service.
It usually finished the last callback and then stops accepting new ones and stops processing pending ones. It actually still accepts new ones but forces a reset is called before any other callbacks are called.
io_service is a message queue (basically), while a socket that posts its messages to the io_service is something else entirely.
1: Yes
4: Yes, it's okay to perform distinct operations on distinct sockets.
5: Yes, if you check the examples that's how they do it.
6: Considering the reference manual says
All invocations of its run() or run_one() member functions should return as soon as possible.
I would say it might do any.
For number 2 and 6, the source is available so the best way to answer those question is by downloading and reading it.
I am trying to implement async_connect() with a timeout.
async_connect_with_timeout(socket_type & s,
std::function<void(BoostAndCustomError const & error)> const & connect_handler,
time_type timeout);
When operation completes connect_handler(error) is called with error indicating operation result (including timeout).
I was hoping to use code from timeouts example 1.51. The biggest difference is that I am using multiple worker threads performing io_service.run().
What changes are necessary to keep the example code working?
My issues are:
When calling :
Start() {
socket_.async_connect(Handleconnect);
dealine_.async_wait(HandleTimeout);
}
HandleConnect() can be completed in another thread even before async_wait() (unlikely but possible). Do I have to strand wrap Start(), HandleConnect(), and HandleTimeout()?
What if HandleConnect() is called first without error, but deadline_timer.cancel() or deadline_timer.expires_from_now() fails because HandleTimeout() "have been queued for invocation in the near future"? Looks like example code lets HandleTimeout() close socket. Such behavior (timer closes connection after we happily started some operations after connect) can easily lead to serious headache.
What if HandleTimeout() and socket.close() are called first. Is it possible to HandlerConnect() be already "queued" without error? Documentation says: "Any asynchronous send, receive or connect operations will be cancelled immediately, and will complete with the boost::asio::error::operation_aborted error". What does "immediately" mean in multithreading environment?
You should wrap with strand each handler, if you want to prevent their parallel execution in different threads. I guess some completion handlers would access socket_ or the timer, so you'll definitely have to wrap Start() with a strand as well. But wouldn't it be much more simple to use io_service-per-CPU model, i.e. to base your application on io_service pool? IMHO, you'll get much less headache.
Yes, it's possible. Why is it a headache? The socket gets closed because of a "false timeout", and you start re-connection (or whatever) procedure just as if it were closed due to a network failure.
Yes, it's also possible, but again, it shouldn't cause any problem for correctly designed program: if in HandleConnect you try to issue some operation on a closed socket, you'll get the appropriate error. Anyway, when you attempt to send/receive data you don't really know the current socket/network status.
I'm new in boost programming, and I've been looking for a reason to use the io_service::work, but I can't figure it out; in some of my tests I removed it and works fine.
The io_service::run() will run operations as long as there are asynchronous operations to perform. If, at any time, there are no asynchronous operations pending (or handlers being invoked), the run() call will return.
However, there are some designs that would prefer that the run() call not exit until all work is done AND the io_service has explicitly been instructed that it's okay to exit. That's what io_service::work is used for. By creating the work object (I usually do it on the heap and a shared_ptr), the io_service considers itself to always have something pending, and therefore the run() method will not return. Once I want the service to be able to exit (usually during shutdown), I will destroy the work object.
io_service::work is base class of all works that can posted to an instance of io_service, for example when you are working with a socket and start an asynchronous read, actually you are adding a work to the io_service. So you normally never use work directly, but there is one exception to this:
io_service::run will return as soon as there is no more work to do, so consider an application that have some producer and consumer threads, producers occasionally produce works and post them to consumer threads with io_service::post, but if all works finished, then io_service::run will return and possibly your consumer thread will be stopped, so you need an arbitrary work to keep io_service busy, in this case you may use io_service::work directly.
I have some code, roughly:
pthread_create(thread_timeout, NULL, handleTimeOut, NULL);
void handleTimeOut()
{
/*...*/
pthread_cancel(thread_timeout);
/*...*/
}
But as I noticed by pthread's manual the cancellation must be used by another threads. I have tried to use pthread_exit() function instead, but this thread hangs on again...
How must the tread termination be handled correctly? Will it be terminated successfully if the function handleTimeOut() just ends without special pthread functions?
Killing a thread without its cooperation is a recipe for problems. The right solution will be one that allows an external thread to request the thread to clean up and terminate, and has the thread periodically example this state and when it's been requested, it follows through with the request. Such a request can be done through anything that all threads can share.
If a thread wants to finish, it can either call pthread_exit() or it can return from the initial thread function. These are equivalent.
I don't see any reason why a thread couldn't call pthread_cancel() on itself, but this would be highly unusual.