Does boost asio strand run all handlers on the same thread?

Does boost asio strand run all handlers on the same thread? - c++

The boost asio documentation talks about executors but I can't see if that implies the same thread.
The reason I'm curious about this is that the purpose of a strand seems to be to allow the developer not to have to worry about multithreading issues. If that is the case then I see two options for a strand, assuming more than one thread in the io_service/context:
Run all handlers on the same thread for the lifetime of the strand
Use different threads but
Use some mechanism to make sure one handler runs after the other
If run on a different thread make sure you use a memory fence so the second handler sees all updates from the previous handler
(I strongly suspect it is impossible to do 2.1 without doing 2.2)
The problem with 2. is that it hits performance because it requires fences, but I don't see anywhere that explicitly says a strand always uses the same thread.

No.
All that is guaranteed is that
all handlers are invoked on a thread that is invoking run[_one] or poll_[one] on the execution context. This is is true for handlers on any executor
strand executors add the guarantee that handlers are only invoked sequentially (non-concurrently) and in the order they were posted (see e.g. https://www.boost.org/doc/libs/1_80_0/doc/html/boost_asio/reference/io_context__strand.html#boost_asio.reference.io_context__strand.order_of_handler_invocation)
So yes, depending on situation there may be overhead. However, in some situations there may be optimizations (e.g. dispatch from the local strand, or running on a context with a concurrency hint).

Related

Cancelling arbitary jobs running in a thread_pool

Is there a way for a thread-pool to cancel a task underway? Better yet, is there a safe alternative for on-demand cancelling opaque function calls in thread_pools?
Killing the entire process is a bad idea and using native handle to perform pthread_cancel or similar API is a last resort only.
Extra
Bonus if the cancellation is immediate, but it's acceptable if the cancellation has some time constraint 'guarantees' (say cancellation within 0.1 execution seconds of the thread in question for example)
More details
I am not restricted to using Boost.Thread.thread_pool or any specific library. The only limitation is compatibility with C++14, and ability to work on at least BSD and Linux based OS.
The tasks are usually data-processing related, pre-compiled and loaded dynamically using C-API (extern "C") and thus are opaque entities. The aim is to perform compute intensive tasks with an option to cancel them when the user sends interrupts.
While launching, the thread_id for a specific task is known, and thus some API can be sued to find more details if required.
Disclaimer
I know using native thread handles to cancel/exit threads is not recommended and is a sign of bad design. I also can't modify the functions using boost::this_thread::interrupt_point, but can wrap them in lambdas/other constructs if that helps. I feel like this is a rock and hard place situation, so alternate suggestions are welcome, but they need to be minimally intrusive in existing functionality, and can be dramatic in their scope for the feature-set being discussed.
EDIT:
Clarification
I guess this should have gone in the 'More Details' section, but I want it to remain separate to show that existing 2 answers are based o limited information. After reading the answers, I went back to the drawing board and came up with the following "constraints" since the question I posed was overly generic. If I should post a new question, please let me know.
My interface promises a "const" input (functional programming style non-mutable input) by using mutexes/copy-by-value as needed and passing by const& (and expecting thread to behave well).
I also mis-used the term "arbitrary" since the jobs aren't arbitrary (empirically speaking) and have the following constraints:
some which download from "internet" already use a "condition variable"
not violate const correctness
can spawn other threads, but they must not outlast the parent
can use mutex, but those can't exist outside the function body
output is via atomic<shared_ptr> passed as argument
pure functions (no shared state with outside) **
** can be lambda binding a functor, in which case the function needs to makes sure it's data structures aren't corrupted (which is the case as usually, the state is a 1 or 2 atomic<inbuilt-type>). Usually the internal state is queried from an external db (similar architecture like cookie + web-server, and the tab/browser can be closed anytime)
These constraints aren't written down as a contract or anything, but rather I generalized based on the "modules" currently in use. The jobs are arbitrary in terms of what they can do: GPU/CPU/internet all are fair play.
It is infeasible to insert a periodic check because of heavy library usage. The libraries (not owned by us) haven't been designed to periodically check a condition variable since it'd incur a performance penalty for the general case and rewriting the libraries is not possible.

Is there a way for a thread-pool to cancel a task underway?
Not at that level of generality, no, and also not if the task running in the thread is implemented natively and arbitrarily in C or C++. You cannot terminate a running task prior to its completion without terminating its whole thread, except with the cooperation of the task.
Better
yet, is there a safe alternative for on-demand cancelling opaque
function calls in thread_pools?
No. The only way to get (approximately) on-demand preemption of a specific thread is to deliver a signal to it (that is is not blocking or ignoring) via pthread_kill(). If such a signal terminates the thread but not the whole process then it does not automatically make any provision for freeing allocated objects or managing the state of mutexes or other synchronization objects. If the signal does not terminate the thread then the interruption can produce surprising and unwanted effects in code not designed to accommodate such signal usage.
Killing the entire process is a bad idea and using native handle to
perform pthread_cancel or similar API is a last resort only.
Note that pthread_cancel() can be blocked by the thread, and that even when not blocked, its effects may be deferred indefinitely. When the effects do occur, they do not necessarily include memory or synchronization-object cleanup. You need the thread to cooperate with its own cancellation to achieve these.
Just what a thread's cooperation with cancellation looks like depends in part on the details of the cancellation mechanism you choose.

Cancelling a non cooperative, not designed to be cancelled component is only possible if that component has limited, constrained, managed interactions with the rest of the system:
the ressources owned by the components should be managed externally (the system knows which component uses what resources)
all accesses should be indirect
the modifications of shared ressources should be safe and reversible until completion
That would allow the system to clean up resource, stop operations, cancel incomplete changes...
None of these properties are cheap; all the properties of threads are the exact opposite of these properties.
Threads only have an implied concept of ownership apparent in the running thread: for a deleted thread, determining what was owned by the thread is not possible.
Threads access shared objects directly. A thread can start modifications of shared objects; after cancellation, such modifications that would be partial, non effective, incoherent if stopped in the middle of an operation.
Cancelled threads could leave locked mutexes around. At least subsequent accesses to these mutexes by other threads trying to access the shared object would deadlock.
Or they might find some data structure in a bad state.
Providing safe cancellation for arbitrary non cooperative threads is not doable even with very large scale changes to thread synchronization objects. Not even by a complete redesign of the thread primitives.
You would have to make thread almost like full processes to be able to do that; but it wouldn't be called a thread then!

io_context concurrency hint (BOOST_ASIO_CONCURRENCY_HINT_UNSAFE_IO)

While checkig the documentation of boost::asio in verion 1.66.0, I noticed that the io_context constructor provides a concurrency_hint parameter. After reading the documentation, I'm unsure if I can use BOOST_ASIO_CONCURRENCY_HINT_UNSAFE_IO.
I have following situation:
I have a single io_context to do the IO. ioc.run() is executed from a single thread.
In this thread, some network IO using async calls are executed.
Other threads call boost::asio::dispatch(ioc, ...) to execute code on the IO thread.
I'm trying to figure out what concurrency hint value is safe to use in the situation as described above:
Using no concurrency hint is ok (eg. BOOST_ASIO_CONCURRENCY_HINT_SAFE), but slower than with hints.
Using 1 is ok.
Using BOOST_ASIO_CONCURRENCY_HINT_UNSAFE is not ok because it doesn't allow async calls.
What is unclear to me is BOOST_ASIO_CONCURRENCY_HINT_UNSAFE_IO. Documentation says:
This special concurrency hint disables locking in the reactor I/O. This hint has the following restrictions:
— Care must be taken to ensure that run functions on the io_context, and all operations on the context's associated I/O objects (such as sockets and timers), occur in only one thread at a time.
I wonder if it's safe do a boost::asio::dispatch from another thread when using this concurrency hint.

Since boost::asio::dispatch¹ ends up calling io_context::dispatch I would conclude that it's not ok to use BOOST_ASIO_CONCURRENCY_HINT_UNSAFE_IO if you call it from another thread:
— Care must be taken to ensure that run functions on the io_context, and all operations on the context's associated I/O objects (such as sockets and timers), occur in only one thread at a time.
¹ same for post/defer

Boost ASIO, SSL: How do strands help the implementation?

TLDR: Strands serialise resources shared across completion handlers: how does that prevent the ssl::stream implementation from concurrent access of the SSL context (used internally) for concurrent read/write requests (stream::ssl is not full duplex)? Remember, strands only serialise the completion handler invocation or the original queueing of the read/write requests. [Thanks to sehe for helping me express this better]
I've spent most of a day reading about ASIO, SSL and strands; mostly on stackoverflow (which has some VERY detailed and well expressed explanations, e.g. Why do I need strand per connection when using boost::asio?), and the Boost documentation; but one point remains unclear.
Obviously strands can serialise invocation of callbacks within the same strand, and so also serialise access to resources shared by those strands.
But it seems to me that the problem with boost::asio::ssl::stream isn't in the completion handler callbacks because it's not the callbacks that are operating concurrently on the SSL context, but the ssl::stream implementation that is.
I can't be confident that use of strands in calling async_read_some and async_write_some, or that use of strands for the completion handler, will prevent the io engine from operating on the SSL context at the same time in different threads.
Clearly strand use while calling async_read_some or async_write_some will mean that the read and write can't be queued at the same instant, but I don't see how that prevents the internal implementation from performing the read and write operations at the same time on different threads if the encapsulated tcp::socket becomes ready for read and write at the same time.
Comments at the end of the last answer to this question boost asio - SSL async_read and async_write from one thread claim that concurrent writes to ssl::stream could segfault rather than merely interleave, suggesting that the implementation is not taking the necessary locks to guard against concurrent access.
Unless the actual delayed socket write is bound to the thread/strand that queued it (which I can't see being true, or it would undermine the usefulness of worker threads), how can I be confident that it is possible to queue a read and a write on the same ssl::stream, or what that way could be?
Perhaps the async_write_some processes all of the data with the SSL context immediately, to produce encrypted data, and then becomes a plain socket write, and so then can't conflict with a read completion handler on the same strand, but it doesn't mean that it can't conflict with the internal implementations socket-read-and-decrypt before the completion handler gets queued on the strand. Never mind transparent SSL session re-negotiation that might happen...
I note from: Why do I need strand per connection when using boost::asio? "Composed operations are unique in that intermediate calls to the stream are invoked within the handler's strand, if one is present, instead of the strand in which the composed operation is initiated." but I'm not sure if what I am refering to are "intermediate calls to the stream". Does it mean: "any subsequent processing within that stream implementation"? I suspect not
And finally, for why-oh-why, why doesn't the ssl::stream implementation use a futex or other lock that is cheap when there is no conflict? If the strand rules (implicit or explicit) were followed, then the cost would be almost non-existent, but it would provide safety otherwise. I ask because I've just transitioned the propaganda of Sutter, Stroustrup and the rest, that C++ makes everything better and safer, to ssl::stream where it seems easy to follow certain spells but almost impossible to know if your code is actually safe.

The answer is that the boost ssl::stream implementation uses strands internally for SSL operations.
For example, the async_read_some() function creates an instance of openssl_operation and then calls strand_.post(boost::bind(&openssl_operation::start, op)).
[http://www.boost.org/doc/libs/1_57_0/boost/asio/ssl/old/detail/openssl_stream_service.hpp]
It seems reasonable to assume that all necessary internal ssl operations are performed on this internal strand, thus serialising access to the SSL context.

Q. but I'm not sure if what I am refering to are "intermediate calls to the stream". Does it mean: "any subsequent processing within that stream implementation"? I suspect not
The docs spell it out:
This operation is implemented in terms of zero or more calls to the stream's async_read_some function, and is known as a composed operation. The program must ensure that the stream performs no other read operations (such as async_read, the stream's async_read_some function, or any other composed operations that perform reads) until this operation completes. doc
And finally, for why-oh-why, why doesn't the ssl::stream implementation use a futex or other lock that is cheap when there is no conflict?
You can't hold a futex across async operations because any thread may execute completion handlers. So, you'd still need the strand here, making the futex redundant.
Comments at the end of the last answer to this question boost asio - SSL async_read and async_write from one thread claim that concurrent writes to ssl::stream could segfault rather than merely interleave, suggesting that the implementation is not taking the necessary locks to guard against concurrent access.
See previous entry. Don't forget about multiple service threads. Data races are Undefined Behaviour
TL;DR
Long story short: async programming is different. It is different for good reasons. You will have to adapt your thinking to it though.
Strands help the implementation by abstracting sequential execution over the async scheduler.
This makes it so that you don't have to know what the scheduling is, how many service threads are running etc.

Boost ASIO - What is async

I've been doing a lot of reading, but I just cannot wrap my head around the difference between synchronous and asynchronous calls in Boost ASIO: what they are, how they work, and why to pick one over the other.
My model is a server which accepts connections and appends the new connection to a list. A different thread loops over the list and sends each registered connection data as it becomes available. Each write operation should be safe. It should have a timeout so that it cannot hang, it should not allocate arbitrarily large amounts of memory, or in general cause the main application to crash.
Confusion:
How does accept_async differ from regular accept? Is a new thread allocated for each connection accepted? From examples I've seen it looks like after a connection is accepted, a request handler is called. This request handler must tell the acceptor to prepare to accept again. Nothing about this seems asynchronous. If the requset handler hangs then the acceptor blocks.
In the boost mailing list the OP was told to use async_write with a timer instead of regular write. In this configureation I don't see any asynchronous behaviour or why they would be recommended. From the Boost docs async_write seems more dangerous than write because the user must not call async_write again before the first one completes.

Asynchronous calls return immediately.
That's the important bit.
Now how do you control "the next thing" that happens when the asynchronous operation has completed? You got it, you supply the completion handler.
The strength of asynchrony is so you can have an IO operation (or similar) run "in the background" without necessarily incurring any thread switch or synchronization overhead. This way you can handle many asynchronous control flows at the same time, on a single thread.
Indeed asynchronous operations can be more complicated and require more thought (e.g. about lifetime of references used in the completion handler). However, when you need it, you need it.

Boost.Asio basic overview from the official site explains it well:
http://www.boost.org/doc/libs/1_61_0/doc/html/boost_asio/overview/core/basics.html
The io_service object is what handles the multiple operations.
Calls to io_service.run() should be made carefully (that could explain the "dangerous async_write")

boost strand vs single thread

Since the strand will not be executed concurrently, what is the difference in performance between strand and single thread? Moreover, a lock is not necessary to protect the share data in the handler of post function, right?
suppose an application performance several jobs, below is some sample code.
strand.post(boost::bind(&onJob, this, job1));
void onJob(tJobType oType)
{
if (oType == job1)
// do something
else if(oType == job2)
// do something
}
Edit: I try to measure the latency from post and calling onJob is quite high. I would like to know if there is any way to reduce it

A strand will typically perform better than a single thread. This is because a strand gives the scheduler and the program logic more flexibility. However, the differences are typically not significant (except in the special case I discuss below).
For example, consider the case where something happens that requires service. With a strand, there can be more than one thread that could perform the service, and whichever of those threads gets scheduled first will do the job. With a thread, that very thread must get scheduled for the job to start.
Suppose, for example, a timer fires that creates some new work to be done by the strand. If the timer thread then calls into the strand's dispatch routine, the timer thread can do the work with no context switch. If you had a dedicated thread rather than a strand, then the timer thread could not do the work, and a context switch would be needed before the work created by the timer routine could even begin.
Note that if you just have one thread that executes the strand, you don't get these benefits. (But, IMO, that's a dumb way to do things if you care about performance at this fine a level.)
For some applications, carefully breaking your program into strands can significantly reduce the amount of lock operations required. Objects that are only accessed in a single strand need not be locked. But you can still get a lot of the advantages of multi-threading. (One big disadvantage though -- if any of your code ever blocks, it will stall the entire strand. So you either have to not mind if a strand stalls or make sure none of your code for a critical strand ever blocks.)
In this case, you can have three strands, A, B, and C, and a single thread can do some work for strand A, some for strand B, and some for strand C with no context switches (and with the data hot in the cache). Using a thread for each task would require two context switches to do the same job, and each task would likely not find the data in cache. If you constantly "hand things" from strand to strand, strands can significantly outperform dedicated threads.
As to your second question, a lock is not needed unless data is being accessed in one thread while it could possibly be being modified in another thread. If all accesses to an object are through a single strand, locks are not needed because a strand can only execute in one thread at a time. Typically, strands will access some data that is only accessed by that strand and some that is shared with other threads or strands.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js