Boost::Asio - wake up a thread when there are handlers to run - c++

The common way to process Asio handlers is to have a thread (or several threads) either polling io_service (i.e. calling io_service::poll()) regularly to run the handlers or using io_service::run(), which blocks the thread until there's work to do, in which case the thread will run the required handlers and either return or go to sleep again.
However, I want to make a system where a thread is not only responsible for running Asio handlers, but also needs to sync up with another thread using a condition variable. Basically, I want the thread to do all of these:
Wake up when there are Asio handlers that need to be processed (i.e. if I call io_service::poll(), one or more handlers will be processed).
Wake up when there is non-Asio work to be done, indicated by my condition variable.
Sleep otherwise.
In other words, I need a way for Asio to signal me that there are handlers ready to execute, without having to busy-wait or continuously poll. Ideally, Asio will somehow signal a thread when work is available, and that thread will in turn wake up my main worker thread, which will process Asio handlers. That worker thread will also be occasionally woken up by yet another thread, and will process other, non-Asio related work.
Is this even feasible, or should I reconsider how I am designing my system?

Related

Behavior of boost::asio::io_service thread pool during uneven load

I have hard time finding out how exactly does thread pool built with boost::asio::io_service behave.
The documentation says:
Multiple threads may call the run() function to set up a pool of
threads from which the io_service may execute handlers. All threads
that are waiting in the pool are equivalent and the io_service may
choose any one of them to invoke a handler.
I would imagine, that when threads executing run() are taking a handler to execute, they execute it, and then come back to wait for next handlers to execute. When executing a handler, a thread is not considered waiting, and hence no new handlers to execute are assigned to it. Is that correct? Or does io_service assign work to threads, without considering whether these are busy or not?
I am asking, because in one project that we are using (OSRM), that uses boost::asio::io_service based thread pool to handle incoming HTTP requests, I noticed that long running request, sometimes block other, fast requests, even though more threads and cores are available.
When executing a handler, a thread is not considered waiting, and hence no new handlers to execute are assigned to it. Is that correct?
Yes. It's a pull model queue.
A notable "apparent" exception is when strands are used: handlers wrapped on a on a strand do synchronize with other handlers running on that same strand.

What's the purpose of SignalObjectAndWait regards there is SetEvent and WaitForSingleObject?

I just realized there is SignalObjectAndWait API function for Windows platform. But there is already SetEvent and WaitForSingleObject. You can use them together to achieve the same goal as SignalObjectAndWait.
Based on the MSDN, SignalObjectAndWait is more efficient than separate calls to SetEvent and WaitForSingleObject. It also states:
A thread can use the SignalObjectAndWait function to ensure that a worker thread is in a wait state before signaling an object.
I don't fully understand this sentence, but it seems that efficiency is not the only reason why we need SignalObjectAndWait. Can anybody provide a scenario where SetEvent + WaitForSingleObject fails to provide the functionality that SignalObjectAndWait offers?
My understanding is that this single function is more efficient in the way that it avoid the following scenario.
The SignalObjectAndWait function provides a more efficient way to signal one object and then wait on another compared to separate function calls such as SetEvent followed by WaitForSingleObject.
When you you SetEvent and another [esp. higher priority] thread is waiting on this event, it might so happen that thread scheduler takes control away from the signaling thread. When the thread receives control back, the only thing that it does is the following WaitForSingleObject call, thus wasting context switch for such a tiny thing.
Using SignalObjectAndWait you hint the kernel by saying "hey, I will be waiting for another event anyway, so if it makes any difference for you don't excessively bounce with context switches back and forth".
The purpose, as MSDN explains is to ensure that the thread is in a Wait State BEFORE the event is signalled. If you call WaitForSingleObject, the thread is in a waitstate, but you can't call that BEFORE calling SetEvent, since that will cause SetEvent to happen only AFTER the wait has finished - which is pointless if nothing else is calling SetEvent.
As you know, Microsoft gives the following example of why we may ever need SignalObjectAndWait if we already need separate SetEvent and WaitForSingleObject (quote the Microsoft example):
A thread can use the SignalObjectAndWait function to ensure that a worker thread is in a wait state before signaling an object. For example, a thread and a worker thread may use handles to event objects to synchronize their work. The thread executes code such as the following:
dwRet = WaitForSingleObject(hEventWorkerDone, INFINITE);
if( WAIT_OBJECT_0 == dwRet)
SetEvent(hEventMoreWorkToDo);
The worker thread executes code such as the following:
dwRet = SignalObjectAndWait(hEventWorkerDone,
hEventMoreWorkToDo,
INFINITE,
FALSE);
This algorithm flow is flawed and should never be used. We do not need such a perplexing mechanism where the threads notify each other until we are in a “Race condition”. Microsoft itself in this example creates the Race Condition. The worker thread should just wait for an event and take tasks from a list, while the thread that generates tasks should just add tasks to this list and signal the event. So, we just need one event, not two as in the above Microsoft example. The list has to be protected by a critical section. The thread that generates tasks should not wait for the worker thread to complete the tasks. If there are tasks that require to notify somebody on their completion, the tasks should send the notifications by themselves. In other words, it is the task who will notify the thread on completion -- it is not the thread who will specifically wait for the jobs thread until it finishes processing all the tasks.
Such a flawed design, as in the Microsoft Example, creates imperative for such monsters like atomic SignalObjectAndWait and atomic PulseEvent -- function that ultimately lead to doom.
Here is an algorithm how can you achieve you goal set in your question. The goal is achieved with just plain and simple events, and simple function SetEvent and WaitForSingleObject - no other functions needed.
Create one common auto-reset event for all job threads to signal that there is a task (tasks) available; and also create per-thread auto-reset events, one event for each job thread.
Multiple job treads, once finished running all the jobs, all wait for this common auto-reset “task available” event using WaitForMultipleObjects - it waits two event - the common event and the own thread event.
The scheduler thread puts new (pending) jobs to the list.
The jobs list access has to be protected by EnterCriticalSection/LeaveCriticalSection, so no one ever accesses this list the other way.
Each of the job threads, after completing one job, before starting to wait for the auto-reset “task available” event and its own event, checks the pending jobs list. If the list is not empty, get one job from the list (remove it from the list) and execute it.
There have to be another list protected by critical section – waiting jobs thread list.
Before each jobs tread starts waiting, i.e. before it calls WaitForMultipleObjects, it adds itself to the “waiting” list. On exit from wait, it removes itself from this waiting list.
When the scheduler thread puts new (pending) jobs to the jobs list, it first enters critical section of the jobs list and then of the treads list - so two critical sections are entered simultaneously. The jobs threads, however, may never enter both critical sections simultaneously.
If there is just one job pending, the scheduler sets the common auto-reset event to the signaled state (call SetEvent) -- it doesn’t matter which of the sleeping job threads will pick up the job.
If there are two or more jobs pending, it would not signal the common event, but will count how many threads are waiting. If there are at least as many threads waiting as there are the jobs, signal the own event of that number of threads as there are events, and leave the remaining thread to continue their sleeping.
If there are more jobs than waiting threads, signal the own event for each of the waiting thread.
After the scheduler thread has signaled all the events, it leaves the critical sections - first of the thread list, and then of the jobs list.
After the scheduler thread has signaled all the events needed for the particular case, it goes to sleep itself, i.e. calls WaitForSingleObject with its own sleep event (that is also an auto-reset event that should be signaled whenever a new job appears).
Since the jobs threads will not start to sleep until the whole jobs list is depleted, you will no longer need the scheduler thread again. The scheduler thread will only be needed later, when a new jobs appears, not when a job is finished by the jobs thread.
Important: this scheme is based purely on auto-reset events. You won’t ever need to call ResetEvent. All the functions that are needed are: SetEvent and WaitForMultipleObjects (or WaitForSingleObject). No atomic event operation is needed.
Please note: when I wrote that a thread sleeps, it doesn't call "Sleep" API call - it will never be needed, it just is in the "wait" state as a result of calling WaitForMultipleObjects (or WaitForSingleObject).
As you know, auto-reset event, and the SetEvent and WaitForMultipleObjects function are very reliable. They exist since NT 3.1. You may always architect such a program logic that will solely rely on these simple functions -- so you would not ever need complex and unreliable functions that presume atomic operations, like PulseEvent or SignalObjectAndWait. By the way, SignalObjectAndWait did only appear in Windows NT 4.0, while SetEvent and WaitForMultipleObjects did exist from the initial version of Win32 – NT 3.1.

platform independent inter thread communication

I have a process which receives multiple jobs and picks a thread from thread pool and assigns a job to it, this thread in turn may spawn another set of threads from its own thread pool. Now when a STOP request for a job comes to the main process, it should be forwarded to corresponding thread for that request and all the threads associated with that job should clean themselves up and exit, My question is how to notify the worker threads about "STOP".
A global variable can be used and worker threads can poll it frequently but there are lot of functions that a worker can be doing, and adding checks everywhere could work.
Is there a clean approach? some kind of messaging layer. btw the code is C++
The Boost.Thread library is a wrapper around pthreads that's also portable to Windows. The boost::thread class has an interrupt() method that'll interrupt the thread at the next interruption point.
Boost.Thread also has a thread_group class which provides a collection of related threads. thread_group also has an interrupt() method that invokes interrupt() on each thread in the thread group.

interrupting boost::asio::async_receive_from from another thread

I am reading multicast input using async_receive_from. So the idea is that when I detect a gap, I will notify another helper thread to request/get the gap filling messages. While this is in the works the main thread will continue to receive and queue any incoming messages. This part I can implement. The other thread can use waitforsingleobject and I can pass it the details through shared memory and notify an event to wake it up.
But once it completes it task, how do I get the helper thread to interrupt the async_receive_from in the initiating thread? And when it comes up out of the the read it knows who interrupted so it will then know what to do next?
Why are you using shared memory between threads?
That aside, the mechanism you should use for executing something in the context of the io_service which is managing the socket is post(). You can post any arbitrary event to the io_service, and it will execute in that context. Quite easy really... Because you are calling async_receive_from, it's not blocking, i.e. the io_service can dispatch other events, which is why the post will work.

Is a signal sent with kill to a parent thread guaranteed to be processed before the next statement?

Okay, so if I'm running in a child thread on linux (using pthreads if that matters), and I run the following command
kill(getpid(), someSignal);
it will send the given signal to the parent of the current thread.
My question: Is it guaranteed that the parent will then immediately get the CPU and process the signal (killing the app if it's a SIGKILL or doing whatever else if it's some other signal) before the statement following kill() is run? Or is it possible - even probable - that whatever command follows kill() will run before the signal is processed by the parent thread?
No, it's not guaranteed.
In general, you cannot make any assumptions about the timing of events happening in separate threads (or processes) unless you use an explicit synchronization mechanism (for example, a phtread_mutex or a semaphore).
This is especially true on multi-CPU (or multi-core) systems, where multiple threads can be running literally simultaneously on separate CPUs, but even on a single-CPU system there are no guarantees.
Signals get delivered asynchronously, so you can't expect the thread handling them to handle them immediately; moreover, it will have to do some work to handle it.
And if a sigprocmask() call had masked the signal in all threads, the signal will only be acted upon after it is unmasked.
Signals don't go to any particular thread, unless you have used sigprocmask to mask them from the threads you don't want to get them. Most multithreaded programs do this, as having process-level signals delivered to arbitrary threads is usually not what you want.
Signals sent to a process (thread group) may generally get delivered to any thread, and you generally have no guarantee that the handler will have finished before the kill call returns.
If you run
kill(getpid(), someSignal);
in a multithreaded process then you can only be sure that your sighandler will run before kill returns in a very specific scenario, where
all but the calling threads have someSignal blocked (in which case, the sig handler will run from the thread calling kill).
See
http://pubs.opengroup.org/onlinepubs/009695399/functions/kill.html :
If the value of pid causes sig to be generated for the sending
process, and if sig is not blocked for the calling thread and if no
other thread has sig unblocked or is waiting in a sigwait() function
for sig, either sig or at least one pending unblocked signal shall be
delivered to the sending thread before kill() returns.