Asynchronous Completion Handling

Asynchronous Completion Handling - c++

I have this situation:
void foo::bar()
{
RequestsManager->SendRequest(someRequest, this, &foo::someCallback);
}
where RequestsManager works in asynchronous way:
SendRequest puts the request in a queue and returns to the caller
Other thread gets the requests from the queue and process them
When one request is processed the callback is called
Is it possible to have foo::someCallback called in the same thread as SendRequest? If not, how may I avoid following "callback limitation": callbacks should not make time consuming operations to avoid blocking the requests manager.

No - calls/callbacks cannot change thread context - you have to issue some signal to communicate between threads.
Typically, 'someCallback' would either signal an event upon which the thread that originated the 'SendRequest' call is waiting on, (synchronous call), or push the SendRequest, (and so, presumably, results from its processing), onto a queue upon which the thread that originated the 'SendRequest' call will eventually pop , (asynchronous). Just depends on how the originator wshes to be signaled..
Aynch example - the callback might PostMessage/Dispatcher.BeginInvoke the completed SendRequest to a GUI thread for display of the results.

I can see few ways how to achieve it:
A) Implement strategy similar to signal handling
When request processing is over RequestManager puts callback invocation on the waiting list. Next time SendRequest is called, right before returning execution it will check are there any pending callbacks for the thread and execute them. This is relatively simple approach with minimal requirements on the client. Choose it if latency is not of a concern. RequestManager can expose API to forcefully check for pending callbacks
B) Suspend callback-target thread and execute callback in the third thread
This will give you true asynchronous solution with all its caveats. It will look like target-thread execution got interrupted and execution jumped into interrupt handler. Before callback returns target thread needs to be resumed. You wont be able to access thread local storage or original thread's stack from inside the callback.

Depends on "time-consuming operations"'s definition.
The classic way to do this is:
when the request is processed, the RequestManager should execute that &foo::someCallback
to avoid blocking the request manager, you may just rise a flag inside this callback
check that flag periodically inside the thread, which called RequestsManager->SendRequest
This flag will be just a volatile bool inside class foo
If you want to make sure, that the calling thread (foo's) will understand immediately, that the request has been processed, you need additional synchronization.
Implement (or use already implemented) blocking pipe (or use signals/events) between these threads. The idea is:
foo's thread executes SendRequest
foo starts sleeping on some select (for example)
RequestManager executes the request and:
calls &foo::someCallback
"awakes" the foo's thread (by sending something in that file descriptor, which foo sleeps on (using select))
foo is awaken
checks the volatile bool flag for already processed request
does what it needs to do
annuls the flag

Related

Thread safety of curl_multi_remove_handle

It seems like some sources recommend using curl_multi_remove_handle to "invalidate" a curl handle and cause curl_multi_wait to return early. This seems not to be covered under the thread safety guarantee (if done from another thread), or am I wrong (the threads safety guarantees are basically just reentrancy guarantees)?
What is the recommended way signal curl_multi_wait to return early? Is it really required to do it via timeouts? (Under Linux, I would use an eventfd in the epoll set to effectively have the case "wait on these sockets OR this event fd OR the given timeout".) It seems I could use custom curl_waitfd structures, but this would require platform specific setup for dummy sockets.

You must not call curl_multi_remove_handle from thread B if curl_multi_wait for that handle is running in thread A. That will just cause tears and misery.
You can opt to, for example:
user sufficiently short timeouts for curl_multi_wait() so that you don't need to abort it
add a private socket/file descriptor to send data on to abort when you want to
return error from the progress callback (or another callback) for the transfer(s) you need to stop - by setting a flag that they all check (global, or global like)
rework your app logic so that you can consider the transfer to "dead" without it having stopped yet, and have libcurl have its cause and close it later and you don't have to care much about it being done a bit after you decided you can ignore it.
curl_multi_poll()
After I first wrote this answer, we introduced curl_multi_poll in libcurl. This function is very similar to curl_multi_wait but also allows it to pre-emptively return with the use of curl_multi_wakeup, thus offering applications a few more alternative approaches.

Unfortunately, curl_multi is not, what people these days would deem as "thread safe". Yes, you can use a CURLM handle in two different threads, as long, as they don't access it at the same time. But hey, this is true for almost any data structure in C or C++.
So, if you have one thread running an event loop with curl_multi_wait(), you cannot use a second thread to add new jobs via curl_multi_add_handle() or remove jobs via curl_multi_remove_handle(). Well, it will work most of the times, but especially during high load, you will start getting data corruptions and segfaults due to the concurrent access to libcurl's internal data structures.
There are two ways around this problem, but both require a bit of coding:
Use the newer curl_multi_poll() interface, which (unlike curl_multi_wait()) is externally interruptible via curl_multi_wakeup(). Yes, curl_multi_wakeup() is the ONLY function on CURLM handles, that is safe to call concurrently from another thread (or even multiple threads). To add new requests to the event loop or remove requests from it, you would need some request queue and a mutex, which secures access to that queue. Then, to add a new job, you would do:
(thread 1 is running curl_multi_poll() in an endless loop)
thread 2 acquires said mutex
thread 2 posts an "add easy handle request" into the request queue
thread 2 releases said mutex again
thread 2 calls curl_multi_wakeup()
thread 1 acquires the mutex after curl_multi_poll() returns
thread 1 then processes the "add easy handle request" in the job list and performs curl_multi_add_handle()
thread 1 then releases the mutex again
thread 1 does all other necessary work (in particular call curl_multi_perform() and pass finished transfers to the application etc.)
thread 1 calls curl_multi_poll() again
To remove a job, you would use the same procedure, just let thread 2 post an "remove easy handle request" instead of an "add easy handle request" to the request queue and then let thread 1 call curl_multi_remove_handle() instead of curl_multi_add_handle().
In this solution, ALL calls to the CURLM handle are performed from thread 1, with the sole exception of curl_multi_wakeup(), which is used by other threads to signal thread 1 of new work waiting in the request queue.
Or use the curl_action() interface, where you have to provide two callbacks to libcurl, with which it reports file descriptors to watch and a timeout to your application. You then have to call epoll() or a similiar OS function yourself to wait for activity (or timeout) in the event loop thread. Then add a mutex again to serialize access to the CURLM handle: Your event loop thread should lock that mutex just before it calls curl_action() (or any other function on the CURLM handle) and unlock it immediately after. As curl_action() (unlike curl_multi_poll()) does not sleep, that mutex will be locked only for brief intervals. So other threads can then easily directly lock that mutex for themselves, too, and call curl_multi_add_handle() or curl_multi_remove_handle() as needed. Be aware, though, that those intervening additions or removals of handles can modify the active FD set, and that you may need some synchronisation with the event loop thread to notify it of the modified epoll() set.
The first solution is likely easier to implement. You should be able to find libcurl wrappers for both variants on Github, but be sure to test them intensively before using them in any critical application.

What does inside a strand mean?

I'm currently trying to get my hands on boost::asio strands. Doing so, I keep reading about "invoking strand post/dispatch inside or outside a strand". Somehow I can't figure out how inside a strand differs from through a strand, and therefore can't grasp the concept of invoking a strand function outside the strand at all.
Probably there is just a small piece missing in my puzzle. Can somebody please give an example how calls to a strand can be inside or outside it?
What I think I've understood so far is that posting something through a strand would be
m_strand.post(myfunctor);
or
m_strand.wrap(myfunctor);
io_svc.post(myfunctor);
Is the latter considered a call to dispatch outside the strand (as opposed to the other being a call to post inside it)? Is there some relation between the strand's "inside realm" and the threads the strand operates on?
If being inside a strand simply meant to invoke a strand's function, then the strand class's documentation would be pointless. It states that strand::post can be invoked outside the strand... That's precisely the part I don't understand.

Even I had some trouble in understanding this concept, but became clear once I started working on libdispatch. It helped me map things with asio better.
Now lets see how to make some sense out of strand. Consider strand as a serial queue of handlers which needs to be executed.
Now, where does these handlers get executed ? Within the worker threads.
Where did these worker threads come from ? From the io_service object you passed while creating the strand.
Something like:
asio::strand s(io_serv_obj);
Now, as you must be knowing, the io_service::run can be called by a single thread or multiple threads. The threads calling the run method of the io_serv_obj are the worker threads for that strand in our case. So, it could be either single threaded or multithreaded.
Coming back to strands, when you post a handler, that handler is always enqueued in the serial queue which we talked about. The worker threads will pick up the handler from the queue one after the other.
Now, when you do a dispatch, asio does some optimization for you:
It checks whether you are calling it from inside one of the worker thread or from some other thread (maybe of some other io_service instance). When it is called outside the current execution context of the strand, thats when it is called outside the strand. So, in the outside case, the dispatch will just enqueue the handler like post when there are other handlers waiting in the queue or will call it directly when it can guarantee that it will not be called concurrently with any other handler from that queue that may be running in one of the worker threads at that moment.
UPDATE:
As noted in the comments section, inside means called within another handler i.e for eg: I posted a handler A and inside that handler, I am doing a dispatch of another handler. Now, as would be explained in #2, if there are no other handlers waiting in the strands serial queue, the dispatch handler will be called synchronously. If this condition is not met, that means, the dispatch is called from outside.
Now, if you call dispatch from outside of the strand i.e not within the current execution context, asio checks its callstack to see if any other handler present in its serial queue is running or not. If not, then it will directly call that handler synchronously. So, there is no cost of enqueueing the handler (I think no extra allocation will be done as well, not sure though).
Lets see the documentation link now:
s.dispatch(a) happens-before s.post(b), where the former is performed
outside the strand
This means that, if dispatch was called from some outside the current run OR there are other handlers already enqueued, then it needs to enqueue the handler, it just cannot call it synchronously. Since its a serial queue, a will get executed before b.
Had there been another call s.dispatch(c) along with a and b but before a and b(in the mentioned order) enqueued, then c will get executed before a and b, but in no way b can get executed before a.
Hope this clears your doubt.

For a given strand object s, running outside s implies that s.running_in_this_thread() returns false. This returns true if the calling thread is executing a handler that was submitted to the strand via post(), dispatch(), or wrap(). Otherwise, it returns false:
io_service.post(handler); // handler will run outside of strand
strand.post(handler); // handler will run inside of strand
strand.dispatch(handler); // handler will run inside of strand
io_service.post(strand.wrap(handler)); // handler will run inside of strand
Given:
a strand object s
a function object f1 that is added to strand s via s.post(), or s.dispatch() when s.running_in_this_thread() == false
a function object f2 that is added to strand s via s.post(), or s.dispatch() when s.running_in_this_thread() == false
then the strand provides a guarantee of ordering and non-concurrency, such that f1 and f2 will not be invoked concurrently. Furthermore, if the addition of f1 happens before the addition of f2, then f1 will be invoked before f2.

Does SleepEx guarantee that all pending completion callbacks get called before timeout?

I have a C++ program that uses overlapped IO for network communication. The main thread has a loop that calls SleepEx(5, true);. There are also two TCP sockets. I assume that the completion callbacks are called during the alertable wait. Assume also that by the time SleepEx gets called both of my TCP connections have received some data. Now the question is what happens if the first completion callback takes longer than 5ms? Does the SleepEx return after calling the first callback or does it also call the second callback? In other words does the SleepEx guarantee to call ALL of the scheduled completion callbacks? This is not clear because the documentation says it will return when at least one of the events meet...

Your code must not assume that both APCs will be called before SleepEx() returns. Conversely, it must not assume that a pending APC will not be called simply because the specified wait period has expired.
The only behaviour that you can rely upon is that if one or more APCs are pending, at least one will be executed.
Generally speaking, best practice is to wait for APCs in a loop that does nothing else, using an infinite timeout in the wait. If you need to do something periodically, you can use a waitable timer to generate an APC periodically.
Alternatively, you can use WaitForSingleObjectEx() or WaitForMultipleObjectsEx() to detect when a waitable timer or other synchronization object is triggered, while still handling APCs.
However, if you must perform some periodic action that cannot be handled in an APC or be triggered by a synchronization object, you can use nested loops: the inner loop does nothing but call the wait repeatedly (with a timeout period reduced by however long the loop has already been running) and the outer loop performs the periodic action.
If you must perform some periodic action that cannot be delayed by pending APCs, you will need to do it in a separate thread. Note that because Windows is not a real-time OS, you will still not be able to guarantee that any given action will take place within any particular timeframe, although you can reduce the risk by increasing the thread priority.

Is locking necessary when using moveToThread

I searched this site and QT documentation, but could not find and direct answer for the following question:
Lets say I have a worker class with only one slot:
void Worker::testSlot(){
//access data and do some calculation
}
Now if this slot is connected to signal from other classes running on other thread, and if queued connection is used, is it necessary to use lock (QMutexLocker) before accessing data in worker? I think it is not needed since the testSlot() is executed in one thread always (the thread in which worker is moved), and thus it is synchronized. Even if two signals were emitted from different thread at the same time, there is no way to suspend executing the slot in half-way for the first signal and start for second signal. But I am not sure about this.

You're 100% correct.
The key bit of information is that emission of a signal connected to an object in a different thread via a queued or automatic connection results in posting a QMetaCallEvent to the target object. It doesn't directly result in any calls at all.
The event loop running in the thread where the target object resides has toy deliver the event to the object - you can verify that by properly overriding the event method and outputting a debug message when the event has the MetaCall type. Remember to call the base class's method in your reimplementation. Since the event loop runs synchronously, it executes the calls serially. Thus no additional serialization-of-access means are necessary. It doesn't matter what thread the meta call event was posted from - the thread per se is not used for the posting, and the event queue will look the same whether a number of events was posted from one thread, or multiple threads.
It is the QObject::event method that handles the QMetaCallEvent and executes the call. The call may be to a slot, an invokable method, a constructor/destructor, or a functor that is to execute in a given object's thread context.

Threads Waiting for Event Do Not Always Catch Event Signal

I have an application wherein multiple threads wait on the same event object to signal. The problem I am seeing appears to be a type of race condition in that sometimes some threads' wait states (WaitForMultipleObjects) return as a result of the event signal and other threads' wait states apparently don't see the event signal because they don't return. These events were created using CreateEvent as manual-reset event objects.
My application handles these events such that when an event object is signaled, its "owner" thread is responsible for resetting the event object's signal state, as shown in the following code snippet. Other threads waiting on the same event do not attempt to reset its signal state.
switch ( dwObjectWaitState = ::WaitForMultipleObjects( i, pHandles, FALSE, INFINITE ) )
{
case WAIT_OBJECT_0 + BAS_MESSAGE_READY_EVT_ID:
::ResetEvent( pHandles[BAS_MESSAGE_READY_EVT_ID] );
/* handles the event */
break;
}
To put it another way, the problem I am seeing appears to be to what is described in the Remarks section for PulseEvent on the MSDN website:
If the call to PulseEvent occurs
during the time when the thread has
been removed from the wait state, the
thread will not be released because
PulseEvent releases only those threads
that are waiting at the moment it is
called. Therefore, PulseEvent is
unreliable and should not be used by
new applications. Instead, use
condition variables.
If this is what is happening, the only solution I can see is for each thread to register its usage of a given event object with that object's owner thread, so that the owner thread can determine when it is safe to reset the event object's signal state.
Is there a better way to do this? Thanks.

Yes there is a better way:
[...] Instead, use condition variables.
http://msdn.microsoft.com/en-us/library/ms682052(v=vs.85).aspx
Look for WakeAllConditionVariable specificly

Why PulseEvent() is Unreliable and What to Do Without It
The auto-reset event is king!
PulseEvent did only appear in Windows NT 4.0. It did not exist in the original Windows NT 3.1. To the contrary, the reliable functions like CreateEvent, SetEvent and WaitForMultipleObjects did exist from start of the Windows NT, so consider using them.
The CreateEvent function has the bManualReset argument. If this parameter is TRUE, the function creates a manual-reset event object, which requires the use of the ResetEvent function to set the event state to non-signaled. This is not what you need. If this parameter is FALSE, the function creates an auto-reset event object, and system automatically resets the event state to non-signaled after a single waiting thread has been released.
These auto-reset events are very reliable and easy to use.
If you wait for an auto-reset event object with WaitForMultipleObjects or WaitForSingleObject, it reliably resets the event upon exit from these wait functions.
So create events the following way:
EventHandle := CreateEvent(nil, FALSE, FALSE, nil);
Wait for the event from one thread and do SetEvent from another thread. This is very simple and very reliable.
Don’t' ever call ResetEvent (since it automatically reset) or PulseEvent (since it is not reliable and deprecated). Even Microsoft has admitted that PulseEvent should not be used. See https://msdn.microsoft.com/en-us/library/windows/desktop/ms684914(v=vs.85).aspx
This function is unreliable and should not be used, because only those threads will be notified that are in the "wait" state at the moment PulseEvent is called. If they are in any other state, they will not be notified, and you may never know for sure what the thread state is. A thread waiting on a synchronization object can be momentarily removed from the wait state by a kernel-mode Asynchronous Procedure Call, and then returned to the wait state after the APC is complete. If the call to PulseEvent occurs during the time when the thread has been removed from the wait state, the thread will not be released because PulseEvent releases only those threads that are waiting at the moment it is called.
You can find out more about the kernel-mode Asynchronous Procedure Calls at the following links:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms681951(v=vs.85).aspx
http://www.drdobbs.com/inside-nts-asynchronous-procedure-call/184416590
http://www.osronline.com/article.cfm?id=75
We have never used PulseEvent in our applications. As about auto-reset events, we are using them since Windows NT 3.51 and they work very well.
What to Do when Multiple Threads Waiting for a Single Object
Unfortunately, your case is a little bit more complicated. You have multiple threads waiting for an event, and you have to make sure that all the threads did in fact receive the notification. There is no other reliable way other than to create own event for each thread.
You wrote theat "the only solution I can see is for each thread to register its usage of a given event object with that object's owner thread". This is correct.
You also wrote that "the owner thread can determine when it is safe to reset the event object's signal state" - this is impractical and unsafe. The best way is to use the auto-reset events, so they will reset themselves automatically.
So, you will need to have as many events as are the threads. Besides that, you will need to keep a list of registered threads. So, to notify all the threads, you will have to do SetEvent in a loop for all the event handles. This is a very fast, reliable and cheap way. Events are much cheaper than threads. So, the number of threads is an issue, not the number of events. There is virtually no limit on the kernel objects - the per-process limit on kernel handles is 2^24.

Use conditional variable as in PulseEvent description. The only problem is that native conditional variable on windows was implemented starting from Vista so older system like XP doesn't have it. But you can emulate conditional variable using some other synchronization objects (http://www1.cse.wustl.edu/~schmidt/win32-cv-1.html) but I think the easiest way is to use conditional variable from boost library and its notify_all method to wake up all threads (http://www.boost.org/doc/libs/1_41_0/doc/html/thread/synchronization.html#thread.synchronization.condvar_ref)
Another possibility (but not very beautiful) is to create one event for each thread and when right now you have PulseEvent you can call SetEvent for all of them. For this solution probably auto-reset events would work better.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js