Libuv: protecting the event loop from concurrent accesses

Libuv: protecting the event loop from concurrent accesses - c++

I would like to know what precautions are needed to be able to safely add callbacks to a libuv event loop from multiple threads in C++.
More details
I have some multi-threaded C++11 code that I want to modify to use make use of libuv's network communication API. I do not want to create a new libuv event loop every time network communication is required (for that would use up resources). So I created a libuv loop in a separate thread (I prevent the loop from closing by registering a "keep-alive" timer). This event loop is currently passed to other threads using a singleton. Callbacks are then registered (from other threads) while the loop is running.
I am worried about concurrent accesses to the libuv event loop when registering new callbacks: when calling uv_tcp_init the loop is explicitly passed (rather, a pointer to the loop); when calling uv_tcp_connect the loop is not explicitly mentioned but a pointer to it is stored in the uv_tcp_t struct passed. I haven't checked whether any of the above-mentioned functions actually modify the loop, but my intuition is at least one of them must do to (otherwise, libuv couldn't keep track of active handles).
My first thought was to add a mutex attribute to the singleton used to access the event loop and use it to prevent concurrent access to the event loop when calling any of the above functions:
EventLoop & loop = EventLoop::get(); // Access the singleton
{
std::lock_guard<std::mutex> lock(loop.mutex_attribute);
// Register callbacks, etc
}
However, this does not protect the event loop from concurrent accesses between my thread (which successfully acquired the lock) and some libuv internal function (or a registered callback triggered by libuv) since the latter are not aware of my using a singleton to protect access.
Should I be worried about said concurrent accesses? What steps may I take to mitigate the risks?

The solution I settled for was to not add handles directly to the libuv event loop from other threads, but rather to have other threads add handles to a queue (stored in the same singleton as the pointer to the event loop). Access to the queue is protected by a mutex.
The "keep-alive" timer then periodically empties the queue (the timer callback is aware of the mutex protecting the queue) by:
getting the first handle from the queue,
registering that handle with the libuv event loop (since we register the handle from a callback within the libuv event loop, there shouldn't be any risks of concurrent access), and performing any other operation needed on this handle (in my case, call uv_tcp_init and uv_tcp_connect),
repeating until the queue is empty.

Related

Thread safety of curl_multi_remove_handle

It seems like some sources recommend using curl_multi_remove_handle to "invalidate" a curl handle and cause curl_multi_wait to return early. This seems not to be covered under the thread safety guarantee (if done from another thread), or am I wrong (the threads safety guarantees are basically just reentrancy guarantees)?
What is the recommended way signal curl_multi_wait to return early? Is it really required to do it via timeouts? (Under Linux, I would use an eventfd in the epoll set to effectively have the case "wait on these sockets OR this event fd OR the given timeout".) It seems I could use custom curl_waitfd structures, but this would require platform specific setup for dummy sockets.

You must not call curl_multi_remove_handle from thread B if curl_multi_wait for that handle is running in thread A. That will just cause tears and misery.
You can opt to, for example:
user sufficiently short timeouts for curl_multi_wait() so that you don't need to abort it
add a private socket/file descriptor to send data on to abort when you want to
return error from the progress callback (or another callback) for the transfer(s) you need to stop - by setting a flag that they all check (global, or global like)
rework your app logic so that you can consider the transfer to "dead" without it having stopped yet, and have libcurl have its cause and close it later and you don't have to care much about it being done a bit after you decided you can ignore it.
curl_multi_poll()
After I first wrote this answer, we introduced curl_multi_poll in libcurl. This function is very similar to curl_multi_wait but also allows it to pre-emptively return with the use of curl_multi_wakeup, thus offering applications a few more alternative approaches.

Unfortunately, curl_multi is not, what people these days would deem as "thread safe". Yes, you can use a CURLM handle in two different threads, as long, as they don't access it at the same time. But hey, this is true for almost any data structure in C or C++.
So, if you have one thread running an event loop with curl_multi_wait(), you cannot use a second thread to add new jobs via curl_multi_add_handle() or remove jobs via curl_multi_remove_handle(). Well, it will work most of the times, but especially during high load, you will start getting data corruptions and segfaults due to the concurrent access to libcurl's internal data structures.
There are two ways around this problem, but both require a bit of coding:
Use the newer curl_multi_poll() interface, which (unlike curl_multi_wait()) is externally interruptible via curl_multi_wakeup(). Yes, curl_multi_wakeup() is the ONLY function on CURLM handles, that is safe to call concurrently from another thread (or even multiple threads). To add new requests to the event loop or remove requests from it, you would need some request queue and a mutex, which secures access to that queue. Then, to add a new job, you would do:
(thread 1 is running curl_multi_poll() in an endless loop)
thread 2 acquires said mutex
thread 2 posts an "add easy handle request" into the request queue
thread 2 releases said mutex again
thread 2 calls curl_multi_wakeup()
thread 1 acquires the mutex after curl_multi_poll() returns
thread 1 then processes the "add easy handle request" in the job list and performs curl_multi_add_handle()
thread 1 then releases the mutex again
thread 1 does all other necessary work (in particular call curl_multi_perform() and pass finished transfers to the application etc.)
thread 1 calls curl_multi_poll() again
To remove a job, you would use the same procedure, just let thread 2 post an "remove easy handle request" instead of an "add easy handle request" to the request queue and then let thread 1 call curl_multi_remove_handle() instead of curl_multi_add_handle().
In this solution, ALL calls to the CURLM handle are performed from thread 1, with the sole exception of curl_multi_wakeup(), which is used by other threads to signal thread 1 of new work waiting in the request queue.
Or use the curl_action() interface, where you have to provide two callbacks to libcurl, with which it reports file descriptors to watch and a timeout to your application. You then have to call epoll() or a similiar OS function yourself to wait for activity (or timeout) in the event loop thread. Then add a mutex again to serialize access to the CURLM handle: Your event loop thread should lock that mutex just before it calls curl_action() (or any other function on the CURLM handle) and unlock it immediately after. As curl_action() (unlike curl_multi_poll()) does not sleep, that mutex will be locked only for brief intervals. So other threads can then easily directly lock that mutex for themselves, too, and call curl_multi_add_handle() or curl_multi_remove_handle() as needed. Be aware, though, that those intervening additions or removals of handles can modify the active FD set, and that you may need some synchronisation with the event loop thread to notify it of the modified epoll() set.
The first solution is likely easier to implement. You should be able to find libcurl wrappers for both variants on Github, but be sure to test them intensively before using them in any critical application.

Is std::queue having event mechanism( Signals in std::queue)

Is there any event mechanism or predefined signals in the queue. If any data or message comes in the queue the queue should generate an event saying data is ready to process. Or signal other thread to do his task Instead of continuously polling to the queue.
In posix Message Queue there is function like mq_notify() which will notify to other process or thread if any data comes in the Message queue so we can avoid Polling.
Edit
If not, So how can I achieve this on std::queue. I want to avoid polling continuously it is slowing down the performance of the code.
Whenever some event occur on the queue it should notify to others.

std::queue is a containter type, not an event mechanism. I recommend making a class around the queue that implements a message queue.
EDIT:
Ok, so
So I recommend using an std::queue, std::mutex, and a std::condition_variable, if you use boost that has the same types. Putting those in your new Queue class and when pushing, you would lock the mutex, push onto the queue, unlock the mutex, and notify_one() the condition. That way the condition variable is notified only when pushed. You can do the same on pop.

There are two approaches to this. The simplest is to have an asynchronous queue, implemented using a mutex and condition variable, on which a thread blocks, waiting for another thread to push something onto the queue. This is a very common idiom for task dispatching and here are two simple implementations:
http://www.justsoftwaresolutions.co.uk/threading/implementing-a-thread-safe-queue-using-condition-variables.html and
http://cxx-gtk-utils.sourceforge.net/2.2/classCgu_1_1AsyncQueueDispatch.html
By using a list rather than a deque as the queue container you can allocate new nodes outside the queue's mutex which significantly improves performance under high contention (see the source code for the second link mentioned above for an example using std::list::splice to achieve this).
Instead of having a designated thread block on the asynchronous queue, after a thread places an item on the queue it could instead invoke an event in the program's event loop which executes a callback which extracts the item from the queue and does something with it. Implementing this is more OS-specific but see http://www.appinf.com/docs/poco/Poco.NotificationQueue.html and http://cxx-gtk-utils.sourceforge.net/2.2/classCgu_1_1Notifier.html for different approaches to this.

Threads Waiting for Event Do Not Always Catch Event Signal

I have an application wherein multiple threads wait on the same event object to signal. The problem I am seeing appears to be a type of race condition in that sometimes some threads' wait states (WaitForMultipleObjects) return as a result of the event signal and other threads' wait states apparently don't see the event signal because they don't return. These events were created using CreateEvent as manual-reset event objects.
My application handles these events such that when an event object is signaled, its "owner" thread is responsible for resetting the event object's signal state, as shown in the following code snippet. Other threads waiting on the same event do not attempt to reset its signal state.
switch ( dwObjectWaitState = ::WaitForMultipleObjects( i, pHandles, FALSE, INFINITE ) )
{
case WAIT_OBJECT_0 + BAS_MESSAGE_READY_EVT_ID:
::ResetEvent( pHandles[BAS_MESSAGE_READY_EVT_ID] );
/* handles the event */
break;
}
To put it another way, the problem I am seeing appears to be to what is described in the Remarks section for PulseEvent on the MSDN website:
If the call to PulseEvent occurs
during the time when the thread has
been removed from the wait state, the
thread will not be released because
PulseEvent releases only those threads
that are waiting at the moment it is
called. Therefore, PulseEvent is
unreliable and should not be used by
new applications. Instead, use
condition variables.
If this is what is happening, the only solution I can see is for each thread to register its usage of a given event object with that object's owner thread, so that the owner thread can determine when it is safe to reset the event object's signal state.
Is there a better way to do this? Thanks.

Yes there is a better way:
[...] Instead, use condition variables.
http://msdn.microsoft.com/en-us/library/ms682052(v=vs.85).aspx
Look for WakeAllConditionVariable specificly

Why PulseEvent() is Unreliable and What to Do Without It
The auto-reset event is king!
PulseEvent did only appear in Windows NT 4.0. It did not exist in the original Windows NT 3.1. To the contrary, the reliable functions like CreateEvent, SetEvent and WaitForMultipleObjects did exist from start of the Windows NT, so consider using them.
The CreateEvent function has the bManualReset argument. If this parameter is TRUE, the function creates a manual-reset event object, which requires the use of the ResetEvent function to set the event state to non-signaled. This is not what you need. If this parameter is FALSE, the function creates an auto-reset event object, and system automatically resets the event state to non-signaled after a single waiting thread has been released.
These auto-reset events are very reliable and easy to use.
If you wait for an auto-reset event object with WaitForMultipleObjects or WaitForSingleObject, it reliably resets the event upon exit from these wait functions.
So create events the following way:
EventHandle := CreateEvent(nil, FALSE, FALSE, nil);
Wait for the event from one thread and do SetEvent from another thread. This is very simple and very reliable.
Don’t' ever call ResetEvent (since it automatically reset) or PulseEvent (since it is not reliable and deprecated). Even Microsoft has admitted that PulseEvent should not be used. See https://msdn.microsoft.com/en-us/library/windows/desktop/ms684914(v=vs.85).aspx
This function is unreliable and should not be used, because only those threads will be notified that are in the "wait" state at the moment PulseEvent is called. If they are in any other state, they will not be notified, and you may never know for sure what the thread state is. A thread waiting on a synchronization object can be momentarily removed from the wait state by a kernel-mode Asynchronous Procedure Call, and then returned to the wait state after the APC is complete. If the call to PulseEvent occurs during the time when the thread has been removed from the wait state, the thread will not be released because PulseEvent releases only those threads that are waiting at the moment it is called.
You can find out more about the kernel-mode Asynchronous Procedure Calls at the following links:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms681951(v=vs.85).aspx
http://www.drdobbs.com/inside-nts-asynchronous-procedure-call/184416590
http://www.osronline.com/article.cfm?id=75
We have never used PulseEvent in our applications. As about auto-reset events, we are using them since Windows NT 3.51 and they work very well.
What to Do when Multiple Threads Waiting for a Single Object
Unfortunately, your case is a little bit more complicated. You have multiple threads waiting for an event, and you have to make sure that all the threads did in fact receive the notification. There is no other reliable way other than to create own event for each thread.
You wrote theat "the only solution I can see is for each thread to register its usage of a given event object with that object's owner thread". This is correct.
You also wrote that "the owner thread can determine when it is safe to reset the event object's signal state" - this is impractical and unsafe. The best way is to use the auto-reset events, so they will reset themselves automatically.
So, you will need to have as many events as are the threads. Besides that, you will need to keep a list of registered threads. So, to notify all the threads, you will have to do SetEvent in a loop for all the event handles. This is a very fast, reliable and cheap way. Events are much cheaper than threads. So, the number of threads is an issue, not the number of events. There is virtually no limit on the kernel objects - the per-process limit on kernel handles is 2^24.

Use conditional variable as in PulseEvent description. The only problem is that native conditional variable on windows was implemented starting from Vista so older system like XP doesn't have it. But you can emulate conditional variable using some other synchronization objects (http://www1.cse.wustl.edu/~schmidt/win32-cv-1.html) but I think the easiest way is to use conditional variable from boost library and its notify_all method to wake up all threads (http://www.boost.org/doc/libs/1_41_0/doc/html/thread/synchronization.html#thread.synchronization.condvar_ref)
Another possibility (but not very beautiful) is to create one event for each thread and when right now you have PulseEvent you can call SetEvent for all of them. For this solution probably auto-reset events would work better.

Lightest synchronization primitive for worker thread queue

I am about to implement a worker thread with work item queuing, and while I was thinking about the problem, I wanted to know if I'm doing the best thing.
The thread in question will have to have some thread local data (preinitialized at construction) and will loop on work items until some condition will be met.
pseudocode:
volatile bool run = true;
int WorkerThread(param)
{
localclassinstance c1 = new c1();
[other initialization]
while(true) {
[LOCK]
[unqueue work item]
[UNLOCK]
if([hasWorkItem]) {
[process data]
[PostMessage with pointer to data]
}
[Sleep]
if(!run)
break;
}
[uninitialize]
return 0;
}
I guess I will do the locking via critical section, as the queue will be std::vector or std::queue, but maybe there is a better way.
The part with Sleep doesn't look too great, as there will be a lot of extra Sleep with big Sleep values, or lot's of extra locking when Sleep value is small, and that's definitely unnecessary.
But I can't think of a WaitForSingleObject friendly primitive I could use instead of critical section, as there might be two threads queuing work items at the same time. So Event, which seems to be the best candidate, can loose the second work item if the Event was set already, and it doesn't guarantee a mutual exclusion.
Maybe there is even a better approach with InterlockedExchange kind of functions that leads to even less serialization.
P.S.: I might need to preprocess the whole queue and drop the obsolete work items during the unqueuing stage.

There are a multitude of ways to do this.
One option is to use a semaphore for the waiting. The semaphore is signalled every time a value is pushed on the queue, so the worker thread will only block if there are no items in the queue. This will still require separate synchronization on the queue itself.
A second option is to use a manual-reset event which is set when there are items in the queue and cleared when the queue is empty. Again, you will need to do separate synchronization on the queue.
A third option is to have an invisible message-only window created on the thread, and use a special WM_USER or WM_APP message to post items to the queue, attaching the item to the message via a pointer.
Another option is to use condition variables. The native Windows condition variables only work if you're targetting Windows Vista or Windows 7, but condition variables are also available for Windows XP with Boost or an implementation of the C++0x thread library. An example queue using boost condition variables is available on my blog: http://www.justsoftwaresolutions.co.uk/threading/implementing-a-thread-safe-queue-using-condition-variables.html

It is possible to share a resource between threads without using blocking locks at all, if your scenario meets certain requirements.
You need an atomic pointer exchange primitive, such as Win32's InterlockedExchange. Most processor architectures provide some sort of atomic swap, and it's usually much less expensive than acquiring a formal lock.
You can store your queue of work items in a pointer variable that is accessible to all the threads that will be interested in it. (global var, or field of an object that all the threads have access to)
This scenario assumes that the threads involved always have something to do, and only occasionally "glance" at the shared resource. If you want a design where threads block waiting for input, use a traditional blocking event object.
Before anything begins, create your queue or work item list object and assign it to the shared pointer variable.
Now, when producers want to push something onto the queue, they "acquire" exclusive access to the queue object by swapping a null into the shared pointer variable using InterlockedExchange. If the result of the swap returns a null, then somebody else is currently modifying the queue object. Sleep(0) to release the rest of your thread's time slice, then loop to retry the swap until it returns non-null. Even if you end up looping a few times, this is many. many times faster than making a kernel call to acquire a mutex object. Kernel calls require hundreds of clock cycles to transition into kernel mode.
When you successfully obtain the pointer, make your modifications to the queue, then swap the queue pointer back into the shared pointer.
When consuming items from the queue, you do the same thing: swap a null into the shared pointer and loop until you get a non-null result, operate on the object in the local var, then swap it back into the shared pointer var.
This technique is a combination of atomic swap and brief spin loops. It works well in scenarios where the threads involved are not blocked and collisions are rare. Most of the time the swap will give you exclusive access to the shared object on the first try, and as long as the length of time the queue object is held exclusively by any thread is very short then no thread should have to loop more than a few times before the queue object becomes available again.
If you expect a lot of contention between threads in your scenario, or you want a design where threads spend most of their time blocked waiting for work to arrive, you may be better served by a formal mutex synchronization object.

The fastest locking primitive is usually a spin-lock or spin-sleep-lock. CRITICAL_SECTION is just such a (user-space) spin-sleep-lock.
(Well, aside from not using locking primitives at all of course. But that means using lock-free data-structures, and those are really really hard to get right.)
As for avoiding the Sleep: have a look at condition-variables. They're designed to be used together with a "mutex", and I think they're much easier to use correctly than Windows' EVENTs.
Boost.Thread has a nice portable implementation of both, fast user-space spin-sleep-locks and condition variables:
http://www.boost.org/doc/libs/1_44_0/doc/html/thread/synchronization.html#thread.synchronization.condvar_ref
A work-queue using Boost.Thread could look something like this:
template <class T>
class Queue : private boost::noncopyable
{
public:
void Enqueue(T const& t)
{
unique_lock lock(m_mutex);
// wait until the queue is not full
while (m_backingStore.size() >= m_maxSize)
m_queueNotFullCondition.wait(lock); // releases the lock temporarily
m_backingStore.push_back(t);
m_queueNotEmptyCondition.notify_all(); // notify waiters that the queue is not empty
}
T DequeueOrBlock()
{
unique_lock lock(m_mutex);
// wait until the queue is not empty
while (m_backingStore.empty())
m_queueNotEmptyCondition.wait(lock); // releases the lock temporarily
T t = m_backingStore.front();
m_backingStore.pop_front();
m_queueNotFullCondition.notify_all(); // notify waiters that the queue is not full
return t;
}
private:
typedef boost::recursive_mutex mutex;
typedef boost::unique_lock<boost::recursive_mutex> unique_lock;
size_t const m_maxSize;
mutex mutable m_mutex;
boost::condition_variable_any m_queueNotEmptyCondition;
boost::condition_variable_any m_queueNotFullCondition;
std::deque<T> m_backingStore;
};

There are various ways to do this
For one you could create an event instead called 'run' and then use that to detect when thread should terminate, the main thread then signals. Instead of sleep you would then use WaitForSingleObject with a timeout, that way you will quit directly instead of waiting for sleep ms.
Another way is to accept messages in your loop and then invent a user defined message that you post to the thread
EDIT: depending on situation it may also be wise to have yet another thread that monitors this thread to check if it is dead or not, this can be done by the above mentioned message queue so replying to a certain message within x ms would mean that the thread hasn't locked up.

I'd restructure a bit:
WorkItem GetWorkItem()
{
while(true)
{
WaitForSingleObject(queue.Ready);
{
ScopeLock lock(queue.Lock);
if(!queue.IsEmpty())
{
return queue.GetItem();
}
}
}
}
int WorkerThread(param)
{
bool done = false;
do
{
WorkItem work = GetWorkItem();
if( work.IsQuitMessage() )
{
done = true;
}
else
{
work.Process();
}
} while(!done);
return 0;
}
Points of interest:
ScopeLock is a RAII class to make critical section usage safer.
Block on event until workitem is (possibly) ready - then lock while trying to dequeue it.
don't use a global "IsDone" flag, enqueue special quitmessage WorkItems.

You can have a look at another approach here that uses C++0x atomic operations
http://www.drdobbs.com/high-performance-computing/210604448

Use a semaphore instead of an event.

Keep the signaling and synchronizing separate. Something along these lines...
// in main thread
HANDLE events[2];
events[0] = CreateEvent(...); // for shutdown
events[1] = CreateEvent(...); // for work to do
// start thread and pass the events
// in worker thread
DWORD ret;
while (true)
{
ret = WaitForMultipleObjects(2, events, FALSE, <timeout val or INFINITE>);
if shutdown
return
else if do-work
enter crit sec
unqueue work
leave crit sec
etc.
else if timeout
do something else that has to be done
}

Given that this question is tagged windows, Ill answer thus:
Don't create 1 worker thread. Your worker thread jobs are presumably independent, so you can process multiple jobs at once? If so:
In your main thread call CreateIOCompletionPort to create an io completion port object.
Create a pool of worker threads. The number you need to create depends on how many jobs you might want to service in parallel. Some multiple of the number of CPU cores is a good start.
Each time a job comes in call PostQueuedCompletionStatus() passing a pointer to the job struct as the lpOverlapped struct.
Each worker thread calls GetQueuedCompletionItem() - retrieves the work item from the lpOverlapped pointer and does the job before returning to GetQueuedCompletionStatus.
This looks heavy, but io completion ports are implemented in kernel mode and represent a queue that can be deserialized into any of the worker threads associated with the queue (i.e. waiting on a call to GetQueuedCompletionStatus). The io completion port knows how many of the threads that are processing an item are actually using a CPU vs blocked on an IO call - and will release more worker threads from the pool to ensure that the concurrency count is met.
So, its not lightweight, but it is very very efficient... io completion port can be associated with pipe and socket handles for example and can dequeue the results of asynchronous operations on those handles. io completion port designs can scale to handling 10's of thousands of socket connects on a single server - but on the desktop side of the world make a very convenient way of scaling processing of jobs over the 2 or 4 cores now common in desktop PCs.

thread-safe function pointers in C++

I'm writing a network library that a user can pass a function pointer to for execution on certain network events. In order to keep the listening loop from holding up the developer's application, I pass the event handler to a thread. Unfortunately, this creates a bit of a headache for handling things in a thread-safe manner. For instance, if the developer passes a function that makes calls to their Windows::Forms application's elements, then an InvalidOperationException will be thrown.
Are there any good strategies for handling thread safety?

Function pointers can not be thread safe as they declare a point to call. So they are just pointers.
Your code always runs in the thread it was called from (via the function pointer).
What you want to achieve is that your code runs in a specific thread (maybe the UI thread).
For this you must use some kind of queue to synchronize the invocation into the MainThread.
This is exactly what .Net's BeginInvoke()/Invoke() on a Form do. The queue is in that case (somewhere deep inside the .NET framework) the windows message queue.
But you can use any other queue as long as the "correct" thread reads and executes the call requests from that queue.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js