Product/Consumer - what is the optimal signalling pattern

Product/Consumer - what is the optimal signalling pattern - c++

I am building a high performance app that needs two function to synchronise threads
void wake_thread(thread)
void sleep_thread(thread)
The app has a single thread (lets call it C) that may fall asleep with a call to sleep_thread. There are multiple threads that will call wake_thread. When wake_thread returns it MUST guarantee that C is either running or will be woken. wake_thread must NEVER block.
The easy way is of course to do use a synchronisation event like this:
hEvent = CreateEvent(NULL, FALSE, TRUE, NULL);
void wake_thread(thread) {
SetEvent(hEvent);
}
And:
void sleep_thread(thread)
{
WaitForSingleObject(hEvent);
}
This provides the desired semantics and is free of race conditions for the scenario (There is only one thread waiting, but multiple that can signal). I included it here to show what I am trying to tune.
HOWEVER, I am wondering there is a faster way under Windows for this very specific scenario. wake_thread may be called a lot, even when C is not sleeping. This causes a lot of calls to SetEvent that do nothing. Would there be a faster way to use manual reset event and reference counters to make sure SetEvent is only called when there is actually something to set.
Every CPU cycle counts in this scenario.

I haven't tested this (apart from making sure it compiles) but I think this should do the trick. It was, admittedly, a bit trickier than I at first thought. Note that there are some obvious optimizations you could make; I've left it in unoptimized form for clarity and to aid any debugging that may be necessary. I've also omitted error checking.
#include <intrin.h>
HANDLE hEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
__declspec(align(4)) volatile LONG thread_state = 2;
// 0 (00): sleeping
// 1 (01): sleeping, wake request pending
// 2 (10): awake, no additional wake request received
// 3 (11): awake, at least one additional wake request
void wake_thread(void)
{
LONG old_state;
old_state = _InterlockedOr(&thread_state, 1);
if (old_state == 0)
{
// This is the first wake request since the consumer thread
// went to sleep. Set the event.
SetEvent(hEvent);
return;
}
if (old_state == 1)
{
// The consumer thread is already in the process of being woken up.
// Any items added to the queue by this thread will be processed,
// so we don't need to do anything.
return;
}
if (old_state == 2)
{
// This is an additional wake request when the consumer thread
// is already awake. We've already changed the state accordingly,
// so we don't need to do anything else.
return;
}
if (old_state == 3)
{
// The consumer thread is already awake, and already has an
// additional wake request registered, so we don't need to do
// anything.
return;
}
BigTrouble();
}
void sleep_thread(void)
{
LONG old_state;
// Debugging only, remove this test in production code.
// The event should never be signaled at this point.
if (WaitForSingleObject(hEvent, 0) != WAIT_TIMEOUT)
{
BigTrouble();
}
old_state = _InterlockedAnd(&thread_state, 1);
if (old_state == 2)
{
// We've changed the state from "awake" to "asleep".
// Go to sleep.
WaitForSingleObject(hEvent, INFINITE);
// We've been buzzed; change the state to "awake"
// and then reset the event.
if (_InterlockedExchange(&thread_state, 2) != 1)
{
BigTrouble();
}
ResetEvent(hEvent);
return;
}
if (old_state == 3)
{
// We've changed the state from "awake with additional
// wake request" to "waking". Change it to "awake"
// and then carry on.
if (_InterlockedExchange(&thread_state, 2) != 1)
{
BigTrouble();
}
return;
}
BigTrouble();
}
Basically this uses a manual-reset event and a two-bit flag to reproduce the behaviour of an automatic-reset event. It may be clearer if you draw a state diagram. The thread safety depends on the rules about which of the functions is allowed to make which transitions, and also on when the event object is allowed to be signaled.
As an editorial: I think it is separating the synchronization code into the wake_thread and sleep_thread functions that makes things a bit awkward. It would probably be more natural, slightly more efficient, and almost certainly clearer if the synchronization code were moved into the queue implementation.

SetEvent() will introduce some latency as it does have to make a system call (sysenter triggers the switch from user to kernel mode) for the object manager to check the state of the event and dispatch it (via a call to KeSetEvent()). I think that the time of the system call might be considered to be acceptable even in your circumstances, but that is speculation. Where most of the latency is likely going to be introduced is on the receiving side of the event. In other words, it takes time to wake a thread from a WaitFor*Object() than it does to signal the event. The Windows scheduler tries to help getting to the thread sooner by giving a priority "boost" to a thread that is having a wait return, but that boost only does so much.
In order to get around this, you should be sure that you are only waiting when it is necessary to do so. The typical method to do this is, in your consumer, when you are signaled to go, consume every work item that you can without waiting on the event again, then when done make your call to sleep_thread()
I should point out that SetEvent()/WaitFor*Object() is almost surely faster than everything short of eating 100% CPU and even then it may be quicker as a result of the contention on whatever locking object needs to protect your shared data.
Normally, I would recommend the use of a ConditionVariable but I have not tested its performance compared to your technique. I have a suspicion that it may be slower since it also has the overhead of entering CRITICAL_SECTION object. You may have to measure the performance different -- when in doubt, measure, measure, measure.
The only other thing that I can think to say is that MS does acknowledge that dispatching and waiting on events can be slow, especially when it is performed repeatedly. In order to get around this, they changed the CRITICAL_SECTION object to try for a number of times in user mode to acquire the lock before actually waiting on the event. They call this the spin count. While I wouldn't recommend it, you may be able to do something similar.

Something like:
void consumer_thread(void)
{
while(1)
{
WaitForSingleObject(...);
// Consume all items from queue in a thread safe manner (e.g. critical section)
}
}
void produce()
{
bool queue_was_empty = ...; // in a thread safe manner determine if queue is empty
// thread safe insertion into queue ...
// These two steps should be done in a way that prevents the consumer
// from emptying the queue in between, e.g. a spin lock.
// This guarantees you will never miss the "edge"
if( queue_was_empty )
{
SetEvent(...);
}
}
The general idea is to only SetEvent on the transition from empty to full. If the threads have the same priority Windows should let the producer(s) keep running and therefore you can minimize your number of SetEvent calls per queue insertions. I've found this arrangement (between threads of equal priority) to give the best performance (at least under Windows XP and Win7, YMMV).

Related

Does a spurious wake up unblock all waiting threads, even the unrelated ones?

I'm still new to multi-threading in C++ and I'm currently trying to wrap my head around "spurious wake-ups" and what's causing them. I've done some digging on condition variables, kernel signals, futex, etc., and found several culprits on why and how "spurious wake-ups" occur, but there is still something that I can't find the answer to...
Question: Will a spurious wake-up unblock all waiting/blocked threads, even the ones waiting for a completely unrelated notification? Or are there separate waiting queues for the blocked threads and therefore the threads waiting for another notification are protected?
Example: Let's say that we have 249 Spartans waiting to attack the Persians. They wait() for their leader, Leonidas (the 250th) to notify_all() when to attack. Now, on the other side of the camp there are 49 injured Spartans who are waiting for the doctor (the 50th) to notify_one() so that he could treat each one. Would a spurious wake-up unblock all waiting Spartans, including the injured ones, or would it only affect the ones waiting for battle? Are there two separate queues for the waiting threads, or just one for all?
Apologies if the example is misleading... I didn't know how else to explain it.

Causes for spurious wakeups are specific to each operating system, and so are the properties of such wakeups. In Linux, for example, a wakeup happens when a signal is delivered to a blocked thread. After executing the signal handler the thread does not block again and instead receives a special error code (usually EINTR) from the system call that it was blocked on. Since signal handling does not involve other threads, they do not get woken up.
Note that spurious wakeup does not depend on the synchronization primitive you're blocking on or the number of threads blocked on that primitive. It may also happen with non-synchronization blocking system calls like read or write. In general, you have to assume that any blocking system call may return prematurely for whatever reason, unless it is guaranteed not to by a specification like POSIX (and even then, there may be bugs and OS specifics that deviate from the specification).
Some attribute superfluous notifications to spurious wakeups because dealing with both is usually the same. They are not the same, though. Unlike spurious wakeups, superfluous notifications are actually caused by another thread and are the result of a notify operation on the condition variable or futex. It's just the condition that you check upon the wakeup could turn to false before the unblocked thread manages to check it.

A spurious wakeup, in the context of a condition variable, is only from the waiters perspective. It means that the wait exited, but the condition is not true; thus the idiomatic use is:
Thing.lock()
while Thing.state != Play {
Thing.wait()
}
....
Thing.unlock()
Each iteration of this loop but one, would be considered spurious. Why they occur:
Many conditions are being multiplexed onto a single condition variable; sometimes this is appropriate, sometimes it is just lazy
A waiting thread beat your thread to the condition, and has changed its state before you get a chance to own it.
Unrelated events, such as kill(2) handling do this to ensure consistency after asynchronous handlers have run.
The most important thing is to verify that the desired condition has been met, and retry or abandon if not. It is a common error to not recheck the condition which can be very difficult to diagnose.
As a more serious example should illustrate:
int q_next(Q *q, int idx) {
/* return the q index succeeding this, with wrap */
if (idx + 1 == q->len) {
return 0
} else {
return idx + 1
}
}
void q_get(Q *q, Item *p) {
Lock(q)
while (q->head == q->tail) {
Wait(q)
}
*p = q->data[q->tail]
if (q_next(q, q->head) == q->tail) {
/* q was full, now has space */
Broadcast(q)
}
q->tail = q_next(q, q->tail)
Unlock(q)
}
void q_put(Q *q, Item *p) {
Lock(q)
while (q_next(q, q->head) == q->tail) {
Wait(q)
}
q->data[q->head] = *p
if (q->head == q->tail) {
/* q was empty, data available */
Broadcast(q)
}
q->head = q_next(q, q->head)
Unlock(q)
}
This is a multi-reader, multi-writer queue. Writers wait until there is space in the queue, put the item in, and if the queue was previously empty, broadcast to indicate there is now data.
Readers wait until there is something in the queue, take the item from the queue, and if the queue was previously full, broadcast to indicate there is now space.
Note the condition variable is being used for two conditions {not full, not empty}. These are edge-triggered conditions: only the transition from full and from empty are signaled.
Q_get and q_put protect themselves from spurious wakeups caused by both [1] and [2], and you can readily instrument the code to show how often this happens.

How to avoid race conditions in a condition variable in VxWorks

We're programming on a proprietary embedded platform sitting atop of VxWorks 5.5. In our toolbox, we have a condition variable, that is implemented using a VxWorks binary semaphore.
Now, POSIX provides a wait function that also takes a mutex. This will unlock the mutex (so that some other task might write to the data) and waits for the other task to signal (it is done writing the data). I believe this implements what's called a Monitor, ICBWT.
We need such a wait function, but implementing it is tricky. A simple approach would do this:
bool condition::wait_for(mutex& mutex) const {
unlocker ul(mutex); // relinquish mutex
return wait(event);
} // ul's dtor grabs mutex again
However, this sports a race condition because it allows another task to preempt this one after the unlocking and before the waiting. The other task can write to the date after it was unlocked and signal the condition before this task starts to wait for the semaphore. (We have tested this and this indeed happens and blocks the waiting task forever.)
Given that VxWorks 5.5 doesn't seem to provide an API to temporarily relinquish a semaphore while waiting for a signal, is there a way to implement this on top of the provided synchronization routines?
Note: This is a very old VxWorks version that has been compiled without POSIX support (by the vendor of the proprietary hardware, from what I understood).

This should be quite easy with native vxworks, a message queue is what is required here. Your wait_for method can be used as is.
bool condition::wait_for(mutex& mutex) const
{
unlocker ul(mutex); // relinquish mutex
return wait(event);
} // ul's dtor grabs mutex again
but the wait(event) code would look like this:
wait(event)
{
if (msgQRecv(event->q, sigMsgBuf, sigMsgSize, timeoutTime) == OK)
{
// got it...
}
else
{
// timeout, report error or something like that....
}
}
and your signal code would like something like this:
signal(event)
{
msgQSend(event->q, sigMsg, sigMsgSize, NO_WAIT, MSG_PRI_NORMAL);
}
So if the signal gets triggered before you start waiting, then msgQRecv will return immediately with the signal when it eventually gets invoked and you can then take the mutex again in the ul dtor as stated above.
The event->q is a MSG_Q_ID that is created at event creation time with a call to msgQCreate, and the data in sigMsg is defined by you... but can be just a random byte of data, or you can come up with a more intelligent structure with information regarding who signaled or something else that may be nice to know.
Update for multiple waiters, this is a little tricky: So there are a couple of assumptions I will make to simplify things
The number of tasks that will be pending is known at event creation time and is constant.
There will be one task that is always responsible for indicating when it is ok to unlock the mutex, all other tasks just want notification when the event is signaled/complete.
This approach uses a counting semaphore, similar to the above with just a little extra logic:
wait(event)
{
if (semTake(event->csm, timeoutTime) == OK)
{
// got it...
}
else
{
// timeout, report error or something like that....
}
}
and your signal code would like something like this:
signal(event)
{
for (int x = 0; x < event->numberOfWaiters; x++)
{
semGive(event->csm);
}
}
The creation of the event is something like this, remember in this example the number of waiters is constant and known at event creation time. You could make it dynamic, but the key is that every time the event is going to happen the numberOfWaiters must be correct before the unlocker unlocks the mutex.
createEvent(numberOfWaiters)
{
event->numberOfWaiters = numberOfWaiters;
event->csv = semCCreate(SEM_Q_FIFO, 0);
return event;
}
You cannot be wishy-washy about the numberOfWaiters :D I will say it again: The numberOfWaiters must be correct before the unlocker unlocks the mutex. To make it dynamic (if that is a requirement) you could add a setNumWaiters(numOfWaiters) function, and call that in the wait_for function before the unlocker unlocks the mutex, so long as it always sets the number correctly.
Now for the last trick, as stated above the assumption is that one task is responsible for unlocking the mutex, the rest just wait for the signal, which means that one and only one task will call the wait_for() function above, and the rest of the tasks just call the wait(event) function.
With this in mind the numberOfWaiters is computed as follows:
The number of tasks who will call wait()
plus 1 for the task that calls wait_for()
Of course you can also make this more complex if you really need to, but chances are this will work because normally 1 task triggers an event, but many tasks want to know it is complete, and that is what this provides.
But your basic flow is as follows:
init()
{
event->createEvent(3);
}
eventHandler()
{
locker l(mutex);
doEventProcessing();
signal(event);
}
taskA()
{
doOperationThatTriggersAnEvent();
wait_for(mutex);
eventComplete();
}
taskB()
{
doWhateverIWant();
// now I need to know if the event has occurred...
wait(event);
coolNowIKnowThatIsDone();
}
taskC()
{
taskCIsFun();
wait(event);
printf("event done!\n");
}
When I write the above I feel like all OO concepts are dead, but hopefully you get the idea, in reality wait and wait_for should take the same parameter, or no parameter but rather be members of the same class that also has all the data they need to know... but none the less that is the overview of how it works.

Race conditions can be avoided if each waiting task waits on a separate binary semaphore.
These semaphores must be registered in a container which the signaling task uses to unblock all waiting tasks. The container must be protected by a mutex.
The wait_for() method obtains a binary semaphore, waits on it and finally deletes it.
void condition::wait_for(mutex& mutex) {
SEM_ID sem = semBCreate(SEM_Q_PRIORITY, SEM_EMPTY);
{
lock l(listeners_mutex); // assure exclusive access to listeners container
listeners.push_back(sem);
} // l's dtor unlocks listeners_mutex again
unlocker ul(mutex); // relinquish mutex
semTake(sem, WAIT_FOREVER);
{
lock l(listeners_mutex);
// remove sem from listeners
// ...
semDelete(sem);
}
} // ul's dtor grabs mutex again
The signal() method iterates over all registered semaphores and unlocks them.
void condition::signal() {
lock l(listeners_mutex);
for_each (listeners.begin(), listeners.end(), /* call semGive()... */ )
}
This approach assures that wait_for() will never miss a signal. A disadvantage is the need of additional system resources.
To avoid creating and destroying semaphores for every wait_for() call, a pool could be used.

From the description, it looks like you may want to implement (or use) a semaphore - it's a standard CS algorithm with semantics similar to condvars, and there are tons of textbooks on how to implement them (https://www.google.com/search?q=semaphore+algorithm).
A random Google result which explains semaphores is at: http://www.cs.cornell.edu/courses/cs414/2007sp/lectures/08-bakery.ppt‎ (see slide 32).

How to go about multithreading with "priority"?

I have multiple threads processing multiple files in the background, while the program is idle.
To improve disk throughput, I use critical sections to ensure that no two threads ever use the same disk simultaneously.
The (pseudo-)code looks something like this:
void RunThread(HANDLE fileHandle)
{
// Acquire CRITICAL_SECTION for disk
CritSecLock diskLock(GetDiskLock(fileHandle));
for (...)
{
// Do some processing on file
}
}
Once the user requests a file to be processed, I need to stop all threads -- except the one which is processing the requested file. Once the file is processed, then I'd like to resume all the threads again.
Given the fact that SuspendThread is a bad idea, how do I go about stopping all threads except the one that is processing the relevant input?
What kind of threading objects/features would I need -- mutexes, semaphores, events, or something else? And how would I use them? (I'm hoping for compatibility with Windows XP.)

I recommend you go about it in a completely different fashion. If you really want only one thread for every disk (I'm not convinced this is a good idea) then you should create one thread per disk, and distribute files as you queue them for processing.
To implement priority requests for specific files I would then have a thread check a "priority slot" at several points during its normal processing (and of course in its main queue wait loop).

The difficulty here isn't priority as such, it's the fact that you want a thread to back out of a lock that it's holding, to let another thread take it. "Priority" relates to which of a set of runnable threads should be scheduled to run -- you want to make a thread runnable that isn't (because it's waiting on a lock held by another thread).
So, you want to implement (as you put it):
if (ThisThreadNeedsToSuspend()) { ReleaseDiskLock(); WaitForResume(); ReacquireDiskLock(); }
Since you're (wisely) using a scoped lock I would want to invert the logic:
while (file_is_not_finished) {
WaitUntilThisThreadCanContinue();
CritSecLock diskLock(blah);
process_part_of_the_file();
}
ReleasePriority();
...
void WaitUntilThisThreadCanContinue() {
MutexLock lock(thread_priority_mutex);
while (thread_with_priority != NOTHREAD and thread_with_priority != thisthread) {
condition_variable_wait(thread_priority_condvar);
}
}
void GiveAThreadThePriority(threadid) {
MutexLock lock(thread_priority_mutex);
thread_with_priority = threadid;
condition_variable_broadcast(thread_priority_condvar);
}
void ReleasePriority() {
MutexLock lock(thread_priority_mutex);
if (thread_with_priority == thisthread) {
thread_with_priority = NOTHREAD;
condition_variable_broadcast(thread_priority_condvar);
}
}
Read up on condition variables -- all recent OSes have them, with similar basic operations. They're also in Boost and in C++11.
If it's not possible for you to write a function process_part_of_the_file then you can't structure it this way. Instead you need a scoped lock that can release and regain the disklock. The easiest way to do that is to make it a mutex, then you can wait on a condvar using that same mutex. You can still use the mutex/condvar pair and the thread_with_priority object in much the same way.
You choose the size of "part of the file" according to how responsive you need the system to be to a change in priority. If you need it to be extremely responsive then the scheme doesn't really work -- this is co-operative multitasking.
I'm not entirely happy with this answer, the thread with priority can be starved for a long time if there are a lot of other threads that are already waiting on the same disk lock. I'd put in more thought to avoid that. Possibly there should not be a per-disk lock, rather the whole thing should be handled under the condition variable and its associated mutex. I hope this gets you started, though.

You may ask the threads to stop gracefully. Just check some variable in loop inside threads and continue or terminate work depending on its value.
Some thoughts about it:
The setting and checking of this value should be done inside critical section.
Because the critical section slows down the thread, the checking should be done often enough to quickly stop the thread when needed and rarely enough, such that thread won't be stalled by acquiring and releasing the critical section.

After each worker thread processes a file, check a condition variable associated with that thread. The condition variable could implemented simply as a bool + critical section. Or with InterlockedExchange* functions. And to be honest, I usually just use an unprotected bool between threads to signal "need to exit" - sometimes with an event handle if the worker thread could be sleeping.
After setting the condition variable for each thread, Main thread waits for each thread to exit via WaitForSingleObject.
DWORD __stdcall WorkerThread(void* pThreadData)
{
ThreadData* pData = (ThreadData*) pTheradData;
while (pData->GetNeedToExit() == false)
{
ProcessNextFile();
}
return 0;
}
void StopWokerThread(HANDLE hThread, ThreadData* pData)
{
pData->SetNeedToExit = true;
WaitForSingleObject(hThread);
CloseHandle(hThread);
}
struct ThreadData()
{
CRITICAL_SECITON _cs;
ThreadData()
{
InitializeCriticalSection(&_cs);
}
~ThreadData()
{
DeleteCriticalSection(&_cs);
}
ThreadData::SetNeedToExit()
{
EnterCriticalSection(&_cs);
_NeedToExit = true;
LeaveCriticalSeciton(&_cs);
}
bool ThreadData::GetNeedToExit()
{
bool returnvalue;
EnterCriticalSection(&_cs);
returnvalue = _NeedToExit = true;
LeaveCriticalSeciton(&_cs);
return returnvalue;
}
};

You can also use the pool of threads and regulate their work by using the I/O Completion port.
Normally threads from the pool would sleep awaiting for the I/O Completion port event/activity.
When you have a request the I/O Completion port releases the thread and it starts to do a job.

OK, how about this:
Two threads per disk, for high and low priority requests, each with its own input queue.
A high-priority disk task, when initially submitted, will then issue its disk requests in parallel with any low-priority task that is running. It can reset a ManualResetEvent that the low-priority thread waits on when it can, (WaitForSingleObject) and so will get blocked if the high-prioriy thread is perfoming disk ops. The high-priority thread should set the event after finishing a task.
This should limit the disk-thrashing to the interval, (if any), between the submission of the high-priority task and whenver the low-priority thread can wait on the MRE. Raising the CPU priority of the thread servicing the high-priority queue may assist in improving performance of the high-priority work in this interval.
Edit: by 'queue', I mean a thread-safe, blocking, producer-consumer queue, (just to be clear:).
More edit - if the issuing threads needs notification of job completion, the tasks issued to the queues could contain an 'OnCompletion' event to call with the task object as a parameter. The event handler could, for example, signal an AutoResetEvent that the originating thread is waiting on, so providing synchronous notification.

What shall I do while waiting in a thread

I have a main program which creates a collection of N child threads to perform some calculations. Each child is going to be fully occupied on their tasks from the moment their threads are created till the moment they have finished. The main program will also create a special (N+1)th thread which has some intermittent tasks to perform. When certain conditions are met (like a global variable takes on a certain value) the special thread will perform a calculation and then go back to waiting for those conditions to be met again. It is vital that when the N+1th thread has nothing to do, it should not slow down the other processors.
Can someone suggest how to achieve this.
EDIT:
The obvious but clumsy way would be like this:
// inside one of the standard worker child threads...
if (time_for_one_of_those_intermittent_calculations_to_be_done())
{
global_flag_set = TRUE;
}
and
// inside the special (N+1)th thread
for(;;)
{
if (global_flag_set == TRUE)
{
perform_big_calculation();
global_flag_set = FALSE;
}
// sleep for a while?
}

You should check out the WaitForSingleObject and WaitForMultipleObjects functions in the Windows API.
WaitForMultipleObjects

A ready-to-use condition class for WIN32 ;)
class Condition {
private:
HANDLE m_condition;
Condition( const Condition& ) {} // non-copyable
public:
Condition() {
m_condition = CreateEvent( NULL, TRUE, FALSE, NULL );
}
~Condition() {
CloseHandle( m_condition );
}
void Wait() {
WaitForSingleObject( m_condition, INFINITE );
ResetEvent( m_condition );
}
bool Wait( uint32 ms ) {
DWORD result = WaitForSingleObject( m_condition, (DWORD)ms );
ResetEvent( m_condition );
return result == WAIT_OBJECT_0;
}
void Signal() {
SetEvent( m_condition );
}
};
Usage:
// inside one of the standard worker child threads...
if( time_for_one_of_those_intermittent_calculations_to_be_done() ) {
global_flag_set = TRUE;
condition.Signal();
}
// inside the special (N+1)th thread
for(;;) {
if( global_flag_set==FALSE ) {
condition.Wait(); // sends thread to sleep, until signalled
}
if (global_flag_set == TRUE) {
perform_big_calculation();
global_flag_set = FALSE;
}
}
NOTE: you have to add a lock (e.g. a critical section) around global_flag_set. And also in most cases the flag should be replaced with a queue or at least a counter (a thread could signal multiple times while 'special' thread is performing its calculations).

Yes. Use condition variables. If you sleep on a condition variable, the thread will be removed from the runqueue until the condition variable has been signaled.

You should use Windows synchronization events for this, so your thread is doing nothing while waiting. See MSDN for more info; I'd start with CreateEvent(), and then go to the rest of the Event-related functions here for OpenEvent(), PulseEvent(), SetEvent() and ResetEvent().
And, of course, WaitForSingleObject() or WaitForMultipleObjects(), as pointed out by mrduclaw in the comment below.

Lacking the more preferred options already given, I generally just yield the CPU in a loop until the desired condition is met.

Basically, you have two possibilities for your N+1th thread.
If its work is rare, the best thing to do is simply to ask it to sleep, and wake it up on demand. Rare context switches are insignificants.
If it has to work often, then you may need to spinlock it, that is, a busy waiting state that prevent it from being rescheduled, or switched.

Each global variable should have an accompanying event for your N+1 thread. Whenever you change the status of the global variable, set the event to the signaled state. It is better to hide these variables inside a singleton-class private properties and expose functions to get and set the values. The function that sets the value will do the comparison and will set the events if needed. So, your N+1 thread will just to the loop of WaitForMultipleObjects with infinite timeout. Another global variable should be used to signal that the application as a whole exits, so the threads will be able to exit. You may only exit your application after your last thread has finished. So, if you need to prematurely exit, you have to notify all your threads that they have to exit. Those threads that are permanently running, can be notified by just reading a variable periodically. Those that are waiting, like the N+1 thread, should be notified by an event.
People have suggested to use CreateEvent (to create auto-reset events), SetEvent and WaitForMultipleObjects. I agree with them.
Other people have suggested, in addition to the above functions, to use ResetEvent and PulseEvent. I do not agree with them. You don’t need ResetEvent with auto-reset events. This is the function supposed to be used with manual-reset events, but the application of the manual-reset events is very limited, you will see below.
To create an auto-reset event, call the CreateEvent Win32 API function with the bManualReset parameter set to FALSE (if it is TRUE, the function creates a manual-reset event object, which requires the use of the ResetEvent function to set the event state to non-signaled – this is not what you need). If this parameter is FALSE, the function creates an auto-reset event object, and system automatically resets the event state to non-signaled after a single waiting thread has been released, i.e. has exited from a function like WaitForMultipleObjects or WaitForSigleObject – but, as I wrote before, only one thread will be notified, not all, so you need one event for each of the threads that are waiting. Since you are going to have just one thread that will be waiting, you will need just one event.
As about the PulseEvent – it is unreliable and should never be used -- see https://msdn.microsoft.com/en-us/library/windows/desktop/ms684914(v=vs.85).aspx
Only those threads are notified by PulseEvent that are in the "wait" state at the moment PulseEvent is called. If they are in any other state, they will not be notified, and you may never know for sure what the thread state is. A thread waiting on a synchronization object can be momentarily removed from the wait state by a kernel-mode Asynchronous Procedure Call, and then returned to the wait state after the APC is complete. If the call to PulseEvent occurs during the time when the thread has been removed from the wait state, the thread will not be released because PulseEvent releases only those threads that are waiting at the moment it is called. You can find out more about the kernel-mode Asynchronous Procedure Calls (APC) at the following links:
- https://msdn.microsoft.com/en-us/library/windows/desktop/ms681951(v=vs.85).aspx
- http://www.drdobbs.com/inside-nts-asynchronous-procedure-call/184416590
- http://www.osronline.com/article.cfm?id=75
You can get more ideas about auto-reset events and manual reset events from the following article:
- https://www.codeproject.com/Articles/39040/Auto-and-Manual-Reset-Events-Revisited
As about the the Manual-Reset events, they too can be used under certain conditions and in certain cases. You can reliably use them when you need to notify multiple instances of a global state change that occurs only once, for example application exit.
You just have one waiting thread, but maybe in future you will have more waiting threads, so this information will be useful.
Auto-reset events can only be used to notify one thread (if more threads are waiting simultaneously for an auto-reset event and you set the event, just one thread will exit and resets it, and the behavior of other threads will be undefined). From the Microsoft documentation, we may assume that only one thread will exit while others would not, this is not very clear. However, we must take the following quote into consideration: “Do not assume a first-in, first-out (FIFO) order. External events such as kernel-mode APCs can change the wait order” Source - https://msdn.microsoft.com/en-us/library/windows/desktop/ms682655(v=vs.85).aspx
So, when you need to very quickly notify all the threads – just set the manual-reset event to the signaled state (by calling the SetEvent), rather than signaling each auto-reset event for each thread. Once you have signaled the manual-reset event, do not call ResetEvent since then. The drawback of this solution is that the threads need to have an additional event handle passed in the array of their WaitForMultipleObjects. The array size is limited, although to MAXIMUM_WAIT_OBJECTS which is 64, and in practice we did never reach close to this limit.
At the first glance, Microsoft documentation may seem to be full of jargon, but over time you will find it very easy and friendly. Anyway, correct multi-threaded work is not an easy topic, so you have to tolerate a certain amount of jargon 😉

Wanted: Elegant solution to race condition

I have the following code:
class TimeOutException
{};
template <typename T>
class MultiThreadedBuffer
{
public:
MultiThreadedBuffer()
{
InitializeCriticalSection(&m_csBuffer);
m_evtDataAvail = CreateEvent(NULL, TRUE, FALSE, NULL);
}
~MultiThreadedBuffer()
{
CloseHandle(m_evtDataAvail);
DeleteCriticalSection(&m_csBuffer);
}
void LockBuffer()
{
EnterCriticalSection(&m_csBuffer);
}
void UnlockBuffer()
{
LeaveCriticalSection(&m_csBuffer);
}
void Add(T val)
{
LockBuffer();
m_buffer.push_back(val);
SetEvent(m_evtDataAvail);
UnlockBuffer();
}
T Get(DWORD timeout)
{
T val;
if (WaitForSingleObject(m_evtDataAvail, timeout) == WAIT_OBJECT_0) {
LockBuffer();
if (!m_buffer.empty()) {
val = m_buffer.front();
m_buffer.pop_front();
}
if (m_buffer.empty()) {
ResetEvent(m_evtDataAvail);
}
UnlockBuffer();
} else {
throw TimeOutException();
}
return val;
}
bool IsDataAvail()
{
return (WaitForSingleObject(m_evtDataAvail, 0) == WAIT_OBJECT_0);
}
std::list<T> m_buffer;
CRITICAL_SECTION m_csBuffer;
HANDLE m_evtDataAvail;
};
Unit testing shows that this code works fine when used on a single thread as long as T's default constructor and copy/assignment operators don't throw. Since I'm writting T, that is acceptable.
My problem is the Get method. If there is no data available (i.e. m_evtDataAvail is not set), then a couple of threads can block on the WaitForSingleObject call. When new data becomes available, they all fall through to the Lock() call. Only one will pass and can get the data out and move on. After the Unlock() another thread can move on through and will find that there is no data. Currently it will return the default object.
What I want to happen is for that second thread (and others) to go back to the WaitForSingleObject call. I could add an else block that unlocked and did a goto but that just feels evil.
That solution also adds the possibility for an endless loop since each trip back would restart the timeout. I could add some code to check the clock on entry and adjust the timeout on each trip back but then this simple Get method starts to get very complicated.
Any ideas on how to solve these problems while maintaining testability and simplicity?
Oh, for anyone wondering, the IsDataAvail function only exists for testing. It won't be used in production code. The Add and Get are the only methods that will be used in a non-testing environment.

You need to create a auto-reset event instead of a manual reset event. This guarantees that if multiple threads are waiting on an event, and when the event is set only one thread will be released. All other threads will remain in waiting state. You can create auto-reset event by passing FALSE to the second parameter of CreateEvent API. Also, note that this code is not exception safe i.e. after locking the buffer, if some statement throws an exception your critical section will not be unlocked. Use RAII principle to ensure that your critical section gets unlocked even in the case of exceptions.

You could use a Semaphore object instead of a generic Event object. The semaphore count should be initialized to 0 and incremented by 1 with ReleaseSemaphore each time Add is called. That way the WaitForSingleObject in Get will never release more threads to read from the buffer than there are values in the buffer.

You will always have to code for the case the event is signaled but there is not data, even WITH auto-reset events. There is a race condition from the moment WaitForsingleevent wakes until the LockBuffer is called, and in that interval another thread can pop the data from the buffer. Your code must place WaitForSingleEvent in a loop. Decrease the timeout with the time already spent in each loop iteration...
As an alternative, may I interest you in more scalable and performant alternatives? Interlocked Singly Linked Lists, OS thread pool QueueUserWorkItem and idempotent processing. Add pushes an entry into the list and submits a work item. The work item pops an entry and if not NULL, process it. You can go fancy and have extra logic for the processor to loop and keep a state marking its 'active' presence so that the Add does not quuee unnecessary work items, but that is not strictly required. For even higher sclae and multi-core/multi-cpu load spread I recommend using queued completion ports. The details are described in Rick Vicik's articles, I have a blog entry that links all 3 at once: High Performance Windows programs.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js