Trying to minimize checks of atomics on every iteration - c++

From a multithreading perspective, is the following correct or incorrect?
I have an app which has 2 threads: the main thread, and a worker thread.
The main thread has a MainUpdate() function that gets called in a continuous loop. As part of its job, that MainUpdate() function might call a ToggleActive() method on the worker objects running on the worker thread. That ToggleActive() method is used to turn the worker objects on/off.
The flow is something like this.
// MainThread
while(true) {
MainUpdate(...);
}
void MainUpdate(...) {
for(auto& obj: objectsInWorkerThread) {
if (foo())
obj.ToggleActive(getBool());
}
}
// Worker thread example worker ------------------------------
struct SomeWorkerObject {
void Execute(...) {
if(mIsActive == false) // %%%%%%% THIS!
return;
Update(...);
}
void ToggleActive(bool active) {
mIsActiveAtom = active; // %%%%%%% THIS!
mIsActive = mIsActiveAtom; // %%%%%%% THIS!
}
private:
void Update(...) {...}
std::atomic_bool mIsActiveAtom = true;
volatile bool mIsActive = true;
};
I'm trying to avoid checking the atomic field on every invocation of Execute(), which gets called on every iteration of the worker thread. There are many worker objects running at any one time, and thus there would be many atomic fields checks.
As you can see, I'm using the non-atomic field to check for activeness. The value of the non-atomic field gets its value from the atomic field in ToggleActive().
From my tests, this seems to be working, but I have a feeling that it is incorrect.

volatile variable only guarantees that it is not optimized out and reorder by compiler and has nothing to do with multi-thread execution. Therefore, your program does have race condition since ToggleActive and Execute can modify/read mIsActive at the same time.
About performance, you can check if your platform support for lock-free atomic bool. If that is the case, checking atomic value can be very fast. I remember seeing a benchmark somewhere that show std::atomic<bool> has the same speed as volatile bool.

#hgminh is right, your code is not safe.
Synchronization is two way road — if you have a thread perform thread-safe write, another thread must perform thread-safe read. If you have a thread use a lock, another thread must use the same lock.
Think about inter-thread communication as message passing (incidentally, it works exactly that way in modern CPUs). If both sides don't share a messaging channel (mIsActiveAtom), the message might not be delivered properly.

Related

Worker Thread permanently hibernates, after executing too fast

I am trying to incorporate threads into my project, but have a problem where using merely 1 worker thread makes it "fall asleep" permanently. Perhaps I have a race condition, but just can't notice it.
My PeriodicThreads object maintains a collection of threads. Once PeriodicThreads::exec_threads() has been invoked, the threads are notified, are awaken and preform their task. Afterwards, they fall back to sleep.
Function of such a worker-thread:
void PeriodicThreads::threadWork(size_t threadId){
//not really used, but need to decalre to use conditional_variable:
std::mutex mutex;
std::unique_lock<std::mutex> lck(mutex);
while (true){
// wait until told to start working on a task:
while (_thread_shouldWork[threadId] == false){
_threads_startSignal.wait(lck);
}
thread_iteration(threadId); //virtual function
_thread_shouldWork[threadId] = false; //vector of flags
_thread_doneSignal.notify_all();
}//end while(true) - run until terminated externally or this whole obj is deleted
}
As you can see, each thread is monitoring its own entry in a vector of flags, and once it sees that it's flag is true - performs the task then resets its flag.
Here is the function that can awaken all the threads:
std::atomic_bool _threadsWorking =false;
//blocks the current thread until all worker threads have completed:
void PeriodicThreads::exec_threads(){
if(_threadsWorking ){
throw std::runtime_error("you requested exec_threads(), but threads haven't yet finished executing the previous task!");
}
_threadsWorking = true;//NOTICE: doing this after the exception check.
//tell all threads to unpause by setting their flags to 'true'
std::fill(_thread_shouldWork.begin(), _thread_shouldWork.end(), true);
_threads_startSignal.notify_all();
//wait for threads to complete:
std::mutex mutex;
std::unique_lock<std::mutex> lck(mutex); //lock & mutex are not really used.
auto isContinueWaiting = [&]()->bool{
bool threadsWorking = false;
for (size_t i=0; i<_thread_shouldWork.size(); ++i){
threadsWorking |= _thread_shouldWork[i];
}
return threadsWorking;
};
while (isContinueWaiting()){
_thread_doneSignal.wait(lck);
}
_threadsWorking = false;//set atomic to false
}
Invoking exec_threads() works fine for several hundred or in rare cases several thousand consecutive iterations. Invocations occur from the main thread's while loop. Its worker thread processes the task, resets its flag and goes back to sleep until the next exec_threads(), and so on.
However, some time after that, the program snaps into a "hibernation", and seems to pause, but doesn't crash.
During such a "hibernation" putting a breakpoint at any while-loop of my condition_variables never actualy causes that breakpoint to trigger.
Being sneaky, I've created my own verify-thread (parallel to main) and monitor my PeriodicThreads object. As it falls into hibernation, my verify-thread keeps outputting to the console me that no threads are currently running (the _threadsWorking atomic of PeriodicThreads is permanently set to false). However, during the other tests the atomic remains as true, once that "hibernation issue" begins.
The strange thing is that if I force the PeriodicThreads::run_thread to sleep for at least 10 microseconds before resetting its flag, things work as normal, and no "hibernation" occurs. Otherwise, if we allow thread to complete it's task very quickly it might cause this whole issue.
I've wrapped each condition_variable inside a while loop to prevent spurious wakes from triggering transition, and situation where notify_all is called before .wait() is called on it. Link
Notice, this occurs even when I have only 1 worker thread
What could be the cause?
Edit
Abandoning these vector flags and just testing on a single atomic_bool with 1 worker thread still shows the same issue.
All shared data should be protected by a mutex. The mutex should have (at least) the same scope as the shared data.
Your _thread_shouldWork container is shared data. You can make a global array of mutexes and each one can protect its own _thread_shouldWork element. (see note below). You should also have at least as many condition variables as you have mutexes. (You can use 1 mutex with several different condition variables, but you should not use several different mutexes with 1 condition variable.)
A condition_variable should protect an actual condition (in this case, the state of an individual element of _thread_shouldWork at any given point) and the mutex is used to protect the variables that encompass that condition.
If you're just using a random local mutex (as you are in your thread code) or just not using a mutex at all (in the main code), then all bets are off. It's undefined behavior. Although I could see it working (by luck) most of the time. What I suspect is happening is that a worker thread is missing the signal from the main thread. It could also be that your main thread is missing the signal from a worker thread. (Thread A reads the state and enters the while loop, then Thread B changes the state and sends the notification, then Thread A goes to sleep... waiting for a notification that was already sent)
Mutexes with local scope are a red flag!
Note: If you're using a vector, you have to watch out because adding or removing items can trigger a resize which will touch elements without grabbing the mutex first (because of course the vector doesn't know about your mutex).
You also have to watch out for false sharing when using arrays
Edit: Here's a video that #Kari found useful for explaining false sharing
https://www.youtube.com/watch?v=dznxqe1Uk3E

How do I make a thread wait without polling?

I have question about multi threading in c++. I have a scenario as follows
void ThreadedRead(int32_t thread_num, BinReader reader) {
while (!reader.endOfData) {
thread_buckets[thread_num].clear();
thread_buckets[thread_num] = reader.readnextbatch()
thread_flags[thread_num] = THREAD_WAITING;
while (thread_flags[thread_num] != THREAD_RUNNING) {
// wait until awakened
if (thread_flags[thread_num] != THREAD_RUNNING) {
//go back to sleep
}
}
}
thread_flags[thread_num] = THREAD_FINISHED;
}
No section of the above code writes or access memory shared between threads. Each thread is assigned a thread_num and a unique reader object that it may use to read data.
I want the main thread to be able to notify a thread that is in the THREAD_WAITING state that his state has been changed back to THREAD_RUNNING and he needs to do some work. I don't want to him to keep polling his state.
I understand conditional vars and mutexes can help me. But I'm not sure how to use them because I don't want to acquire or need a lock. How can the mainthread blanket notify all waiting threads that they are now free to read more data?
EDIT:
Just in case anyone needs more details
1) reader reads some files
2) thread_buckets is a vector of vectors of uint16
3) threadflags is a int vector
they have all been resized appropriately
I realize that you wrote that you wanted to avoid condition variables and locks. On the other hand you mentioned that this was because you were not sure about how to use them. Please consider the following example to get the job done without polling:
The trick with the condition variables is that a single condition_variable object together with a single mutex object will do the management for you including the handling of the unique_lock objects in the worker threads. Since you tagged your question as C++ I assume you are talking about C++11 (or higher) multithreading (I guess that C-pthreads may work similarly). Your code could be as follows:
// compile for C++11 or higher
#include <thread>
#include <condition_variable>
#include <mutex>
// objects visible to both master and workers:
std::condition_variable cvr;
std::mutex mtx;
void ThreadedRead(int32_t thread_num, BinReader reader) {
while (!reader.endOfData) {
thread_buckets[thread_num].clear();
thread_buckets[thread_num] = reader.readnextbatch()
std::unique_lock<std::mutex> myLock(mtx);
// This lock will be managed by the condition variable!
thread_flags[thread_num] = THREAD_WAITING;
while (thread_flags[thread_num] == THREAD_WAITING) {
cvr.wait(myLock);
// ...must be in a loop as shown because of potential spurious wake-ups
}
}
thread_flags[thread_num] = THREAD_FINISHED;
}
To (re-)activate the workers from a master thread:
{ // block...
// step 1: usually make sure that there is no worker still preparing itself at the moment
std::unique_lock<std::mutex> someLock(mtx);
// (in your case this would not cover workers currently busy with reader.readnextbatch(),
// these would be not re-started this time...)
// step 2: set all worker threads that should work now to THREAD_RUNNING
for (...looping over the worker's flags...) {
if (...corresponding worker should run now...) {
flag = THREAD_RUNNING;
}
}
// step 3: signalize the workers to run now
cvr.notify_all();
} // ...block, releasing someLock
Notice:
If you just want to trigger all sleeping workers you should control them with a single flag instead of a container of flags.
If you want to trigger single sleeping workers but it doesn't matter which one consider the .notify_one() member function instead of .notify_all(). Note as well that also in this case a single mutex/condition_variable pair is sufficient.
The flags should better be placed in an atomic object such as a global std::atomic<int> or maybe for finer control in a std::vector<std::atomic<int>>.
A good introduction to std::condition_variable which also inspired the suggested solution is given in: cplusplus website
It looks like there are a few issues. For one thing, you do not need the conditional inside of your loop:
while (thread_flags[thread_num] != THREAD_RUNNING);
will work by itself. As soon as that condition is false, the loop will exit.
If all you want to do is avoid checking thread_flags as quickly as possible, just put a yield in the loop:
while (thread_flags[thread_num] != THREAD_RUNNING) yield(100);
This will cause the thread to yield the CPU so that it can do other things while the thread waits for its state to change. This will make make the overhead for polling close to negligible. You can experiment with the sleep duration to find a good value. 100ms is probably on the long side.
Depending on what causes the thread state to change, you could have the thread poll that condition/value directly (with a sleep in still) and not bother with states at all.
There are a lot of options here. If you look up reader threads you can probably find just what you want; having a separate reader thread is very common.

How does mutex condition signaling loop works?

I will make a hypothetical scenario just to be clear about what I need to know.
Let's say I have a single file being updated very often.
I need to read and parse this file by several different threads.
Everytime this file is rewritten, I'm gonna wake a condition mutex so the other threads can do whatever they want to.
My question is:
If I have 10000 threads, the first thread execution will block the execution of the other 9999 ones?
Does it work in parallel or synchronously?
This post has been edited since first posted to address comments below by Jonathan Wakely, and to better distinguish between a condition_variable, a condition (which were both called condition in the first version), and how the wait function operates. Just as important, however, is an exploration of better methods from modern C++, using std::future, std::thread and std::packaged_task, with some discussion regarding buffering and reasonable thread count.
First, 10,000 threads is a lot of threads. The thread scheduler will be highly burdened on all but the very highest performance of computers. Typical quad core workstations under Windows would struggle. It's a sign that some kind of queued scheduling of tasks is in order, typical of servers accepting thousands of connections using perhaps 10 threads, each servicing 1,000 connects. The number of threads is really not important to the question, but that in such a volume of tasks 10,000 threads is impracticable.
To handle synchronization, the mutex doesn't actually do what you're proposing, by itself. The concept you're describing is a type of event object, perhaps an auto reset event, which by itself is a higher level concept. Windows has them as part of its API, but they are fashioned on Linux (and for portable software, usually) with two primitive components, a mutex and a condition variable. Together these create the auto reset event, and other types of "waitable events" as Windows calls them. In C++ these are provided by std::mutex and std::condition_variable.
Mutexes by themselves merely provide locked control over a common resource. In that scenario we are not thinking in terms of clients and a server (or workers and an executive), but we're thinking in terms of competition among peers for a single resource which can only be accessed by one actor (thread) at a time. A mutex can block execution, but it does not release based on an external signal. Mutexes block if another thread has locked the mutex, and wait indefinitely until the owner of the lock releases it. This isn't the scenario you present in the question.
In your scenario, there are many "clients" and one "server" thread. The server is in charge of signalling that something is ready to be processed. All other threads are clients in this design (nothing about the thread itself makes them clients, we merely deem them so by the function they execute). In some discussions, clients are called worker threads.
The clients use a mutex/condition variable pair to wait for a signal. This construct usually takes the form of locking a mutex, then waiting on the condition variable using that mutex. When a thread enters wait on the condition variable, the mutex is unlocked. This is repeated for all client threads who wait for work to be done. A typical client wait example is:
std::mutex m;
std::condition_variable cv;
void client_thread()
{
// Wait until server signals data is ready
std::unique_lock<std::mutex> lk(m); // lock the mutex
cv.wait(lk); // wait on cv
// do the work
}
This is pseudo code showing the mutex/conditional variable used together. std::condition_variable has two overloads of the wait function, this is the simplest one. The intent is that a thread will block, entering into an idle state until the condition_variable is signalled. It is not intended as a complete example, merely to point out these two objects are used together.
Johnathan Wakely's comments below are based on the fact that wait is not indefinite; there is no guarantee that the reason the call is unblocked is because of a signal. The documentation calls this a "spurious wakeup", which occasionally occurs for complex reasons of OS scheduling. The point which Johnathan makes is that code using this pair must be safe to operate even if the wakeup is not because the condition_variable was signalled.
In the parlance of using condition variables, this is known as a condition (not the condition_variable). The condition is an application defined concept, usually illustrated as a boolean in the literature, and often the result of checking a bool, an integer (sometimes of atomic type) or calling a function returning a bool. Sometimes application defined notions of what constitutes a true condition are more complex, but the overall effect of the condition is to determine whether or not the thread, once awakened, should continue to process, or should simply repeat the wait.
One way to satisfy this requirement is the second version of std::condition_variable::wait. The two are declared:
void wait( std::unique_lock<std::mutex>& lock );
template< class Predicate >
void wait( std::unique_lock<std::mutex>& lock, Predicate pred );
Johnathan's point is to insist the second version be used. However, documentation describes (and the fact there are two overloads indicates) that the Predicate is optional. The Predicate is a functor of some kind, often a lambda expression, resolving to true if the wait should unblock, false if the wait should continue waiting, and it is evaluated under lock. The Predicate is synonymous with condition in that the Predicate is one way in which to indicate true or false regarding whether wait should unblock.
Although the Predicate is, in fact, optional, the notion that 'wait' is not perfect in blocking until a signal is received requires that if the first version is used, it is because the application is constructed such that spurious wakes have no consequence (indeed, are part of the design).
Jonathan's citation shows that the Predicate is evaluated under lock, but in generalized forms of the paradigm that's frequently not practicable. std::condition_variable must wait on a locked std::mutex, which may be protecting a variable defining the condition, but sometimes that's not possible. Sometimes the condition is more complex, external, or trivial enough that the std::mutex isn't associated with the condition.
To see how that works in the context of the proposed solution, assume there are 10 client threads waiting for a server to signal that work is to be done, and that work is scheduled in a queue as a container of virtual functors. A virtual functor might be something like:
struct VFunc
{
virtual void operator()(){}
};
template <typename T>
struct VFunctor
{
// Something referring to T, possible std::function
virtual void operator()(){...call the std::function...}
};
typedef std::deque< VFunc > Queue;
The pseudo code above suggests a typical functor with a virtual operator(), returning void and taking no parameters, sometimes known as a "blind call". The key point in suggesting it is the fact Queue can own a collection of these without knowing what is being called, and whatever VFunctors are in Queue could refer to anything std::function might be able to call, which includes member functions of other objects, lambdas, simple functions, etc. If, however, there is only one function signature to be called, perhaps:
typedef std::deque< std::function<void(void)>> Queue
Is sufficient.
For either case, work is to be done only if there are entries in Queue.
To wait, one might use a class like:
class AutoResetEvent
{
private:
std::mutex m;
std::condition_variable cv;
bool signalled;
bool signalled_all;
unsigned int wcount;
public:
AutoResetEvent() : wcount( 0 ), signalled(false), signalled_all(false) {}
void SignalAll() { std::unique_lock<std::mutex> l(m);
signalled = true;
signalled_all = true;
cv.notify_all();
}
void SignalOne() { std::unique_lock<std::mutex> l(m);
signalled = true;
cv.notify_one();
}
void Wait() { std::unique_lock<std::mutex> l(m);
++wcount;
while( !signalled )
{
cv.wait(l);
}
--wcount;
if ( signalled_all )
{ if ( wcount == 0 )
{ signalled = false;
signalled_all = false;
}
}
else { signalled = false;
}
}
};
This is pseudo code of a standard reset event type of waitable object, compatible with Windows CreateEvent and WaitForSingleObject API, functioning the basic same way.
All client threads end up at cv.wait (this can have a timeout in Windows, using the Windows API, but not with std::condition_variable). At some point, the server signals the event with a call to Signalxxx. Your scenario suggests SignalAll().
If notify_one is called, one of the waiting threads is released, and all others remain asleep. Of notify_all is called, then all threads waiting on that condition are released to do work.
The following might be an example of using AutoResetEvent:
AutoResetEvent evt; // probably not a global
void client()
{
while( !Shutdown ) // assuming some bool to indicate shutdown
{
if ( IsWorkPending() ) DoWork();
evt.Wait();
}
}
void server()
{
// gather data
evt.SignalAll();
}
The use of IsWorkPending() satisfies the notion of a condition, as Jonathan Wakely indicates. Until a shutdown is indidated, this loop will process work if it's pending, and wait for a signal otherwise. Spurious wakeups have no negative effect. IsWorkPending() would check Queue.size(), possibly through an object which protects Queue with a std::mutex or some other synchronization mechanism. If work is pending, DoWork() would sequentially pop entries out of Queue until Queue is empty. Upon return, the loop would again wait for a signal.
With all of that discussed, the combination of mutex and condition_variable is related to an old style of thinking, now outdated in the era of C++11/C++14. Unless you have trouble using a compliant compiler, it would be better to investigate the use of std::promise, std::future and either std::async or std::thread with std::packaged_task. For example, using future, promise, packaged_task and thread could entirely replace the discussion above.
For example:
// a function for threads to execute
int func()
{
// do some work, return status as result
return result;
}
Assuming func does the work you require on the files, these typedefs apply:
typedef std::packaged_task< int() > func_task;
typedef std::future< int > f_int;
typedef std::shared_ptr< f_int > f_int_ptr;
typedef std::vector< f_int_ptr > f_int_vec;
std::future can't be copied, so it's stored using a shared_ptr for ease of use in a vector, but there are various solutions.
Next, an example of using these for 10 threads of work
void executive_function()
{
// a vector of future pointers
f_int_vec future_list;
// start some threads
for( int n=0; n < 10; ++n )
{
// a packaged_task calling func
func_task ft( &func );
// get a future from the task as a shared_ptr
f_int_ptr future_ptr( new f_int( ft.get_future() ) );
// store the task for later use
future_list.push_back( future_ptr );
// launch a thread to call task
std::thread( std::move( ft )).detach();
}
// at this point, 10 threads are running
for( auto &d : future_list )
{
// for each future pointer, wait (block if required)
// for each thread's func to return
d->wait();
// get the result of the func return value
int res = d->get();
}
}
The point here is really in the last range-for loop. The vector stores futures, which the packaged_tasks provided. Those tasks are used to launch threads, and the future is key to synchronizing the executive. Once all threads are running, each is "waited on" with a simple call to the future's wait function, after which the return value of func can be obtained. No mutexes or condition_variables involved (that we know of).
This brings me to the subject of processing files in parallel, no matter how you launch a number of threads. If there were a machine which could handle 10,000 threads, then if each thread were a trivial file oriented operation there would be considerable RAM resources devoted to file processing, all duplicating each other. Depending on the API chosen, there are buffers associated with each read operation.
Let's say the file was 10 Mbytes, and 10,000 threads began operating on it, where each thread used 4 Kbyte buffers for processing. Combined, that suggests there would be 40 Mbytes of buffers to process a 10 Mbyte file. It would be less wasteful to simply read the file into RAM, and offer read only access to all threads from RAM.
That notion is further complicated by the fact that multiple tasks reading from various sections of the file at different times may cause heavy thrashing from a standard hard disk (not so for flash sources), if the disk cache can't keep up. More importantly, though, is that 10,000 threads are all calling system API's for reading the file, each with considerable overhead.
If the source material is a candidate for reading entirely into RAM, the threads could be focused on RAM instead of the file, alleviating that overhead, improving performance. The threads could share read access to the contents without locks.
If the source file is too large to read entirely into RAM, it may still be best read in blocks of the source file, have threads process that portion from a shared memory resource, then move to the next block in a series.

How to avoid race conditions in a condition variable in VxWorks

We're programming on a proprietary embedded platform sitting atop of VxWorks 5.5. In our toolbox, we have a condition variable, that is implemented using a VxWorks binary semaphore.
Now, POSIX provides a wait function that also takes a mutex. This will unlock the mutex (so that some other task might write to the data) and waits for the other task to signal (it is done writing the data). I believe this implements what's called a Monitor, ICBWT.
We need such a wait function, but implementing it is tricky. A simple approach would do this:
bool condition::wait_for(mutex& mutex) const {
unlocker ul(mutex); // relinquish mutex
return wait(event);
} // ul's dtor grabs mutex again
However, this sports a race condition because it allows another task to preempt this one after the unlocking and before the waiting. The other task can write to the date after it was unlocked and signal the condition before this task starts to wait for the semaphore. (We have tested this and this indeed happens and blocks the waiting task forever.)
Given that VxWorks 5.5 doesn't seem to provide an API to temporarily relinquish a semaphore while waiting for a signal, is there a way to implement this on top of the provided synchronization routines?
Note: This is a very old VxWorks version that has been compiled without POSIX support (by the vendor of the proprietary hardware, from what I understood).
This should be quite easy with native vxworks, a message queue is what is required here. Your wait_for method can be used as is.
bool condition::wait_for(mutex& mutex) const
{
unlocker ul(mutex); // relinquish mutex
return wait(event);
} // ul's dtor grabs mutex again
but the wait(event) code would look like this:
wait(event)
{
if (msgQRecv(event->q, sigMsgBuf, sigMsgSize, timeoutTime) == OK)
{
// got it...
}
else
{
// timeout, report error or something like that....
}
}
and your signal code would like something like this:
signal(event)
{
msgQSend(event->q, sigMsg, sigMsgSize, NO_WAIT, MSG_PRI_NORMAL);
}
So if the signal gets triggered before you start waiting, then msgQRecv will return immediately with the signal when it eventually gets invoked and you can then take the mutex again in the ul dtor as stated above.
The event->q is a MSG_Q_ID that is created at event creation time with a call to msgQCreate, and the data in sigMsg is defined by you... but can be just a random byte of data, or you can come up with a more intelligent structure with information regarding who signaled or something else that may be nice to know.
Update for multiple waiters, this is a little tricky: So there are a couple of assumptions I will make to simplify things
The number of tasks that will be pending is known at event creation time and is constant.
There will be one task that is always responsible for indicating when it is ok to unlock the mutex, all other tasks just want notification when the event is signaled/complete.
This approach uses a counting semaphore, similar to the above with just a little extra logic:
wait(event)
{
if (semTake(event->csm, timeoutTime) == OK)
{
// got it...
}
else
{
// timeout, report error or something like that....
}
}
and your signal code would like something like this:
signal(event)
{
for (int x = 0; x < event->numberOfWaiters; x++)
{
semGive(event->csm);
}
}
The creation of the event is something like this, remember in this example the number of waiters is constant and known at event creation time. You could make it dynamic, but the key is that every time the event is going to happen the numberOfWaiters must be correct before the unlocker unlocks the mutex.
createEvent(numberOfWaiters)
{
event->numberOfWaiters = numberOfWaiters;
event->csv = semCCreate(SEM_Q_FIFO, 0);
return event;
}
You cannot be wishy-washy about the numberOfWaiters :D I will say it again: The numberOfWaiters must be correct before the unlocker unlocks the mutex. To make it dynamic (if that is a requirement) you could add a setNumWaiters(numOfWaiters) function, and call that in the wait_for function before the unlocker unlocks the mutex, so long as it always sets the number correctly.
Now for the last trick, as stated above the assumption is that one task is responsible for unlocking the mutex, the rest just wait for the signal, which means that one and only one task will call the wait_for() function above, and the rest of the tasks just call the wait(event) function.
With this in mind the numberOfWaiters is computed as follows:
The number of tasks who will call wait()
plus 1 for the task that calls wait_for()
Of course you can also make this more complex if you really need to, but chances are this will work because normally 1 task triggers an event, but many tasks want to know it is complete, and that is what this provides.
But your basic flow is as follows:
init()
{
event->createEvent(3);
}
eventHandler()
{
locker l(mutex);
doEventProcessing();
signal(event);
}
taskA()
{
doOperationThatTriggersAnEvent();
wait_for(mutex);
eventComplete();
}
taskB()
{
doWhateverIWant();
// now I need to know if the event has occurred...
wait(event);
coolNowIKnowThatIsDone();
}
taskC()
{
taskCIsFun();
wait(event);
printf("event done!\n");
}
When I write the above I feel like all OO concepts are dead, but hopefully you get the idea, in reality wait and wait_for should take the same parameter, or no parameter but rather be members of the same class that also has all the data they need to know... but none the less that is the overview of how it works.
Race conditions can be avoided if each waiting task waits on a separate binary semaphore.
These semaphores must be registered in a container which the signaling task uses to unblock all waiting tasks. The container must be protected by a mutex.
The wait_for() method obtains a binary semaphore, waits on it and finally deletes it.
void condition::wait_for(mutex& mutex) {
SEM_ID sem = semBCreate(SEM_Q_PRIORITY, SEM_EMPTY);
{
lock l(listeners_mutex); // assure exclusive access to listeners container
listeners.push_back(sem);
} // l's dtor unlocks listeners_mutex again
unlocker ul(mutex); // relinquish mutex
semTake(sem, WAIT_FOREVER);
{
lock l(listeners_mutex);
// remove sem from listeners
// ...
semDelete(sem);
}
} // ul's dtor grabs mutex again
The signal() method iterates over all registered semaphores and unlocks them.
void condition::signal() {
lock l(listeners_mutex);
for_each (listeners.begin(), listeners.end(), /* call semGive()... */ )
}
This approach assures that wait_for() will never miss a signal. A disadvantage is the need of additional system resources.
To avoid creating and destroying semaphores for every wait_for() call, a pool could be used.
From the description, it looks like you may want to implement (or use) a semaphore - it's a standard CS algorithm with semantics similar to condvars, and there are tons of textbooks on how to implement them (https://www.google.com/search?q=semaphore+algorithm).
A random Google result which explains semaphores is at: http://www.cs.cornell.edu/courses/cs414/2007sp/lectures/08-bakery.ppt‎ (see slide 32).

making sure threads are created and waiting before broadcasting

I have 10 threads that are supposed to be waiting for signal.
Until now I've simply done 'sleep(3)', and that has been working fine, but is there are a more secure way to make sure, that all threads have been created and are indeed waiting.
I made the following construction where I in critical region, before the wait, increment a counter telling how many threads are waiting. But then I have to have an additional mutex and conditional for signalling back to the main that all threads are created, it seems overly complex.
Am I missing some basic thread design pattern?
Thanks
edit: fixed types
edit: clarifying information below
A barrier won't work in this case, because I'm not interested in letting my threads wait until all threads are ready. This already happens with the 'cond_wait'.
I'm interested in letting the main function know, when all threads are ready and waiting.
//mutex and conditional to signal from main to threads to do work
mutex_t mutex_for_cond;
condt_t cond;
//mutex and conditional to signal back from thread to main that threads are ready
mutex_t mutex_for_back_cond;
condt_t back_cond;
int nThreads=0;//threadsafe by using mutex_for_cond
void *thread(){
mutex_lock(mutex_for_cond);
nThreads++;
if(nThreads==10){
mutex_lock(mutex_for_back_cond)
cond_signal(back_cond);
mutex_unlock(mutex_for_back_cond)
}while(1){
cond_wait(cond,mutext_for_cond);
if(spurious)
continue;
else
break;
}
mutex_unlock(mutex_for_cond);
//do work on non critical region data
}
int main(){
for(int i=0;i<10)
create_threads;
while(1){
mutex_lock(mutex_for_back_cond);
cond_wait(back_cond,mutex_for_back_cond);
mutex_unlock(mutex_for_back_cond);
mutex_lock(mutex_for_cond);
if(nThreads==10){
break;
}else{
//spurious wakeup
mutex_unlock(mutex_for_cond);
}
}
//now all threads are waiting
//mutex_for_cond is still locked so broadcast
cond_broadcast(cond);//was type here
}
Am I missing some basic thread design pattern?
Yes. For every condition, there should be a variable that is protected by the accompanying mutex. Only the change of this variable is indicated by signals on the condition variable.
You check the variable in a loop, waiting on the condition:
mutex_lock(mutex_for_back_cond);
while ( ready_threads < 10 )
cond_wait(back_cond,mutex_for_back_cond);
mutex_unlock( mutex_for_back_cond );
Additionally, what you are trying to build is a thread barrier. It is often pre-implemented in threading libraries, like pthread_barrier_wait.
Sensible threading APIs have a barrier construct which does precisely this.
For example, with boost::thread, you would create a barrier like this:
boost::barrier bar(10); // a barrier for 10 threads
and then each thread would wait on the barrier:
bar.wait();
the barrier waits until the specified number of threads are waiting for it, and then releases them all at once. In other words, once all ten threads have been created and are ready, it'll allow them all to proceed.
That's the simple, and sane, way of doing it. Threading APIs which do not have a barrier construct require you to do it the hard way, not unlike what you're doing now.
You should associate some variable that contains the 'event state' with the condition variable. The main thread sets the event state variable appropriately just before issuing the broadcast. The threads that are interested in the event check the event state variable regardless of whether they've blocked on the condition variable or not.
With this pattern, the main thread doesn't need to know about the precise state of the threads - it just sets the event when it needs to then broadcasts the condition. Any waiting threads will be unblocked, and any threads not waiting yet will never block on the condition variable because they'll note that the event has already occurred before waiting on the condition. Something like the following pseudocode:
//mutex and conditional to signal from main to threads to do work
pthread_mutex_t mutex_for_cond;
pthread_cond_t cond;
int event_occurred = 0;
void *thread()
{
pthread_mutex_lock(&mutex_for_cond);
while (!event_occurred) {
pthread_cond_wait( &cond, &mutex_for_cond);
}
pthread_mutex_unlock(&mutex_for_cond);
//do work on non critical region data
}
int main()
{
pthread_mutex_init(&mutex_for_cond, ...);
pthread_cond_init(&cond, ...);
for(int i=0;i<10)
create_threads(...);
// do whatever needs to done to set up the work for the threads
// now let the threads know they can do their work (whether or not
// they've gotten to the "wait point" yet)
pthread_mutex_lock(&mutex_for_cond);
event_occured = 1;
pthread_cond_broadcast(&cond);
pthread_mutex_unlock(&mutex_for_cond);
}