I implemented a simple ThreadPool that uses std::list<Tasks> mTasks for task list.
All the threads waits on a conditional variable using the following code :
EnterCriticalSection(&mCriticalSection);
while(mTasks.size() ==0)
SleepConditionVariableCS(&mConditionVariable,&mCriticalSection, INFINITE);
Until someone adds something to the list and then one of them is waken up.
I used a while that checks that the task list is not empty although the only way to be waken up is by adding a new task to the list(so it can't be empty), the reason i did this is because in the MSDN it's written :
Condition variables are subject to spurious wakeups (those not
associated with an explicit wake) and stolen wakeups (another thread
manages to run before the woken thread). Therefore, you should recheck
a predicate (typically in a while loop) after a sleep operation
returns.
But what are those spurious wakeups, what will wake up my variable ?
My understanding on this topic, at the time I studied it at the university, was that to implement 100% safe condition variables would have been too expensive from a performance perspective.
The wikipedia page about spurious wakeups has a quote from David R. Butenhof (the author of Programming with POSIX Threads) saying:
This means that when you wait on a condition variable, the wait may
(occasionally) return when no thread specifically broadcast or
signaled that condition variable. Spurious wakeups may sound strange,
but on some multiprocessor systems, making condition wakeup completely
predictable might substantially slow all condition variable
operations. The race conditions that cause spurious wakeups should be
considered rare
Checking a condition in a while loop is a good practice that will certainly avoid this problem.
For what concerns having more details on why this actually can happen, I am sorry, but I cannot provide such insights.
I believe that the problem is down to synchronization between multiple processors in a multi-processor system.
In order to keep conditional variables as lightweight as possible implementations use certain threading primitives that, whilst thread safe, will cause the conditonal variable to register two notify() style calls when only one has actually been made. This is a rare scenario, and rather that making less efficient by handling this scenario the designers push the issue into user code where you only have to worry about it if it might affect you.
Related
I was trying to search for how std::conidition_variable::wait is implemented in the standard library on my local machine, I can see wait_unitl but I cannot find wait.
My question is, how is the wait function implemented internally, how would one make a thread sleep indefinitely, is it using some long timed sleep or something entirely different that is OS-specific?
Thanks!
Pre-emptive multithreading is a process governed largely by the operating system. It decides which threads get timeslices and/or assigned to which cores, and so forth. As such, for most low-level threading primitives (mutexes, conditional variables, etc), the real work is done inside OS calls.
Yes, you could in theory implement something like a conditional variable with nothing more than atomic accesses and timed thread suspension. However, it would perform extremely poorly. Modern OS's know when a thread is waiting on a condition and can wake that thread up "immediately" when the condition is satisfied. Your mechanism requires that the waiting thread wait until some specific time has passed.
Plus, you'd have a whole bunch of spurious wake-ups that you have to check for, thus using thread time for no reason. The OS-based implementation will have far fewer spurious wake-ups.
Using C++17, for a worker thread with a non-blocking loop in it that performs some task, I see three ways to signal the thread to exit:
A std::atomic_bool that the thread checks in a loop. If it is set to true, the thread exits. The main thread sets it to true before invoking std::thread::join().
A std::condition_variable with a bool. This is similar to the above, except it allows you to invoke std::condition_variable::wait_for() to effectively "sleep" the thread (to lower CPU usage) while it waits for a potential exit signal (via setting the bool, which is checked in the 3rd argument to wait_for() (the predicate). The main thread would lock a mutex, change the bool to true, and invoke std::condition_variable::notify_all() before invoking std::thread::join() to signal the thread to exit.
A std::future and std::promise. The main thread holds a std::promise<void> while the worker thread holds the corresponding std::future<void>. The worker thread uses std::future::wait_for() similar to the step above. Main thread invokes std::promise::set_value() before calling std::thread::join().
My thoughts on each:
This is simple, but lacks the ability to "slow down" the worker thread loop without explicitly calling std::this_thread::sleep_for(). Seems like an "old fashioned" way of doing thread signals.
This one is comprehensive, but very complicated, because you need a condition variable plus a boolean variable.
This one seems like the best option, because it has the simplicity of #1 without the verbosity of #2. But I have no personal experience with std::future and std::promise yet, so I am not sure if it's the ideal solution. In my mind, promise & future are meant to transfer values across threads, not really be used as signals. So I'm not sure if there are efficiency concerns.
I see multiple ways of signaling a thread to exit. And sadly, my Google searching has only introduced more as I keep looking, without actually coming to a general consensus on the "modern" and/or "best" way of doing this with C++17.
I would love to see some light shed on this confusion. Is there a conclusive, definitive way of doing this? What is the general consensus? What are the pros/cons of each solution, if there is no "one size fits all"?
If you have a busy working thread which requires one-way notification if it should stop working the best way is to just use an atomic<bool>. It is up to the worker thread if it wants to slow down or it doesn't want to slow down. The requirement to "throttle" the worker thread is completely orthogonal to the thread cancellation and, in my opinion, should not be considered with the cancellation itself. This approach, to my knowledge, has 2 drawbacks: you can't pass back the result (if any) and you can't pass back an exception (if any). But if you do not need any of those then use atomic<bool> and don't bother with anything else. It is as modern as any; there is nothing old-fashioned about it.
condition_variable is part of the consumer/producer pattern. So there is something which produces work and there is something that consumes what was produced. To avoid busy waiting for the consumer while there is nothing to consume the condition_variable is a great option to use. It is just a perfect primitive for such tasks. But it doesn't make sense for the thread cancellation process. And you will have to use another variable anyway because you can't rely on condition_variable alone. It might spuriously wake up the thread. You might "set" it before it gets in the waiting process, losing the "set" completely, and so on. It just can't be used alone so we back to square one but now with an atomic<bool> variable to accompany our condition_variable
The future/promise pair is good when you need to know the result of the operation done on the other thread. So it is not a replacement of the approach with the atomic<bool> but it rather complements it. So to remove the drawbacks described in the first paragraph you add future/promise to the equation. You provide the calling side with the future extracted from the promise which lives within the thread. That promise gets set once the thread is finished:
Because exception is thrown.
Because thread has done its work and completed on its own.
Because we asked it to stop by setting the atomic<bool> variable.
So as you see the future/promise pair just helps to provide some feedback for the callee it has nothing to do with the cancellation itself.
P.S. You can always use an electric sledgehammer to crack a nut but it doesn't make the approach any more modern.
I can't say that this is conclusive, or definitive, but since this is somewhat an opinion question, I'll give an answer that it is based upon a lot of trial and error to solve the kind of problem you are asking about (I think).
My preferred pattern is to signal the thread to stop using atomic bool, and control the 'loop' timing with a condition variable.
We ran into the requirement for running repeating tasks on worker threads so often that we created a class that we called 'threaded_worker'. This class handles the complexities of aborting the thread, and timing the calls to the worker function.
The abort is handled via a method that sets the atomic bool 'abort' signal which tells the thread to stop calling the work function and terminate.
The loop timing can be controlled by methods that set the wait time for the condition variable. The thread can be released to continue via method that calls the notify on the condition variable.
We use the class as a base class for all kinds of objects that have some function that needs to execute on a separate thread. The class is designed to run the 'work' function once, or in a loop.
We use the bool for the abort, because it is simple and suitable to do the job. We use the condition variable for loop timing, because it has the benefit of being notified to 'short circuit' the timing. This is very useful when the threaded object is a consumer. When a producer has work for the threaded object, it can queue the work and notify that the work is available. The threaded object immediately continues, instead of waiting for the specified wait time on the condition variable.
The reason for both (the abort signal, and the condition variable) is that I see terminating the thread as one function, and timing the loop as another.
We used to time loops by putting the thread to sleep for some duration. This made it almost impossible to get predictable loop timing on Windows computers. Some computers will return from sleep(1) in about 1ms, but others will return in 15ms. Our performance was highly dependent on the specific hardware. Using condition variables we have greatly improved the timing of critical tasks. The added benefit of notifying a waiting thread when work is available is more than worth the complexity of the condition variable.
I am new to condition variables, multi-threading and mutexes and I have a fundamental question regarding it.
Quote from en.cppreference.com on wait_for - "blocks the current thread until the condition variable is woken up or after the specified timeout duration".
Why should the current thread be unblocked after a specified duration of time. The basic purpose of condition variable is to notify whenever a "condition" occurs. Does it not cause a overhead if the thread is woken up, say, every 500ms? Spurious wake ups are also inbuilt as a safety mechanism in case a call to notify does not happen/fails etc.
I am obviously missing something basic here but not sure what it is. Any help is appreciated.
The use case for this is if you want to wait for an event to occur, but not indefinitely.
Maybe after the timeout expired, you want to notify the user that obtaining the result takes longer than expected. Maybe you want to trigger cancellation of the task providing the result.
As you have correctly pointed out, this causes additional overhead, so it only makes sense to use this instead of wait if you actually have something reasonable to do to react to an expired timeout.
Spurious wakeups are not so much of a safety mechanism as more an unfortunate necessity imposed by certain hardware architectures. In a perfect world (ie. a world where you only ever call the wait functions with a predicate), no spurious wakeups occur ever.
if the thread is woken up, say, every 500ms?
That's not how wait_for works. Let's ignore spurious wakes for the moment. The function will wake when notified or when the timer expires. Another way to look at it: wait for notification, but no longer than rel_time time.
It can be used to avoid infinite wait when for instance the notification mechanism could crash. Or when you need to take an action if the notification doesn't come soon enough.
Spurious wakeups are unwanted side effects of the realities of modern software/hardware architectures. This answer explains it pretty well.
I'm working on an application where I currently have only one condition variable and lots of wait statements with different conditions. Whenever one of the queues or some other state is changed I just call cv.notify_all(); and know that all threads waiting for this state change will be notified (and possibly other threads which are waiting for a different condition).
I'm wondering whether it makes sense to use separate condition variables for each queue or state. All my queues are usually separate, so a thread waiting for data in queue x doesn't care at about new data in queue y. So I won't have situations where I have to notify or wait for multiple condition variables (which is what most questions I've found are about).
Is there any downside, from a performance point of view, for having dozens of condition variables around? Of course the code may be more prone to errors because one has to notify the correct condition variable.
Edit: All condition variables are still going to use the same mutex.
I would recommend having one condition_variable per actual condition. In producer/consumer, for example, there would be a cv for buffer empty and another cv for buffer full.
It doesn't make sense to notifiy all when you could just notify one. That makes me think that your one condition_variable is in reality managing several actual conditions.
You're asking about optimization. On one hand, we want to avoid premature optimization and do the profiling when it becomes a problem. On the other hand, we want to design code that's going to scale well algorithmically from the beginning.
From a speed performance point of view, it's hard to say without knowing the situation, but the potential for significant negative impact on speed from adding cv's is very low, completely dwarfed by the cost of a wait or notify. If adding more cv's can get you out of calling notify-all, the potential for positive impact on speed is high.
Notify-all makes for simple-to-understand code. However, it shouldn't be used when performance matters. When a thread waiting on cv is awoken, it is guaranteed to hold the mutex. If you notify-all, every thread waiting on that cv will be woken up and immediately try to grab the mutex. At that point, all but one will go back to sleep (this time waiting for the mutex to be free). When the lucky thread releases the mutex, the next thread will get it and the next thing it will do is check the cv predicate. Depending on the condition, it could be very likely that the predicate evaluates to false (because the first thread already took care of it, or because you have multiple actual conditions for the one cv) and the thread immediately goes back to sleep waiting on the cv. One-by-one, every thread now waiting on the mutex will wake up, check the predicate, and probably go back to sleep on the cv. It's likely that only one actually ends up doing something.
This is horrible performance because mutex lock, mutex unlock, cv wait, and cv signal are all relatively expensive.
Even when you call notify-one, you might consider doing it after the critical section to try to prevent the one waking thread from blocking on the mutex. And if you actually want to notify-all, you can simply call notify-one and have each thread notify-one once it finishes the critical section. This creates a nice linear cascade of turn-taking versus an explosion of contention.
I understand that pthread_cond_wait() is documented to get spurious wake ups and the caller must check for that condition and that the motivation for this is to allow implementations of pthread_cond_wait() to have better performance and to force the caller to create more robust code.
However, I have not seen anyone get specific about the performance opportunity that this affords though other than mention of race conditions that would be expensive to avoid.
Can someone please go into detail about what race conditions would arise in order to ensure there were no spurious wake ups and what hardware architectures would cause such scenarios to arise?
There is no guarantee that your thread will run immediately when signaled. It just be marked as "ready" and will run at the mercy of the system scheduler. Between this time it becomes schedulable and the time it is actually scheduled, another thread could have changed the underlying condition.
For example:
Thread A:
Wait on condition variable.
Thread B:
Update state.
Signal condition variable.
Thread C:
Reset state
Thread A:
Wake.
Check underlying state, it is unchanged.