Multiple vs single condition variable - c++

I'm working on an application where I currently have only one condition variable and lots of wait statements with different conditions. Whenever one of the queues or some other state is changed I just call cv.notify_all(); and know that all threads waiting for this state change will be notified (and possibly other threads which are waiting for a different condition).
I'm wondering whether it makes sense to use separate condition variables for each queue or state. All my queues are usually separate, so a thread waiting for data in queue x doesn't care at about new data in queue y. So I won't have situations where I have to notify or wait for multiple condition variables (which is what most questions I've found are about).
Is there any downside, from a performance point of view, for having dozens of condition variables around? Of course the code may be more prone to errors because one has to notify the correct condition variable.
Edit: All condition variables are still going to use the same mutex.

I would recommend having one condition_variable per actual condition. In producer/consumer, for example, there would be a cv for buffer empty and another cv for buffer full.
It doesn't make sense to notifiy all when you could just notify one. That makes me think that your one condition_variable is in reality managing several actual conditions.
You're asking about optimization. On one hand, we want to avoid premature optimization and do the profiling when it becomes a problem. On the other hand, we want to design code that's going to scale well algorithmically from the beginning.
From a speed performance point of view, it's hard to say without knowing the situation, but the potential for significant negative impact on speed from adding cv's is very low, completely dwarfed by the cost of a wait or notify. If adding more cv's can get you out of calling notify-all, the potential for positive impact on speed is high.
Notify-all makes for simple-to-understand code. However, it shouldn't be used when performance matters. When a thread waiting on cv is awoken, it is guaranteed to hold the mutex. If you notify-all, every thread waiting on that cv will be woken up and immediately try to grab the mutex. At that point, all but one will go back to sleep (this time waiting for the mutex to be free). When the lucky thread releases the mutex, the next thread will get it and the next thing it will do is check the cv predicate. Depending on the condition, it could be very likely that the predicate evaluates to false (because the first thread already took care of it, or because you have multiple actual conditions for the one cv) and the thread immediately goes back to sleep waiting on the cv. One-by-one, every thread now waiting on the mutex will wake up, check the predicate, and probably go back to sleep on the cv. It's likely that only one actually ends up doing something.
This is horrible performance because mutex lock, mutex unlock, cv wait, and cv signal are all relatively expensive.
Even when you call notify-one, you might consider doing it after the critical section to try to prevent the one waking thread from blocking on the mutex. And if you actually want to notify-all, you can simply call notify-one and have each thread notify-one once it finishes the critical section. This creates a nice linear cascade of turn-taking versus an explosion of contention.

Related

How do I signal a std::thread to exit gracefully?

Using C++17, for a worker thread with a non-blocking loop in it that performs some task, I see three ways to signal the thread to exit:
A std::atomic_bool that the thread checks in a loop. If it is set to true, the thread exits. The main thread sets it to true before invoking std::thread::join().
A std::condition_variable with a bool. This is similar to the above, except it allows you to invoke std::condition_variable::wait_for() to effectively "sleep" the thread (to lower CPU usage) while it waits for a potential exit signal (via setting the bool, which is checked in the 3rd argument to wait_for() (the predicate). The main thread would lock a mutex, change the bool to true, and invoke std::condition_variable::notify_all() before invoking std::thread::join() to signal the thread to exit.
A std::future and std::promise. The main thread holds a std::promise<void> while the worker thread holds the corresponding std::future<void>. The worker thread uses std::future::wait_for() similar to the step above. Main thread invokes std::promise::set_value() before calling std::thread::join().
My thoughts on each:
This is simple, but lacks the ability to "slow down" the worker thread loop without explicitly calling std::this_thread::sleep_for(). Seems like an "old fashioned" way of doing thread signals.
This one is comprehensive, but very complicated, because you need a condition variable plus a boolean variable.
This one seems like the best option, because it has the simplicity of #1 without the verbosity of #2. But I have no personal experience with std::future and std::promise yet, so I am not sure if it's the ideal solution. In my mind, promise & future are meant to transfer values across threads, not really be used as signals. So I'm not sure if there are efficiency concerns.
I see multiple ways of signaling a thread to exit. And sadly, my Google searching has only introduced more as I keep looking, without actually coming to a general consensus on the "modern" and/or "best" way of doing this with C++17.
I would love to see some light shed on this confusion. Is there a conclusive, definitive way of doing this? What is the general consensus? What are the pros/cons of each solution, if there is no "one size fits all"?
If you have a busy working thread which requires one-way notification if it should stop working the best way is to just use an atomic<bool>. It is up to the worker thread if it wants to slow down or it doesn't want to slow down. The requirement to "throttle" the worker thread is completely orthogonal to the thread cancellation and, in my opinion, should not be considered with the cancellation itself. This approach, to my knowledge, has 2 drawbacks: you can't pass back the result (if any) and you can't pass back an exception (if any). But if you do not need any of those then use atomic<bool> and don't bother with anything else. It is as modern as any; there is nothing old-fashioned about it.
condition_variable is part of the consumer/producer pattern. So there is something which produces work and there is something that consumes what was produced. To avoid busy waiting for the consumer while there is nothing to consume the condition_variable is a great option to use. It is just a perfect primitive for such tasks. But it doesn't make sense for the thread cancellation process. And you will have to use another variable anyway because you can't rely on condition_variable alone. It might spuriously wake up the thread. You might "set" it before it gets in the waiting process, losing the "set" completely, and so on. It just can't be used alone so we back to square one but now with an atomic<bool> variable to accompany our condition_variable
The future/promise pair is good when you need to know the result of the operation done on the other thread. So it is not a replacement of the approach with the atomic<bool> but it rather complements it. So to remove the drawbacks described in the first paragraph you add future/promise to the equation. You provide the calling side with the future extracted from the promise which lives within the thread. That promise gets set once the thread is finished:
Because exception is thrown.
Because thread has done its work and completed on its own.
Because we asked it to stop by setting the atomic<bool> variable.
So as you see the future/promise pair just helps to provide some feedback for the callee it has nothing to do with the cancellation itself.
P.S. You can always use an electric sledgehammer to crack a nut but it doesn't make the approach any more modern.
I can't say that this is conclusive, or definitive, but since this is somewhat an opinion question, I'll give an answer that it is based upon a lot of trial and error to solve the kind of problem you are asking about (I think).
My preferred pattern is to signal the thread to stop using atomic bool, and control the 'loop' timing with a condition variable.
We ran into the requirement for running repeating tasks on worker threads so often that we created a class that we called 'threaded_worker'. This class handles the complexities of aborting the thread, and timing the calls to the worker function.
The abort is handled via a method that sets the atomic bool 'abort' signal which tells the thread to stop calling the work function and terminate.
The loop timing can be controlled by methods that set the wait time for the condition variable. The thread can be released to continue via method that calls the notify on the condition variable.
We use the class as a base class for all kinds of objects that have some function that needs to execute on a separate thread. The class is designed to run the 'work' function once, or in a loop.
We use the bool for the abort, because it is simple and suitable to do the job. We use the condition variable for loop timing, because it has the benefit of being notified to 'short circuit' the timing. This is very useful when the threaded object is a consumer. When a producer has work for the threaded object, it can queue the work and notify that the work is available. The threaded object immediately continues, instead of waiting for the specified wait time on the condition variable.
The reason for both (the abort signal, and the condition variable) is that I see terminating the thread as one function, and timing the loop as another.
We used to time loops by putting the thread to sleep for some duration. This made it almost impossible to get predictable loop timing on Windows computers. Some computers will return from sleep(1) in about 1ms, but others will return in 15ms. Our performance was highly dependent on the specific hardware. Using condition variables we have greatly improved the timing of critical tasks. The added benefit of notifying a waiting thread when work is available is more than worth the complexity of the condition variable.

Is there a C++ design pattern that implements a mechanism or mutex that controls the amount of time a thread can own a locked resource?

I am looking for a way to guarantee that any time a thread locks a specific resource, it is forced to release that resource after a specific period of time (if it has not already released it). Envision a connection where you need to limit the amount of time any specific thread can own that connection for.
I envision this is how it could be used:
{
std::lock_guard<std::TimeLimitedMutex> lock(this->myTimeLimitedMutex, timeout);
try {
// perform some operation with the resource that myTimeLimitedMutex guards.
}
catch (MutexTimeoutException ex) {
// perform cleanup
}
}
I see that there is a timed_mutex that lets the program timeout if a lock cannot be acquired. I need the timeout to occur after the lock is acquired.
There are already some situations where you get a resource that can be taken away unexpectedly. For instance, a tcp sockets -- once a socket connection is made, code on each side needs to handle the case where the other side drops the connection.
I am looking for a pattern that handle types of resources that normally time out on their own, but when they don't, they need to be reset. This does not have to handle every type of resource.
This can't work, and it will never work. In other words, this can never be made. It goes against all concept of ownership and atomic transactions. Because when thread acquires the lock and implements two transactions in a row, it expects them to become atomically visible to outside word. In this scenario, it would be very possible that the transaction will be torn - first part of it will be performed, but the second will be not.
What's worse is that since the lock will be forcefully removed, the part-executed transaction will become visible to outside word, before the interrupted thread has any chance to roll-back.
This idea goes contrary to all school of multi-threaded thinking.
I support SergeyAs answer. Releasing a locked mutex after a timeout is a bad idea and cannot work. Mutex stands for mutual exclusion and this is a rock-hard contract which cannot be violated.
But you can do what you want:
Problem: You want to guarantee that your threads do not hold the mutex longer than a certain time T.
Solution: Never lock the mutex for longer than time T. Instead write your code so that the mutex is locked only for the absolutely necessary operations. It is always possible to give such a time T (modulo the uncertainties and limits given my a multitasking and multiuser operating system of course).
To achieve that (examples):
Never do file I/O inside a locked section.
Never call a system call while a mutex is locked.
Avoid sorting a list while a mutex is locked (*).
Avoid doing a slow operation on each element of a list while a mutex is locked (*).
Avoid memory allocation/deallocation while a mutex is locked (*).
There are exceptions to these rules, but the general guideline is:
Make your code slightly less optimal (e.g. do some redundant copying inside the critical section) to make the critical section as short as possible. This is good multithreading programming.
(*) These are just examples for operations where it is tempting to lock the entire list, do the operations and then unlock the list. Instead it is advisable to just take a local copy of the list and clear the original list while the mutex is locked, ideally by using the swap() operation offered by most STL containers. And then do the slow operation on the local copy outside of the critical section. This is not always possible but always worth considering. Sorting has square complexity in the worst case and usually needs random access to the entire list. It is useful to sort (a copy of) the list outside of the critical section and later check whether elements need to be added or removed. Memory allocations also have quite some complexity behind them, so massive memory allocations/deallocations should be avoided.
You can't do that with only C++.
If you are using a Posix system, it can be done.
You'll have to trigger a SIGALARM signal that's only unmasked for the thread that'll timeout. In the signal handler, you'll have to set a flag and use longjmp to return to the thread code.
In the thread code, on the setjmp position, you can only be called if the signal was triggered, thus you can throw the Timeout exception.
Please see this answer for how to do that.
Also, on linux, it seems you can directly throw from the signal handler (so no longjmp/setjmp here).
BTW, if I were you, I would code the opposite. Think about it: You want to tell a thread "hey, you're taking too long, so let's throw away all the (long) work you've done so far so I can make progress".
Ideally, you should have your long thread be more cooperative, doing something like "I've done A of a ABCD task, let's release the mutex so other can progress on A. Then let's check if I can take it again to do B and so on."
You probably want to be more fine grained (have more mutex on smaller objects, but make sure you're locking in the same order) or use RW locks (so that other threads can use the objects if you're not modifying them), etc...
Such an approach cannot be enforced because the holder of the mutex needs the opportunity to clean up anything which is left in an invalid state part way through the transaction. This can take an unknown arbitrary amount of time.
The typical approach is to release the lock when doing long tasks, and re-aquire it as needed. You have to manage this yourself as everyone will have a slightly different approach.
The only situation I know of where this sort of thing is accepted practice is at the kernel level, especially with respect to microcontrollers (which either have no kernel, or are all kernel, depending on who you ask). You can set an interrupt which modifies the call stack, so that when it is triggered it unwinds the particular operations you are interested in.
"Condition" variables can have timeouts. This allows you to wait until a thread voluntarily releases a resource (with notify_one() or notify_all()), but the wait itself will timeout after a specified fixed amount of time.
Examples in the Boost documentation for "conditions" might make this more clear.
If you want to force a release, you have to write the code which will force it though. This could be dangerous. The code written in C++ can be doing some pretty close-to-the-metal stuff. The resource could be accessing real hardware and it could be waiting on it to finish something. It may not be physically possible to end whatever the program is stuck on.
However, if it is possible, then you can handle it in the thread in which the wait() times out.

purpose for wait_for function in condition variable - C++11

I am new to condition variables, multi-threading and mutexes and I have a fundamental question regarding it.
Quote from en.cppreference.com on wait_for - "blocks the current thread until the condition variable is woken up or after the specified timeout duration".
Why should the current thread be unblocked after a specified duration of time. The basic purpose of condition variable is to notify whenever a "condition" occurs. Does it not cause a overhead if the thread is woken up, say, every 500ms? Spurious wake ups are also inbuilt as a safety mechanism in case a call to notify does not happen/fails etc.
I am obviously missing something basic here but not sure what it is. Any help is appreciated.
The use case for this is if you want to wait for an event to occur, but not indefinitely.
Maybe after the timeout expired, you want to notify the user that obtaining the result takes longer than expected. Maybe you want to trigger cancellation of the task providing the result.
As you have correctly pointed out, this causes additional overhead, so it only makes sense to use this instead of wait if you actually have something reasonable to do to react to an expired timeout.
Spurious wakeups are not so much of a safety mechanism as more an unfortunate necessity imposed by certain hardware architectures. In a perfect world (ie. a world where you only ever call the wait functions with a predicate), no spurious wakeups occur ever.
if the thread is woken up, say, every 500ms?
That's not how wait_for works. Let's ignore spurious wakes for the moment. The function will wake when notified or when the timer expires. Another way to look at it: wait for notification, but no longer than rel_time time.
It can be used to avoid infinite wait when for instance the notification mechanism could crash. Or when you need to take an action if the notification doesn't come soon enough.
Spurious wakeups are unwanted side effects of the realities of modern software/hardware architectures. This answer explains it pretty well.

Bottleneck in Threads C++

So I am just trying to verify my understanding and hope that you guys will be able to clear up any misunderstandings. So essentially I have two threads which use the same lock and perform calculations when they hold the lock but the interesting thing is that within the lock I will cause the thread to sleep for a short time. For both threads, this sleep time will be slightly different for either thread. Because of the way locks work, wont the faster thread be bottlenecked by the slower thread as it will have to wait for it to complete?
For example:
Thread1() {
lock();
usleep(10)
lock();
}
-
Thread2() {
lock();
sleep(100)
lock();
}
Now because Thread2 holds onto the lock longer, this will cause a bottleneck. And just to be sure, this system should have a back and forth happens on who gets the lock, right?
It should be:
Thread1 gets lock
Thread1 releases lock
Thread2 gets lock
Thread2 releases lock
Thread1 gets lock
Thread1 releases lock
Thread2 gets lock
Thread2 releases lock
and so on, right? Thread1 should never be able to acquire the lock right after it releases it, can it?
Thread1 should never be able to acquire the lock right after it releases it, can it?
No, Thread1 could reacquire the lock, right after it releases it, because Thread2 could still be suspended (sleeps because of the scheduler)
Also sleep only guarantees that the thread will sleep at least the wanted amount, it can and will often be more.
In practice you would not hold a lock while calculating a value, you would get the lock, get the needed values for calculation, unlock, calculate it, and then get the lock again, check if the old values for the calculation are still valid/wanted, and then store/return your calculated results.
For this purpose, the std::future and atomic data types were invented.
...this system should have a back and forth happens on who gets the lock, right?
Mostly The most of the time it will be a back and forth but some times there could/will be two lock/unlock cycles by Thread1. It depends on your scheduler and any execution and cycle will probably vary.
Absolutely nothing prevents either thread from immediately reacquiring the lock after releasing it. I have no idea what you think prevents this from happening, but nothing does.
In fact, in many implementations, a thread that is already running has an advantage in acquiring a lock over threads that have to be made ready-to-run. This is a sensible optimization to minimize context switches.
If you're using a sleep as a way to simulate work and think this represents some real world issue with lock fairness, you are wrong. Threads that sleep are voluntarily yielding the remainder of their timeslice and are treated very differently from threads that exhaust their timeslice doing work. If these threads were actually doing work, eventually one thread would exhaust its timeslice.
Depending on what you are trying to achieve there are several possibilities.
If you want your threads to run in a particular order then have a look here.
There are basically 2 options:
- one is to use events where a thread is signaling the next one it has done his job and so the next one could start.
- the other one is to have a scheduler thread that handle the ordering with events or semaphores.
If you want your threads to run independently but have a lock mechanism where the order of attempting to get the lock is preserved you can have a look here. The last part of the answer uses a queue of one condition variable per thread seem good.
And as it was said in previous answers and comments, using sleep for scheduling is a bad idea.
Also lock is just a mutual exclusion mechanism and has no guarentee on the execution order.
A lock is usually intended for preventing concurrent access on a critical resource so it should just do that. The smaller the critical section is, the better.
Finally yes trying to order threads is making "bottlenecks". In this particular case if all calculations are made in the locked sections the threads won't do anything in parallel so you can question the utility of using threads.
Edit :
Just on more warning: be careful, with threads it's not because is worked (was scheduled as you wanted to) 10 times on your machine that it always will, especially if you change any of the context (machine, workload...). You have to be sure of it by design.

Does msleep() give cycles to other threads?

In a multi threaded app, is
while (result->Status == Result::InProgress) Sleep(50);
//process results
better than
while (result->Status == Result::InProgress);
//process results
?
By that, I'm asking will the first method be polite to other threads while waiting for results rather than spinning constantly? The operation I'm waiting for usually takes about 1-2 seconds and is on a different thread.
I would suggest using semaphores for such case instead of polling. If you prefer active waiting, the sleep is much better solution than evaluating the loop condition constantly.
It's better, but not by much.
As long as result->Status is not volatile, the compiler is allowed to reduce
while(result->Status == Result::InProgress);
to
if(result->Status == Result::InProgress) for(;;) ;
as the condition does not change inside the loop.
Calling the external (and hence implicitly volatile) function Sleep changes this, because this may modify the result structure, unless the compiler is aware that Sleep never modifies data. Thus, depending on the compiler, the second implementation is a lot less likely to go into an endless loop.
There is also no guarantee that accesses to result->Status will be atomic. For specific memory layouts and processor architectures, reading and writing this variable may consist of multiple steps, which means that the scheduler may decide to step in in the middle.
As all you are communicating at this point is a simple yes/no, and the receiving thread should also wait on a negative reply, the best way is to use the appropriate thread synchronisation primitive provided by your OS that achieves this effect. This has the advantage that your thread is woken up immediately when the condition changes, and that it uses no CPU in the meantime as the OS is aware what your thread is waiting for.
On Windows, use CreateEvent and co. to communicate using an event object; on Unix, use a pthread_cond_t object.
Yes, sleep and variants give up the processor. Other threads can take over. But there are better ways to wait on other threads.
Don't use the empty loop.
That depends on your OS scheduling policy too.For example Linux has CFS schedular by default and with that it will fairly distribute the processor to all the tasks. But if you make this thread as real time thread with FIFO policy then code without sleep will never relenquish the processor untill and unless a higher priority thread comes, same priority or lower will never get scheduled untill you break from the loop. if you apply SCHED_RR then processes of same priority and higher will get scheduled but not lower.