I was trying to search for how std::conidition_variable::wait is implemented in the standard library on my local machine, I can see wait_unitl but I cannot find wait.
My question is, how is the wait function implemented internally, how would one make a thread sleep indefinitely, is it using some long timed sleep or something entirely different that is OS-specific?
Thanks!
Pre-emptive multithreading is a process governed largely by the operating system. It decides which threads get timeslices and/or assigned to which cores, and so forth. As such, for most low-level threading primitives (mutexes, conditional variables, etc), the real work is done inside OS calls.
Yes, you could in theory implement something like a conditional variable with nothing more than atomic accesses and timed thread suspension. However, it would perform extremely poorly. Modern OS's know when a thread is waiting on a condition and can wake that thread up "immediately" when the condition is satisfied. Your mechanism requires that the waiting thread wait until some specific time has passed.
Plus, you'd have a whole bunch of spurious wake-ups that you have to check for, thus using thread time for no reason. The OS-based implementation will have far fewer spurious wake-ups.
Related
I have read in many places that there is some overhead associated with std::condition_variable_any. Just wondering, what is this overhead?
My guess here is that since this is a generic condition variable that can work with any type of lock, it requires a manually rolled implementation of waiting (perhaps with another condition_variable and mutex or futex, or something similar) so the extra overhead probably comes from that? But not sure... As opposed to just being a native wrapper around pthread_cond_wait() (and equivalent on other systems) etc.
As a followup, if I was say implementing something that waits on, say, a shared mutex, then is this type of condition variable a bad choice because of the performance overhead? What else can I do in this situation?
pthread_cond_wait() / SleepConditionVariableSRW(), same as the the plain std::condition_variable::wait() require just a single, atomic syscall for both releasing the mutex, waiting for the condition variable and re-aquiring the mutex. The thread immediately goes to sleep and another thread - ideally one which was blocked by the mutex - can take over immediately on the same core.
With std::condition_variable_any, the unlock of the passed BasicLockable and starting to wait on the native event / condition is more than just a single syscall, it's invoking the unlock() method on the BasicLockable first and only then issues the syscall for waiting. So you have at least the overhead from the separate unlock(), plus you are more likely to trigger an less than ideal scheduling decision on the OS side. Worst case, the unlock even caused continuation of a waiting thread on a different core, with all the associated overhead.
The other way around, e.g. on spurious wakes, there are also OS side scheduling optimizations possible when dealing with a native mutex (as used in std::mutex) which don't apply with a generic BasicLockable.
Both involve some book keeping, in order to provide notify_all() logic (it's actually one event / condition per waiting thread) as well as the guarantees about all methods being atomic, so they both come with a small overhead anyway.
The real overhead comes from how well the OS can make a good scheduling decision on the combined signal-and-wait-and-lock syscall. If the OS isn't smart about the scheduling, then it makes virtually no difference.
I was reading a bit about std::condition_variable and more particularly on how to notify a waiting thread using std::condition_variable::notify_one.
I came across a few questions I will be happy to get answers on:
What exactly happens when a thread calls notify_one (OS-wise)? I guess this is OS-specific, so for the sake of argument - I'm working in Windows.
What happens if a thread calls notify_one when there is no waiting thread? Does this call have any performance impact (CPU-cycles, power etc)?
Thanks
On windows, std::condition_variable is likely to be implemented in terms of native Windows condition variables.
See: https://msdn.microsoft.com/en-us/library/windows/desktop/ms682052(v=vs.85).aspx
On unix-like systems they're normally implemented in terms of a pthreads semaphore/mutex pair.
The entire operation should take place in user space so you don't pay to switch to kernel mode, but you will be working with two synchronisation primitives under the covers. This will mean that memory fences will be issued, so there is always some price to pay.
To cut a long story short, calling notify_one when you should, i.e. after changing the state of the condition and releasing the lock, it's a reasonably cheap operation. Calling notify_one in a tight loop for no good reason is probably not going to be a good idea.
What happens if a thread calls notify_one when there is no waiting thread?
Take a mutex, check whether there are threads waiting, release the mutex. end.
Does this call have any performance impact (CPU-cycles, power etc)?
Yes of course, it consumes a few cycles and requires that the CPU is operating. Doing it once in a while won't hurt. Doing it continuously in a tight loop will consume power.
I guess my question for you is, "what's the use case"? If you're adding a million items a second to a producer/consumer queue then you're going to spend a lot of time and energy notifying nonexistent consumers. If you're adding 10 a second, time spent in notify_one probably won't even show up on any performance trace.
These questions are extremely implementation-specific. Just saying you're on Windows is not enough; each standard library may have different implementations, and a debug version could have a different implementation than a release version.
The semantic effect of notify_one when no thread is waiting is a no-op. In implementation terms, at the very least the thread has to check an atomic variable to determine if any threads are waiting. So there is a bit of overhead.
The Microsoft standard library's condition_variable is implemented in terms of the concurrency runtime's condition variable which, starting from Windows Vista, is implemented in terms of the WinAPI RTL_CONDITION_VARIABLE. The implementation of that is not accessible. However, there's a reasonable chance that its implementation is based on this Microsoft research paper:
http://research.microsoft.com/pubs/64242/implementingcvs.pdf
In a multi threaded app, is
while (result->Status == Result::InProgress) Sleep(50);
//process results
better than
while (result->Status == Result::InProgress);
//process results
?
By that, I'm asking will the first method be polite to other threads while waiting for results rather than spinning constantly? The operation I'm waiting for usually takes about 1-2 seconds and is on a different thread.
I would suggest using semaphores for such case instead of polling. If you prefer active waiting, the sleep is much better solution than evaluating the loop condition constantly.
It's better, but not by much.
As long as result->Status is not volatile, the compiler is allowed to reduce
while(result->Status == Result::InProgress);
to
if(result->Status == Result::InProgress) for(;;) ;
as the condition does not change inside the loop.
Calling the external (and hence implicitly volatile) function Sleep changes this, because this may modify the result structure, unless the compiler is aware that Sleep never modifies data. Thus, depending on the compiler, the second implementation is a lot less likely to go into an endless loop.
There is also no guarantee that accesses to result->Status will be atomic. For specific memory layouts and processor architectures, reading and writing this variable may consist of multiple steps, which means that the scheduler may decide to step in in the middle.
As all you are communicating at this point is a simple yes/no, and the receiving thread should also wait on a negative reply, the best way is to use the appropriate thread synchronisation primitive provided by your OS that achieves this effect. This has the advantage that your thread is woken up immediately when the condition changes, and that it uses no CPU in the meantime as the OS is aware what your thread is waiting for.
On Windows, use CreateEvent and co. to communicate using an event object; on Unix, use a pthread_cond_t object.
Yes, sleep and variants give up the processor. Other threads can take over. But there are better ways to wait on other threads.
Don't use the empty loop.
That depends on your OS scheduling policy too.For example Linux has CFS schedular by default and with that it will fairly distribute the processor to all the tasks. But if you make this thread as real time thread with FIFO policy then code without sleep will never relenquish the processor untill and unless a higher priority thread comes, same priority or lower will never get scheduled untill you break from the loop. if you apply SCHED_RR then processes of same priority and higher will get scheduled but not lower.
I have a program that should get the maximum out of my cpu.
It is multithreaded via pthreads that do their job well apart from the fact that they "only" get my cores to about 60% load which is not enough in my opinion.
I am searching for the reason and am asking myself (and hereby you) if the blocking functions mutex_lock/cond_wait are candidates?
What happens when a thread cannot run on in such a function?
Does pthread switch to another thread it handles or
does the thread yield its time to the system and if the latter is the case, can I change this behavior?
Regards,
Nobody
More Information
The setting is one mainthread that fills the taskpool and countless workers that fetch jobs from there and wait on a conditional that is signaled via broadcast when a serialized calculation is done. They go on with the values from this calculation until they are done, deliver their mail and fetch the next job...
On a typical modern pthreads implementation, each thread is managed by the kernel not unlike a separate process. Any blocking call like pthread_mutex_lock or pthread_cond_wait (but also, say, read) will yield its time to the system. The system will then find another eligible thread to schedule, whether in your process or another process, and run it.
If your program is only taking 60% of the CPU, it is more likely blocked on I/O than on pthread operations, unless you have done something way too granular with your pthread operations.
If a thread is waiting on a mutex/condition, it doesn't use resources (well, uses just a tiny amount). Whenever the thread enters waiting state, control switches to other threads. When the mutex is released (or condition variable signalled), the thread wakes up and may acquire the mutex (if no other thread grabs it first), and continue to run. If however some other thread acquires the mutex (this can happen if several threads are waiting for it), the thread returns to sleeping state.
I need to write my own implementation of a condition variable much like pthread_cond_t.
I know I'll need to use the compiler provided primitives like __sync_val_compare_and_swap etc.
Does anyone know how I'd go about this please.
Thx
Correct implementation of condition variables is HARD. Use one of the many libraries out there instead (e.g. boost, pthreads-win32, my just::thread library)
You need to:
Keep a list of waiting threads (this might be a "virtual" list rather than an actual data structure)
Ensure that when a thread waits you atomically unlock the mutex owned by the waiting thread and add it to the list before that thread goes into a blocking OS call
Ensure that when the condition variable is notified then one of the threads waiting at that time is woken, and not one that waits later
Ensure that when the condition variable is broadcast then all of the threads waiting at that time are woken, and not any threads that wait later.
plus other issues that I can't think of just now.
The details vary with OS, as you are dependent on the OS blocking/waking primitives.
I need to write my own implementation of a condition variable much like pthread_cond_t.
The condition variables cannot be implemented using only the atomic primitives like compare-and-swap.
The purpose in life of the cond vars is to provide flexible mechanism for application to access the process/thread scheduler: put a thread into sleep and wake it up.
Atomic ops are implemented by the CPU, while process/thread scheduler is an OS territory. Without some supporting system call (or emulation using existing synchronization primitives) implementing cond vars is impossible.
Edit1. The only sensible example I know and can point you to is the implementation of the historical Linux pthread library which can be found here - e.g. version from 1997. The implementation (found in condvar.c file) is rather easy to read but also highlights the requirements for implementation of the cond vars. Spinlocks (using test-and-set op) are used for synchronizations and POSIX signals are used to put threads into sleep and to wake them up.
It depends on your requirements. IF you have no further requirements, and if your process may consume 100% of available CPU time, then you have the rare chance to experiment and try out different mutex and condition variables - just try it out, and learn about the details. Great thing.
But in reality, you are uusally bound to an operating system, and so you are captivated on the OSs threading primitives, because they represent the only kind of control to - yeah - process/threading/cpu ressource usage! So, in that case, you will not even have the chance to implement your OWN condition variables - if they are not based on the primites, that the OS provides you!
So... double check your environment, what do you control? What don't you control? And what makes sense?