Do threads sleep when waiting on a locked mutex?

Do threads sleep when waiting on a locked mutex? - c++

Do threads blocked by a std::mutex::lock() or a condition variable sleep in a way that frees the core for other processes, or am I required to manually put these threads to sleep? And if true, would std::mutex::try_lock() allow for a way to spin the thread without sleeping?
The reason I ask: I want to have three states for threads in my thread pool that are unused: spinning for 2 milliseconds, then locked by a mutex for 250-ish milliseconds (assuming this lets them sleep and unhog the core), then finally being deallocated.
I want to avoid calling sleep manually if I can help it, tuning the sleep duration would be hard. So can I safely leave that to the mutex?

That is implementation specific; the C++ standard does not speak to it directly.
In practice, mutexes may use a combination of spin lock and full sleep. Sleeping and waking up is relatively expensive, and a compiler may write the locks to spin for a few ms before putting the thread to sleep.
No C++ implementation on a major phone, PC or big iron is going to spin lock indefinitely however. I could imagine some embedded system doing so, but have not personally encountered one.

Yes. Such blocked threads sleep and don't take up any CPU cycles.

Related

What will cpu do when a thread waiting for a mutex

I'm curious about the behavior of cpu during a thread waiting for a mutex. Now I can imagine two possibilities:
The cpu stay on the current thread and check if the mutex had been unlocked continually.
The cpu will switch to another thread(or process) for a moment and switch back to the origin thread and check temporary.
Which one is right or the stl implement in another way?

To understand this you first need to understand the difference between thread and cpu core. Thread is an abstract thing, a data structure, that is used to represent some sequence of operations to be executed. The OS assigns threads to cpu cores, and those cores then execute those operations. The OS (and also hardware) can also interrupt this execution at any time (although not in the middle of a single instruction), save such thread's state, suspend it, and assign some other thread to that core. This is also known as context switch. The OS sometimes does that on so called syscalls (when a program calls some OS's functionality, e.g. asks for the access to disk, network, etc.) as well. It is important because mutexes utilize some syscalls under the hood.
So what happens when a thread tries to access a locked mutex? First of all, no periodical checks happen. While possible, that would be a waste of cpu cycles and extremely unlikely that any serious OS does that. What actually happens is that each mutex internally has a queue associated. When it is locked, the OS will add current thread to this queue and will suspend it. Afterwards the OS will assign some other thread to this cpu core, if available.
Now if a mutex is locked, then there's a thread that actually locked that mutex. Let's call that thread an owner. This thread is not suspended, and it does some work. When it finishes whatever it is doing, it has to unlock the mutex (which is a syscall as well), otherwise those pending threads will never resume. When that (i.e. the unlocking) happens the OS will look at the associated queue, and pick a thread from it (which one is an implementation detail, it will often be some priority queue). This newly picked thread will be the new owner of the mutex, and the OS will resume it, meaning schedule the thread for execution. Schedule, because all cores may be busy at the moment.
Note that this is a brief overview of the topic. There are lots of other things and optimizations in play, like futexes and how to actually implement thread-safe (or rather core-safe) code without mutexes (these are not hardware features, mutexes are implemented in the OS). But that's more or less how things are.

Typically the thread will attempt to acquire the mutex, and if it can't (e.g. because another thread already acquired it) it will inform the scheduler and the scheduler will block the waiting thread and switch to a different thread, and then (later, when the lock is released) the scheduler will unblock the waiting thread and give it CPU time again.
On single-CPU systems; this is almost required. All CPU time spent (e.g. "spinning"/polling the lock again) between finding out the lock can't be acquired and doing a task switch (to a thread that may release the lock) is a waste of CPU time that will achieve nothing (because no other thread can release the lock until a task switch occurs).
However, research on multi-CPU systems (that I vaguely remember from about 20 years ago that may or may not have been done by Sun for Solaris) indicates that a small amount of "spinning" (in the hope that a thread running on a different CPU releases the lock in time) can be beneficial (by avoiding the cost of task switch/es). My intuition is that "time spent spinning before blocking" should be roughly equal to the cost of a task switch (or, if a task switch costs 123 microseconds, it'd probably be worthwhile spinning for 123 microseconds before the scheduler is told to block your thread); but this would depend heavily on scenario (e.g. how heavily contended the lock is, etc).

Typically,
The hardware thread (your "CPU") will be switched to running a different software thread by the kernel, and the original software thread will be set aside until the mutex it is waiting on becomes signaled. At that point the kernel will place it among the set of software threads that it seeks to schedule for execution on one of the hardware threads in the system.
Your option 1 applies to what is called a critical section on Microsoft's platforms and more generally a spinlock. See pthread_spin_lock().
Your option 2 is most similar to what usually happens.

In the Microsoft world, the Mutex is waited on with WaitForSingleObject(), which is described as
If the object's state is nonsignaled, the calling thread enters the wait state until the object is signaled or the time-out interval elapses.
Now you need to know that the "wait state" is a state where the thread is not active. We call it "blocking", which is the opposite of a busy wait where CPU time is used.
At that beginning, the kernel will immediately give the CPU to another thread and never give it back to your thread, unless the Mutex is becoming "signaled". So it will really use 0 CPU cycles during the wait.
When the kernel notices that the Mutex has changed, it can "wake up" the thread and might even boost its priority because it was waiting friendly all the time.
The cpu stay on the current thread and check if the mutex had been unlocked continually.
It's not the CPU that picks a thread to be executed. The thread scheduler of Windows will pick a thread that gets executed.
If a Mutex could block a CPU that way, you need to only 8 or 12 Mutexes to fully brick your system.
The cpu will switch to another thread(or process) for a moment [...]
Almost. There will be an interrupt by a timer. The interrupt will be handled by an interrupt service routine by the Windows kernel. At that time, the kernel can decide which thread will be executed next.
[...] and switch back to the origin thread and check temporary.
No. Because the Mutex is a kernel object, the kernel already knows that there's no used in letting the thread check again unless the Mutex has been signaled.

Does the use of locks in a thread slow down other threads that don't use locks?

If I have a thread that frequently uses locks, will it slow down other threads that don't use locks (e.g., because of global lock behavior in the CPU etc).

No. Locks simply consist of a system-call. In C/C++ there is no interpreter-overhead to worry about, no "global interpreter lock" and so on.
Waiting on a lock of course does slow things down, and you should carefully plan the design of your application to minimize the chances of a thread requesting a lock and having to wait to get it. The worst-case scenario is when you divide your application into multiple threads but they're all obliged to wait for something and are thereby effectively serialized.

When a thread acquires a lock, other threads can acquire that same lock until it is released by the thread that acquired it first. While the other threads are waiting for the lock to be released, they will enter idle-mode, meaning they will stop and waiting for the lock to be released. That is why it is a good practice to release locks as soon as you are done with them.

std::mutex::lock blocking CPU usage

I want to be able to freeze and unfreeze a thread at will.
My current solution is done through callbacks and busy waiting with sleep.
This is obviously not an optimal solution.
I'm considering having the main thread lock a mutex, then have the slave thread run a function that locks and unlocks the same mutex.
My worry is the possible CPU usage if it's a true busy wait.
My question, as such, is: how does STL in C++11 specify "blocking", and if it is a busy wait, are there less CPU intensive solutions (e.g. pthreads)?

While mutexes can be used, it is not an optimal solution because mutexes should be used for resource protection (see also this q/a).
Usually, std::condition_variable instances are what you should be looking for.
It works as follows:
Create an instance of std::condition_variable and distribute it to your controlling thread and your controlled thread
In the controlled thread, create a std::unique_lock. Pass it to one of the condition variable's wait methods
In the controlling thread, invoke one of the notify methods on the condition variable.
Hope that helps.

Have a look at this answer: Multithreading, when to yield versus sleep. Locking a mutex (in the manner you've described), is a reasonable solution to your problem.
Here's an MSDN article that's worth a read. Quote:
Until threads that are suspended or blocked become ready to run, the
scheduler does not allocate any processor time to them, regardless of
their priority.
If a thread isn't being scheduled it's not being run.

How to prevent threads from starvation in C++11

I am just wondering if there is any locking policy in C++11 which would prevent threads from starvation.
I have a bunch of threads which are competing for one mutex. Now, my problem is that the thread which is leaving a critical section starts immediately compete for the same mutex and most of the time wins. Therefore other threads waiting on the mutex are starving.
I do not want to let the thread, leaving a critical section, sleep for some minimal amount of time to give other threads a chance to lock the mutex.
I thought that there must be some parameter which would enable fair locking for threads waiting on the mutex but I wasn't able to find any appropriate solution.
Well I found std::this_thread::yield() function, which suppose to reschedule the order of threads execution, but it is only hint to scheduler thread and depends on scheduler thread implementation if it reschedule the threads or not.
Is there any way how to provide fair locking policy for the threads waiting on the same mutex in C++11?
What are the usual strategies?
Thanks

This is a common optimization in mutexes designed to avoid wasting time switching tasks when the same thread can take the mutex again. If the current thread still has time left in its time slice then you get more throughput in terms of user-instructions-executed-per-second by letting it take the mutex rather than suspending it, and switching to another thread (which likely causes a big reload of cache lines and various other delays).
If you have so much contention on a mutex that this is a problem then your application design is wrong. You have all these threads blocked on a mutex, and therefore not doing anything: you are probably better off without so many threads.
You should design your application so that if multiple threads compete for a mutex then it doesn't matter which thread gets the lock. Direct contention should also be a rare thing, especially direct contention with lots of threads.
The only situation where I can think this is an OK scenario is where every thread is waiting on a condition variable, which is then broadcast to wake them all. Every thread will then contend for the mutex, but if you are doing this right then they should all do a quick check that this isn't a spurious wake and then release the mutex. Even then, this is called a "thundering herd" situation, and is not ideal, precisely because it serializes all these threads.

Conditional wait overhead

When using boost::conditional_variable, ACE_Conditional or directly pthread_cond_wait, is there any overhead for the waiting itself? These are more specific issues that trouble be:
After the waiting thread is unscheduled, will it be scheduled back before the wait expires and then unscheduled again or it will stay unscheduled until signaled?
Does wait acquires periodically the mutex? In this case, I guess it wastes each iteration some CPU time on system calls to lock and release the mutex. Is it the same as constantly acquiring and releasing a mutex?
Also, then, how much time passes between the signal and the return from wait?
Afaik, when using semaphores the acquire calls responsiveness is dependent on scheduler time slice size. How does it work in pthread_cond_wait? I assume this is platform dependent. I am more interested in Linux but if someone knows how it works on other platforms, it will help too.
And one more question: are there any additional system resources allocated for each conditional? I won't create 30000 mutexes in my code, but should I worry about 30000 conditionals that use the same one mutex?

Here's what is written in the pthread_cond man page:
pthread_cond_wait atomically unlocks the mutex and waits for the condition variable cond to be signaled. The thread execution is suspended and does not consume any CPU time until the condition variable is signaled.
So from here I'd answer to the questions as following:
The waiting thread won't be scheduled back before the wait was signaled or canceled.
There are no periodic mutex acquisitions. The mutex is reacquired only once before wait returns.
The time that passes between the signal and the wait return is similar to that of thread scheduling due to mutex release.
Regarding the resources, on the same man page:
In the LinuxThreads implementation, no resources are associated with condition variables, thus pthread_cond_destroy actually does nothing except checking that the condition has no waiting threads.
Update: I dug into the sources of pthread_cond_* functions and the behavior is as follows:
All the pthread conditionals in Linux are implemented using futex.
When a thread calls wait it is suspended and unscheduled. The thread id is inserted at the tail of a list of waiting threads.
When a thread calls signal the thread at the head of the list is scheduled back.
So, the waking is as efficient as the scheduler, no OS resources are consumed and the only memory overhead is the size of the waiting list (see futex_wake function).

You should only call pthread_cond_wait if the variable is already in the "wrong" state. Since it always waits, there is always the overhead associated with putting the current thread to sleep and switching.
When the thread is unscheduled, it is unscheduled. It should not use any resources, but of course an OS can in theory be implemented badly. It is allowed to re-acquire the mutex, and even to return, before the signal (which is why you must double-check the condition), but the OS will be implemented so this doesn't impact performance much, if it happens at all. It doesn't happen spontaneously, but rather in response to another, possibly-unrelated signal.
30000 mutexes shouldn't be a problem, but some OSes might have a problem with 30000 sleeping threads.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js