std::mutex::lock blocking CPU usage

std::mutex::lock blocking CPU usage - c++

I want to be able to freeze and unfreeze a thread at will.
My current solution is done through callbacks and busy waiting with sleep.
This is obviously not an optimal solution.
I'm considering having the main thread lock a mutex, then have the slave thread run a function that locks and unlocks the same mutex.
My worry is the possible CPU usage if it's a true busy wait.
My question, as such, is: how does STL in C++11 specify "blocking", and if it is a busy wait, are there less CPU intensive solutions (e.g. pthreads)?

While mutexes can be used, it is not an optimal solution because mutexes should be used for resource protection (see also this q/a).
Usually, std::condition_variable instances are what you should be looking for.
It works as follows:
Create an instance of std::condition_variable and distribute it to your controlling thread and your controlled thread
In the controlled thread, create a std::unique_lock. Pass it to one of the condition variable's wait methods
In the controlling thread, invoke one of the notify methods on the condition variable.
Hope that helps.

Have a look at this answer: Multithreading, when to yield versus sleep. Locking a mutex (in the manner you've described), is a reasonable solution to your problem.
Here's an MSDN article that's worth a read. Quote:
Until threads that are suspended or blocked become ready to run, the
scheduler does not allocate any processor time to them, regardless of
their priority.
If a thread isn't being scheduled it's not being run.

Related

Do threads sleep when waiting on a locked mutex?

Do threads blocked by a std::mutex::lock() or a condition variable sleep in a way that frees the core for other processes, or am I required to manually put these threads to sleep? And if true, would std::mutex::try_lock() allow for a way to spin the thread without sleeping?
The reason I ask: I want to have three states for threads in my thread pool that are unused: spinning for 2 milliseconds, then locked by a mutex for 250-ish milliseconds (assuming this lets them sleep and unhog the core), then finally being deallocated.
I want to avoid calling sleep manually if I can help it, tuning the sleep duration would be hard. So can I safely leave that to the mutex?

That is implementation specific; the C++ standard does not speak to it directly.
In practice, mutexes may use a combination of spin lock and full sleep. Sleeping and waking up is relatively expensive, and a compiler may write the locks to spin for a few ms before putting the thread to sleep.
No C++ implementation on a major phone, PC or big iron is going to spin lock indefinitely however. I could imagine some embedded system doing so, but have not personally encountered one.

Yes. Such blocked threads sleep and don't take up any CPU cycles.

How expensive is a blocked mutex?

Say I have a mutex and thread 1 locked the mutex. Now, thread 2 tries to acquire the lock but it is blocked, say for a couple of seconds. How expensive is this blocked thread? Can the executing hardware thread be rescheduled to do something computationally more expensive? If yes, then who checks if the mutex gets unlocked?
EDIT: Ok so I try to reformulate what I wanted to ask.
What I dont really understand is how the following works. thread 2 got blocked, so what exactly does thread 2 do? From the answer it seems like it is not just constantly checking if the mutex gets unlocked. If this were the case, I would consider a blocked thread expensive, as I am using one of my hardware threads just for checking if some boolean value changes.
So am I correct in thinking that when the mutex gets released by thread 1, thread 1 notifies the sheduler and the shedular assigns a hardware thread to execute thread 2 which is waiting?

I am reading your questions as:
How expensive is a locked mutex?
Mutex can be considered as an integer in memory.
A thread trying to lock on a mutex has to read the existing state of the mutex and can set it depending on the value read.
test_and_set( &mutex_value, 0, 1 ); // if mutex_value is 0, set to 1
The trick is that both the read and write (also called test-and-set) operation should be atomic. The atomicity is achieved with CPU support.
However, the test-and-set operation doesn't offer any mechanism to block/wait.
CPU has no knowledge of threads blocking on a mutex. The OS takes the responsibility to manage the blocking by providing system calls to users. The implementation varies from OS to OS. In case of Linux, you can consider futex or pthreads as an example.
The overall costs of using a mutex sums up to the test-and-set operation and the system calls used to implement the mutex.
The test-and set operation is almost constant and is insignificant compared to the cost the other operation can amount to.
If there a multiple threads trying to acquire the lock, the cost of
mutex can be accredited to the following:
1. Kernel scheduling overhead cost
2. Context switch overhead cost
Kernel scheduling overhead
What happens to other threads, if one thread has already acquired lock on a mutex?
The other threads will continue. If any other thread(s) attempting to lock a mutex that is already locked, OS will (re)schedule the other thread(s) to wait. As soon as the original thread unlocks the mutex, kernel will wake up one of the threads waiting on the mutex.
Context switch overhead
User space code should be designed in such a manner that a thread should spend a very less time trying to lock on a mutex. If you have multiple thread trying to acquire lock on a mutex at multiple places, it may result in a disaster and the performance may be as poor as a single thread serving all requests.
Can the executing hardware thread be resheduled to do something
computationally more expensive?
If I am getting your question correctly, the thread which has acquired the lock can be context switched, depending on the scheduling mechanism. But, that is an overhead of multi-threaded programming, itself.
Can you provide a use case, to define this problem clearly?
who checks if the mutex gets unlocked?
Definitely the OS scheduler. Note, it is not just a blind sleep().

Threads are just a logical OS concept. There are no "hardware threads". Hardware has cores. The OS schedules a core to run a thread for a certain amount of time. If a thread gets blocked, there are always plenty left to run.
Taking your example, with mutexes, if thread 2 is blocked, the OS takes it off schedule and puts it in a queue associated with the mutex. When thread 1 releases the lock, it notifies the scheduler, which takes thread 2 off the queue and puts it back on the schedule. A blocked thread isn't using compute resources. However, there is overhead involved in the actual lock/unlock operation, which is an OS scheduling call.
That overhead is not insignificant, so you would generally use mutexes if you have longer tasks (reasonably longer than a scheduling time slice) and not too much lock competition.
So if a lock goes out of scope, there is some code in the destructor that tells the OS that the locked mutex is now unlocked?
Blockquote
If a std::mutex goes out of scope while locked, that is undefined behavior. (https://en.cppreference.com/w/cpp/thread/mutex/~mutex) Even with non-std mutex implementations, it's reasonable to expect one to unlock before going out of scope.
Keep in mind that there are other kinds of "lock" (like spinlock...which itself has many versions) but we're only talking about mutexes here.

Bottleneck in Threads C++

So I am just trying to verify my understanding and hope that you guys will be able to clear up any misunderstandings. So essentially I have two threads which use the same lock and perform calculations when they hold the lock but the interesting thing is that within the lock I will cause the thread to sleep for a short time. For both threads, this sleep time will be slightly different for either thread. Because of the way locks work, wont the faster thread be bottlenecked by the slower thread as it will have to wait for it to complete?
For example:
Thread1() {
lock();
usleep(10)
lock();
}
-
Thread2() {
lock();
sleep(100)
lock();
}
Now because Thread2 holds onto the lock longer, this will cause a bottleneck. And just to be sure, this system should have a back and forth happens on who gets the lock, right?
It should be:
Thread1 gets lock
Thread1 releases lock
Thread2 gets lock
Thread2 releases lock
Thread1 gets lock
Thread1 releases lock
Thread2 gets lock
Thread2 releases lock
and so on, right? Thread1 should never be able to acquire the lock right after it releases it, can it?

Thread1 should never be able to acquire the lock right after it releases it, can it?
No, Thread1 could reacquire the lock, right after it releases it, because Thread2 could still be suspended (sleeps because of the scheduler)
Also sleep only guarantees that the thread will sleep at least the wanted amount, it can and will often be more.
In practice you would not hold a lock while calculating a value, you would get the lock, get the needed values for calculation, unlock, calculate it, and then get the lock again, check if the old values for the calculation are still valid/wanted, and then store/return your calculated results.
For this purpose, the std::future and atomic data types were invented.
...this system should have a back and forth happens on who gets the lock, right?
Mostly The most of the time it will be a back and forth but some times there could/will be two lock/unlock cycles by Thread1. It depends on your scheduler and any execution and cycle will probably vary.

Absolutely nothing prevents either thread from immediately reacquiring the lock after releasing it. I have no idea what you think prevents this from happening, but nothing does.
In fact, in many implementations, a thread that is already running has an advantage in acquiring a lock over threads that have to be made ready-to-run. This is a sensible optimization to minimize context switches.
If you're using a sleep as a way to simulate work and think this represents some real world issue with lock fairness, you are wrong. Threads that sleep are voluntarily yielding the remainder of their timeslice and are treated very differently from threads that exhaust their timeslice doing work. If these threads were actually doing work, eventually one thread would exhaust its timeslice.

Depending on what you are trying to achieve there are several possibilities.
If you want your threads to run in a particular order then have a look here.
There are basically 2 options:
- one is to use events where a thread is signaling the next one it has done his job and so the next one could start.
- the other one is to have a scheduler thread that handle the ordering with events or semaphores.
If you want your threads to run independently but have a lock mechanism where the order of attempting to get the lock is preserved you can have a look here. The last part of the answer uses a queue of one condition variable per thread seem good.
And as it was said in previous answers and comments, using sleep for scheduling is a bad idea.
Also lock is just a mutual exclusion mechanism and has no guarentee on the execution order.
A lock is usually intended for preventing concurrent access on a critical resource so it should just do that. The smaller the critical section is, the better.
Finally yes trying to order threads is making "bottlenecks". In this particular case if all calculations are made in the locked sections the threads won't do anything in parallel so you can question the utility of using threads.
Edit :
Just on more warning: be careful, with threads it's not because is worked (was scheduled as you wanted to) 10 times on your machine that it always will, especially if you change any of the context (machine, workload...). You have to be sure of it by design.

How to prevent threads from starvation in C++11

I am just wondering if there is any locking policy in C++11 which would prevent threads from starvation.
I have a bunch of threads which are competing for one mutex. Now, my problem is that the thread which is leaving a critical section starts immediately compete for the same mutex and most of the time wins. Therefore other threads waiting on the mutex are starving.
I do not want to let the thread, leaving a critical section, sleep for some minimal amount of time to give other threads a chance to lock the mutex.
I thought that there must be some parameter which would enable fair locking for threads waiting on the mutex but I wasn't able to find any appropriate solution.
Well I found std::this_thread::yield() function, which suppose to reschedule the order of threads execution, but it is only hint to scheduler thread and depends on scheduler thread implementation if it reschedule the threads or not.
Is there any way how to provide fair locking policy for the threads waiting on the same mutex in C++11?
What are the usual strategies?
Thanks

This is a common optimization in mutexes designed to avoid wasting time switching tasks when the same thread can take the mutex again. If the current thread still has time left in its time slice then you get more throughput in terms of user-instructions-executed-per-second by letting it take the mutex rather than suspending it, and switching to another thread (which likely causes a big reload of cache lines and various other delays).
If you have so much contention on a mutex that this is a problem then your application design is wrong. You have all these threads blocked on a mutex, and therefore not doing anything: you are probably better off without so many threads.
You should design your application so that if multiple threads compete for a mutex then it doesn't matter which thread gets the lock. Direct contention should also be a rare thing, especially direct contention with lots of threads.
The only situation where I can think this is an OK scenario is where every thread is waiting on a condition variable, which is then broadcast to wake them all. Every thread will then contend for the mutex, but if you are doing this right then they should all do a quick check that this isn't a spurious wake and then release the mutex. Even then, this is called a "thundering herd" situation, and is not ideal, precisely because it serializes all these threads.

Conditional wait overhead

When using boost::conditional_variable, ACE_Conditional or directly pthread_cond_wait, is there any overhead for the waiting itself? These are more specific issues that trouble be:
After the waiting thread is unscheduled, will it be scheduled back before the wait expires and then unscheduled again or it will stay unscheduled until signaled?
Does wait acquires periodically the mutex? In this case, I guess it wastes each iteration some CPU time on system calls to lock and release the mutex. Is it the same as constantly acquiring and releasing a mutex?
Also, then, how much time passes between the signal and the return from wait?
Afaik, when using semaphores the acquire calls responsiveness is dependent on scheduler time slice size. How does it work in pthread_cond_wait? I assume this is platform dependent. I am more interested in Linux but if someone knows how it works on other platforms, it will help too.
And one more question: are there any additional system resources allocated for each conditional? I won't create 30000 mutexes in my code, but should I worry about 30000 conditionals that use the same one mutex?

Here's what is written in the pthread_cond man page:
pthread_cond_wait atomically unlocks the mutex and waits for the condition variable cond to be signaled. The thread execution is suspended and does not consume any CPU time until the condition variable is signaled.
So from here I'd answer to the questions as following:
The waiting thread won't be scheduled back before the wait was signaled or canceled.
There are no periodic mutex acquisitions. The mutex is reacquired only once before wait returns.
The time that passes between the signal and the wait return is similar to that of thread scheduling due to mutex release.
Regarding the resources, on the same man page:
In the LinuxThreads implementation, no resources are associated with condition variables, thus pthread_cond_destroy actually does nothing except checking that the condition has no waiting threads.
Update: I dug into the sources of pthread_cond_* functions and the behavior is as follows:
All the pthread conditionals in Linux are implemented using futex.
When a thread calls wait it is suspended and unscheduled. The thread id is inserted at the tail of a list of waiting threads.
When a thread calls signal the thread at the head of the list is scheduled back.
So, the waking is as efficient as the scheduler, no OS resources are consumed and the only memory overhead is the size of the waiting list (see futex_wake function).

You should only call pthread_cond_wait if the variable is already in the "wrong" state. Since it always waits, there is always the overhead associated with putting the current thread to sleep and switching.
When the thread is unscheduled, it is unscheduled. It should not use any resources, but of course an OS can in theory be implemented badly. It is allowed to re-acquire the mutex, and even to return, before the signal (which is why you must double-check the condition), but the OS will be implemented so this doesn't impact performance much, if it happens at all. It doesn't happen spontaneously, but rather in response to another, possibly-unrelated signal.
30000 mutexes shouldn't be a problem, but some OSes might have a problem with 30000 sleeping threads.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js