lock vs (try_lock, sleep, repeat) performance

lock vs (try_lock, sleep, repeat) performance - c++

I am updating some code and I came across several mutexs that were using the lines:
while (!mutex_.try_lock())
sleep_for(milliseconds(1));
instead of just locking the mutex straight away:
mutex_.lock();
Is there any performance difference positive or negative to using the try lock and sleep approach vs straight locking the mutex or is it just wasted instructions?

lock() will block if the lock is not available, while try_lock() returns straight away even if the lock is not available.
The first form polls the lock until it is available, which has a couple of disadvantages:
Depending on the delay in the loop, there can be unnecessary latency. The lock might become available just after the try_lock attempt, but your process still waits for the preset delay.
It is a waste of CPU cycles polling the lock
In general you should only use try_lock if there is something useful that the thread can do while it is waiting for the lock, in which case you would not want the thread to block waiting for the lock.
Another use case for try_lock would be to attempt locking with a timeout, so that if the lock did not become available within a certain time the program could abort.
cppreference says that:
lock() is usually not called directly: std::unique_lock and std::lock_guard are used to manage exclusive locking.
Using these classes may be more appropriate depending on the circumstances.

Related

How do I lock multiple mutexes without deadlock and without potential infinite blocking?

I'm trying to lock multiple mutexes without a deadlock while also making sure no threads block forever. From what I understand, std::scoped_lock() calls std::lock() to acquire its locks--which is blocking. I'm worried about the poor, unlucky thread that never catches a break and always fails to acquire all of the locks when multiple locks are needed amidst a lot of concurrent access.
I get that critical sections should be as small and fast as possible; one needs to get in and out as quickly as possible to avoid long blocking times. But surely even locking well-written code and critical sections can stress the mutex resources given enough concurrent requests.
I looked at using std::timed_mutex with std::scoped_lock(), but std::timed_mutex::lock() blocks and doesn't make use of the mutex's timed capability (would require std::timed_mutex::try_lock_for() or std::timed_mutex::try_lock_until(), which as far as I know isn't used by std::scoped_lock()).
So is there a way to lock multiple mutexes and guarantee that no thread will block forever while waiting to lock the mutexes (even if that means it doesn't acquire any locks)? Is there a non-blocking version of std::scoped_lock()? Can you break a blocked std::scoped_lock() call? Should I expect the scenario of a thread blocking forever to be so rare that it's an acceptable risk? Am I thinking too hard?

How expensive is a blocked mutex?

Say I have a mutex and thread 1 locked the mutex. Now, thread 2 tries to acquire the lock but it is blocked, say for a couple of seconds. How expensive is this blocked thread? Can the executing hardware thread be rescheduled to do something computationally more expensive? If yes, then who checks if the mutex gets unlocked?
EDIT: Ok so I try to reformulate what I wanted to ask.
What I dont really understand is how the following works. thread 2 got blocked, so what exactly does thread 2 do? From the answer it seems like it is not just constantly checking if the mutex gets unlocked. If this were the case, I would consider a blocked thread expensive, as I am using one of my hardware threads just for checking if some boolean value changes.
So am I correct in thinking that when the mutex gets released by thread 1, thread 1 notifies the sheduler and the shedular assigns a hardware thread to execute thread 2 which is waiting?

I am reading your questions as:
How expensive is a locked mutex?
Mutex can be considered as an integer in memory.
A thread trying to lock on a mutex has to read the existing state of the mutex and can set it depending on the value read.
test_and_set( &mutex_value, 0, 1 ); // if mutex_value is 0, set to 1
The trick is that both the read and write (also called test-and-set) operation should be atomic. The atomicity is achieved with CPU support.
However, the test-and-set operation doesn't offer any mechanism to block/wait.
CPU has no knowledge of threads blocking on a mutex. The OS takes the responsibility to manage the blocking by providing system calls to users. The implementation varies from OS to OS. In case of Linux, you can consider futex or pthreads as an example.
The overall costs of using a mutex sums up to the test-and-set operation and the system calls used to implement the mutex.
The test-and set operation is almost constant and is insignificant compared to the cost the other operation can amount to.
If there a multiple threads trying to acquire the lock, the cost of
mutex can be accredited to the following:
1. Kernel scheduling overhead cost
2. Context switch overhead cost
Kernel scheduling overhead
What happens to other threads, if one thread has already acquired lock on a mutex?
The other threads will continue. If any other thread(s) attempting to lock a mutex that is already locked, OS will (re)schedule the other thread(s) to wait. As soon as the original thread unlocks the mutex, kernel will wake up one of the threads waiting on the mutex.
Context switch overhead
User space code should be designed in such a manner that a thread should spend a very less time trying to lock on a mutex. If you have multiple thread trying to acquire lock on a mutex at multiple places, it may result in a disaster and the performance may be as poor as a single thread serving all requests.
Can the executing hardware thread be resheduled to do something
computationally more expensive?
If I am getting your question correctly, the thread which has acquired the lock can be context switched, depending on the scheduling mechanism. But, that is an overhead of multi-threaded programming, itself.
Can you provide a use case, to define this problem clearly?
who checks if the mutex gets unlocked?
Definitely the OS scheduler. Note, it is not just a blind sleep().

Threads are just a logical OS concept. There are no "hardware threads". Hardware has cores. The OS schedules a core to run a thread for a certain amount of time. If a thread gets blocked, there are always plenty left to run.
Taking your example, with mutexes, if thread 2 is blocked, the OS takes it off schedule and puts it in a queue associated with the mutex. When thread 1 releases the lock, it notifies the scheduler, which takes thread 2 off the queue and puts it back on the schedule. A blocked thread isn't using compute resources. However, there is overhead involved in the actual lock/unlock operation, which is an OS scheduling call.
That overhead is not insignificant, so you would generally use mutexes if you have longer tasks (reasonably longer than a scheduling time slice) and not too much lock competition.
So if a lock goes out of scope, there is some code in the destructor that tells the OS that the locked mutex is now unlocked?
Blockquote
If a std::mutex goes out of scope while locked, that is undefined behavior. (https://en.cppreference.com/w/cpp/thread/mutex/~mutex) Even with non-std mutex implementations, it's reasonable to expect one to unlock before going out of scope.
Keep in mind that there are other kinds of "lock" (like spinlock...which itself has many versions) but we're only talking about mutexes here.

Bottleneck in Threads C++

So I am just trying to verify my understanding and hope that you guys will be able to clear up any misunderstandings. So essentially I have two threads which use the same lock and perform calculations when they hold the lock but the interesting thing is that within the lock I will cause the thread to sleep for a short time. For both threads, this sleep time will be slightly different for either thread. Because of the way locks work, wont the faster thread be bottlenecked by the slower thread as it will have to wait for it to complete?
For example:
Thread1() {
lock();
usleep(10)
lock();
}
-
Thread2() {
lock();
sleep(100)
lock();
}
Now because Thread2 holds onto the lock longer, this will cause a bottleneck. And just to be sure, this system should have a back and forth happens on who gets the lock, right?
It should be:
Thread1 gets lock
Thread1 releases lock
Thread2 gets lock
Thread2 releases lock
Thread1 gets lock
Thread1 releases lock
Thread2 gets lock
Thread2 releases lock
and so on, right? Thread1 should never be able to acquire the lock right after it releases it, can it?

Thread1 should never be able to acquire the lock right after it releases it, can it?
No, Thread1 could reacquire the lock, right after it releases it, because Thread2 could still be suspended (sleeps because of the scheduler)
Also sleep only guarantees that the thread will sleep at least the wanted amount, it can and will often be more.
In practice you would not hold a lock while calculating a value, you would get the lock, get the needed values for calculation, unlock, calculate it, and then get the lock again, check if the old values for the calculation are still valid/wanted, and then store/return your calculated results.
For this purpose, the std::future and atomic data types were invented.
...this system should have a back and forth happens on who gets the lock, right?
Mostly The most of the time it will be a back and forth but some times there could/will be two lock/unlock cycles by Thread1. It depends on your scheduler and any execution and cycle will probably vary.

Absolutely nothing prevents either thread from immediately reacquiring the lock after releasing it. I have no idea what you think prevents this from happening, but nothing does.
In fact, in many implementations, a thread that is already running has an advantage in acquiring a lock over threads that have to be made ready-to-run. This is a sensible optimization to minimize context switches.
If you're using a sleep as a way to simulate work and think this represents some real world issue with lock fairness, you are wrong. Threads that sleep are voluntarily yielding the remainder of their timeslice and are treated very differently from threads that exhaust their timeslice doing work. If these threads were actually doing work, eventually one thread would exhaust its timeslice.

Depending on what you are trying to achieve there are several possibilities.
If you want your threads to run in a particular order then have a look here.
There are basically 2 options:
- one is to use events where a thread is signaling the next one it has done his job and so the next one could start.
- the other one is to have a scheduler thread that handle the ordering with events or semaphores.
If you want your threads to run independently but have a lock mechanism where the order of attempting to get the lock is preserved you can have a look here. The last part of the answer uses a queue of one condition variable per thread seem good.
And as it was said in previous answers and comments, using sleep for scheduling is a bad idea.
Also lock is just a mutual exclusion mechanism and has no guarentee on the execution order.
A lock is usually intended for preventing concurrent access on a critical resource so it should just do that. The smaller the critical section is, the better.
Finally yes trying to order threads is making "bottlenecks". In this particular case if all calculations are made in the locked sections the threads won't do anything in parallel so you can question the utility of using threads.
Edit :
Just on more warning: be careful, with threads it's not because is worked (was scheduled as you wanted to) 10 times on your machine that it always will, especially if you change any of the context (machine, workload...). You have to be sure of it by design.

std::mutex::lock blocking CPU usage

I want to be able to freeze and unfreeze a thread at will.
My current solution is done through callbacks and busy waiting with sleep.
This is obviously not an optimal solution.
I'm considering having the main thread lock a mutex, then have the slave thread run a function that locks and unlocks the same mutex.
My worry is the possible CPU usage if it's a true busy wait.
My question, as such, is: how does STL in C++11 specify "blocking", and if it is a busy wait, are there less CPU intensive solutions (e.g. pthreads)?

While mutexes can be used, it is not an optimal solution because mutexes should be used for resource protection (see also this q/a).
Usually, std::condition_variable instances are what you should be looking for.
It works as follows:
Create an instance of std::condition_variable and distribute it to your controlling thread and your controlled thread
In the controlled thread, create a std::unique_lock. Pass it to one of the condition variable's wait methods
In the controlling thread, invoke one of the notify methods on the condition variable.
Hope that helps.

Have a look at this answer: Multithreading, when to yield versus sleep. Locking a mutex (in the manner you've described), is a reasonable solution to your problem.
Here's an MSDN article that's worth a read. Quote:
Until threads that are suspended or blocked become ready to run, the
scheduler does not allocate any processor time to them, regardless of
their priority.
If a thread isn't being scheduled it's not being run.

How to prevent threads from starvation in C++11

I am just wondering if there is any locking policy in C++11 which would prevent threads from starvation.
I have a bunch of threads which are competing for one mutex. Now, my problem is that the thread which is leaving a critical section starts immediately compete for the same mutex and most of the time wins. Therefore other threads waiting on the mutex are starving.
I do not want to let the thread, leaving a critical section, sleep for some minimal amount of time to give other threads a chance to lock the mutex.
I thought that there must be some parameter which would enable fair locking for threads waiting on the mutex but I wasn't able to find any appropriate solution.
Well I found std::this_thread::yield() function, which suppose to reschedule the order of threads execution, but it is only hint to scheduler thread and depends on scheduler thread implementation if it reschedule the threads or not.
Is there any way how to provide fair locking policy for the threads waiting on the same mutex in C++11?
What are the usual strategies?
Thanks

This is a common optimization in mutexes designed to avoid wasting time switching tasks when the same thread can take the mutex again. If the current thread still has time left in its time slice then you get more throughput in terms of user-instructions-executed-per-second by letting it take the mutex rather than suspending it, and switching to another thread (which likely causes a big reload of cache lines and various other delays).
If you have so much contention on a mutex that this is a problem then your application design is wrong. You have all these threads blocked on a mutex, and therefore not doing anything: you are probably better off without so many threads.
You should design your application so that if multiple threads compete for a mutex then it doesn't matter which thread gets the lock. Direct contention should also be a rare thing, especially direct contention with lots of threads.
The only situation where I can think this is an OK scenario is where every thread is waiting on a condition variable, which is then broadcast to wake them all. Every thread will then contend for the mutex, but if you are doing this right then they should all do a quick check that this isn't a spurious wake and then release the mutex. Even then, this is called a "thundering herd" situation, and is not ideal, precisely because it serializes all these threads.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js