boost interprocess condition blocking on notify_all - c++

I have a managed shared memory segment which has a boost::interprocess::interprocess_mutex and a boost::interprocess::interprocess_condition variable. I have 2 processes accessing the shared memory and they are synchronizing access based on the mutex and condition. I have come across a case where my first process blocks on notify_all method, initially I thought this was a non blocking method but it seems the interprocess condition implements a mutex which is used to synchronize itself.
The case where I get this deadlock is when process 2 is killed ungracefully while it is waiting on the condition, this I believe prevents the conditions mutex from being unlocked and then when I run process 2 again it blocks. Is there anyway to reset or clean up the interprocess condition the second time I start process 2.

http://www.boost.org/doc/libs/1_48_0/boost/interprocess/sync/interprocess_mutex.hpp
Are you using timed lock?
For a simple dead-lock avoidance algorithm take a look here: Wikipedia
It's for threads but I believe it can be used with interprocess_locks.
Recursively, only one thread is allowed to pass through a lock.
If other threads enter the lock, they must wait until the initial thread
that passed through completes n number of times. But if the number of
threads that enter locking equal the number that are locked, assign
one thread as the super-thread, and only allow it to run (tracking the
number of times it enters/exits locking) until it completes.
After a super-thread is finished, the condition changes back to using the logic from the recursive lock, and the exiting super-thread
sets itself as not being a super-thread
notifies the locker that other locked, waiting threads need to re-check this condition
If a deadlock scenario exists, set a new super-thread and follow that logic. Otherwise, resume regular locking.
Note that the above algorithm doesn't solve livelock situations, to prevent such behaviour use a semaphore if possible.
I was stunned to notice that interprocess_mutex doesn't support implement deadlock avoidance algorithms since these days, most mutex i.e std::mutex and boost::mutex already do.
I guess it's OS specific limitations.
For more flexibility try using a named_upgradable_mutex
Use a timed lock catch the exception when the process crashes and remove the upgradable mutex!. This type also allows elevated priviledges to be obtained by either thread!

Related

How expensive is a blocked mutex?

Say I have a mutex and thread 1 locked the mutex. Now, thread 2 tries to acquire the lock but it is blocked, say for a couple of seconds. How expensive is this blocked thread? Can the executing hardware thread be rescheduled to do something computationally more expensive? If yes, then who checks if the mutex gets unlocked?
EDIT: Ok so I try to reformulate what I wanted to ask.
What I dont really understand is how the following works. thread 2 got blocked, so what exactly does thread 2 do? From the answer it seems like it is not just constantly checking if the mutex gets unlocked. If this were the case, I would consider a blocked thread expensive, as I am using one of my hardware threads just for checking if some boolean value changes.
So am I correct in thinking that when the mutex gets released by thread 1, thread 1 notifies the sheduler and the shedular assigns a hardware thread to execute thread 2 which is waiting?
I am reading your questions as:
How expensive is a locked mutex?
Mutex can be considered as an integer in memory.
A thread trying to lock on a mutex has to read the existing state of the mutex and can set it depending on the value read.
test_and_set( &mutex_value, 0, 1 ); // if mutex_value is 0, set to 1
The trick is that both the read and write (also called test-and-set) operation should be atomic. The atomicity is achieved with CPU support.
However, the test-and-set operation doesn't offer any mechanism to block/wait.
CPU has no knowledge of threads blocking on a mutex. The OS takes the responsibility to manage the blocking by providing system calls to users. The implementation varies from OS to OS. In case of Linux, you can consider futex or pthreads as an example.
The overall costs of using a mutex sums up to the test-and-set operation and the system calls used to implement the mutex.
The test-and set operation is almost constant and is insignificant compared to the cost the other operation can amount to.
If there a multiple threads trying to acquire the lock, the cost of
mutex can be accredited to the following:
1. Kernel scheduling overhead cost
2. Context switch overhead cost
Kernel scheduling overhead
What happens to other threads, if one thread has already acquired lock on a mutex?
The other threads will continue. If any other thread(s) attempting to lock a mutex that is already locked, OS will (re)schedule the other thread(s) to wait. As soon as the original thread unlocks the mutex, kernel will wake up one of the threads waiting on the mutex.
Context switch overhead
User space code should be designed in such a manner that a thread should spend a very less time trying to lock on a mutex. If you have multiple thread trying to acquire lock on a mutex at multiple places, it may result in a disaster and the performance may be as poor as a single thread serving all requests.
Can the executing hardware thread be resheduled to do something
computationally more expensive?
If I am getting your question correctly, the thread which has acquired the lock can be context switched, depending on the scheduling mechanism. But, that is an overhead of multi-threaded programming, itself.
Can you provide a use case, to define this problem clearly?
who checks if the mutex gets unlocked?
Definitely the OS scheduler. Note, it is not just a blind sleep().
Threads are just a logical OS concept. There are no "hardware threads". Hardware has cores. The OS schedules a core to run a thread for a certain amount of time. If a thread gets blocked, there are always plenty left to run.
Taking your example, with mutexes, if thread 2 is blocked, the OS takes it off schedule and puts it in a queue associated with the mutex. When thread 1 releases the lock, it notifies the scheduler, which takes thread 2 off the queue and puts it back on the schedule. A blocked thread isn't using compute resources. However, there is overhead involved in the actual lock/unlock operation, which is an OS scheduling call.
That overhead is not insignificant, so you would generally use mutexes if you have longer tasks (reasonably longer than a scheduling time slice) and not too much lock competition.
So if a lock goes out of scope, there is some code in the destructor that tells the OS that the locked mutex is now unlocked?
Blockquote
If a std::mutex goes out of scope while locked, that is undefined behavior. (https://en.cppreference.com/w/cpp/thread/mutex/~mutex) Even with non-std mutex implementations, it's reasonable to expect one to unlock before going out of scope.
Keep in mind that there are other kinds of "lock" (like spinlock...which itself has many versions) but we're only talking about mutexes here.

Bottleneck in Threads C++

So I am just trying to verify my understanding and hope that you guys will be able to clear up any misunderstandings. So essentially I have two threads which use the same lock and perform calculations when they hold the lock but the interesting thing is that within the lock I will cause the thread to sleep for a short time. For both threads, this sleep time will be slightly different for either thread. Because of the way locks work, wont the faster thread be bottlenecked by the slower thread as it will have to wait for it to complete?
For example:
Thread1() {
lock();
usleep(10)
lock();
}
-
Thread2() {
lock();
sleep(100)
lock();
}
Now because Thread2 holds onto the lock longer, this will cause a bottleneck. And just to be sure, this system should have a back and forth happens on who gets the lock, right?
It should be:
Thread1 gets lock
Thread1 releases lock
Thread2 gets lock
Thread2 releases lock
Thread1 gets lock
Thread1 releases lock
Thread2 gets lock
Thread2 releases lock
and so on, right? Thread1 should never be able to acquire the lock right after it releases it, can it?
Thread1 should never be able to acquire the lock right after it releases it, can it?
No, Thread1 could reacquire the lock, right after it releases it, because Thread2 could still be suspended (sleeps because of the scheduler)
Also sleep only guarantees that the thread will sleep at least the wanted amount, it can and will often be more.
In practice you would not hold a lock while calculating a value, you would get the lock, get the needed values for calculation, unlock, calculate it, and then get the lock again, check if the old values for the calculation are still valid/wanted, and then store/return your calculated results.
For this purpose, the std::future and atomic data types were invented.
...this system should have a back and forth happens on who gets the lock, right?
Mostly The most of the time it will be a back and forth but some times there could/will be two lock/unlock cycles by Thread1. It depends on your scheduler and any execution and cycle will probably vary.
Absolutely nothing prevents either thread from immediately reacquiring the lock after releasing it. I have no idea what you think prevents this from happening, but nothing does.
In fact, in many implementations, a thread that is already running has an advantage in acquiring a lock over threads that have to be made ready-to-run. This is a sensible optimization to minimize context switches.
If you're using a sleep as a way to simulate work and think this represents some real world issue with lock fairness, you are wrong. Threads that sleep are voluntarily yielding the remainder of their timeslice and are treated very differently from threads that exhaust their timeslice doing work. If these threads were actually doing work, eventually one thread would exhaust its timeslice.
Depending on what you are trying to achieve there are several possibilities.
If you want your threads to run in a particular order then have a look here.
There are basically 2 options:
- one is to use events where a thread is signaling the next one it has done his job and so the next one could start.
- the other one is to have a scheduler thread that handle the ordering with events or semaphores.
If you want your threads to run independently but have a lock mechanism where the order of attempting to get the lock is preserved you can have a look here. The last part of the answer uses a queue of one condition variable per thread seem good.
And as it was said in previous answers and comments, using sleep for scheduling is a bad idea.
Also lock is just a mutual exclusion mechanism and has no guarentee on the execution order.
A lock is usually intended for preventing concurrent access on a critical resource so it should just do that. The smaller the critical section is, the better.
Finally yes trying to order threads is making "bottlenecks". In this particular case if all calculations are made in the locked sections the threads won't do anything in parallel so you can question the utility of using threads.
Edit :
Just on more warning: be careful, with threads it's not because is worked (was scheduled as you wanted to) 10 times on your machine that it always will, especially if you change any of the context (machine, workload...). You have to be sure of it by design.

lock vs (try_lock, sleep, repeat) performance

I am updating some code and I came across several mutexs that were using the lines:
while (!mutex_.try_lock())
sleep_for(milliseconds(1));
instead of just locking the mutex straight away:
mutex_.lock();
Is there any performance difference positive or negative to using the try lock and sleep approach vs straight locking the mutex or is it just wasted instructions?
lock() will block if the lock is not available, while try_lock() returns straight away even if the lock is not available.
The first form polls the lock until it is available, which has a couple of disadvantages:
Depending on the delay in the loop, there can be unnecessary latency. The lock might become available just after the try_lock attempt, but your process still waits for the preset delay.
It is a waste of CPU cycles polling the lock
In general you should only use try_lock if there is something useful that the thread can do while it is waiting for the lock, in which case you would not want the thread to block waiting for the lock.
Another use case for try_lock would be to attempt locking with a timeout, so that if the lock did not become available within a certain time the program could abort.
cppreference says that:
lock() is usually not called directly: std::unique_lock and std::lock_guard are used to manage exclusive locking.
Using these classes may be more appropriate depending on the circumstances.

boost: how to monitor status of mutex and force release on deadlock [2]

I am trying to use the shared_lock and unique_lock libraries from boost to implement a basic reader-writer lock on a resource. However, some of the threads accessing the resource have the potential to simply crash. I want to create another process that, given a mutex, monitors the mutex and keep track of what processes locked the resource and how long each process have the lock. The process will also force a process to release its lock if it has the lock for more than a given period of time.
Despite that the boost locks are all scoped locks and will automatically unlock once it's out of the scope, it still doesn't solve my problem if the server crashes, thus sending SIGSEGV to the process and killing it. The killed process will not call any of its destructors and thus will not release any of its held resources.
One potential solution is to somehow put a timer on the lock so that the process is forced to release the lock after a given period of lock. Even though this goes against the concept of locking, it works in our case because we can guarantee that if any process holds the lock for more than, let's say 5 minutes, then it's pretty safe to say that the process is either killed or there is a deadlock situation.
Any suggestions on how to approach this problem is greatly appreciated!
My previous thread was closed due to "possible duplicate", but the stated duplicate question does not answer my question.
boost: how to monitor status of mutex and force release on deadlock
Putting aside whether this is a good idea or not, you could roll your own mutex implementation that utilizes shared memory to store a timestamp, a process identifier, and a thread identifier.
When a thread wants to take a lock it will need to find an empty slot in the shared memory and use an atomic test and set operation, such as InterlockedCompareExchange on Windows, to set the process id if current value is the empty value. If the set doesn't occur it will need to start over. After getting the process id set, the thread will need to repeat the process for the thread identifier, and then do the same thing with the timestamp (it can't just set it though, it still needs to be done atomically).
The thread will then need to check all of the other filled slots to determine if it has the lowest timestamp. If not it needs to make note the slot that has the lowest time stamp, and poll it until it's either emptied, has a higher timestamp, or is has been timed out. Then rinse repeat until the thread has the slot with the oldest time stamp at which point the thread has acquired the lock.
If another slot has been timed out the thread should trigger the timeout handler (which may kill the other process or simply raise an exception in the thread with the lock) and then use atomic test and set operations to clear the slot.
When the thread with the lock unlocks it then uses atomic test and set operations to clear its slot.
Update: also ties among the lowest timestamps would need to be dealt with to avoid a possible deadlock and the handling of that would need to avoid creating a race condition.
#Arno: I disagree that the software needs to be so robust that it should not crash in the first place. Fault-tolerence systems (think on the lines of 5 nines of availability), need to have checks in place recover in face of sudden termination of critical processes. Something on the lines of pthread_mutexattr_*robust
Saving the owner pid, the last used timestamp for the mutex should help in recovery.

Conditional wait overhead

When using boost::conditional_variable, ACE_Conditional or directly pthread_cond_wait, is there any overhead for the waiting itself? These are more specific issues that trouble be:
After the waiting thread is unscheduled, will it be scheduled back before the wait expires and then unscheduled again or it will stay unscheduled until signaled?
Does wait acquires periodically the mutex? In this case, I guess it wastes each iteration some CPU time on system calls to lock and release the mutex. Is it the same as constantly acquiring and releasing a mutex?
Also, then, how much time passes between the signal and the return from wait?
Afaik, when using semaphores the acquire calls responsiveness is dependent on scheduler time slice size. How does it work in pthread_cond_wait? I assume this is platform dependent. I am more interested in Linux but if someone knows how it works on other platforms, it will help too.
And one more question: are there any additional system resources allocated for each conditional? I won't create 30000 mutexes in my code, but should I worry about 30000 conditionals that use the same one mutex?
Here's what is written in the pthread_cond man page:
pthread_cond_wait atomically unlocks the mutex and waits for the condition variable cond to be signaled. The thread execution is suspended and does not consume any CPU time until the condition variable is signaled.
So from here I'd answer to the questions as following:
The waiting thread won't be scheduled back before the wait was signaled or canceled.
There are no periodic mutex acquisitions. The mutex is reacquired only once before wait returns.
The time that passes between the signal and the wait return is similar to that of thread scheduling due to mutex release.
Regarding the resources, on the same man page:
In the LinuxThreads implementation, no resources are associated with condition variables, thus pthread_cond_destroy actually does nothing except checking that the condition has no waiting threads.
Update: I dug into the sources of pthread_cond_* functions and the behavior is as follows:
All the pthread conditionals in Linux are implemented using futex.
When a thread calls wait it is suspended and unscheduled. The thread id is inserted at the tail of a list of waiting threads.
When a thread calls signal the thread at the head of the list is scheduled back.
So, the waking is as efficient as the scheduler, no OS resources are consumed and the only memory overhead is the size of the waiting list (see futex_wake function).
You should only call pthread_cond_wait if the variable is already in the "wrong" state. Since it always waits, there is always the overhead associated with putting the current thread to sleep and switching.
When the thread is unscheduled, it is unscheduled. It should not use any resources, but of course an OS can in theory be implemented badly. It is allowed to re-acquire the mutex, and even to return, before the signal (which is why you must double-check the condition), but the OS will be implemented so this doesn't impact performance much, if it happens at all. It doesn't happen spontaneously, but rather in response to another, possibly-unrelated signal.
30000 mutexes shouldn't be a problem, but some OSes might have a problem with 30000 sleeping threads.