Someone please help me solve deadlock issues in c++ if possible with reference or examples.
Scenario would be like below.
Thread1 is locked by mutex and doing some operation, thread2 and thread3 are in waiting state for thread1 to unlock to access the resource.
Some abort/unexpected thing happened -- thread1 was terminated and didn't get the unlock, thread2 and thread3 are still waiting.
How to save the main thread (mean nothing should happen to main thread) in such situations.
Please throw some light how to solve such issues in c++.
Thanks,
Sheik
Some abort/unexpected thing happened
Use s.th. like std::lock_guard to prevent 'hanging' locks due to exceptions or forgotten/unexpected, but necessary unlock() operations.
The principle is pretty simple and you can easily implement it for any mechanism that uses a pair of methods that correspond together in a 'lock/unlock' manner:
class LockObject // E.g. mutex or alike
{
public:
// ...
void lock();
void unlock();
};
Bind the guard classes constructor to a reference to the lock object's instance and call lock() in the constructor and unlock() in the destructor:
template<typename T>
class LockGuard
{
public:
LockGuard(T& lockObject)
: lockObject_(lockObject)
{
lockObject_.lock();
}
~LockGuard()
{
lockObject_.unlock();
}
private:
T& lockObject_;
};
Use LockGuard like this:
// Some scope providing 'LockObject lockObject'
{ LockGuard<LockObject> lock(lockObject)
// Do s.th. when lockObject is locked
} // Call of lockObject.unlock() is guaranteed at least here, no matter what
// (exception, goto, break, etc.) caused leaving the block's scope.
Generally threads should not terminate unexpectedly. You may try using try/catch blocks. If you still want to free resources when a thread terminates unexpectedly, you may create a monitor thread that waits for the termination of the first thread.
On Windows, you can use something as ::WaitForSingleObject(m_htThread, INFINITE).
Once the 1st thread had been terminated, you may proceed with freeing abandoned locks.
Maybe you'll want to add some flag which indicates if the termination was graceful.
You'll probably also have to remember which thread is locking which object.
As said, I wouldn't recommend using such method, but on extreme cases.
The way to solve deadlocks in any language or platform is always the same.
Always acquire the locks in the same order.
EDIT: However you have misdescribed your problem. This is not a deadlock. A deadlock is a circular chain of locks. This is simply an unreleased lock, i.e. a lock leak. The solution is the same as any other resource leak: don't. In C++ that means releasing resources in destructors, and ensuring that destructors are called. Somehow your thread has terminated without doing that. Find that problem and fix it.
Related
I've been trying to learn how to multithread and came up with the following understanding. I was wondering if I'm correct or far off and, if I'm incorrect in any way, if someone could give me advice.
To create a thread, first you need to utilize a library such as <thread> or any alternative (I'm using boost's multithreading library to get cross-platform capabilities). Afterwards, you can create a thread by declaring it as such (for std::thread)
std::thread thread (foo);
Now, you can use thread.join() or thread.detach(). The former will wait until the thread finishes, and then continue; while, the latter will run the thread alongside whatever you plan to do.
If you want to protect something, say a vector std::vector<double> data, from threads accessing simultaneously, you would use a mutex.
Mutex's would be declared as a global variable so that they may access the thread functions (OR, if you're making a class that will be multithreaded, the mutex can be declared as a private/public variable of the class). Afterwards, you can lock and unlock a thread using a mutex.
Let's take a quick look at this example pseudo code:
std::mutex mtx;
std::vector<double> data;
void threadFunction(){
// Do stuff
// ...
// Want to access a global variable
mtx.lock();
data.push_back(3.23);
mtx.unlock();
// Continue
}
In this code, when the mutex locks down on the thread, it only locks the lines of code between it and mtx.unlock(). Thus, other threads will still continue on their merry way until they try accessing data (Note, we would likely through a mutex in the other threads as well). Then they would stop, wait to use data, lock it, push_back, unlock it and continue. Check here for a good description of mutex's.
That's about it on my understanding of multithreading. So, am I horribly wrong or accurate?
Your comments refer to "locking the whole thread". You can't lock part of a thread.
When you lock a mutex, the current thread takes ownership of the mutex. Conceptually, you can think of it as the thread places its mark on the mutex (stores its threadid in the mutex data structure). If any other thread comes along and attempts to acquire the same mutex instance, it sees that the mutex is already "claimed" by somebody else and it waits until the first thread has released the mutex. When the owning thread later releases the mutex, one of the threads that is waiting for the mutex can wake up, acquire the mutex for themselves, and carry on.
In your code example, there is a potential risk that the mutex might not be released once it is acquired. If the call to data.push_back(xxx) throws an exception (out of memory?), then execution will never reach mtx.unlock() and the mutex will remain locked forever. All subsequent threads that attempt to acquire that mutex will drop into a permanent wait state. They'll never wake up because the thread that owns the mutex is toast.
For this reason, acquiring and releasing critical resources like mutexes should be done in a manner that will guarantee they will be released regardless of how execution leaves the current scope. In other languages, this would mean putting the mtx.unlock() in the finally section of a try..finally block:
mtx.lock();
try
{
// do stuff
}
finally
{
mtx.unlock();
}
C++ doesn't have try..finally statements. Instead, C++ leverages its language rules for automatic disposal of locally defined variables. You construct an object in a local variable, the object acquires a mutex lock in its constructor. When execution leaves the current function scope, C++ will make sure that the object is disposed, and the object releases the lock when it is disposed. That's the RAII others have mentioned. RAII just makes use of the existing implicit try..finally block that wraps every C++ function body.
I have some code which need to be thread safe and exception safe. The code below is a very simplified version of my problem :
#include <mutex>
#include <thread>
std::mutex mutex;
int n=0;
class Counter{
public:
Counter(){
std::lock_guard<std::mutex>guard(mutex);
n++;}
~Counter(){
std::lock_guard<std::mutex>guard(mutex);//How can I protect here the underlying code to mutex.lock() ?
n--;}
};
void doSomething(){
Counter counter;
//Here I could do something meaningful
}
int numberOfThreadInDoSomething(){
std::lock_guard<std::mutex>guard(mutex);
return n;}
I have a mutex that I need to lock in the destructor of an object. The problem is that my destructor should not throw exceptions.
What can I do ?
0) I cannot replace n with an atomic variable (of course it would do the trick here but that is not the point of my question)
1) I could replace my mutex with a spin lock
2) I could try and catch the locking into an infinite loop until I eventualy acquire the lock with no exception raised
None of those solution seems very appealing. Did you have the same problem ? How did you solve it ?
As suggested by Adam H. Peterson, I finally decided to write a no throw mutex :
class NoThrowMutex{
private:
std::mutex mutex;
std::atomic_flag flag;
bool both;
public:
NoThrowMutex();
~NoThrowMutex();
void lock();
void unlock();
};
NoThrowMutex::NoThrowMutex():mutex(),flag(),both(false){
flag.clear(std::memory_order_release);}
NoThrowMutex::~NoThrowMutex(){}
void NoThrowMutex::lock(){
try{
mutex.lock();
while(flag.test_and_set(std::memory_order_acquire));
both=true;}
catch(...){
while(flag.test_and_set(std::memory_order_acquire));
both=false;}}
void NoThrowMutex::unlock(){
if(both){mutex.unlock();}
flag.clear(std::memory_order_release);}
The idea is to have two mutex instead of only one. The real mutex is the spin mutex implemented with an std::atomic_flag. This spin mutex is protected by a std::mutex which could throw.
In a normal situation, the standard mutex is acquired and the flag is set with a cost of only one atomic operation. If the standard mutex cannot be locked immediately, the thread is going to sleep.
If for any reason the standard mutex throws, the mutex will enter its spin mode. The thread where the exception occured will then loop until it can set the flag. As no other thread is aware that this thread bybassed completely the standard mutex, they could spin too.
In the worst case scenario, this locking mechanism degrades to a spin lock. Most of the time it reacts just like a normal mutex.
This is a bad situation to be in. Your destructor is doing something that might fail. If failure to update this counter will irrecoverably corrupt your application, you may want to simply let the destructor throw. This will crash your application with a call to terminate, but if your application is corrupted, it may be better to kill the process and rely on some higher-level recovery scheme (such as a watchdog for a daemon or retrying execution for another utility). If failure to decrement the counter is recoverable, you should absorb the exception with a try{}catch() block and recover (or potentially save information for some other operation to eventually recover). If it's not recoverable, but it's not fatal, you may want to catch and absorb the exception and log the failure (being sure to log in an exception-safe way, of course).
It would be ideal if the code could be restructured such that the destructor doesn't do anything that can't fail. However, if your code is otherwise correct, failure while acquiring a lock is probably rare except in cases of constrained resources, so either absorbing or aborting on failure may very well be acceptable. For some mutexes, lock() is probably a no-throw operation (such as a spinlock using atomic_flag), and if you can use such a mutex, you can expect lock_guard to never throw. Your only worry in that situation would be deadlock.
Is there a way to have a boost mutex throw an exception on any waiting threads? I have a problem where an object is deleted but do to the nature of the software library it is possible threads are still waiting at a mutex within the object and a rather nasty exception is thrown when the mutex is closed. I guess I could use a multiple mutex counter but that could cause performance degradation. What I'd like to have happen is the mutex throw an exception on any threads that are waiting when it is closed so that the stack is unrolled. Is there a clean way to do this that is platform-independent?
Such a concept of a mutex that throws when it is destroyed seems innocuous enough, but when it comes time to implement it, it reveals a flaw in how you are thinking about mutexes.
Let's take a look at some example code to get an idea of the pitfalls of such an approach.
Note: Please do not use the code below, it will cause nothing but endless hours of torment and suffering debugging synchronization problems.
class throwing_mutex
{
private:
mutex m_;
condition_variable cv_;
bool destroyed_;
bool locked_;
public:
void lock()
{
std::unique_lock<std::mutex> lock(m_);
cv_.wait(lock, [&]() {return !locked_ || destroyed_;}); // Wait until the mutex is unlocked or destroyed.
if (destroyed_) throw runtime_error("The exception was terminated while waiting.");
locked_ = true;
}
void unlock()
{
std::unique_lock<std::mutex> lock(m_);
locked_ = false;
lock.unlock();
cv_.notify_one();
}
~throwing_mutex()
{
std::unique_lock<std::mutex> lock(m_);
destroyed_ = true;
lock.unlock();
cv_.notify_all(); // Let all waiters know we are dead.
}
};
Upon destruction, everyone waiting on the throwing_mutex will have an exception thrown. But this opens up a pretty big race condition.
We've handled the case where everyone is waiting for the mutex -- they will safely throw. What we have not handled is the case where anyone is on their way to calling lock() but not quite there yet. When they finally get to the point where they can call lock(), the throwing_mutex has already been destroyed. The bug we've just introduced by means of our flawed methodology is called use-after-free. If we are lucky, the error will present itself early and clearly, but sometimes we aren't so lucky and we will be tormented for hours or days. There is no way that our throwing_mutex class can ever solve that problem and any code that would need such a class does not have well thought out ownership semantics.
So, how do we solve this problem if it isn't by means of a mutex that throws? We fix the lifetime of the mutex and the object[s] that are locked by it.
Presumably, this is mutex is a member of a class. If that is the case, it means delaying destruction until everyone who depends on the object is done with it. This is conveyed with the use of a shared_ptr. Without getting into the nitty-gritty of ownership-semantics, that is the best this can be answered. Hopefully I've changed your way of thinking of the problem enough to stray you away from your original plan and toward something that will work more reliably.
I'm using the C++ boost::thread library, which in my case means I'm using pthreads. Officially, a mutex must be unlocked from the same thread which locks it, and I want the effect of being able to lock in one thread and then unlock in another. There are many ways to accomplish this. One possibility would be to write a new mutex class which allows this behavior.
For example:
class inter_thread_mutex{
bool locked;
boost::mutex mx;
boost::condition_variable cv;
public:
void lock(){
boost::unique_lock<boost::mutex> lck(mx);
while(locked) cv.wait(lck);
locked=true;
}
void unlock(){
{
boost::lock_guard<boost::mutex> lck(mx);
if(!locked) error();
locked=false;
}
cv.notify_one();
}
// bool try_lock(); void error(); etc.
}
I should point out that the above code doesn't guarantee FIFO access, since if one thread calls lock() while another calls unlock(), this first thread may acquire the lock ahead of other threads which are waiting. (Come to think of it, the boost::thread documentation doesn't appear to make any explicit scheduling guarantees for either mutexes or condition variables). But let's just ignore that (and any other bugs) for now.
My question is, if I decide to go this route, would I be able to use such a mutex as a model for the boost Lockable concept. For example, would anything go wrong if I use a boost::unique_lock< inter_thread_mutex > for RAII-style access, and then pass this lock to boost::condition_variable_any.wait(), etc.
On one hand I don't see why not. On the other hand, "I don't see why not" is usually a very bad way of determining whether something will work.
The reason I ask is that if it turns out that I have to write wrapper classes for RAII locks and condition variables and whatever else, then I'd rather just find some other way to achieve the same effect.
EDIT:
The kind of behavior I want is basically as follows. I have an object, and it needs to be locked whenever it is modified. I want to lock the object from one thread, and do some work on it. Then I want to keep the object locked while I tell another worker thread to complete the work. So the first thread can go on and do something else while the worker thread finishes up. When the worker thread gets done, it unlocks the mutex.
And I want the transition to be seemless so nobody else can get the mutex lock in between when thread 1 starts the work and thread 2 completes it.
Something like inter_thread_mutex seems like it would work, and it would also allow the program to interact with it as if it were an ordinary mutex. So it seems like a clean solution. If there's a better solution, I'd be happy to hear that also.
EDIT AGAIN:
The reason I need locks to begin with is that there are multiple master threads, and the locks are there to prevent them from accessing shared objects concurrently in invalid ways.
So the code already uses loop-level lock-free sequencing of operations at the master thread level. Also, in the original implementation, there were no worker threads, and the mutexes were ordinary kosher mutexes.
The inter_thread_thingy came up as an optimization, primarily to improve response time. In many cases, it was sufficient to guarantee that the "first part" of operation A, occurs before the "first part" of operation B. As a dumb example, say I punch object 1 and give it a black eye. Then I tell object 1 to change it's internal structure to reflect all the tissue damage. I don't want to wait around for the tissue damage before I move on to punch object 2. However, I do want the tissue damage to occur as part of the same operation; for example, in the interim, I don't want any other thread to reconfigure the object in such a way that would make tissue damage an invalid operation. (yes, this example is imperfect in many ways, and no I'm not working on a game)
So we made the change to a model where ownership of an object can be passed to a worker thread to complete an operation, and it actually works quite nicely; each master thread is able to get a lot more operations done because it doesn't need to wait for them all to complete. And, since the event sequencing at the master thread level is still loop-based, it is easy to write high-level master-thread operations, as they can be based on the assumption that an operation is complete (more precisely, the critical "first part" upon which the sequencing logic depends is complete) when the corresponding function call returns.
Finally, I thought it would be nice to use inter_thread mutex/semaphore thingies using RAII with boost locks to encapsulate the necessary synchronization that is required to make the whole thing work.
man pthread_unlock (this is on OS X, similar wording on Linux) has the answer:
NAME
pthread_mutex_unlock -- unlock a mutex
SYNOPSIS
#include <pthread.h>
int
pthread_mutex_unlock(pthread_mutex_t *mutex);
DESCRIPTION
If the current thread holds the lock on mutex, then the
pthread_mutex_unlock() function unlocks mutex.
Calling pthread_mutex_unlock() with a mutex that the
calling thread does not hold will result in
undefined behavior.
...
My counter-question would be - what kind of synchronization problem are you trying to solve with this? Most probably there is an easier solution.
Neither pthreads nor boost::thread (built on top of it) guarantee any order in which a contended mutex is acquired by competing threads.
Sorry, but I don't understand. what will be the state of your mutex in line [1] in the following code if another thread can unlock it?
inter_thread_mutex m;
{
m.lock();
// [1]
m.unlock();
}
This has no sens.
There's a few ways to approach this. Both of the ones I'm going to suggest are going to involve adding an additional piece of information to the object, rather adding a mechanism to unlock a thread from a thread other than the one that owns it.
1) you can add some information to indicate the object's state:
enum modification_state { consistent, // ready to be examined or to start being modified
phase1_complete, // ready for the second thread to finish the work
};
// first worker thread
lock();
do_init_work(object);
object.mod_state = phase1_complete;
unlock();
signal();
do_other_stuff();
// second worker thread
lock()
while( object.mod_state != phase1_complete )
wait()
do_final_work(obj)
object.mod_state = consistent;
unlock()
signal()
// some other thread that needs to read the data
lock()
while( object.mod_state != consistent )
wait();
read_data(obj)
unlock()
Works just fine with condition variables, because obviously you're not writing your own lock.
2) If you have a specific thread in mind, you can give the object an owner.
// first worker
lock();
while( obj.owner != this_thread() ) wait();
do_initial_work(obj);
obj.owner = second_thread_id;
unlock()
signal()
...
This is pretty much the same solution as my first solution, but more flexible in the adding/removing of phases, and less flexible in the adding/removing of threads.
To be honest, I'm not sure how inter thread mutex would help you here. You'd still need a semaphore or condition variable to signal the passing of the work to the second thread.
Small modification to what you already have: how about storing the id of the thread which you want to take the lock, in your inter_thread_whatever? Then unlock it, and send a message to that thread, saying "I want you execute whatever routine it is that tries to take this lock".
Then the condition in lock becomes while(locked || (desired_locker != thisthread && desired_locker != 0)). Technically you've "released the lock" in the first thread, and "taken it again" in the second thread, but there's no way that any other thread can grab it in between, so it's as if you've transferred it directly from one to the other.
There's a potential problem, that if a thread exits or is killed, while it's the desired locker of your lock, then that thread deadlocks. But you were already talking about the first thread waiting for a message from the second thread to say that it has successfully acquired the lock, so presumably you already have a plan in mind for what happens if that message is never received. To that plan, add "reset the desired_locker field on the inter_thread_whatever".
This is all very hairy, though, I'm not convinced that what I've proposed is correct. Is there a way that the "master" thread (the one that's directing all these helpers) can just make sure that it doesn't order any more operations to be performed on whatever is protected by this lock, until the first op is completed (or fails and some RAII thing notifies you)? You don't need locks as such, if you can deal with it at the level of the message loop.
I don't think it is a good idea to say that your inter_thread_mutex (binary_semaphore) can be seen as a model of Lockable. The main issue is that the main feature of your inter_thread_mutex defeats the Locakble concept. If inter_thread_mutex was a model of lockable you will expect in In [1] that the inter_thread_mutex m is locked.
// thread T1
inter_thread_mutex m;
{
unique_lock<inter_thread_mutex> lk(m);
// [1]
}
But as an other thread T2 can do m.unlock() while T1 is in [1], the guaranty is broken.
Binary semaphores can be used as Lockables as far as each thread tries to lock before unlocking. But the main goal of your class is exactly the contrary.
This is one of the reason semaphores in Boost.Interprocess don't use lock/unlock to name the functions, but wait/notify. Curiously these are the same names used by conditions :)
A mutex is a mechanism for describing mutually exclusive blocks of code. It does not make sense for these blocks of code to cross thread boundaries. Trying to use such a concept in such an counter intuitive way can only lead to problems down the line.
It sounds very much like you're looking for a different multi-threading concept, but without more detail it's hard to know what.
How can I wait for a detached thread to finish in C++?
I don't care about an exit status, I just want to know whether or not the thread has finished.
I'm trying to provide a synchronous wrapper around an asynchronous thirdarty tool. The problem is a weird race condition crash involving a callback. The progression is:
I call the thirdparty, and register a callback
when the thirdparty finishes, it notifies me using the callback -- in a detached thread I have no real control over.
I want the thread from (1) to wait until (2) is called.
I want to wrap this in a mechanism that provides a blocking call. So far, I have:
class Wait {
public:
void callback() {
pthread_mutex_lock(&m_mutex);
m_done = true;
pthread_cond_broadcast(&m_cond);
pthread_mutex_unlock(&m_mutex);
}
void wait() {
pthread_mutex_lock(&m_mutex);
while (!m_done) {
pthread_cond_wait(&m_cond, &m_mutex);
}
pthread_mutex_unlock(&m_mutex);
}
private:
pthread_mutex_t m_mutex;
pthread_cond_t m_cond;
bool m_done;
};
// elsewhere...
Wait waiter;
thirdparty_utility(&waiter);
waiter.wait();
As far as I can tell, this should work, and it usually does, but sometimes it crashes. As far as I can determine from the corefile, my guess as to the problem is this:
When the callback broadcasts the end of m_done, the wait thread wakes up
The wait thread is now done here, and Wait is destroyed. All of Wait's members are destroyed, including the mutex and cond.
The callback thread tries to continue from the broadcast point, but is now using memory that's been released, which results in memory corruption.
When the callback thread tries to return (above the level of my poor callback method), the program crashes (usually with a SIGSEGV, but I've seen SIGILL a couple of times).
I've tried a lot of different mechanisms to try to fix this, but none of them solve the problem. I still see occasional crashes.
EDIT: More details:
This is part of a massively multithreaded application, so creating a static Wait isn't practical.
I ran a test, creating Wait on the heap, and deliberately leaking the memory (i.e. the Wait objects are never deallocated), and that resulted in no crashes. So I'm sure it's a problem of Wait being deallocated too soon.
I've also tried a test with a sleep(5) after the unlock in wait, and that also produced no crashes. I hate to rely on a kludge like that though.
EDIT: ThirdParty details:
I didn't think this was relevant at first, but the more I think about it, the more I think it's the real problem:
The thirdparty stuff I mentioned, and why I have no control over the thread: this is using CORBA.
So, it's possible that CORBA is holding onto a reference to my object longer than intended.
Yes, I believe that what you're describing is happening (race condition on deallocate). One quick way to fix this is to create a static instance of Wait, one that won't get destroyed. This will work as long as you don't need to have more than one waiter at the same time.
You will also permanently use that memory, it will not deallocate. But it doesn't look like that's too bad.
The main issue is that it's hard to coordinate lifetimes of your thread communication constructs between threads: you will always need at least one leftover communication construct to communicate when it is safe to destroy (at least in languages without garbage collection, like C++).
EDIT:
See comments for some ideas about refcounting with a global mutex.
To the best of my knowledge there's no portable way to directly ask a thread if its done running (i.e. no pthread_ function). What you are doing is the right way to do it, at least as far as having a condition that you signal. If you are seeing crashes that you are sure are due to the Wait object is being deallocated when the thread that creates it quits (and not some other subtle locking issue -- all too common), the issue is that you need to make sure the Wait isn't being deallocated, by managing from a thread other than the one that does the notification. Put it in global memory or dynamically allocate it and share it with that thread. Most simply don't have the thread being waited on own the memory for the Wait, have the thread doing the waiting own it.
Are you initializing and destroying the mutex and condition var properly?
Wait::Wait()
{
pthread_mutex_init(&m_mutex, NULL);
pthread_cond_init(&m_cond, NULL);
m_done = false;
}
Wait::~Wait()
{
assert(m_done);
pthread_mutex_destroy(&m_mutex);
pthread_cond_destroy(&m_cond);
}
Make sure that you aren't prematurely destroying the Wait object -- if it gets destroyed in one thread while the other thread still needs it, you'll get a race condition that will likely result in a segfault. I'd recommend making it a global static variable that gets constructed on program initialization (before main()) and gets destroyed on program exit.
If your assumption is correct then third party module appears to be buggy and you need to come up with some kind of hack to make your application work.
Static Wait is not feasible. How about Wait pool (it even may grow on demand)? Is you application using thread pool to run?
Although there will still be a chance that same Wait will be reused while third party module is still using it. But you can minimize such chance by properly queing vacant Waits in your pool.
Disclaimer: I am in no way an expert in thread safety, so consider this post as a suggestion from a layman.