Ran across this destructor in a codebase I am debugging.
ManagerImpl::~ManagerImpl() {
// don't go away if some thread is still hitting us
boost::unique_lock<boost::mutex> l(m_mutex);
}
Does it actually serve any useful purpose in a multi-threaded program? It looks like kludge.
I assume the idea is to defer destruction if another thread is calling a function that locks the mutex, but is it even effectively at doing that? ElectricFence segfaults would have me believe otherwise.
It is probably trying to postpone destruction until another thread unlocks the mutex and leaves another member function.
However, this wouldn't prevent another thread calling that function again after the lock in the destructor has been released.
There must be more interaction between threads (which you don't show) to make this code make sense. Still, thought, this doesn't seem to be robust code.
Related
I have a socket shared between 4 threads and I wanted to use the RAII principle for acquiring and releasing the mutex.
The ground realities
I am using the pthread library.
I cannot use Boost.
I cannot use anything newer than C++03.
I cannot use exceptions.
The Background
Instead of having to lock the mutex for the socket everytime before using it, and then unlocking the mutex right afterwards, I thought I could write a scoped_lock() which would lock the mutex, and once it goes out of scope, it would automatically unlock the mutex.
So, quite simply I do a lock in the constructor and an unlock in the destructor, as shown here.
ScopedLock::ScopedLock(pthread_mutex_t& mutex, int& errorCode)
: m_Mutex(mutex)
{
errorCode = m_lock();
}
ScopedLock::~ScopedLock()
{
errorCode = m_unlock();
}
where m_lock() and m_unlock() are quite simply two wrapper functions around the pthread_mutex_lock() and the pthread_mutex_unlock() functions respectively, with some additional tracelines/logging.
In this way, I would not have to write at least two unlock statements, one for the good case and one for the bad case (at least one, could be more bad-paths in some situations).
The Problem
The problem that I have bumped into and the thing that I don't like about this scheme is the destructor.
I have diligiently done for every function the error-handling, but from the destructor of this ScopedLock(), I cannot inform the caller about any errors that might be returned my m_unlock().
This is a fundamental problem with RAII, but in this case, you're in luck. pthread_unlock only fails if you set up the mutex wrong (EINVAL) or if you're attempting to unlock an error checking mutex from a thread that doesn't own it (EPERM). These errors are indications of bugs in your code, not runtime errors that you should be taking into account. asserting errorCode==0 is a reasonable strategy in this case.
Let's say I have a small operation which I want to perform in a separate thread. I do not need to know when it completes, nor do I need to wait for its completion, but I do not want the operation blocking my current thread. When I write the following code, I will get a crash:
void myFunction() {
// do other stuff
std::thread([]()
{
// do thread stuff
});
}
This crash is solved by assigning the thread to a variable, and detaching it:
void myFunction() {
// do other stuff
std::thread t([]()
{
// do thread stuff
});
t.detach();
}
Why is this step necessary? Or is there a better way to create a small single-use thread?
Because the std::thread::~thread() specification says so:
A thread object does not have an associated thread (and is safe to destroy) after
it was default-constructed
it was moved from
join() has been called
detach() has been called
It looks like detach() is the only one of these that makes sense in your case, unless you want to return the thread object (by moving) to the caller.
Why is this step necessary?
Consider that the thread object represents a long-running "thread" of execution (a lightweight process or kernel schedulable entity or similar).
Allowing you to destroy the object while the thread is still executing, leaves you no way to subsequently join (and find the result of) that thread. This may be a logical error, but it can also make it hard even to correctly exit your program.
Or is there a better way to create a small single-use thread?
Not obviously, but it's frequently better to use a thread pool for running tasks in the background, instead of starting and stopping lots of short-lived threads.
You might be able to use std::async() instead, but the future it returns may block in the destructor in some circumstances, if you try to discard it.
See the documentation of the destructor of std:thread:
If *this has an associated thread (joinable() == true), std::terminate() is called.
You should explicitly say that you don't care what's going to happen with the thread, and that you're OK with loosing any control over it. And that is what detach is for.
In general, this looks like a design problem so crashing makes sense: it's hard to propose a general and not surprising rule about what should happen in such a case (e.g. your program might as well normally end its execution - what should happen with the thread?).
Basically, your use case requires a call to detach() because your use case is pretty weird, and not what C++ is trying to make easy.
While Java and .Net blithely let you toss away a Thread object whose associated thread is still running, in the C++ model the Thread is closer to being the thread, in the sense that the existence of the Thread object coincides with the lifetime, or at least joinability, of the execution it refers to. Note how it's not possible to create a Thread without starting it (except in the case of the default constructor, which is really just there in the service of move semantics), or to copy it or to make one from a thread id. C++ wants Thread to outlive the thread.
Maintaining that condition has various benefits. Final cleanup of a thread's control data doesn't have to be done automagically by the OS, because once a Thread goes away, nothing can ever try to join it. It's easier to ensure that variables with thread storage get destroyed in time, since the main thread is the last to exit (barring some move shenanigans). And a missing join -- which is an extremely common type of bug -- gets properly flagged at runtime.
Letting some thread wander off into the distance, in contrast, is allowed, but it's an unusual thing to do. Unless it's interacting with your other threads through sync objects, there's no way to ensure it's done whatever it was meant to do. A detached thread is on the level of reinterpret_cast: You're allowed to tell the compiler that you know something it doesn't, but that has to be explicit, not just the consequence of the function you didn't call.
Consider this: thread A creates thread B and thread A leaves its scope of execution. The handle for thread B is about to be lost. What should happen now? There are several possibilities, with most obvious as follows:
Thread B is detached and continues its execution indempedently
Thread A waits (joins) thread B before quiting its own scope
Now you can argue which is better: 1 or 2? How should we (the compiler) decide on which one of these is better?
So what the designers did was something different: crash terminate the code so that the developer picks one of these solutions explicitely. In order to avoid implicit (perhaps unwanted) behaviuor. It's a signal for you: "hey, pay attention now, this piece of code is important and I (the compiler) don't want to decide for you".
A few days ago my friend told me about the situation, they had in their project.
Someone decided, that it would be good to destroy the object of NotVerySafeClass in parallel thread (like asynchronously). It was implemented some time ago.
Now they get crashes, because some method is called in main thread, while object is destroyed.
Some workaround was created to handle the situation.
Ofcourse, this is just an example of not very good solution, but still the question:
Is there some way to prevent the situation internally in NotVerySafeClass (deny running the methods, if destructor was called already, and force the destructor to wait, until any running method is over (let's assume there is only one method))?
No, no and no. This is a fundamental design issue, and it shows a common misconception in thinking about multithreaded situations and race conditions in general.
There is one thing that can happen equally likely, and this is really showing that you need an ownership concept: The function calling thread could call the function just right after the object has been destroyed, so there is no object anymore and try to call a function on it is UB, and since the object does not exist anymore, it also has no chance to prevent any interaction between the dtor and a member function.
What you need is a sound ownership policy. Why is the code destroying the object when it is still needed?
Without more details about the code, a std::shared_ptr would probably solve this issue. Depending on your specific situation, you may be able to solve it with a more lightweight policy.
Sounds like a horrible design. Can't you use smart pointer to make sure the object is destroyed only when no-one holds any references to it?
If not, I'd use some external synchronization mechanism. Synchronizing the destructor with a method is really awkward.
There is no methods that can be used to prevent this scenario.
In multithread programming, you need to make sure that an object will not be deleted if there are some others thread still accessing it.
If you are dealing with such code, it needs fundamental fix
(Not to promote bad design) but to answer your two questions:
... deny running the methods, if destructor was called already
You can do this with the solution proposed by #snemarch and #Simon (a lock). To handle the situation where one thread is inside the destructor, while another one is waiting for the lock at the beginning of your method, you need to keep track of the state of the object in a thread-safe way in memory shared between threads. E.g. a static atomic int that is set to 0 by the destructor before releasing the lock. The method checks for the int once it acquires the lock and bails if its 0.
... force the destructor to wait, until any running method is over
The solution proposed by #snemarch and #Simon (a lock) will handle this.
No. Just need to design the program propertly so that it is thread safe.
Why not make use of a mutex / semaphore ? At the beginning of any method the mutex is locked, and the destructor wait until the mutex is unlocked. It's a fix, not a solution. Maybe you should change the design of a part of your application.
Simple answer: no.
Somewhat longer answer: you could guard each and every member function and the destructor in your class with a mutex... welcome to deadlock opportunities and performance nightmares.
Gather a mob and beat some design sense into the 'someone' who thought parallel destruction was a good idea :)
If I understand correctly, then foo1() fails to unlock &private_value_. As a result, foo2()'s thread_mutex_lock does not work since foo1() never released it.
What are the other consequences?
int main ( ... )
foo1();
foo2();
return 0;
}
foo1()
{
pthread_mutex_lock(&private_value_);
do something
// no unlock!
}
foo2()
{
pthread_mutex_lock(&private_value_)
do something
pthread_mutex_unlock(&private_value_);
}
There seems to be some confusion here between how the program should have been written and how the program, as currently written, will behave.
This code will cause deadlock, and this does not indicate that there is something wrong with how the mutexes are working. They're working exactly how they are supposed to: If you try to re-acquire a non-recursive mutex that is already locked, your code will block until the mutex is unlocked. That's how it's supposed to work.
Since this code is single-threaded, the blocking in foo2 will never end, and so your program will deadlock and not progress. That is most likely not how the program should work (because it's not a very useful program that way). The error is not in how the mutexes are functioning, but in how the programmer chose to employ them. The programmer should have put an unlock call at the end of foo1.
The mutex works fine. It's doing what it is meant to do. The thread will block after foo1() exits until the mutex is obtained by foo2.
This is why people migrate to languages that offer scope-based solutions, like C++ and RAII. The mutex is working as intended, but the writer of the first function forgot a small call, and now the application halts.
It's a bug if a lock is taken on a mutex and nothing ever releases the lock.
However, that doesn't necessarily mean that foo1() has to release the lock, but something must.
There are patterns where one function will acquire a lock, and another will release it. But you need to take special care that these more complex mutex handling patterns are coded correctly. You might be looking at an example of this (and the boiled-down snippet in the question doesn't include that additional complexity).
And as Neil Butterworth mentioned in a comment, there are many situations in C++ where a mutex will be managed by an RAII class, so the lock will be released automatically when the 'lock manager' object gets destroyed (often by going out of scope). In this case, it may not be obvious that the lock is being released, since that is done as a side effect of the variable merely going out of scope.
No, it does not necessarily block. All depends on your private_value_ variable. pthread_mutex_t can show different behavior according to the properties that were set when the variable was initialized. See http://opengroup.org/onlinepubs/007908775/xsh/pthread_mutex_lock.html
To answer the question:
There are no other consequences other than a deadlock (since your example is single threaded and makes the calls synchronously)..
If only foo2 gets called everything will run normal. If foo1 gets called then, if foo2 gets called or any other function that requires the mutex it will lock up forever.
The behavior of this code depends on the type of the mutex:
Default: Undefined behavior (locking an already-locked mutex).
Normal: Deadlock (locking an already-locked mutex).
Recursive: Works, but leaves the mutex locked (since it's locked twice but only unlocked once).
Error-checking: Fully works, and unlocks the mutex at the end (the second call to lock fails with EDEADLK, and the error value returned is ignored by the caller).
How can I wait for a detached thread to finish in C++?
I don't care about an exit status, I just want to know whether or not the thread has finished.
I'm trying to provide a synchronous wrapper around an asynchronous thirdarty tool. The problem is a weird race condition crash involving a callback. The progression is:
I call the thirdparty, and register a callback
when the thirdparty finishes, it notifies me using the callback -- in a detached thread I have no real control over.
I want the thread from (1) to wait until (2) is called.
I want to wrap this in a mechanism that provides a blocking call. So far, I have:
class Wait {
public:
void callback() {
pthread_mutex_lock(&m_mutex);
m_done = true;
pthread_cond_broadcast(&m_cond);
pthread_mutex_unlock(&m_mutex);
}
void wait() {
pthread_mutex_lock(&m_mutex);
while (!m_done) {
pthread_cond_wait(&m_cond, &m_mutex);
}
pthread_mutex_unlock(&m_mutex);
}
private:
pthread_mutex_t m_mutex;
pthread_cond_t m_cond;
bool m_done;
};
// elsewhere...
Wait waiter;
thirdparty_utility(&waiter);
waiter.wait();
As far as I can tell, this should work, and it usually does, but sometimes it crashes. As far as I can determine from the corefile, my guess as to the problem is this:
When the callback broadcasts the end of m_done, the wait thread wakes up
The wait thread is now done here, and Wait is destroyed. All of Wait's members are destroyed, including the mutex and cond.
The callback thread tries to continue from the broadcast point, but is now using memory that's been released, which results in memory corruption.
When the callback thread tries to return (above the level of my poor callback method), the program crashes (usually with a SIGSEGV, but I've seen SIGILL a couple of times).
I've tried a lot of different mechanisms to try to fix this, but none of them solve the problem. I still see occasional crashes.
EDIT: More details:
This is part of a massively multithreaded application, so creating a static Wait isn't practical.
I ran a test, creating Wait on the heap, and deliberately leaking the memory (i.e. the Wait objects are never deallocated), and that resulted in no crashes. So I'm sure it's a problem of Wait being deallocated too soon.
I've also tried a test with a sleep(5) after the unlock in wait, and that also produced no crashes. I hate to rely on a kludge like that though.
EDIT: ThirdParty details:
I didn't think this was relevant at first, but the more I think about it, the more I think it's the real problem:
The thirdparty stuff I mentioned, and why I have no control over the thread: this is using CORBA.
So, it's possible that CORBA is holding onto a reference to my object longer than intended.
Yes, I believe that what you're describing is happening (race condition on deallocate). One quick way to fix this is to create a static instance of Wait, one that won't get destroyed. This will work as long as you don't need to have more than one waiter at the same time.
You will also permanently use that memory, it will not deallocate. But it doesn't look like that's too bad.
The main issue is that it's hard to coordinate lifetimes of your thread communication constructs between threads: you will always need at least one leftover communication construct to communicate when it is safe to destroy (at least in languages without garbage collection, like C++).
EDIT:
See comments for some ideas about refcounting with a global mutex.
To the best of my knowledge there's no portable way to directly ask a thread if its done running (i.e. no pthread_ function). What you are doing is the right way to do it, at least as far as having a condition that you signal. If you are seeing crashes that you are sure are due to the Wait object is being deallocated when the thread that creates it quits (and not some other subtle locking issue -- all too common), the issue is that you need to make sure the Wait isn't being deallocated, by managing from a thread other than the one that does the notification. Put it in global memory or dynamically allocate it and share it with that thread. Most simply don't have the thread being waited on own the memory for the Wait, have the thread doing the waiting own it.
Are you initializing and destroying the mutex and condition var properly?
Wait::Wait()
{
pthread_mutex_init(&m_mutex, NULL);
pthread_cond_init(&m_cond, NULL);
m_done = false;
}
Wait::~Wait()
{
assert(m_done);
pthread_mutex_destroy(&m_mutex);
pthread_cond_destroy(&m_cond);
}
Make sure that you aren't prematurely destroying the Wait object -- if it gets destroyed in one thread while the other thread still needs it, you'll get a race condition that will likely result in a segfault. I'd recommend making it a global static variable that gets constructed on program initialization (before main()) and gets destroyed on program exit.
If your assumption is correct then third party module appears to be buggy and you need to come up with some kind of hack to make your application work.
Static Wait is not feasible. How about Wait pool (it even may grow on demand)? Is you application using thread pool to run?
Although there will still be a chance that same Wait will be reused while third party module is still using it. But you can minimize such chance by properly queing vacant Waits in your pool.
Disclaimer: I am in no way an expert in thread safety, so consider this post as a suggestion from a layman.