Race condition between terminating worker threads and main thread - c++

I am having an issue with terminating worker threads from the main thread. So far each method I tried either leads to a race condition or dead lock.
The worker threads are stored in a inner class inside a class called ThreadPool, ThreadPool maintains a vector of these WorkerThreads using unique_ptr.
Here is the header for my ThreadPool:
class ThreadPool
typedef void (*pFunc)(const wpath&, const Args&, Global::mFile_t&, std::mutex&, std::mutex&); // function to point to
class WorkerThread
ThreadPool* const _thisPool; // reference enclosing class
// pointers to arguments
wpath _pPath; // member argument that will be modifyable to running thread
Args * _pArgs;
Global::mFile_t * _pMap;
// flags for thread management
bool _terminate; // terminate thread
bool _busy; // is thread busy?
bool _isRunning;
// thread management members
std::mutex _threadMtx;
std::condition_variable _threadCond;
std::thread _thisThread;
// exception ptr
std::exception_ptr _ex;
// private copy constructor
WorkerThread(const WorkerThread&): _thisPool(nullptr) {}
WorkerThread(ThreadPool&, Args&, Global::mFile_t&);
void setPath(const wpath); // sets a new task
void terminate(); // calls terminate on thread
bool busy() const; // returns whether thread is busy doing task
bool isRunning() const; // returns whether thread is still running
void join(); // thread join wrapper
std::exception_ptr exception() const;
// actual worker thread running tasks
void thisWorkerThread();
// thread specific information
DWORD _numProcs; // number of processors on system
unsigned _numThreads; // number of viable threads
std::vector<std::unique_ptr<WorkerThread>> _vThreads; // stores thread pointers - workaround for no move constructor in WorkerThread
pFunc _task; // the task threads will call
// synchronization members
unsigned _barrierLimit; // limit before barrier goes down
std::mutex _barrierMtx; // mutex for barrier
std::condition_variable _barrierCond; // condition for barrier
std::mutex _coutMtx;
// argument mutex
std::mutex matchesMap_mtx;
std::mutex coutMatch_mtx;
ThreadPool(pFunc f);
// wake a thread and pass it a new parameter to work on
void callThread(const wpath&);
// barrier synchronization
void synchronizeStartingThreads();
// starts and synchronizes all threads in a sleep state
void startThreads(Args&, Global::mFile_t&);
// terminate threads
void terminateThreads();
So far the real issue I am having is that when calling terminateThreads() from main thread
causes dead lock or race condition.
When I set my _terminate flag to true, there is a chance that the main will already exit scope and destruct all mutexes before the thread has had a chance to wake up and terminate. In fact I have gotten this crash quite a few times (console window displays: mutex destroyed while busy)
If I add a thread.join() after I notify_all() the thread, there is a chance the thread will terminate before the join occurs, causing an infinite dead lock, as joining to a terminated thread suspends the program indefinitely.
If I detach - same issue as above, but causes program crash
If I instead use a while(WorkerThread.isRunning()) Sleep(0);
The program may crash because the main thread may exit before the WorkerThread reaches that last closing brace.
I am not sure what else to do to stop halt the main until all worker threads have terminated safely. Also, even with try-catch in thread and main, no exceptions are being caught. (everything I have tried leads to program crash)
What can I do to halt the main thread until worker threads have finished?
Here are the implementations of the primary functions:
Terminate Individual worker thread
void ThreadPool::WorkerThread::terminate()
_terminate = true;
The actual ThreadLoop
void ThreadPool::WorkerThread::thisWorkerThread()
while (!_terminate)
std::cout << std::this_thread::get_id() << " Sleeping..." << std::endl;
_busy = false;
std::unique_lock<std::mutex> lock(_threadMtx);
std::cout << std::this_thread::get_id() << " Awake..." << std::endl;
_thisPool->_task(_pPath, *_pArgs, *_pMap, _thisPool->coutMatch_mtx, _thisPool->matchesMap_mtx);
std::cout << std::this_thread::get_id() << " Finished Task..." << std::endl;
std::cout << std::this_thread::get_id() << " Terminating" << std::endl;
catch (const std::exception&)
_ex = std::current_exception();
_isRunning = false;
Terminate All Worker Threads
void ThreadPool::terminateThreads()
for (std::vector<std::unique_ptr<WorkerThread>>::iterator it = _vThreads.begin(); it != _vThreads.end(); ++it)
// if thread threw an exception, rethrow it in main
if (it->get()->exception() != nullptr)
and lastly, the function that is calling the thread pool (the scan function is running on main)
// scans a path recursively for all files of selected extension type, calls thread to parse file
unsigned int Functions::Scan(wpath path, const Args& args, ThreadPool& pool)
wrecursive_directory_iterator d(path), e;
unsigned int filesFound = 0;
while ( d != e )
if (args.verbose())
std::wcout << L"Grepping: " << d->path().string() << std::endl;
for (Args::ext_T::const_iterator it = args.extension().cbegin(); it != args.extension().cend(); ++it)
if (extension(d->path()) == *it)
std::cout << "Scan Function: Calling TerminateThreads() " << std::endl;
std::cout << "Scan Function: Called TerminateThreads() " << std::endl;
return filesFound;
Ill repeat the question again: What can I do to halt the main thread until worker threads have finished?

I don't get the issue with thread termination and join.
Joining threads is all about waiting until the given thread has terminated, so it's exaclty what you want to do. If the thread has finished execution already, join will just return immediately.
So you'll just want to join each thread during the terminate call as you already do in your code.
Note: currently you immediately rethrow any exception if a thread you just terminated has an active exception_ptr. That might lead to unjoined threads. You'll have to keep that in mind when handling those exceptions
Update: after looking at your code, I see a potential bug: std::condition_variable::wait() can return when a spurious wakeup occurs. If that is the case, you will work again on the path that was worked on the last time, leading to wrong results. You should have a flag for new work that is set if new work has been added, and that _threadCond.wait(lock) line should be in a loop that checks for the flag and _terminate. Not sure if that one will fix your problem, though.

The problem was two fold:
synchronizeStartingThreads() would sometimes have 1 or 2 threads blocked, waiting for the okay to go ahead (a problem in the while (some_condition) barrierCond.wait(lock). The condition would sometimes never evaluate to true. removing the while loop fixed this blocking issue.
The second issue was the potential for a worker thread to enter the _threadMtx, and notify_all was called just before they entered the _threadCond.wait(), since notify was already called, the thread would wait forever.
// terminate() is called
std::unique_lock<std::mutex> lock(_threadMtx);
// _threadCond.notify_all() is called here
_busy = false;
// thread is blocked forever
surprisingly, locking this mutex in terminate() did not stop this from happening.
This was solved by adding a timeout of 30ms to the _threadCond.wait()
Also, a check was added before the starting of task to make sure the same task wasn't being processed again.
The new code now looks like this:
_threadCond.wait_for(lock, std::chrono::milliseconds(30)); // hold the lock a max of 30ms
// after the lock, and the termination check
Global::mFile_t rMap = _thisPool->_task(_pPath, *_pArgs, _thisPool->coutMatch_mtx);
_workerMap.element.insert(rMap.element.begin(), rMap.element.end());


C++ scoped lock in loop blocks another thread

Simple example:
#include <iostream>
#include <thread>
#include <mutex>
std::mutex lock_m;
void childTh() {
while(true) {
std::unique_lock<std::mutex> lockChild(lock_m);
std::cout << "childTh CPN1" << std::endl;
int main(int, char**) {
std::thread thr(childTh);
std::unique_lock<std::mutex> lockMain(lock_m);
std::cout << "MainTh CPN1" << std::endl;
return 0;
Main thread blocks on lockMain and never reach "MainTh CPN1". I expect that main thread should acquire lock_m when childTh reach end of iteration because lockChild is destroyed and lock_m is released. But this never happens.
Can you please describe in details why main thread don't have time to acquire the lock before childTh lock it again ?
With sleep_for main can reach "MainTh CPN1", but with yield not.
I know that condition_variable can be used to notify and unblock another thread, but is it possible to use just scoped lock ? So it looks that it is risky to use scoped lock in different threads, even if it the same lock.
In childTh, lockChild doesn't release the mutex until the iteration ends. Right after that iteration ends, it starts the next one. This means you only have the time between the destruction of lockChild and then the initialization of lockChild in the next iteration. Since that happens as basically the next instruction, there basically isn't any time for lockMain to acquire a lock on the mutex. To save CPU cycles a typical lock acquire is going to yield for a short duration, which is not as short as single instruction, so there is basically no chance of lockMain being able to lock the mutex as it would have to be timed perfectly. If you change childTh to
void childTh() {
while(true) {
std::unique_lock<std::mutex> lockChild(lock_m);
std::cout << "childTh CPN1" << std::endl;
now you have a 1 second delay between when the mutex is release by lockChild and when it reacquired in the next iteration, which then allows lockMain to acquire the mutex.
Also note that you are not calling join on thr at the end of main. Not doing so causes thr's destructor to throw an exception which will cause your program to terminate imporperly.

Infinite waiting on condition variable

The simplified goal is to force calling 3 member functions in 3 different threads one by one (thread A calls F::first, thread B F::second, an thread C F::third).
In order to achieve the order for threads to be executed I used 1 condition variable and 2 bools indicating whether first and second threads finished their work.
In the code:
std::mutex mtx;
std::condition_variable cv;
bool firstPrinted = false;
bool secondPrinted = false;
class F {
void first(std::function<void()> printFirst) {
std::unique_lock<std::mutex> lck(mtx);
std::cout << "first\n";
firstPrinted = true;
void second(std::function<void()> printSecond) {
std::unique_lock<std::mutex> lck(mtx);
std::cout << "second\n";
cv.wait(lck, []() { return firstPrinted; });
secondPrinted = true;
void third(std::function<void()> printThird) {
std::unique_lock<std::mutex> lck(mtx);
std::cout << "third\n";
cv.wait(lck, []() { return secondPrinted; });
auto first = []() {
std::cout << "1";
auto second = []() {
std::cout << "2";
auto third = []() {
std::cout << "3";
F f;
std::thread A(&F::first, &f, first);
std::thread B(&F::second, &f, second);
std::thread C(&F::third, &f, third);
A.join(); B.join(); C.join();
Now lets consider this situation:
Thread A does not start first - whether the first starting thread was B or C they both block (wait) until get notified (B blocks until notified by A, and C blocks until notified by B)
The infinite waiting (or perhaps deadlock !?) appears when the first starting thread is C, which always yields this output:
...and stalling here
Theoretically, this should not happen because calling cv.wait in thread C unlocks the mutex which allows thread B to run which in turn also waits (because condition didn't become true) and therefore it unlocks the locked mutex as well allowing thread A to start first which finally should enter critical section and notify B.
What is the call path that causes stalling of the program ?
What nuance did I miss ?
Please correct me if I was wrong in the thoughts above.
std::condition_variable::notify_one() will wake one of the threads waiting for the condition_variable. If multiple threads are waiting, one will be picked. It will wake, reacquire the lock check it's predicate. If that predicate is still false it will return to it's waiting state and the notification is in essence lost.
That is what is happening here when the thread running first is the last to execute. When it reaches it's notify_one there will be two threads waiting for the condition_variable. If it notifies the thread running third, it's predicate will still return false. That thread will wake, fail it's predicate test and return to waiting. Your process now has no running threads and is frozen.
The solution is to use std::condition_variable::notify_all(). This function wakes all waiting threads who will, one at a time, relock the mutex and check their own predicate.

Parent thread join(): Blocks Until Children Finish?

I have a C++ class that does some multi-threading. Consider the pseudo-code below:
void MyClass::Open() {
loop_flag = true;
// create consumer_thread (infinite loop)
// create producer_thread (infinite loop)
void MyClass::Close() {
loop_flag = false;
// join producer_thread
// join consumer_thread
MyClass::~MyClass() {
// do other stuff here
Note that consumer_thread, producer_thread, and their associated functions are all encapsulated in MyClass. The caller has no clue that their calls are multi-threaded and what's going on in the background.
Now, the class is part of a larger program. The program has some initial multi-threading to handle configuration of the system since there's a ton of stuff happening at once.
Like this (pseudo-code):
int main() {
// create config_thread1 (unrelated to MyClass)
// create thread for MyClass::Open()
// ...
// join all spawned configuration threads
So my question is, when I call join() for the thread linked to MyClass::Open() (i.e., the configuration thread spawned in main()), what happens? Does it join() immediately (since the MyClass::Open() function just returns after creation of producer_thread and consumer_thread) or does it wait for producer_thread and consumer_thread to finish (and therefore hangs my program).
Thanks in advance for the help. In terms of implementation details, I'm using Boost threads on a Linux box.
Edited to add this diagram:
|--->configuration_thread (that runs MyClass::Open())
|----> producer_thread
|----> consumer_thread
If I call join() on configuration_thread(), does it wait until producer_thread() and consumer_thread() are finished or does it return immediately (and producer_thread() and consumer_thread() continue to run)?
A (non detached) thread will be joignable, even after having returned from the function it was set to run, until it has been joined.
#include <iostream>
#include <thread>
#include <chrono>
using namespace std;
void foo(){
std::cout << "helper: I'm done\n";
int main(){
cout << "starting helper...\n";
thread helper(foo);
cout << "helper still joignable?..." << (helper.joignable()?"yes!":"no...:(") << "\n";
cout << "helper joined!";
cout << "helper still joignable?..." << (helper.joignable()?"really?":"not anymore!") << "\n";
cout << "done!\n";
starting helper...
helper: I'm done
still joinable?...yes!
helper joined!
still joinable?...not anymore!
As for how much time the join method takes, I don't think this is specified, but surely it doesn't't have to wait for all the other threads to finish, or it would mean that only one thread would be able to join all the others.
From ยง30.3.5:
void Join();
Requires: joinable() is true
Effects: Blocks until the thread represented by *this had completed.
Synchronization: The completion of the thread represented by *this synchronises with the corresponding successful join() return. [Note: Operations on *this are not synchronised. * -- end note*]

C++11 std::threads and waiting for threads to finish

I have a vector of Timer Objects. Each Timer Object launches an std::thread that simulates a growing period. I am using a Command pattern.
What is happening is each Timer is getting executed one after another but what I really want is for one to be executed....then once finished, the next one...once finished the next...while not interfering with the main execution of the program
class Timer
bool _bTimerStarted;
bool _bTimerCompleted;
int _timerDuration;
virtual ~Timer() { }
virtual void execute()=0;
virtual void runTimer()=0;
inline void setDuration(int _s) { _timerDuration = _s; };
inline int getDuration() { return _timerDuration; };
inline bool isTimerComplete() { return _bTimerCompleted; };
class GrowingTimer : public Timer
void execute()
//std::cout << "Timer execute..." << std::endl;
_bTimerStarted = false;
_bTimerCompleted = false;
//std::thread t1(&GrowingTimer::runTimer, this); //Launch a thread
void runTimer()
//std::cout << "Timer runTimer..." << std::endl;
_bTimerStarted = true;
auto start = std::chrono::high_resolution_clock::now();
std::this_thread::sleep_until(start + std::chrono::seconds(20));
_bTimerCompleted = true;
std::cout << "Growing Timer Finished..." << std::endl;
class Timers
std::vector<Timer*> _timers;
struct ExecuteTimer
void operator()(Timer* _timer) { _timer->execute(); }
void add_timer(Timer& _timer) { _timers.push_back(&_timer); }
void execute()
//std::for_each(_timers.begin(), _timers.end(), ExecuteTimer());
for (int i=0; i < _timers.size(); i++)
Timer* _t = _timers.at(i);
//while ( ! _t->isTimerComplete())
Executing the above like:
Timers _timer;
GrowingTimer _g, g1;
void start_timers()
In Timers::execute I am trying a few different ways to execute the first and not execute the
next until I somehow signal it is done.
I am now doing this to execute everything:
Timers _timer;
GrowingTimer _g, g1;
std::thread t1(&Broccoli::start_timers, this); //Launch a thread
void start_timers()
The first time completes (I see the "completed" cout), but crashes at _t->execute(); inside the for loop with an EXEC_BAD_ACCESS. I added a cout to check the size of the vector and it is 2 so both timers are inside. I do see this in the console:
this Timers * 0xbfffd998
_timers std::__1::vector<Timer *, std::__1::allocator<Timer *> >
if I change the detach() to join() everything completes without the crash, but it blocks execution of my app until those timers finish.
Why are you using threads here? Timers::execute() calls execute on a timer, then waits for it to finish, then calls execute on the next, and so forth. Why don't you just call the timer function directly in Timers::execute() rather than spawning a thread and then waiting for it?
Threads allow you to write code that executes concurrently. What you want is serial execution, so threads are the wrong tool.
Update: In the updated code you run start_timers on a background thread, which is good. However, by detaching that thread you leave the thread running past the end of the scope. This means that the timer objects _g and _g1 and even the Timers object _timers are potentially destroyed before the thread has completed. Given the time-consuming nature of the timers thread, and the fact that you used detach rather than join in order to avoid your code blocking, this is certainly the cause of your problem.
If you run code on a thread then you need to ensure that all objects accessed by that thread have a long-enough lifetime that they are still valid when the thread accesses them. For detached threads this is especially hard to achieve, so detached threads are not recommended.
One option is to create an object containing _timers, _g and _g1 along side the thread t1, and have its destructor join with the thread. All you need to do then is to ensure that the object lives until the point that it is safe to wait for the timers to complete.
If you don't want to interfere with the execution of the program, you could do something like #Joel said but also adding a thread in the Timers class which would execute the threads in the vector.
You could include a unique_ptr to the thread in GrowingTimer instead of creating it as a local object in execute and calling detach. You can still create the thread in execute, but you would do it with a unique_ptr::reset call.
Then use join instead of isTimerComplete (add a join function to the Timer base class). The isTimerComplete polling mechanism will be extremely inefficient because it will basically use up that thread's entire time slice continually polling, whereas join will block until the other thread is complete.
An example of join:
#include <iostream>
#include <chrono>
#include <thread>
using namespace std;
void threadMain()
cout << "Done sleeping\n";
int main()
thread t(threadMain);
for (int i = 0; i < 10; ++i)
cout << i << "\n";
cout << "Press Enter to exit\n";
return 0;
Note how the main thread keeps running while the other thread does its thing. Note that Anthony's answer is right in that it doesn't really seem like you need more than one background thread that just executes tasks sequentially rather than starting a thread and waiting for it to finish before starting a new one.

Why might this thread management pattern result in a deadlock?

I'm using a common base class has_threads to manage any type that should be allowed to instantiate a boost::thread.
Instances of has_threads each own a set of threads (to support waitAll and interruptAll functions, which I do not include below), and should automatically invoke removeThread when a thread terminates to maintain this set's integrity.
In my program, I have just one of these. Threads are created on an interval every 10s, and each performs a database lookup. When the lookup is complete, the thread runs to completion and removeThread should be invoked; with a mutex set, the thread object is removed from internal tracking. I can see this working properly with the output ABC.
Once in a while, though, the mechanisms collide. removeThread is executed perhaps twice concurrently. What I can't figure out is why this results in a deadlock. All thread invocations from this point never output anything other than A. [It's worth noting that I'm using thread-safe stdlib, and that the issue remains when IOStreams are not used.] Stack traces indicate that the mutex is locking these threads, but why would the lock not be eventually released by the first thread for the second, then the second for the third, and so on?
Am I missing something fundamental about how scoped_lock works? Is there anything obvious here that I've missed that could lead to a deadlock, despite (or even due to?) the use of a mutex lock?
Sorry for the poor question, but as I'm sure you're aware it's nigh-on impossible to present real testcases for bugs like this.
class has_threads {
template <typename Callable>
void createThread(Callable f, bool allowSignals)
boost::mutex::scoped_lock l(threads_lock);
// Create and run thread
boost::shared_ptr<boost::thread> t(new boost::thread());
// Track thread
// Run thread (do this after inserting the thread for tracking so that we're ready for the on-exit handler)
*t = boost::thread(&has_threads::runThread<Callable>, this, f, allowSignals);
* Entrypoint function for a thread.
* Sets up the on-end handler then invokes the user-provided worker function.
template <typename Callable>
void runThread(Callable f, bool allowSignals)
if (!allowSignals)
try {
catch (boost::thread_interrupted& e) {
// Yes, we should catch this exception!
// Letting it bubble over is _potentially_ dangerous:
// http://stackoverflow.com/questions/6375121
std::cout << "Thread " << boost::this_thread::get_id() << " interrupted (and ended)." << std::endl;
catch (std::exception& e) {
std::cout << "Exception caught from thread " << boost::this_thread::get_id() << ": " << e.what() << std::endl;
catch (...) {
std::cout << "Unknown exception caught from thread " << boost::this_thread::get_id() << std::endl;
void has_threads::releaseThread(boost::thread::id thread_id)
std::cout << "A";
boost::mutex::scoped_lock l(threads_lock);
std::cout << "B";
for (threads_t::iterator it = threads.begin(), end = threads.end(); it != end; ++it) {
if ((*it)->get_id() != thread_id)
std::cout << "C";
void blockSignalsInThisThread()
sigset_t signal_set;
sigaddset(&signal_set, SIGINT);
sigaddset(&signal_set, SIGTERM);
sigaddset(&signal_set, SIGHUP);
sigaddset(&signal_set, SIGPIPE); // http://www.unixguide.net/network/socketfaq/2.19.shtml
pthread_sigmask(SIG_BLOCK, &signal_set, NULL);
typedef std::set<boost::shared_ptr<boost::thread> > threads_t;
threads_t threads;
boost::mutex threads_lock;
struct some_component : has_threads {
some_component() {
// set a scheduler to invoke createThread(bind(&some_work, this)) every 10s
void some_work() {
// usually pretty quick, but I guess sometimes it could take >= 10s
Well, a deadlock might occurs if the same thread lock a mutex it has already locked (unless you use a recursive mutex).
If the release part is called a second time by the same thread as it seems to happen with your code, you have a deadlock.
I have not studied your code in details, but you probably have to re-design your code (simplify ?) to be sure that a lock can not be acquired twice by the same thread. You can probably use a safeguard checking for the ownership of the lock ...
As said in my comment and in IronMensan answer, one possible case is that the thread stop during creation, the at_exit being called before the release of the mutex locked in the creation part of your code.
Well, with mutex and scoped lock, I can only imagine a recursive lock, or a lock that is not released. It can happen if a loop goes to infinite due to a memory corruption for instance.
I suggest to add more logs with a thread id to check if there is a recursive lock or something strange. Then I will check that my loop is correct. I will also check that the at_exit is only called once per thread ...
One more thing, check the effect of erasing (thus calling the destructor) of a thread while being in the at_exit function...
my 2 cents
You may need to do something like this:
void createThread(Callable f, bool allowSignals)
// Create and run thread
boost::shared_ptr<boost::thread> t(new boost::thread());
boost::mutex::scoped_lock l(threads_lock);
// Track thread
//Do not hold threads_lock while starting the new thread in case
//it completes immediately
// Run thread (do this after inserting the thread for tracking so that we're ready for the on-exit handler)
*t = boost::thread(&has_threads::runThread<Callable>, this, f, allowSignals);
In other words, use thread_lock exclusively to protect threads.
To expand on something in the comments with speculation about how boost::thread works, the lock patterns could look something like this:
(createThread) obtain threads_lock
(boost::thread::opeator =) obtain a boost::thread internal lock
(boost::thread::opeator =) release a boost::thread internal lock
(createThread) release threads_lock
thread end handler:
(at_thread_exit) obtain a boost::thread internal lock
(releaseThread) obtain threads_lock
(releaseThread) release threads_lock
(at_thread_exit) release a boost:thread internal lock
If those two boost::thread locks are the same lock, the potential for deadlock is clear. But this is speculation because much of the boost code scares me and I try not to look at it.
createThread could/should be reworked to move step 4 up between steps one and two and eliminate the potential deadlock.
It is possible that the created thread is finishing before or during the assignment operator in createThread is complete. Using an event queue or some other structure that is might be necessary. Though a simpler, though hack-ish, solution might work as well. Don't change createThread since you have to use threads_lock to protect threads itself and the thread objects it points to. Instead change runThread to this:
template <typename Callable>
void runThread(Callable f, bool allowSignals)
//SNIP setup
try {
//SNIP catch blocks
//ensure that createThread is complete before this thread terminates
boost::mutex::scoped_lock l(threads_lock);