Unexpected behavior when std::thread.detach is called

I've been trying to develop a better understanding of C++ threading, by which I have written the following example:
#include <functional>
#include <iostream>
#include <thread>
class Test {
Test() { x = 5; }
void act() {
std::cout << "1" << std::endl;
std::thread worker(&Test::changex, this);
std::cout << "2" << std::endl;
void changex() {
std::cout << "3" << std::endl;
x = 10;
std::cout << "4" << std::endl;
int x;
int main() {
Test t;
return 0;
To me, I should get the following output when compiled with g++ linked with -pthread:
as the cout calls are in that order. However, the output is inconsistent. 1 and 2 are always printed in order, but sometimes the 3 and or 4 are either omitted or printed double. i.e. 12, 123, 1234, or 12344
My working theory is that the main thread exits before the worker thread begins working or completes, thus resulting in the omission of output. I can immediately think of a solution to this problem in creating a global boolean variable to signify when the worker thread has completed that the main thread waits on for a state change before exiting. This alleviates the issue.
However, this feels to me like a highly messy approach that likely has a more clean solution, especially for an issue like this that likely comes up often in threading.

Just some general advice, that holds both for using raw pthreads in C++ and for pthreads wrapped in std::thread: The best way to get readable, comprehensible and debuggable behavior is to make thread synchronization and lifetime management explicit. I.e. avoid using pthread_kill, pthread_cancel, and in most cases, avoid detaching threads and instead do explicit join.
One design pattern I like is using an std atomic flag. When main thread wants to quit, it sets the atomic flag to true. The worker threads typically do their work in a loop, and check the atomic flag reasonably often, e.g. once per lap of the loop. When they find main has ordered them to quit, they clean up and return. The main thread then join:s with all workers.
There are some special cases that require extra care, for example when one worker is stuck in a blocking syscall and/or C library function. Usually, the platform provides ways of getting out of such blocking calls without resorting to e.g. pthread_cancel, since thread cancellation works very badly with C++. One example of how to avoid blocking is the Linux manpage for getaddrinfo_a, i.e. asynchronous network address translation.
One additional nice design pattern is when workers are sleeping in e.g. select(). You can then add an extra control pipe between main and the worker. Main signals the worker to quit by send():ing one byte over the pipe, thus waking up the worker if it sleeps in select().

Example of how this could be done:
#include <functional>
#include <iostream>
#include <thread>
class Test {
std::thread worker; // worker is now a member
Test() { x = 5; } // worker deliberately left without a function to run.
if (worker.joinable()) // worker can be joined (act was called successfully)
worker.join(); // wait for worker thread to exit.
// Note destructor cannot complete if thread cannot be exited.
// Some extra brains needed here for production code.
void act() {
std::cout << "1" << std::endl;
worker = std::thread(&Test::changex, this); // give worker some work
std::cout << "2" << std::endl;
// rest unchanged.
void changex() {
std::cout << "3" << std::endl;
x = 10;
std::cout << "4" << std::endl;
int x;
int main() {
Test t;
return 0;
} // test destroyed here. Destruction halts and waits for thread.


How can I syncronize these two threads properly?

I would like to synchronize different threads properly but so far I have only be able to write an inelegant solution. Can somebody kindly point out how I can improve the following code?
typedef void (*func)();
void thread(func func1, func func2, int& has_finished, int& id) {
while (has_finished != 0) std::cout << "thread " << id << " waiting\n";
std::cout << "thread" << id << "resuming\n";
int main() {
int has_finished(0), id_one(0), id_two(1);
std::thread t1(thread, fun, fun, std::ref(has_finished), std::ref(id_one));
std::thread t2(thread, fun, fun, std::ref(has_finished), std::ref(id_two));
The gist of the program is described by the function thread. The function is executed by two std::threads. The function accepts two long-running functions func1 and func2 and two references of ints as arguments. The threads should only invoke func2 after all threads exited func1. The argument has_finished is used to coordinate the different threads: Upon entering the function, has_arguments is zero. Then each std::thread decrements the value and invokes the long-running function func1. After having left func1, has_finished is incremented again. As long as this value is not at its original value of zero a thread waits. Then, each thread works on func2. The main function is shown at the end.
How can I coordinate the two threads better? I was thinking of using a std::mutex and std::condition_variable but could not figure out how to use them properly? Does somebody have any idea how I can improve the program?
Don't write this yourself. This kind of synchronization is known as a "latch" (or more generally a "barrier", and it's available through various libraries and through the C++ Concurrency TS. (It might also make it into C++20 in some form.)
For example, using a version from Boost:
#include <iostream>
#include <thread>
#include <boost/thread/latch.hpp>
void f(boost::latch& c) {
std::cout << "Doing work in round 1\n";
std::cout << "Doing work in round 2\n";
int main() {
boost::latch c(2);
std::thread t1(f, std::ref(c)), t2(f, std::ref(c));
The method you've chosen won't actually work and results in undefined behavior because of the race conditions. As you surmised, you need a condition variable.
Here is a Gate class demonstrating how to use a condition variable to implement a gate that waits for some number of threads to arrive at it before continuing:
#include <thread>
#include <mutex>
#include <condition_variable>
#include <iostream>
#include <sstream>
#include <utility>
#include <cassert>
struct Gate {
explicit Gate(unsigned int count = 2) : count_(count) { } // How many threads need to reach the gate before it unlocks
Gate(Gate const &) = delete;
void operator =(Gate const &) = delete;
void wait_for_gate();
int count_;
::std::mutex count_mutex_;
::std::condition_variable count_gate_;
void Gate::wait_for_gate()
::std::unique_lock<::std::mutex> guard(count_mutex_);
assert(count > 0); // Count being 0 here indicates an irrecoverable programming error.
count_gate_.wait(guard, [this](){ return this-> count_ <= 0; });
void f1()
::std::ostringstream msg;
msg << "In f1 with thread " << ::std::this_thread::get_id() << '\n';
::std::cout << msg.str();
void f2()
::std::ostringstream msg;
msg << "In f2 with thread " << ::std::this_thread::get_id() << '\n';
::std::cout << msg.str();
void thread_func(Gate &gate)
int main()
Gate gate;
::std::thread t1{thread_func, ::std::ref(gate)};
::std::thread t2{thread_func, ::std::ref(gate)};
Hopefully the structure of this code looks enough like your code that you can understand what's going on here. From reading your code, it seems like you're looking for all threads to execute func1, then func2. You do not want func2 running while any thread is executing func1.
That can be thought of as a gate where all the threads are waiting to arrive at the 'finished func1' location before moving on to run func2.
I tested this code on my own local version of compiler explorer.
The main disadvantage of the latch in the other answer is that it is not yet standard C++. My Gate class is a simple implementation of the latch class mentioned in the other answer, and it is standard C++.
The basic way a condition variable works is that it unlocks a mutex, waits for a notify, then locks that mutex and tests the condition. If the condition is true, it continues without unlocking the mutex. If the condition is false, it starts over again.
So, after the condition variable says the condition is true, you have to do whatever you need to do, then unlock the mutex and notify everybody that you've done it.
The mutex here is guarding the shared count variable. Whenever you have a shared value you should guard it with a mutex so that no thread can see that value in an inconsistent state. The condition is that threads can wait for that count to reach 0, indicating that all threads have decremented the count variable.

What's the proper way of implementing 'sleeping' technique using C++?

Two thread. Main one is constantly gathering notifications while the other one is processing some of them.
The way i implemet it - is not correct as i've been told. What problems is it causing and what's wrong about it?
#include <iostream>
#include <atomic>
#include <thread>
#include <mutex>
#include <chrono>
std::condition_variable foo;
std::mutex mtx;
void secondThread()
while (true)
std::cout << " ----------------------------" << std::endl;
std::cout << "|processing a notification...|" << std::endl;
std::cout << " ----------------------------" << std::endl;
int main()
std::thread subThread = std::thread(&secondThread);
int count = 0;
while (true)
if (count % 10 == 0)
std::cout << "Main thread working on gathering notifications..." << std::endl;
return 0;
I was told that this foo.wait(std::unique_lock<std::mutex>(mtx)) line of code is not a good practice according to the C++ spec. This is not a proper way of solving this kind of problem. It's also called, sleeping(not busy waiting).
Before you call wait, you must check that the thing you are waiting for hasn't already happened. And before you stop calling wait, you must check that the thing you are waiting for has happened. Condition variables are stateless and have no idea what you're waiting for. It's your job to code that.
Also, the associated mutex must protect the thing you're waiting for. The entire point of a condition variable is to provide an atomic "unlock and wait" operation to prevent this problem:
You check if you need to wait under the protection of a mutex.
You decide you do need to wait.
You unlock the mutex so other threads can make progress.
You wait.
But what if the thing you're waiting for happens after you unlocked the mutex but before you waited? You'll be waiting for something that already happened.
This is why the wait function takes a lock holder -- so that it can perform steps 3 and 4 atomically.

Creating a class to store threads and calling them

Here is a simplified version of what I am trying to do:
#include <iostream>
#include <vector>
#include <thread>
#include <atomic>
class client {
std::vector<std::thread> threads;
std::atomic<bool> running;
void main() {
while(running) {
std::cout << "main" << std::endl;
void render() {
while(running) {
std::cout << "render" << std::endl;
client() {
running = true;
threads.push_back(std::thread(&client::main, this));
threads.push_back(std::thread(&client::render, this));
~client() {
running = false;
for(auto& th : threads) th.join();
int main() {
client c;
std::string inputString;
getline(std::cin, inputString);
return 0;
(Note: code has been changed since question was written)
What I am trying to do is create a class that holds threads for the main loop(of the class), rendering, and a couple other things. However I cannot get this simplified version to work. I have tried using mutex to lock and unlock the threads, but didn't seem to help any. I do not know why it is not working, but I suspect that it is a result of the use of this in threads.push_back(std::thread(this->main, this));.
The current structure of the code doesn't have to remain... The only requirement is that uses one of it's own member functions as a thread (and that, that thread is stored in the class). I am not sure if this requires two classes or if my attempt to do it in one class was the correct approach. I have seen many examples of creating an object, and then calling a member that creates threads. I am trying to avoid this and instead create the threads within the constructor.
The problem here is that you do not wait for the threads to end. In main you create c. This then spawns the threads. The next thing to happen is to return which destroys c. When c is destroyed it destroys its members. Now when a thread is destroyed if it has not been joined or detached then std::terminate is called and the program ends
What you need to do is in the destructor, set running to false and then call join on both the threads. This will stop the loop in each thread and allow c to be destructed correctly.
Doing this however brings up another issue. running is not an atomic variable so writing to it while threads are reading it is undefined behavior. We can fin that though by changing running to a std::atomic<bool> which provides synchronization.
I also had to make a change to the thread construction. When you want to use a member function the syntax should be
std::thread(&class_name::function_name, pointer_to_instance_of_class_name, function_parameters)
so in this case it would be
threads.push_back(std::thread(&client::main, this));
threads.push_back(std::thread(&client::render, this));

Parent thread join(): Blocks Until Children Finish?

I have a C++ class that does some multi-threading. Consider the pseudo-code below:
void MyClass::Open() {
loop_flag = true;
// create consumer_thread (infinite loop)
// create producer_thread (infinite loop)
void MyClass::Close() {
loop_flag = false;
// join producer_thread
// join consumer_thread
MyClass::~MyClass() {
// do other stuff here
Note that consumer_thread, producer_thread, and their associated functions are all encapsulated in MyClass. The caller has no clue that their calls are multi-threaded and what's going on in the background.
Now, the class is part of a larger program. The program has some initial multi-threading to handle configuration of the system since there's a ton of stuff happening at once.
Like this (pseudo-code):
int main() {
// create config_thread1 (unrelated to MyClass)
// create thread for MyClass::Open()
// ...
// join all spawned configuration threads
So my question is, when I call join() for the thread linked to MyClass::Open() (i.e., the configuration thread spawned in main()), what happens? Does it join() immediately (since the MyClass::Open() function just returns after creation of producer_thread and consumer_thread) or does it wait for producer_thread and consumer_thread to finish (and therefore hangs my program).
Thanks in advance for the help. In terms of implementation details, I'm using Boost threads on a Linux box.
Edited to add this diagram:
|--->configuration_thread (that runs MyClass::Open())
|----> producer_thread
|----> consumer_thread
If I call join() on configuration_thread(), does it wait until producer_thread() and consumer_thread() are finished or does it return immediately (and producer_thread() and consumer_thread() continue to run)?
A (non detached) thread will be joignable, even after having returned from the function it was set to run, until it has been joined.
#include <iostream>
#include <thread>
#include <chrono>
using namespace std;
void foo(){
std::cout << "helper: I'm done\n";
int main(){
cout << "starting helper...\n";
thread helper(foo);
cout << "helper still joignable?..." << (helper.joignable()?"yes!":"no...:(") << "\n";
cout << "helper joined!";
cout << "helper still joignable?..." << (helper.joignable()?"really?":"not anymore!") << "\n";
cout << "done!\n";
starting helper...
helper: I'm done
still joinable?...yes!
helper joined!
still joinable?...not anymore!
As for how much time the join method takes, I don't think this is specified, but surely it doesn't't have to wait for all the other threads to finish, or it would mean that only one thread would be able to join all the others.
From ยง30.3.5:
void Join();
Requires: joinable() is true
Effects: Blocks until the thread represented by *this had completed.
Synchronization: The completion of the thread represented by *this synchronises with the corresponding successful join() return. [Note: Operations on *this are not synchronised. * -- end note*]

When should I use std::thread::detach?

Sometime I have to use std::thread to speed up my application. I also know join() waits until a thread completes. This is easy to understand, but what's the difference between calling detach() and not calling it?
I thought that without detach(), the thread's method will work using a thread independently.
Not detaching:
void Someclass::Somefunction() {
std::thread t([ ] {
printf("thread called without detach");
//some code here
Calling with detaching:
void Someclass::Somefunction() {
std::thread t([ ] {
printf("thread called with detach");
//some code here
In the destructor of std::thread, std::terminate is called if:
the thread was not joined (with t.join())
and was not detached either (with t.detach())
Thus, you should always either join or detach a thread before the flows of execution reaches the destructor.
When a program terminates (ie, main returns) the remaining detached threads executing in the background are not waited upon; instead their execution is suspended and their thread-local objects destructed.
Crucially, this means that the stack of those threads is not unwound and thus some destructors are not executed. Depending on the actions those destructors were supposed to undertake, this might be as bad a situation as if the program had crashed or had been killed. Hopefully the OS will release the locks on files, etc... but you could have corrupted shared memory, half-written files, and the like.
So, should you use join or detach ?
Use join
Unless you need to have more flexibility AND are willing to provide a synchronization mechanism to wait for the thread completion on your own, in which case you may use detach
You should call detach if you're not going to wait for the thread to complete with join but the thread instead will just keep running until it's done and then terminate without having the spawner thread waiting for it specifically; e.g.
std::thread(func).detach(); // It's done when it's done
detach basically will release the resources needed to be able to implement join.
It is a fatal error if a thread object ends its life and neither join nor detach has been called; in this case terminate is invoked.
This answer is aimed at answering question in the title, rather than explaining the difference between join and detach. So when should std::thread::detach be used?
In properly maintained C++ code std::thread::detach should not be used at all. Programmer must ensure that all the created threads gracefully exit releasing all the acquired resources and performing other necessary cleanup actions. This implies that giving up ownership of threads by invoking detach is not an option and therefore join should be used in all scenarios.
However some applications rely on old and often not well designed and supported APIs that may contain indefinitely blocking functions. Moving invocations of these functions into a dedicated thread to avoid blocking other stuff is a common practice. There is no way to make such a thread to exit gracefully so use of join will just lead to primary thread blocking. That's a situation when using detach would be a less evil alternative to, say, allocating thread object with dynamic storage duration and then purposely leaking it.
#include <LegacyApi.hpp>
#include <thread>
auto LegacyApiThreadEntry(void)
auto result{NastyBlockingFunction()};
// do something...
int main()
::std::thread legacy_api_thread{&LegacyApiThreadEntry};
// do something...
return 0;
When you detach thread it means that you don't have to join() it before exiting main().
Thread library will actually wait for each such thread below-main, but you should not care about it.
detach() is mainly useful when you have a task that has to be done in background, but you don't care about its execution. This is usually a case for some libraries. They may silently create a background worker thread and detach it so you won't even notice it.
According to cppreference.com:
Separates the thread of execution from the thread object, allowing
execution to continue independently. Any allocated resources will be
freed once the thread exits.
After calling detach *this no longer owns any thread.
For example:
std::thread my_thread([&](){XXXX});
Notice the local variable: my_thread, while the lifetime of my_thread is over, the destructor of std::thread will be called, and std::terminate() will be called within the destructor.
But if you use detach(), you should not use my_thread anymore, even if the lifetime of my_thread is over, nothing will happen to the new thread.
Maybe it is good idea to iterate what was mentioned in one of the answers above: When the main function is finished and main thread is closing, all spawn threads either will be terminated or suspended. So, if you are relying on detach to have a background thread continue running after the main thread is shutdown, you are in for a surprise. To see the effect try the following. If you uncomment the last sleep call, then the output file will be created and written to fine. Otherwise not:
#include <mutex>
#include <thread>
#include <iostream>
#include <fstream>
#include <array>
#include <chrono>
using Ms = std::chrono::milliseconds;
std::once_flag oflag;
std::mutex mx;
std::mutex printMx;
int globalCount{};
std::ofstream *logfile;
void do_one_time_task() {
//std::cout<<"I am in thread with thread id: "<< std::this_thread::get_id() << std::endl;
std::call_once(oflag, [&]() {
// std::cout << "Called once by thread: " << std::this_thread::get_id() << std::endl;
// std::cout<<"Initialized globalCount to 3\n";
globalCount = 3;
logfile = new std::ofstream("testlog.txt");
// some more here
for(int i=0; i<10; ++i){
*logfile << "thread: "<< std::this_thread::get_id() <<", globalCount = " << globalCount << std::endl;
std::call_once(oflag, [&]() {
//std::cout << "Called once by thread: " << std::this_thread::get_id() << std::endl;
//std::cout << "closing logfile:\n";
int main()
std::array<std::thread, 5> thArray;
for (int i = 0; i < 5; ++i)
thArray[i] = std::thread(do_one_time_task);
for (int i = 0; i < 5; ++i)
std::cout << "Main: globalCount = " << globalCount << std::endl;
return 0;