C++ program unexpectedly blocks / throws - c++

I'm learning about mutexes in C++ and have a problem with the following code (taken from N. Josuttis' "The C++ Standard Library").
I don't understand why it blocks / throws unless I add this_thread::sleep_for in the main thread (then it doesn't block and all three calls are carried out).
The compiler is cl.exe used from the command line.
#include <future>
#include <mutex>
#include <iostream>
#include <string>
#include <thread>
#include <chrono>
std::mutex printMutex;
void print(const std::string& s)
{
std::lock_guard<std::mutex> lg(printMutex);
for (char c : s)
{
std::cout.put(c);
}
std::cout << std::endl;
}
int main()
{
auto f1 = std::async(std::launch::async, print, "Hello from thread 1");
auto f2 = std::async(std::launch::async, print, "Hello from thread 2");
// std::this_thread::sleep_for(std::chrono::seconds(1));
print(std::string("Hello from main"));
}

I think what you are seeing is an issue with the conformance of the MSVC implementation of async (in combination with future). I believe it is not conformant. I am able to reproduce it with VS2013, but unable to reproduce the issue with gcc.
The crash is because the main thread exits (and starts to clean up) before the other two threads complete.
Hence a simple delay (the sleep_for) or .get() or .wait() on the two futures should fix it for you. So the modified main could look like;
int main()
{
auto f1 = std::async(std::launch::async, print, "Hello from thread 1");
auto f2 = std::async(std::launch::async, print, "Hello from thread 2");
print(std::string("Hello from main"));
f1.get();
f2.get();
}
Favour the explicit wait or get over the timed "sleep".
Notes on the conformance
There was a proposal from Herb Sutter to change the wait or block on the shared state of the future returned from async. This may be the reason for the behaviour in MSVC, it could be seen as having implemented the proposal. I'm not sure what the final result was of the proposal was or its integration (or part thereof) into C++14. At least w.r.t. the blocking of the future returned from async it looks like the MSVC behaviour did not make it into the specification.
It is interesting to note that the wording in §30.6.8/5 changed;
From C++11
a call to a waiting function on an asynchronous return object that shares the shared state created
by this async call shall block until the associated thread has completed, as if joined
To C++14
a call to a waiting function on an asynchronous return object that shares the shared state created
by this async call shall block until the associated thread has completed, as if joined, or else time
out
I'm not sure how the "time out" would be specified, I would imagine it is implementation defined.

std::async returns a future. Its destructor blocks if get or wait has not been called:
it may block if all of the following are true: the shared state was created by a call to std::async, the shared state is not yet ready, and this was the last reference to the shared state.
See std::futures from std::async aren't special! for a detailed treatment of the subject.

Add these 2 lines at the end of main:
f1.wait();
f2.wait();
This will make sure the threads finish before main exists.

Related

Why std::future is different returned from std::packaged_task and std::async?

I got to know the reason that future returned from std::async has some special shared state through which wait on returned future happened in the destructor of future. But when we use std::pakaged_task, its future does not exhibit the same behavior.
To complete a packaged task, you have to explicitly call get() on future object from packaged_task.
Now my questions are:
What could be the internal implementation of future (thinking std::async vs std::packaged_task)?
Why the same behavior was not applied to future returned from std::packaged_task? Or, in other words, how is the same behavior stopped for std::packaged_task future?
To see the context, please see the code below:
It does not wait to finish countdown task. However, if I un-comment // int value = ret.get();, it would finish countdown and is obvious because we are literally blocking on returned future.
// packaged_task example
#include <iostream> // std::cout
#include <future> // std::packaged_task, std::future
#include <chrono> // std::chrono::seconds
#include <thread> // std::thread, std::this_thread::sleep_for
// count down taking a second for each value:
int countdown (int from, int to) {
for (int i=from; i!=to; --i) {
std::cout << i << std::endl;
std::this_thread::sleep_for(std::chrono::seconds(1));
}
std::cout << "Lift off!" <<std::endl;
return from-to;
}
int main ()
{
std::cout << "Start " << std::endl;
std::packaged_task<int(int,int)> tsk (countdown); // set up packaged_task
std::future<int> ret = tsk.get_future(); // get future
std::thread th (std::move(tsk),10,0); // spawn thread to count down from 10 to 0
// int value = ret.get(); // wait for the task to finish and get result
std::cout << "The countdown lasted for " << std::endl;//<< value << " seconds.\n";
th.detach();
return 0;
}
If I use std::async to execute task countdown on another thread, no matter if I use get() on returned future object or not, it will always finish the task.
// packaged_task example
#include <iostream> // std::cout
#include <future> // std::packaged_task, std::future
#include <chrono> // std::chrono::seconds
#include <thread> // std::thread, std::this_thread::sleep_for
// count down taking a second for each value:
int countdown (int from, int to) {
for (int i=from; i!=to; --i) {
std::cout << i << std::endl;
std::this_thread::sleep_for(std::chrono::seconds(1));
}
std::cout << "Lift off!" <<std::endl;
return from-to;
}
int main ()
{
std::cout << "Start " << std::endl;
std::packaged_task<int(int,int)> tsk (countdown); // set up packaged_task
std::future<int> ret = tsk.get_future(); // get future
auto fut = std::async(std::move(tsk), 10, 0);
// int value = fut.get(); // wait for the task to finish and get result
std::cout << "The countdown lasted for " << std::endl;//<< value << " seconds.\n";
return 0;
}
std::async has definite knowledge of how and where the task it is given is executed. That is its job: to execute the task. To do that, it has to actually put it somewhere. That somewhere could be a thread pool, a newly created thread, or in a place to be executed by whomever destroys the future.
Because async knows how the function will be executed, it has 100% of the information it needs to build a mechanism that can communicate when that potentially asynchronous execution has concluded, as well as to ensure that if you destroy the future, then whatever mechanism that's going to execute that function will eventually get around to actually executing it. After all, it knows what that mechanism is.
But packaged_task doesn't. All packaged_task does is store a callable object which can be called with the given arguments, create a promise with the type of the function's return value, and provide a means to both get a future and to execute the function that generates the value.
When and where the task actually gets executed is none of packaged_task's business. Without that knowledge, the synchronization needed to make future's destructor synchronize with the task simply can't be built.
Let's say you want to execute the task on a freshly-created thread. OK, so to synchronize its execution with the future's destruction, you'd need a mutex which the destructor will block on until the task thread finishes.
But what if you want to execute the task in the same thread as the caller of the future's destructor? Well, then you can't use a mutex to synchronize that since it all on the same thread. Instead, you need to make the destructor invoke the task. That's a completely different mechanism, and it is contingent on how you plan to execute.
Because packaged_task doesn't know how you intend to execute it, it cannot do any of that.
Note that this is not unique to packaged_task. All futures created from a user-created promise object will not have the special property of async's futures.
So the question really ought to be why async works this way, not why everyone else doesn't.
If you want to know that, it's because of two competing needs: async needed to be a high-level, brain-dead simple way to get asynchronous execution (for which sychronization-on-destruction makes sense), and nobody wanted to create a new future type that was identical to the existing one save for the behavior of its destructor. So they decided to overload how future works, complicating its implementation and usage.
#Nicol Bolas has already answered this question quite satisfactorily. So I'll attempt to answer the question slightly from different perspective, elaborating the points already mentioned by #Nicol Bolas.
The design of related things and their goals
Consider this simple function which we want to execute, in various ways:
int add(int a, int b) {
std::cout << "adding: " << a << ", "<< b << std::endl;
return a + b;
}
Forget std::packaged_task, std ::future and std::async for a while, let's take one step back and revisit how std::function works and what problem it causes.
case 1 — std::function isn't good enough for executing things in different threads
std::function<int(int,int)> f { add };
Once we have f, we can execute it, in the same thread, like:
int result = f(1, 2); //note we can get the result here
Or, in a different thread, like this:
std::thread t { std::move(f), 3, 4 };
t.join();
If we see carefully, we realize that executing f in a different thread creates a new problem: how do we get the result of the function? Executing f in the same thread does not have that problem — we get the result as returned value, but when executed it in a different thread, we don't have any way to get the result. That is exactly what is solved by std::packaged_task.
case 2 — std::packaged_task solves the problem which std::function does not solve
In particular, it creates a channel between threads to send the result to the other thread. Apart from that, it is more or less same as std::function.
std::packaged_task<int(int,int)> f { add }; // almost same as before
std::future<int> channel = f.get_future(); // get the channel
std::thread t{ std::move(f), 30, 40 }; // same as before
t.join(); // same as before
int result = channel.get(); // problem solved: get the result from the channel
Now you see how std::packaged_task solves the problem created by std::function. That however does not mean that std::packaged_task has to be executed in a different thread. You can execute it in the same thread as well, just like std::function, though you will still get the result from the channel.
std::packaged_task<int(int,int)> f { add }; // same as before
std::future<int> channel = f.get_future(); // same as before
f(10, 20); // execute it in the current thread !!
int result = channel.get(); // same as before
So fundamentally std::function and std::packaged_task are similar kind of thing: they simply wrap callable entity, with one difference: std::packaged_task is multithreading-friendly, because it provides a channel through which it can pass the result to other threads. Both of them do NOT execute the wrapped callable entity by themselves. One needs to invoke them, either in the same thread, or in another thread, to execute the wrapped callable entity. So basically there are two kinds of thing in this space:
what is executed i.e regular functions, std::function, std::packaged_task, etc.
how/where is executed i.e threads, thread pools, executors, etc.
case 3: std::async is an entirely different thing
It's a different thing because it combines what-is-executed with how/where-is-executed.
std::future<int> fut = std::async(add, 100, 200);
int result = fut.get();
Note that in this case, the future created has an associated executor, which means that the future will complete at some point as there is someone executing things behind the scene. However, in case of the future created by std::packaged_task, there is not necessarily an executor and that future may never complete if the created task is never given to any executor.
Hope that helps you understand how things work behind the scene. See the online demo.
The difference between two kinds of std::future
Well, at this point, it becomes pretty much clear that there are two kinds of std::future which can be created:
One kind can be created by std::async. Such future has an associated executor and thus can complete.
Other kind can be created by std::packaged_task or things like that. Such future does not necessarily have an associated executor and thus may or may not complete.
Since, in the second case the future does not necessarily have an associated executor, its destructor is not designed for its completion/wait because it may never complete:
{
std::packaged_task<int(int,int)> f { add };
std::future<int> fut = f.get_future();
} // fut goes out of scope, but there is no point
// in waiting in its destructor, as it cannot complete
// because as `f` is not given to any executor.
Hope this answer helps you understand things from a different perspective.
The change in behaviour is due to the difference between std::thread and std::async.
In the first example, you have created a daemon thread by detaching. Where you print std::cout << "The countdown lasted for " << std::endl; in your main thread, may occur before, during or after the print statements inside the countdown thread function. Because the main thread does not await the spawned thread, you will likely not even see all of the print outs.
In the second example, you launch the thread function with the std::launch::deferred policy. The behaviour for std::async is:
If the async policy is chosen, the associated thread completion synchronizes-with the successful return from the first function that is waiting on the shared state, or with the return of the last function that releases the shared state, whichever comes first.
In this example, you have two futures for the same shared state. Before their dtors are called when exiting main, the async task must complete. Even if you had not explicitly defined any futures, the temporary future that gets created and destroyed (returned from the call to std::async) will mean that the task completes before the main thread exits.
Here is a great blog post by Scott Meyers, clarifying the behaviour of std::future & std::async.
Related SO post.

boost::future::then() not returning future that blocks on destruction

I wrote this sample code to test boost::future continuations to use in my application.
#include <iostream>
#include <functional>
#include <unistd.h>
#include <exception>
#define BOOST_THREAD_PROVIDES_FUTURE
#define BOOST_THREAD_PROVIDES_FUTURE_CONTINUATION
#include <boost/thread/future.hpp>
void magicNumber(std::shared_ptr<boost::promise<long>> p)
{
sleep(5);
p->set_value(0xcafebabe);
}
boost::future<long> foo()
{
std::shared_ptr<boost::promise<long>> p =
std::make_shared<boost::promise<long>>();
boost::future<long> f = p->get_future();
boost::thread t([p](){magicNumber(p);});
t.detach();
return f;
}
void bar()
{
auto f = foo();
f.then([](boost::future<long> f) { std::cout << f.get() << std::endl; });
std::cout << "Should have blocked?" << std::endl;
}
int main()
{
bar();
sleep (6);
return 0;
}
When compiled, linked and run with boost version 1.64.0_1, I am getting following output:
Should have blocked?
3405691582
But according to boost::future::then's documentation here.
The execution should be blocked at f.then() in function bar() because the temporary variable of type boost::future<void> should block at destruction, and the output should be
3405691582
Should have blocked?
In my application though, the call to f.then() is blocking the execution till continuation is not invoked.
What is happening here?
Note that the only time a future would ever block in the destructor used to be documented as when you use std::async with a launch-policy of launch::async.
See Why is the destructor of a future returned from `std::async` blocking?
The answer lists the many discussions that have taken place around this subject. The proposal N3776 made it into C++14:
This paper provides proposed wording to implement a positive SG1 straw poll to clarify that ~future and
~shared_future don’t block except possibly in the presence of async.
cppreference.com documents std::async
Your code never used async, so it would be surprising if any future derived would block on destruction.
More gener
ally, it is clear that the consensus is that blocking destruction is an unfortunate design wart, not something you'd expect being introduced on newer extensions (such as .then continuations).
I can only assume this is a case of documentation error where the wording
The returned futures behave as the ones returned from boost::async, the destructor of the future object returned from then will block. This could be subject to change in future versions.
should be removed.

Confusion about threads launched by std::async with std::launch::async parameter

I am a little bit confused by the std::async function.
The specification says:
asynchronous operation being executed "as if in a new thread of execution" (C++11 §30.6.8/11).
Now, what is that supposed to mean?
In my understanding, the code
std::future<double> fut = std::async(std::launch::async, pow2, num);
should launch the function pow2 on a new thread and pass the variable num to the thread by value, then sometime in the future, when the function is done, place the result in fut (as long as the function pow2 has a signature like double pow2(double);). But the specification states "as if", which makes the whole thing kinda foggy for me.
The question is:
Is a new thread always launched in this case? I hope so. I mean for me, the parameter std::launch::async makes sense in a way that I am explicitly stating I indeed want to create a new thread.
And the code
std::future<double> fut = std::async(std::launch::deferred, pow2, num);
should make lazy evaluation possible, by delaying the pow2 function call to the point where i write something like var = fut.get();. In this case the parameter std::launch::deferred, should mean that I am explicitly stating, I don't want a new thread, I just want to make sure the function gets called when there is need for it's return value.
Are my assumptions correct? If not, please explain.
Also, I know that by default the function is called as follows:
std::future<double> fut = std::async(std::launch::deferred | std::launch::async, pow2, num);
In this case, I was told that whether a new thread will be launched or not depends on the implementation. Again, what is that supposed to mean?
The std::async (part of the <future> header) function template is used to start a (possibly) asynchronous task. It returns a std::future object, which will eventually hold the return value of std::async's parameter function.
When the value is needed, we call get() on the std::future instance; this blocks the thread until the future is ready and then returns the value. std::launch::async or std::launch::deferred can be specified as the first parameter to std::async in order to specify how the task is run.
std::launch::async indicates that the function call must be run on its own (new) thread. (Take user #T.C.'s comment into account).
std::launch::deferred indicates that the function call is to be deferred until either wait() or get() is called on the future. Ownership of the future can be transferred to another thread before this happens.
std::launch::async | std::launch::deferred indicates that the implementation may choose. This is the default option (when you don't specify one yourself). It can decide to run synchronously.
Is a new thread always launched in this case?
From 1., we can say that a new thread is always launched.
Are my assumptions [on std::launch::deferred] correct?
From 2., we can say that your assumptions are correct.
What is that supposed to mean? [in relation to a new thread being launched or not depending on the implementation]
From 3., as std::launch::async | std::launch::deferred is the default option, it means that the implementation of the template function std::async will decide whether it will create a new thread or not. This is because some implementations may be checking for over scheduling.
WARNING
The following section is not related to your question, but I think that it is important to keep in mind.
The C++ standard says that if a std::future holds the last reference to the shared state corresponding to a call to an asynchronous function, that std::future's destructor must block until the thread for the asynchronously running function finishes. An instance of std::future returned by std::async will thus block in its destructor.
void operation()
{
auto func = [] { std::this_thread::sleep_for( std::chrono::seconds( 2 ) ); };
std::async( std::launch::async, func );
std::async( std::launch::async, func );
std::future<void> f{ std::async( std::launch::async, func ) };
}
This misleading code can make you think that the std::async calls are asynchronous, they are actually synchronous. The std::future instances returned by std::async are temporary and will block because their destructor is called right when std::async returns as they are not assigned to a variable.
The first call to std::async will block for 2 seconds, followed by another 2 seconds of blocking from the second call to std::async. We may think that the last call to std::async does not block, since we store its returned std::future instance in a variable, but since that is a local variable that is destroyed at the end of the scope, it will actually block for an additional 2 seconds at the end of the scope of the function, when local variable f is destroyed.
In other words, calling the operation() function will block whatever thread it is called on synchronously for approximately 6 seconds. Such requirements might not exist in a future version of the C++ standard.
Sources of information I used to compile these notes:
C++ Concurrency in Action: Practical Multithreading, Anthony Williams
Scott Meyers' blog post: http://scottmeyers.blogspot.ca/2013/03/stdfutures-from-stdasync-arent-special.html
I was also confused by this and ran a quick test on Windows which shows that the async future will be run on the OS thread pool threads. A simple application can demonstrate this, breaking out in Visual Studio will also show the executing threads named as "TppWorkerThread".
#include <future>
#include <thread>
#include <iostream>
using namespace std;
int main()
{
cout << "main thread id " << this_thread::get_id() << endl;
future<int> f1 = async(launch::async, [](){
cout << "future run on thread " << this_thread::get_id() << endl;
return 1;
});
f1.get();
future<int> f2 = async(launch::async, [](){
cout << "future run on thread " << this_thread::get_id() << endl;
return 1;
});
f2.get();
future<int> f3 = async(launch::async, [](){
cout << "future run on thread " << this_thread::get_id() << endl;
return 1;
});
f3.get();
cin.ignore();
return 0;
}
Will result in an output similar to:
main thread id 4164
future run on thread 4188
future run on thread 4188
future run on thread 4188
That is not actually true.
Add thread_local stored value and you will see, that actually std::async run f1 f2 f3 tasks in different threads, but with same std::thread::id

How to give the user some assigned time to answer?

Something like a stopwatch, give the person who is using my program about 30 second to answer, if no answer is got the program to exit ?
Basically the response shouldn't take more than the time given, otherwise the program will exit.
I found the answer by Axalo interesting, however fatally flawed by unfortunate minutia of std::async and std::future. So I'm presenting an alternative that eschews std::async but otherwise follows Axalo's basic design.
When I run Axalo's answer on my platform (which is conforming in the pertinent details), if the client never answers, getInputWithin never returns or exits. The program just hangs. And if the client answers well within the timeout, getInputWithin returns with the correct answer, but doesn't do so until the timeout period has expired.
The reason for this problem is subtle. It is well described in Herb Sutter's excellent paper N3630. A ~std::future() can block if it was returned by std::async() and will block until the associated task is done. This feature was intentionally put into async/future, and in the eyes of some, makes future completely useless.
Axalo's r1 and r2 are such std::futures whose destructor is supposed to block until the associated task is done. And this is why this solution hangs if the client never answers.
Below is an alternative answer which is built from thread, mutex, and condition_variable. It is otherwise very similar to Axalo's answer, but does not suffer from (what some consider) the design flaws of std::async.
#include <chrono>
#include <condition_variable>
#include <iostream>
#include <memory>
#include <mutex>
#include <stdexcept>
#include <string>
#include <thread>
#include <tuple>
std::string
getInputWithin(std::chrono::seconds timeout)
{
auto sp = std::make_shared<std::tuple<std::mutex, std::condition_variable,
std::string, bool>>();
std::thread([sp]() mutable
{
std::getline(std::cin, std::get<2>(*sp));
std::lock_guard<std::mutex> lk(std::get<0>(*sp));
std::get<3>(*sp) = true;
std::get<1>(*sp).notify_one();
sp.reset();
}).detach();
std::unique_lock<std::mutex> lk(std::get<0>(*sp));
if (!std::get<1>(*sp).wait_for(lk, timeout, [&]() {return std::get<3>(*sp);}))
throw std::runtime_error("time out");
return std::get<2>(*sp);
}
int main()
{
std::cout << "please answer within 10 seconds...\n";
std::string answer = getInputWithin(std::chrono::seconds(10));
std::cout << answer << '\n';
}
Notes:
The timing stays within the chrono type system always. Prefer the type std::chrono::seconds to a scalar with a suggestive name (int timeoutInSeconds vs std::chrono::seconds timeout).
We need to launch a std::thread to handle the read from std::cin, as Axalo demonstrated. However we are going to need a std::mutex and std::condition_variable for communication instead of using the convenience of std::future. Both the main thread and this auxiliary thread need to share ownership of these communication objects, and we don't know which will die first. If the client never responds, the auxiliary thread may live forever, creating an effective memory leak, which is another problem not solved herein. But at any rate, the easiest way to share ownership is to store the communication objects with a copied std::shared_ptr. Last one out turns out the lights.
Launch a std::thread that waits for std::cin and signals the main thread if it gets it. The signaling must be done with the mutex locked. Note that this thread can be (indeed must be) detached. The thread can not touch any memory that it does not own (because of the shared_ptr owning all referenced memory). If main exits while the auxiliary thread is running, the OS will bring the thread down gracefully with no UB.
The main thread then locks the mutex and does a wait_for on the condition_variable using the specified timeout, and a predicate that is checking for the bool in the tuple to turn to true. This wait_for will either return early with that bool set to true, or it will return with it set to false after timeout seconds. If they race (timeout and client answer at the same time) it is ok, either there will be a string there or not, and the bool in the tuple answers that question. While
the main thread is executing the wait_for, the mutex is unlocked so the auxiliary thread can use it.
If the main thread returns and the bool in the tuple has not been set to true, then an exception is thrown. If this exception is not caught, std::terminate() will be called. Otherwise, the string in the tuple will have the client's response.
This approach is susceptible to a client creating many responses to which it never answers, and thus effectively growing memory leaks held by shared_ptrs which never get destructed. Solving that problem is not something I know how to do in portable C++.
In C++14, a slight modification can be done with getInputWithin which reduces the error of choosing the wrong member of the tuple. Since our tuple is composed of all different types, we can index it by type instead of by position:
std::string
getInputWithin(std::chrono::seconds timeout)
{
auto sp = std::make_shared<std::tuple<std::mutex, std::condition_variable,
std::string, bool>>();
std::thread([sp]() mutable
{
std::getline(std::cin, std::get<std::string>(*sp)); // here
std::lock_guard<std::mutex> lk(std::get<std::mutex>(*sp)); // here
std::get<bool>(*sp) = true; // here
std::get<std::condition_variable>(*sp).notify_one(); // here
sp.reset();
}).detach();
std::unique_lock<std::mutex> lk(std::get<std::mutex>(*sp)); // here
if (!std::get<std::condition_variable>(*sp).wait_for(lk, timeout,
[&]() {return std::get<bool>(*sp);})) // here
throw std::runtime_error("time out");
return std::get<std::string>(*sp); // here
}
That is, the lines marked // here have been changed with std::get<type>(*sp) as opposed to std::get<index>(*sp).
Update
In a fit of paranoia inspired by the good comment from TemplateRex below, I've added a call to sp.reset() as the last thing the aux thread does. This forces the main thread to be the one to destruct the tuple, eliminating the possibility that the aux thread could stall before destructing its local copy of sp, and let main blow through the atexit chain, and then have the aux thread wake up and run the tuple destructor.
There may be other reasons that exist to make the call to sp.reset() unnecessary. But by adding this preventative medicine, we don't have to worry about it.
If you don't want to use exit and kill the process you could do it this way:
std::string getInputWithin(int timeoutInSeconds, bool *noInput = nullptr)
{
std::string answer;
bool exceeded = false;
bool gotInput = false;
auto r1 = std::async([&answer, &gotInput]()
{
std::getline(std::cin, answer);
gotInput = true;
});
auto r2 = std::async([&timeoutInSeconds, &exceeded]()
{
std::this_thread::sleep_for(std::chrono::seconds(timeoutInSeconds));
exceeded = true;
});
while(!gotInput && !exceeded)
{
std::this_thread::sleep_for(std::chrono::milliseconds(1));
}
if(gotInput)
{
if(noInput != nullptr) *noInput = false;
return answer;
}
if(noInput != nullptr) *noInput = true;
return "";
}
int main()
{
std::cout << "please answer within 10 seconds...\n";
bool noInput;
std::string answer = getInputWithin(10, &noInput);
return 0;
}
The nice thing about this is that you can now handle the missing input by using a default value or simply give the user a second chance, etc...

std::thread::join() hangs if called after main() exits when using VS2012 RC

The following example runs successfully (i.e. doesn't hang) if compiled using Clang 3.2 or GCC 4.7 on Ubuntu 12.04, but hangs if I compile using VS11 Beta or VS2012 RC.
#include <iostream>
#include <string>
#include <thread>
#include "boost/thread/thread.hpp"
void SleepFor(int ms) {
std::this_thread::sleep_for(std::chrono::milliseconds(ms));
}
template<typename T>
class ThreadTest {
public:
ThreadTest() : thread_([] { SleepFor(10); }) {}
~ThreadTest() {
std::cout << "About to join\t" << id() << '\n';
thread_.join();
std::cout << "Joined\t\t" << id() << '\n';
}
private:
std::string id() const { return typeid(decltype(thread_)).name(); }
T thread_;
};
int main() {
static ThreadTest<std::thread> std_test;
static ThreadTest<boost::thread> boost_test;
// SleepFor(100);
}
The issue appears to be that std::thread::join() never returns if it is invoked after main has exited. It is blocked at WaitForSingleObject in _Thrd_join defined in cthread.c.
Uncommenting SleepFor(100); at the end of main allows the program to exit properly, as does making std_test non-static. Using boost::thread also avoids the issue.
So I'd like to know if I'm invoking undefined behaviour here (seems unlikely to me), or if I should be filing a bug against VS2012?
Tracing through Fraser's sample code in his connect bug (https://connect.microsoft.com/VisualStudio/feedback/details/747145)
with VS2012 RTM seems to show a fairly straightforward case of deadlocking. This likely isn't specific to std::thread - likely _beginthreadex suffers the same fate.
What I see in the debugger is the following:
On the main thread, the main() function has completed, the process cleanup code has acquired a critical section called _EXIT_LOCK1, called the destructor of ThreadTest, and is waiting (indefinitely) on the second thread to exit (via the call to join()).
The second thread's anonymous function completed and is in the thread cleanup code waiting to acquire the _EXIT_LOCK1 critical section. Unfortunately, due to the timing of things (whereby the second thread's anonymous function's lifetime exceeds that of the main() function) the main thread already owns that critical section.
DEADLOCK.
Anything that extends the lifetime of main() such that the second thread can acquire _EXIT_LOCK1 before the main thread avoids the deadlock situation. That's why the uncommenting the sleep in main() results in a clean shutdown.
Alternatively if you remove the static keyword from the ThreadTest local variable, the destructor call is moved up to the end of the main() function (instead of in the process cleanup code) which then blocks until the second thread has exited - avoiding the deadlock situation.
Or you could add a function to ThreadTest that calls join() and call that function at the end of main() - again avoiding the deadlock situation.
I realize this is an old question regarding VS2012, but the bug is still present in VS2013. For those who are stuck on VS2013, perhaps due to Microsoft's refusal to provide upgrade pricing for VS2015, I offer the following analysis and workaround.
The problem is that the mutex (at_thread_exit_mutex) used by _Cnd_do_broadcast_at_thread_exit() is either not yet initialized, or has already been destroyed, depending on the exact circumstances. In the former case, _Cnd_do_broadcast_at_thread_exit() tries to initialize the mutex during shutdown, causing a deadlock. In the latter case, where the mutex has already been destroyed via the atexit stack, the program will crash on the way out.
The solution I found is to explicitly call _Cnd_do_broadcast_at_thread_exit() (which thankfully is declared publicly) early during program startup. This has the effect of creating the mutex before anyone else tries to access it, as well as ensuring that the mutex continues to exist until the last possible moment.
So, to fix the problem, insert the following code at the bottom of a source module, for instance somewhere below main().
#pragma warning(disable:4073) // initializers put in library initialization area
#pragma init_seg(lib)
#if _MSC_VER < 1900
struct VS2013_threading_fix
{
VS2013_threading_fix()
{
_Cnd_do_broadcast_at_thread_exit();
}
} threading_fix;
#endif
I believe your threads have already been terminated and their resources freed following the termination of your main function and before static destruction. This is the behavior of the VC runtimes dating back to at least VC6.
Do child threads exit when the parent thread terminates
boost thread and process cleanup on windows
My answer is too far late, but hope will help someone.
I was stucked by this bug, and i find a trick to solve this problem,it worked in my code.
int main()
{
ThreadTest trick_obj; //trick... You can put this line of code anywhere
static ThreadTest std_test;
return 1;
}
I have been battling this bug for a day, and found the following work-around, which turned out the be the least dirty trick:
Instead of returning, one can use the standard Windows API function call ExitThread() to terminate the thread. This method of course may mess up the internal state of the std::thread object and associated library, but since the program is going to terminate anyway, well, so be it.
#include <windows.h>
template<typename T>
class ThreadTest {
public:
ThreadTest() : thread_([] { SleepFor(10); ExitThread(NULL); }) {}
~ThreadTest() {
std::cout << "About to join\t" << id() << '\n';
thread_.join();
std::cout << "Joined\t\t" << id() << '\n';
}
private:
std::string id() const { return typeid(decltype(thread_)).name(); }
T thread_;
};
The join() call apparently works correctly. However, I chose to use a more safe method in our solution. One can get the thread HANDLE via std::thread::native_handle(). With this handle we can call the Windows API directly to join the thread:
WaitForSingleObject(thread_.native_handle(), INFINITE);
CloseHandle(thread_.native_handle());
Thereafter, the std::thread object must not be destroyed, as the destructor would try to join the thread a second time. So we just leave the std::thread object dangling at program exit.