C++ async threads terminate when calling thread finishes - c++

I am trying to do a recursive directory listing using a multi threaded approach. The following code works fine when replacing the async calls as a normal single threaded recursive function call but when implemented with async, the recursively started threads all seem to terminate when the initial async call made from main finishes as the output shows several calls to the function starting but the only directory where all of the files are output is the initial one and "Finished" is only output once although "Started" is output several times and some other directory's files are also output. I suspect I am missing something fundamental. Can anyone explain what is wrong with this code?
#include <filesystem>
#include <future>
#include <functional>
#include <concurrent_vector.h>
#include <concurrent_queue.h>
#include <iostream>
using namespace std;
using namespace std::tr2::sys;
using namespace concurrency;
concurrent_vector<future<void>> taskList;
void searchFiles(wstring path, concurrent_queue<wstring>& fileList)
wcout << L"Started " << path << endl;
wdirectory_iterator directoryIterator(path);
wdirectory_iterator endDirectory;
for( ; directoryIterator != endDirectory; ++directoryIterator)
wcout << path + L"/" + (wstring)directoryIterator->path() << endl;
if ( is_directory(directoryIterator->status() ) )
taskList.push_back( async( launch::async, searchFiles, path +
L"/" + (wstring)directoryIterator->path(), ref(fileList) ));
fileList.push( path + L"/" + (wstring)directoryIterator->path() );
wcout << L"Finished " << path << endl;
int main()
concurrent_queue<wstring> fileList;
wstring path = L"..";
taskList.push_back( async( launch::async, searchFiles, path, ref(fileList) ));
for (auto &x: taskList)
BTW some might ask why I am not using wrecursive_directory_iterator. Apparently wrecursive_directory_iterator will throw an exception and stop with no way to continue if you do not have read permission so this method should allow you to continue on in that case.

The problem is the range-based for loop.
If we take a look at how the range-based for statement is defined, we see that the end-iterator of the loop will only be calculated once. At the time of entering the loop, there is probably(this is a race) only one future in your vector(the one you pushed back in the line above). Thus after that task finishes, the iterator will be incremented and be equal to your old end-iterator and the loop will finish even though the vector might now contain more elements which got pushed back in your first task. There are even more problems to this.
The destructor of the vector which will be called after finishing the loop should normally call the destructor of all its elements which for a future from std::async would be equal to calling wait, though you are still adding elements to the vector while it's already in its destructor, which is probably UB.
Another point is that the end-iterator you created on entering the for-loop will be invalidated as soon as you push_back to your vector in your first thread, this means that you are operating on invalidated iterators.
As a solution I would propose to avoid the global task-list and instead use a local task-list in your searchFiles function, you can then wait on all your local futures in your searchFiles function on each level. This is a common pattern in non-managed recursive parallelism.
Note: I don't know all the details from the ppl concurrent_vector but I assume it behaves similar to a std::vector.


Multithreaded program works only with print statements

I wish I could have thought of a more descriptive title, but that is the case. I've got some code that I'd like to do some image processing with. I also need to get some statistical data from those processed images, and I'd like to do that on a separate thread so my main thread can keep doing image processing.
All that aside, here's my code. It shouldn't really be relevant, other than the fact that my Image class wraps an OpenCV Mat (though I'm not using OMP or anything, as far as I'm aware):
#include <thread>
#include <iostream>
#include <vector>
using namespace std;
//Data struct
struct CenterFindData{
//Some images I'd like to store
Image in, bpass, bpass_thresh, local_max, tmp;
//Some Data (Particle Data = float[8])
vector<ParticleData> data;
//My thread flag
bool goodToGo{ false };
CenterFindData(const Image& m);
vector<ParticleData> statistics(CenterFindData& CFD);
void operate(vector<CenterFindData> v_CFD);
void operate(vector<CenterFindData> v_CFD){
//Thread function, gathers statistics on processed images
thread T( [&](){
int nProcessed(0);
for (auto& cfd : v_CFD){
//Chill while the images are still being processed
while (cfd.goodToGo == false){
//This works if I uncomment this print statement
/*cout << "Waiting" << endl;*/
cout << "Statistics gathered from " << nProcessed++ << " images" << endl;
//This returns vector<ParticleData>
cfd.data = m_Statistics(cfd);
//Run some filters on the images before statistics
int nProcessed(0);
for (auto& cfd : v_CFD){
//Preprocess images
//Tell thread to do statistics, on to the next
cfd.goodToGo = true;
cout << "Ran filters on " << nProcessed++ << " images" << endl;
//Join thread
I figure that the delay from cout is avoid some race condition I otherwise run into, but what? Because only one thread modified the bool goodToGo, while the other checks it, should that be a thread safe way of "gating" the two functions?
Sorry if anything is unclear, I'm very new to this and seem to make a lot of obvious mistakes WRT multithreaded programming.
Thanks for your help
When you have:
while (cfd.goodToGo == false){ }
the compiler doesn't see any reason to "reload" the value of goodToGo (it doesn't know that this value is affected by other threads!). So it reads it once, and then loops forever.
The reason printing something makes a difference is, that the compiler doesn't know what the print function actually will and won't affect, so "just in case", it reloads that value (if the compiler could "see inside" all of the printing code, it could in fact decide that goodToGo is NOT changed by the printing, and not needing to reload - but there are limits to how much time [or some proxy for time, such as "number of levels of calls" or "number of intermediate instructions"] the compiler spends on figuring these sort of thing out [and there may of course be calls to code that the compiler doesn't actually have access to the source code of, such as the system calls to write or similar].
The solution, however, is to use thread-safe mechanisms to update the goodToGo - we could just throw a volatile attribute to the variable, but that will not guarantee that, for example, another processor gets "told" that the value has updated, so could delay the detection of the updated value significantly [or even infinitely under some conditions].
Use std::atomic_bool goodToGo and the store and load functions to access the value inside. That way, you will be guaranteed that the value is updated correctly and "immediately" (as in, a few dozen to hundreds of clock-cycles later) seen by the other thread.
As a side-note, which probably should have been the actual answer: Busy-waiting for threads is a bad idea in general, you should probably look at some thread-primitives to wait for a condition_variable or similar.

Asynchronous IO in C++11

I need to run some iterative algorithm of which I don't know whether it will converge to the desired accuracy within reasonable time. It would therefore be cool if I could print the residual after each iteration and once I'm satisfied/out of patience I could tell the program to write the current solution to disk and terminate.
Usually, to achieve this the program would have to ask after every iteration whether it should terminate now, and most of the time I would have to tell it not to. This is clearly annoying. Can't I tell the program to run until I hit a certain key, and once I do it should finish the current iteration, write the approximation to disk and terminate?
Yes, you can, and you can even do it using only standard C++11 features. The trick is to spawn a new thread whose only job it is to listen to std::cin. Once the user writes anything, the listening thread sets a flag which tells the worker thread to abort. In the following small example, I implement a "stopwatch" using this technique.
#include <iostream>
#include <thread>
#include <atomic>
int main() {
std::atomic<bool> abort(false);
std::thread t([&abort] () {
std::cout << "Abort?";
abort = true;
unsigned long i = 0;
while (!abort) ++i;
std::cout << "Counted to " << i << std::endl;
return 0;
You may now try to terminate the program exactly when it reached 100000000. :-)

Delayed Function Call

What's the most elegant way of performing a delayed (and therefore also asynchronous) functional call using C++11, lambdas and async? Suggested naming: delayed_async. Reason for asking is that I want a GUI alert light to be switched off after given time (in this case one second) without blocking the main (wxWidgets main loop) thread of course. I've use wxWidgets' wxTimer for this and I find wxTimer rather cumbersome to use in this case. So that got my curious about how much more convenient this could be implemented if I instead used C++11's async1, 2. I'm aware of that I need to protect the resources involved with mutexes, when using async.
You mean something like this?
#include <iostream>
#include <chrono>
#include <thread>
#include <future>
int main()
// Use async to launch a function (lambda) in parallel
std::async(std::launch::async, [] () {
// Use sleep_for to wait specified time (or sleep_until).
std::this_thread::sleep_for( std::chrono::seconds{1});
// Do whatever you want.
std::cout << "Lights out!" << std::endl;
} );
std::this_thread::sleep_for( std::chrono::seconds{2});
std::cout << "Finished" << std::endl;
Just make sure that you don't capture a variable by reference in the lambda.

BOOST threading : cout behavior

I am new to Boost threading and I am stuck with how output is performed from multiple threads.
I have a simple boost::thread counting down from 9 to 1; the main thread waits and then prints "LiftOff..!!"
#include <iostream>
#include <boost/thread.hpp>
using namespace std;
struct callable {
void operator() ();
void callable::operator() () {
int i = 10;
while(--i > 0) {
cout << "#" << i << ", ";
int main() {
callable x;
boost::thread myThread(x);
cout << "LiftOff..!!" << endl;
return 0;
The problem is that I have to use an explicit "cout.flush()" statement in my thread to display the output. If I don't use flush(), I only get "LiftOff!!" as the output.
Could someone please advise why I need to use flush() explicitly?
This isn't specifically thread related as cout will buffer usually on a per thread basis and only output when the implementation decides to - so in the thread the output will only appear on a implementation specific basic - by calling flush you are forcing the buffers to be flushed.
This will vary across implementations - usually though it's after a certain amount of characters or when a new line is sent.
I've found that multiple threads writing too the same stream or file is mostly OK - providing that the output is performed as atomically as possible. It's not something that I'd recommend in a production environment though as it is too unpredictable.
This behaviour seems to depend on OS specific implementation of the cout stream. I guess that write operations on cout are buffered to some thread specific memory intermediatly in your case, and the flush() operation forces them being printed on the console. I guess this, since endl includes calling the flush() operation and the endl in your main function doesn't see your changes even after the thread has been joined.
BTW it would be a good idea to synchronize outputs to an ostream shared between threads anyway, otherwise you might see them intermigled. We do so for our logging classes which use a background thread to write the logging messages to the associated ostream.
Given the short length of your messages, there's no reason anything should appear without a flush. (Don't forget that std::endl is the equivalent of << '\n' << std::flush.)
I get the asked behaviour with and without flush (gcc 4.3.2 boost 1.47 Linux RH5)
I assume that your cygwin system chooses to implement several std::cout objects with associated std::streambuf. This I assume is implementation specific.
Since flush or endl only forces its buffer to flush onto its OS controlled output sequence the cout object of your thread remains buffered.
Sharing a reference of an ostream between the threads should solve the problem.

Multithreading using the boost library

Wish to simultaneously call a function multiple times. I wish to use threads to call a function which will utilize the machines capability to the fullest. This is a 8 core machine, and my requirement is to use the machine cpu from 10% to 100% or more.
My requirement is to use the boost class. Is there any way I can accomplish this using the boost thread or threadpool library? Or some other way to do it?
Also, if I have to call multiple functions with different parameters each time (with separate threads), what is the best way to do this? [using boost or not using boost] and how?
#include <iostream>
#include <fstream>
#include <string.h>
#include <time.h>
#include <boost/thread/mutex.hpp>
#include <boost/bind.hpp>
using namespace std;
using boost::mutex;
using boost::thread;
int threadedAPI1( );
int threadedAPI2( );
int threadedAPI3( );
int threadedAPI4( );
int threadedAPI1( ) {
cout << "Thread0" << endl;
int threadedAPI2( ) {
cout << "Thread1" << endl;
int threadedAPI3( ) {
cout << "Thread2" << endl;
int threadedAPI4( ) {
cout << "Thread3" << endl;
int main(int argc, char* argv[]) {
boost::threadpool::thread_pool<> threads(4);
// start a new thread that calls the "threadLockedAPI" function
// wait for the thread to finish
return 0;
The above is not working and I am not sure why? :-(
I suggest that you read up on the documentation for the functions you use. From your comment in James Hopkin's answer, it seems like you don't know what boost::bind does, but simply copy-pasted the code.
boost::bind takes a function (call it f), and optionally a number of parameters, and returns a function which, when called, calls f with the specified parameters.
That is, boost::bind(threadedAPI1, 0)() (creating a function which takes no arguments and calls threadedAPI1() with the argument 0, and then calling that) is equivalent to threadedAPI1(0).
Since your threadedAPI functions don't actually take any parameters, you can't pass any arguments to them. That is just fundamental C++. You can't call threadedAPI1(0), but only threadedAPI1(), and yet when you call the function, you try (via boost::bind) to pass the integer 0 as an argument.
So the simple answer to your question is to simply define threadedAPI1 as follows:
int threadedAPI1(int i);
However, one way to avoid the boost::bind calls is to call a functor instead of a free function when launching the thread. Declare a class something like this:
struct threadedAPI {
threadedAPI(int i) : i(i) {} // A constructor taking the arguments you wish to pass to the thread, and saves them in the class instance.
void operator()() { // The () operator is the function that is actually called when the thread starts, and because it is just a regular class member function, it can see the 'i' variable initialized by the constructor
cout << "Thread" << i << endl; // No need to create 4 identical functions. We can just reuse this one, and pass a different `i` each time we call it.
int i;
Finally, depending on what you need, plain threads may be better suited than a threadpool. In general, a thread pool only runs a limited number of threads, so it may queue up some tasks until one of its threads finish executing. It is mainly intended for cases where you have many short-lived tasks.
If you have a fixed number of longer-duration tasks, creating a dedicated thread for each may be the way to go.
You're binding parameters to functions that don't take parameters:
int threadedAPI1( );
Just pass the function directly if there are no parameters:
If your interest is in using your processor effeciently then you might want to consider intels thread building blocks http://www.intel.com/cd/software/products/asmo-na/eng/294797.htm. I believe it is designed specifically to utilise multi core processors while boost threads leaves control up to the user (i.e. TBB will thread differently on a quad core compared to a dual core).
As for your code you are binding functions which don't take parameters to a parameter. Why? You might also want to check the return code from schedule.