std::async vs. thread [duplicate] - c++

This question already has answers here:
When to use std::async vs std::threads?
(5 answers)
Closed 1 year ago.
I am trying to understand how exactly async differs from using threads. On a conceptual level, I thought multithreading was by definition asynchronous, because you are doing context switches between threads for things like I/O.
But it seems that even for instances like single-threaded applications, just adding threads would be the same as using async. For example:
#include <iostream> // std::cout
#include <future> // std::async, std::future
// a non-optimized way of checking for prime numbers:
bool is_prime (int x) {
std::cout << "Calculating. Please, wait...\n";
for (int i=2; i<x; ++i) if (x%i==0) return false;
return true;
}
int main ()
{
// call is_prime(313222313) asynchronously:
std::future<bool> fut = std::async (is_prime,313222313);
std::cout << "Checking whether 313222313 is prime.\n";
// ...
bool ret = fut.get(); // waits for is_prime to return
if (ret) std::cout << "It is prime!\n";
else std::cout << "It is not prime.\n";
return 0;
}
Why can't I just create a thread to call is_prime that writes to some variable, and then call join() before I print that variable? If I can do this, what really is the benefit of using async? Some specific examples would be very helpful.

This is not C++ specific, so I try to be a little bit generic. I'm sure there are C++ specific quirks as well.
Generally speaking, yes. You could just create a variable for the output, start a thread, give the address of the variable to the thread and later .join the thread and access the variable after the thread wrote to it. That works. Nothing wrong with it. We did that for many years.
But as the program gets more complicated, this gets more and more messy. More and more thread to keep running, more and more variables to keep in mind when and how to access them safely. Can I print i here, or do I need to .join a specific thread first? Who knows.
Futures (or Promises or Tasks) and async/await is a pattern many languages use nowadays under those or very similar names. They don't do anything we could not do before, but they make it a lot easier to maintain when the program grows and is no longer this one page example program that everybody can read on one screen.

Related

Benefits of using std::stop_source and std::stop_token instead of std::atomic<bool> for deferred cancellation?

When I run several std::threads in parallell and need to cancel other threads in a deferred manner if one thread fails I use a std::atomic<bool> flag:
#include <thread>
#include <chrono>
#include <iostream>
void threadFunction(unsigned int id, std::atomic<bool>& terminated) {
srand(id);
while (!terminated) {
int r = rand() % 100;
if (r == 0) {
std::cerr << "Thread " << id << ": an error occured.\n";
terminated = true; // without this line we have to wait for other thread to finish
return;
}
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
}
int main()
{
std::atomic<bool> terminated = false;
std::thread t1(&threadFunction, 1, std::ref(terminated));
std::thread t2(&threadFunction, 2, std::ref(terminated));
t1.join();
t2.join();
std::cerr << "Both threads finished.\n";
int k;
std::cin >> k;
}
However now I am reading about std::stop_sourceand std::stop_token.
I find that I can achieve the same as above by passing both a std::stop_sourceby reference and std::stop_token by value to the thread function?
How would that be superior?
I understand that when using std::jthread the std::stop_token is very convenient if I want to stop threads from outside the threads.
I could then call std::jthread::request_stop() from the main program.
However in the case where I want to stop threads from a thread is it still better?
I managed to achieve the same thing as in my code using std::stop_source:
void threadFunction(std::stop_token stoken, unsigned int id, std::stop_source source) {
srand(id);
while (!stoken.stop_requested()) {
int r = rand() % 100;
if (r == 0) {
std::cerr << "Thread " << id << ": an error occured.\n";
source.request_stop(); // without this line we have to wait for other thread to finish
return;
}
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
}
int main()
{
std::stop_source source;
std::stop_token stoken = source.get_token();
std::thread t1(&threadFunction, stoken, 1, source);
std::thread t2(&threadFunction, stoken, 2, source);
t1.join();
t2.join();
std::cerr << "Both threads finished.\n";
int k;
std::cin >> k;
}
Using std::jthread would have resulted in more compact code:
std::jthread t1(&threadFunction, 1, source);
std::jthread t2(&threadFunction, 2, source);
But that did not seem to work.
It didn't work because std::jthread has a special feature where, if the first parameter of a thread-function is a std::stop_token, it fills that token in by an internal stop_source object.
What you ought to do is only pass a stop_source (by value, not by reference), and extract the token from it within your thread function.
As for why this is better than a reference to an atomic, there are a myriad of reasons. The first being that stop_source is a lot safer than a bare reference to an object whose lifetime is not under the local control of the thread function. The second being that you don't have to do std::ref gymnastics to pass parameters. This can be a source of bugs since you might accidentally forget to do that in some place.
The standard stop_token mechanism has features beyond just requesting and responding to a stop. Since the response to a stop happens at an arbitrary time after issuing it, it may be necessary to execute some code when the stop is actually requested rather than when it is responded to. The stop_callback mechanism allows you to register a callback with a stop_token. This callback will be called in the thread of the stop_source::request_stop call (unless you register the callback after the stop was requested, in which case it's called right when you register it). This can be useful in limited cases, and it's not simple code to write yourself. Especially when all you have is an atomic<bool>.
And then there's simple readability. Passing a stop_source tells you exactly what is going on without having to even see the name of a parameter. Passing an atomic<bool> tells you very little from just the typename; you have to look at the parameter name or its usage in the function to know that it is for halting the thread.
Apart from being more expressive and communicating intentions better, stop_token and friends achieve something really important for jthread. To understand it you have to consider its destructor which looks something like this:
~jthread()
{
if(joinable())
{
// Not only user code, but the destructor as well
// will let your callback know it's time to go.
request_stop();
join();
}
}
by encapsulating a stop_source, jthread facilitates what is called cooperative cancellation. As you've also noted, you never have to pass the stop_token to a jthread, just provide a callback that accepts the token as its first parameter. What happens next is that the class can detect that your callback accepts a stop token and pass a token to its internal stop source when calling it.
What does this mean for cooperative cancellation? Safer termination of course! Since jthread will always attempt to join on destruction, it now has the means to prevent endless loops and deadlocks where two or more threads wait for each other to finish. By using stop_token your code can make sure that it can safely join when it's time to go.
However in the case where I want to stop threads from a thread is it still better?
Now regarding the feature you are requesting, that's what C# calls "linked cancellation". Yes, there are requests and discussions to add a parameter in the jthread constructor so that it can refer to an external stop source, but that's not yet available (and has many implications). Doing something similar purely with stop tokens would require a stop_callback to tie all cancellations together, but still it could be suboptimal (as shown in the link). The bottom line is that jthread needs stop_token, but in some cases you may not need jthread, especially if the following solution does not appeal to you:
stop_source ssource;
std::stop_callback cb {ssource.get_token(), [&] {
t1.request_stop();
t2.request_stop();
}};
ssource.request_stop(); // This stops boths threads.
The good news is that if you don't fall into the suboptimal pattern described in the link (i.e. you don't need an asynchronous termination), then this functionality is easy to abstract into a utility, something like:
auto linked_cancellations = [](auto&... jthreads) {
stop_source s;
return std::make_pair(s, std::stop_callback{
s.get_token(), [&]{ (jthreads.request_stop(), ...); }});
};
which you'd use as
auto [stop_source, cb] = linked_cancellations(t1, t2);
// or as many thread objects as you want to link ^^^
stop_source.request_stop(); // Stops all the threads that you linked.
Now if you want to control the linked threads from within the thread, I'd use the initial pattern (std::atomic<bool>), since having a callback with both a stop token and a stop source is somewhat confusing.

Why std::future is different returned from std::packaged_task and std::async?

I got to know the reason that future returned from std::async has some special shared state through which wait on returned future happened in the destructor of future. But when we use std::pakaged_task, its future does not exhibit the same behavior.
To complete a packaged task, you have to explicitly call get() on future object from packaged_task.
Now my questions are:
What could be the internal implementation of future (thinking std::async vs std::packaged_task)?
Why the same behavior was not applied to future returned from std::packaged_task? Or, in other words, how is the same behavior stopped for std::packaged_task future?
To see the context, please see the code below:
It does not wait to finish countdown task. However, if I un-comment // int value = ret.get();, it would finish countdown and is obvious because we are literally blocking on returned future.
// packaged_task example
#include <iostream> // std::cout
#include <future> // std::packaged_task, std::future
#include <chrono> // std::chrono::seconds
#include <thread> // std::thread, std::this_thread::sleep_for
// count down taking a second for each value:
int countdown (int from, int to) {
for (int i=from; i!=to; --i) {
std::cout << i << std::endl;
std::this_thread::sleep_for(std::chrono::seconds(1));
}
std::cout << "Lift off!" <<std::endl;
return from-to;
}
int main ()
{
std::cout << "Start " << std::endl;
std::packaged_task<int(int,int)> tsk (countdown); // set up packaged_task
std::future<int> ret = tsk.get_future(); // get future
std::thread th (std::move(tsk),10,0); // spawn thread to count down from 10 to 0
// int value = ret.get(); // wait for the task to finish and get result
std::cout << "The countdown lasted for " << std::endl;//<< value << " seconds.\n";
th.detach();
return 0;
}
If I use std::async to execute task countdown on another thread, no matter if I use get() on returned future object or not, it will always finish the task.
// packaged_task example
#include <iostream> // std::cout
#include <future> // std::packaged_task, std::future
#include <chrono> // std::chrono::seconds
#include <thread> // std::thread, std::this_thread::sleep_for
// count down taking a second for each value:
int countdown (int from, int to) {
for (int i=from; i!=to; --i) {
std::cout << i << std::endl;
std::this_thread::sleep_for(std::chrono::seconds(1));
}
std::cout << "Lift off!" <<std::endl;
return from-to;
}
int main ()
{
std::cout << "Start " << std::endl;
std::packaged_task<int(int,int)> tsk (countdown); // set up packaged_task
std::future<int> ret = tsk.get_future(); // get future
auto fut = std::async(std::move(tsk), 10, 0);
// int value = fut.get(); // wait for the task to finish and get result
std::cout << "The countdown lasted for " << std::endl;//<< value << " seconds.\n";
return 0;
}
std::async has definite knowledge of how and where the task it is given is executed. That is its job: to execute the task. To do that, it has to actually put it somewhere. That somewhere could be a thread pool, a newly created thread, or in a place to be executed by whomever destroys the future.
Because async knows how the function will be executed, it has 100% of the information it needs to build a mechanism that can communicate when that potentially asynchronous execution has concluded, as well as to ensure that if you destroy the future, then whatever mechanism that's going to execute that function will eventually get around to actually executing it. After all, it knows what that mechanism is.
But packaged_task doesn't. All packaged_task does is store a callable object which can be called with the given arguments, create a promise with the type of the function's return value, and provide a means to both get a future and to execute the function that generates the value.
When and where the task actually gets executed is none of packaged_task's business. Without that knowledge, the synchronization needed to make future's destructor synchronize with the task simply can't be built.
Let's say you want to execute the task on a freshly-created thread. OK, so to synchronize its execution with the future's destruction, you'd need a mutex which the destructor will block on until the task thread finishes.
But what if you want to execute the task in the same thread as the caller of the future's destructor? Well, then you can't use a mutex to synchronize that since it all on the same thread. Instead, you need to make the destructor invoke the task. That's a completely different mechanism, and it is contingent on how you plan to execute.
Because packaged_task doesn't know how you intend to execute it, it cannot do any of that.
Note that this is not unique to packaged_task. All futures created from a user-created promise object will not have the special property of async's futures.
So the question really ought to be why async works this way, not why everyone else doesn't.
If you want to know that, it's because of two competing needs: async needed to be a high-level, brain-dead simple way to get asynchronous execution (for which sychronization-on-destruction makes sense), and nobody wanted to create a new future type that was identical to the existing one save for the behavior of its destructor. So they decided to overload how future works, complicating its implementation and usage.
#Nicol Bolas has already answered this question quite satisfactorily. So I'll attempt to answer the question slightly from different perspective, elaborating the points already mentioned by #Nicol Bolas.
The design of related things and their goals
Consider this simple function which we want to execute, in various ways:
int add(int a, int b) {
std::cout << "adding: " << a << ", "<< b << std::endl;
return a + b;
}
Forget std::packaged_task, std ::future and std::async for a while, let's take one step back and revisit how std::function works and what problem it causes.
case 1 — std::function isn't good enough for executing things in different threads
std::function<int(int,int)> f { add };
Once we have f, we can execute it, in the same thread, like:
int result = f(1, 2); //note we can get the result here
Or, in a different thread, like this:
std::thread t { std::move(f), 3, 4 };
t.join();
If we see carefully, we realize that executing f in a different thread creates a new problem: how do we get the result of the function? Executing f in the same thread does not have that problem — we get the result as returned value, but when executed it in a different thread, we don't have any way to get the result. That is exactly what is solved by std::packaged_task.
case 2 — std::packaged_task solves the problem which std::function does not solve
In particular, it creates a channel between threads to send the result to the other thread. Apart from that, it is more or less same as std::function.
std::packaged_task<int(int,int)> f { add }; // almost same as before
std::future<int> channel = f.get_future(); // get the channel
std::thread t{ std::move(f), 30, 40 }; // same as before
t.join(); // same as before
int result = channel.get(); // problem solved: get the result from the channel
Now you see how std::packaged_task solves the problem created by std::function. That however does not mean that std::packaged_task has to be executed in a different thread. You can execute it in the same thread as well, just like std::function, though you will still get the result from the channel.
std::packaged_task<int(int,int)> f { add }; // same as before
std::future<int> channel = f.get_future(); // same as before
f(10, 20); // execute it in the current thread !!
int result = channel.get(); // same as before
So fundamentally std::function and std::packaged_task are similar kind of thing: they simply wrap callable entity, with one difference: std::packaged_task is multithreading-friendly, because it provides a channel through which it can pass the result to other threads. Both of them do NOT execute the wrapped callable entity by themselves. One needs to invoke them, either in the same thread, or in another thread, to execute the wrapped callable entity. So basically there are two kinds of thing in this space:
what is executed i.e regular functions, std::function, std::packaged_task, etc.
how/where is executed i.e threads, thread pools, executors, etc.
case 3: std::async is an entirely different thing
It's a different thing because it combines what-is-executed with how/where-is-executed.
std::future<int> fut = std::async(add, 100, 200);
int result = fut.get();
Note that in this case, the future created has an associated executor, which means that the future will complete at some point as there is someone executing things behind the scene. However, in case of the future created by std::packaged_task, there is not necessarily an executor and that future may never complete if the created task is never given to any executor.
Hope that helps you understand how things work behind the scene. See the online demo.
The difference between two kinds of std::future
Well, at this point, it becomes pretty much clear that there are two kinds of std::future which can be created:
One kind can be created by std::async. Such future has an associated executor and thus can complete.
Other kind can be created by std::packaged_task or things like that. Such future does not necessarily have an associated executor and thus may or may not complete.
Since, in the second case the future does not necessarily have an associated executor, its destructor is not designed for its completion/wait because it may never complete:
{
std::packaged_task<int(int,int)> f { add };
std::future<int> fut = f.get_future();
} // fut goes out of scope, but there is no point
// in waiting in its destructor, as it cannot complete
// because as `f` is not given to any executor.
Hope this answer helps you understand things from a different perspective.
The change in behaviour is due to the difference between std::thread and std::async.
In the first example, you have created a daemon thread by detaching. Where you print std::cout << "The countdown lasted for " << std::endl; in your main thread, may occur before, during or after the print statements inside the countdown thread function. Because the main thread does not await the spawned thread, you will likely not even see all of the print outs.
In the second example, you launch the thread function with the std::launch::deferred policy. The behaviour for std::async is:
If the async policy is chosen, the associated thread completion synchronizes-with the successful return from the first function that is waiting on the shared state, or with the return of the last function that releases the shared state, whichever comes first.
In this example, you have two futures for the same shared state. Before their dtors are called when exiting main, the async task must complete. Even if you had not explicitly defined any futures, the temporary future that gets created and destroyed (returned from the call to std::async) will mean that the task completes before the main thread exits.
Here is a great blog post by Scott Meyers, clarifying the behaviour of std::future & std::async.
Related SO post.

Check if a thread is finished to send another param to it

I wanna to check if a thread job has been finished to call it again and send another parameter to that. The code is sth like this:
void SendMassage(double Speed)
{
Sleep(200);
cout << "Speed:" << Speed << endl;
}
int main() {
int Speed_1 = 0;
thread f(SendMassage, Speed_1);
for (int i = 0; i < 50; i++)
{
Sleep(20);
if (?)
{
another call of thread // If last thread done then call it again, otherwise not.
}
Speed_1++;
}
}
How should I do it?
Use, e.g., an atomic flag to indicate that the thread has finished:
std::atomic<bool> finished_flag{false};
void SendMassage(double Speed) {
Sleep(200);
cout << "Speed:" << Speed << endl;
finished_flag = true;
}
int main() {
int Speed_1 = 0;
thread f(SendMassage, Speed_1);
while (Speed_1 < 50) {
Sleep(20);
if (finished_flag) {
f.join();
finished_flag = false;
f = std::thread(SendMassage, Speed_1);
}
Speed_1++;
}
f.join();
}
Working example: https://wandbox.org/permlink/BrEMHFvlInshBy5V
Note that I assumed that, according to your code, you don't want to block when checking whether the thread f has finished. Otherwise, simply call f.join().
If you want to wait untill a thread has finished it's job without using Sleep, you neeed to call it's join method, like so
thread t(SendMassage, Speed_1);
t.join();
//Code here will start executing after returning from join
You can read more about it here http://en.cppreference.com/w/cpp/thread/thread/join
About sending another parameter, I think the best way would be splitting it into another function that you would call after this thread has been joined, if you need some information about something that's known only inside the function, you could create a class that would store that information in it's fields, and use it in the function you're threading.
The possibly most simple way of doing so is just joining the thread. Nothing clever, but...
OK, but why would you then want to have another thread at all if your main thread passes all its time sleeping anyway, so you quite sure are looking for something cleverer.
I personally like the principle of queues; you could use e. g. a std::deque for:
Your producer thread places in some values, your consumer thread just takes them out. Of course, you need to protect your queue via a std::mutex (or by other appropriate means) against race conditions...
The consumer would be running in an endless loop, processing the queue, if entries are available, or sleep if this is not the case. Have a look at this response for how to do the waiting...
There is the danger, though, that your queue runs full, so you might define some threshold when you stop or at least slow down producing new values, if you discover your producer being too fast. The queue has another advantage, though: If your producer is too fast, you might have more than one consumer, all serving the same queue (depending on your needs, putting together the results might need some extra efforts to keep ordering of correct).
Admitted, that's quite some work to do, it might be worth the effort, it might be overkill. If simpler approaches fit your needs already, Daniel's answer is fine, too...

C++ - Threads without coordinating mechanism like mutex_Lock

I attended one interview two days back. The interviewed guy was good in C++, but not in multithreading. When he asked me to write a code for multithreading of two threads, where one thread prints 1,3,5,.. and the other prints 2,4,6,.. . But, the output should be 1,2,3,4,5,.... So, I gave the below code(sudo code)
mutex_Lock LOCK;
int last=2;
int last_Value = 0;
void function_Thread_1()
{
while(1)
{
mutex_Lock(&LOCK);
if(last == 2)
{
cout << ++last_Value << endl;
last = 1;
}
mutex_Unlock(&LOCK);
}
}
void function_Thread_2()
{
while(1)
{
mutex_Lock(&LOCK);
if(last == 1)
{
cout << ++last_Value << endl;
last = 2;
}
mutex_Unlock(&LOCK);
}
}
After this, he said "these threads will work correctly even without those locks. Those locks will reduce the efficiency". My point was without the lock there will be a situation where one thread will check for(last == 1 or 2) at the same time the other thread will try to change the value to 2 or 1. So, My conclusion is that it will work without that lock, but that is not a correct/standard way. Now, I want to know who is correct and in which basis?
Without the lock, running the two functions concurrently would be undefined behaviour because there's a data race in the access of last and last_Value Moreover (though not causing UB) the printing would be unpredictable.
With the lock, the program becomes essentially single-threaded, and is probably slower than the naive single-threaded code. But that's just in the nature of the problem (i.e. to produce a serialized sequence of events).
I think the interviewer might have thought about using atomic variables.
Each instantiation and full specialization of the std::atomic template defines an atomic type. Objects of atomic types are the only C++ objects that are free from data races; that is, if one thread writes to an atomic object while another thread reads from it, the behavior is well-defined.
In addition, accesses to atomic objects may establish inter-thread synchronization and order non-atomic memory accesses as specified by std::memory_order.
[Source]
By this I mean the only thing you should change is remove the locks and change the lastvariable to std::atomic<int> last = 2; instead of int last = 2;
This should make it safe to access the last variable concurrently.
Out of curiosity I have edited your code a bit, and ran it on my Windows machine:
#include <iostream>
#include <atomic>
#include <thread>
#include <Windows.h>
std::atomic<int> last=2;
std::atomic<int> last_Value = 0;
std::atomic<bool> running = true;
void function_Thread_1()
{
while(running)
{
if(last == 2)
{
last_Value = last_Value + 1;
std::cout << last_Value << std::endl;
last = 1;
}
}
}
void function_Thread_2()
{
while(running)
{
if(last == 1)
{
last_Value = last_Value + 1;
std::cout << last_Value << std::endl;
last = 2;
}
}
}
int main()
{
std::thread a(function_Thread_1);
std::thread b(function_Thread_2);
while(last_Value != 6){}//we want to print 1 to 6
running = false;//inform threads we are about to stop
a.join();
b.join();//join
while(!GetAsyncKeyState('Q')){}//wait for 'Q' press
return 0;
}
and the output is always:
1
2
3
4
5
6
Ideone refuses to run this code (compilation errors)..
Edit: But here is a working linux version :) (thanks to soon)
The interviewer doesn't know what he is talking about. Without the locks you get races on both last and last_value. The compiler could for example reorder the assignment to last before the print and increment of last_value, which could lead to the other thread executing on stale data. Furthermore you could get interleaved output, meaning things like two numbers not being seperated by a linebreak.
Another thing, which could go wrong is that the compiler might decide not to reload last and (less importantly) last_value each iteration, since it can't (safely) change between those iterations anyways (since data races are illegal by the C++11 standard and aren't acknowledged in previous standards). This means that the code suggested by the interviewer actually has a good chance of creating infinite loops of doing absoulutely doing nothing.
While it is possible to make that code correct without mutices, that absolutely needs atomic operations with appropriate ordering constraints (release-semantics on the assignment to last and acquire on the load of last inside the if statement).
Of course your solution does lower efficiency due to effectivly serializing the whole execution. However since the runtime is almost completely spent inside the streamout operation, which is almost certainly internally synchronized by the use of locks, your solution doesn't lower the efficiency anymore then it already is. Waiting on the lock in your code might actually be faster then busy waiting for it, depending on the availible resources (the nonlocking version using atomics would absolutely tank when executed on a single core machine)

How can I cancel a std::async function? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Is there a way to cancel/detach a future in C++11?
There is a member function which runs asynchronously using std::future and std::async. In some case, I need to cancel it. (The function loads near objects consecutively and sometimes an objects gets out of range while loading it.) I already read the answers to this question addressing the same issue, but I cannot get it work.
This is simplified code with the same structure as my actual program has. Calling Start() and Kill() while the asynchronous is running, causes a crash because of access violation for input.
In my eyes the code should work as follows. When Kill() is called, the running flag is disabled. The next command get() should wait for thread to end, which it does soon since it checks the running flag. After the thread is canceled, the input pointer is deleted.
#include <vector>
#include <future>
using namespace std;
class Class
{
future<void> task;
bool running;
int *input;
vector<int> output;
void Function()
{
for(int i = 0; i < *input; ++i)
{
if(!running) return;
output.push_back(i);
}
}
void Start()
{
input = new int(42534);
running = true;
task = async(launch::async, &Class::Function, this);
}
void Kill()
{
running = false;
task.get();
delete input;
}
};
It seems like the thread doesn't notice toggling the running flag to false. What is my mistake?
Since noone's actually answered the question yet I'll do so.
The writes and reads to the running variable are not atomic operations, so there is nothing in the code that causes any synchronisation between the two threads, so nothing ever ensures that the async thread sees that the variable has changed.
One possible way that can happen is that the compiler analyzes the code of Function, determines that there are never any writes to the variable in that thread, and as it's not an atomic object writes by other threads are not required to be visible, so it's entirely legal to rearrange the code to this:
void Function()
{
if(!running) return;
for(int i = 0; i < *input; ++i)
{
output.push_back(i);
}
}
Obviously in this code if running changes after the function has started it won't cause the loop to stop.
There are two ways the C++ standard allows you to synchronize the two threads, which is either to use a mutex and only read or write the running variable while the mutex is locked, or to make the variable an atomic variable. In your case, changing running from bool to atomic<bool> will ensure that writes to the variable are synchronized with reads from it, and the async thread will terminate.