Can long-running std::asyncs starve other std::asyncs? - c++

As I understand it, usual implementations of std::async schedule these jobs on threads from a pre-allocated thread pool.
So lets say I first create and schedule enough long-running std::asyncs to keep all threads from that thread pool occupied. Directly afterwards (before long they finished executing) I also create and schedule some short-running std::asyncs. Could it happen that the short-running ones aren't executed at all until at least one of the long-running ones has finished? Or is there some guarantee in the standard (specifically C++11) that prevents this kind of situation (like spawning more threads so that the OS can schedule them in a round-robin fasion)?

The standard reads:
[futures.async#3.1] If launch​::​async is set in policy, calls INVOKE(DECAY_­COPY(std​::​forward<F>(f)), DECAY_­COPY(std​::​forward<Args>(args))...) ([func.require], [thread.thread.constr]) as if in a new thread of execution represented by a thread object with the calls to DECAY_­COPY being evaluated in the thread that called async.[...]
so, under the as-if rule, new threads must be spawned when async() is invoked with ​async launch policy. Of course, an implementation may use a thread pool internally but, usual thread creation overhead aside, no special 'starving' can occur. Moreover, things like the initialization of thread locals should always happen.
In fact, clang libc++ trunk async implementation reads:
unique_ptr<__async_assoc_state<_Rp, _Fp>, __release_shared_count>
__h(new __async_assoc_state<_Rp, _Fp>(_VSTD::forward<_Fp>(__f)));
VSTD::thread(&__async_assoc_state<_Rp, _Fp>::__execute, __h.get()).detach();
return future<_Rp>(__h.get());
as you can see, no 'explicit' thread pool is used internally.
Moreover, as you can read here also the libstdc++ implementation shipping with gcc 5.4.0 just invokes a plain thread.

Yes, MSVC's std::async seem to have exactly that property, at least as of MSVC2015.
I don't know if they fixed it in an 2017 update.
This is against the spirit of the standard. However, the standard is extremely vague about thread forward progress guarantees (at least as of C++14). So while std::async must behave as if it wraps a std::thread, the guarantees on std::thread forward progress are sufficiently weak that this isn't much of a guarantee under the as-if rule.
In practice, this has led me to replace std::async in my thread pool implementations with raw calls to std::thread, as raw use of std::thread in MSVC2015 doesn't appear to have that problem.
I find that a thread pool (with a task queue) is far more practical than raw calls to either std::async or std::thread, and as it is really easy to write a thread pool with either std::thread or std::async, I'd advise writing one with std::thread.
Your thread pool can return std::futures just like std::async does (but without the auto-blocking on destruction feature, as the pool itself manages the thread lifetimes).
I have read that C++17 added better forward progress guarantees, but I lack sufficient understanding to conclude if MSVC's behavior is now against the standard requirements.

Related

Thread pool for std::async

In the documentation of std::async it's mentioned that (emphasis mine):
The function template async runs the function f asynchronously (potentially in a separate thread which might be a part of a thread pool) and returns a std::future that will eventually hold the result of that function call.
Is it possible to specify a thread pool for calls to async? If not can I somehow check whether a thread pool was used or not?
The standard std::async lack of this level of control, although implementation may provide such as non-standard guarantees/extensions.
The lack of this level of control is recognized -- there's https://wg21.link/p0443 proposal to address this.
For now the best option is probably to own thread pool using std::thread and std::packaged_task.

Why is there no thread pool in C++ standard library? [duplicate]

This question already has an answer here:
Why doesn't C++ have a std::thread_pool in the standard library?
(1 answer)
Closed 4 months ago.
Since C++11 there has been a surge in the amount of parallel/concurrent programming tools in C++: threads, async functions, parallel algorithms, coroutines… But what about a popular parallel programming pattern: thread pool?
As far as I can see, nothing in the standard library implements this directly. Threading via std::thread can be used to implement a thread pool, but this requires manual labor. Asynchronous function via std::async can be launched either in a new thread (std::launch::async) or in the calling thread (std::launch::deferred).
I think std::async could've been easily made to support thread pooling: via another launch policy (std::launch::thread_pool) which executes the task in an implicitly created global thread pool; or there could be a std::thread_pool object plus an overload of std::async which takes a thread pool.
Was something like this considered, and if so, why was it rejected? Or is there a standard solution that I am missing?
In principal std::async could use a thread pool and is seems to me that allowing this was the intention. But in practice the existence of thread_local makes it difficult.
From cppreference on std::async with std::launch::async:
[...] execute the callable object f on a new thread of execution (with all thread-locals initialized) as if spawned by std::thread(std::forward<F>(f), std::forward<Args>(args)...) [...]
If the function contains any local thread_local variables, and if std::async used a thread pool, the behavior of running the function would std::async could be different from the behavior of std::thread.
One example may be that the thread_local might not have not have it's initial value the second time its called by the same thread. If you were to use std::thread instead, it would always have the initial value.
Another way the behavior would diverge is thread_local object's destructors would not run in the same way for std::async and std::thread. This is illustrated by Microsoft's attempt at using a thread pool, which I suspect scared off others from trying it. You can read about this non-conformance here : In Visual Studio, thread_local variables' destructor not called when used with std::async, is this a bug?.
To be completely conforming, the implementation would need to "reset" all the thread_local objects anyway. This requires compiler support and starts to look an awful lot like starting a new thread anyway.

Why should I use std::async?

I'm trying to explore all the options of the new C++11 standard in depth, while using std::async and reading its definition, I noticed 2 things, at least under linux with gcc 4.8.1 :
it's called async, but it got a really "sequential behaviour", basically in the row where you call the future associated with your async function foo, the program blocks until the execution of foo it's completed.
it depends on the exact same external library as others, and better, non-blocking solutions, which means pthread, if you want to use std::async you need pthread.
at this point it's natural for me asking why choosing std::async over even a simple set of functors ? It's a solution that doesn't even scale at all, the more future you call, the less responsive your program will be.
Am I missing something ? Can you show an example that is granted to be executed in an async, non blocking, way ?
it's called async, but it got a really "sequential behaviour",
No, if you use the std::launch::async policy then it runs asynchronously in a new thread. If you don't specify a policy it might run in a new thread.
basically in the row where you call the future associated with your async function foo, the program blocks until the execution of foo it's completed.
It only blocks if foo hasn't completed, but if it was run asynchronously (e.g. because you use the std::launch::async policy) it might have completed before you need it.
it depends on the exact same external library as others, and better, non-blocking solutions, which means pthread, if you want to use std::async you need pthread.
Wrong, it doesn't have to be implemented using Pthreads (and on Windows it isn't, it uses the ConcRT features.)
at this point it's natural for me asking why choosing std::async over even a simple set of functors ?
Because it guarantees thread-safety and propagates exceptions across threads. Can you do that with a simple set of functors?
It's a solution that doesn't even scale at all, the more future you call, the less responsive your program will be.
Not necessarily. If you don't specify the launch policy then a smart implementation can decide whether to start a new thread, or return a deferred function, or return something that decides later, when more resources may be available.
Now, it's true that with GCC's implementation, if you don't provide a launch policy then with current releases it will never run in a new thread (there's a bugzilla report for that) but that's a property of that implementation, not of std::async in general. You should not confuse the specification in the standard with a particular implementation. Reading the implementation of one standard library is a poor way to learn about C++11.
Can you show an example that is granted to be executed in an async, non blocking, way ?
This shouldn't block:
auto fut = std::async(std::launch::async, doSomethingThatTakesTenSeconds);
auto result1 = doSomethingThatTakesTwentySeconds();
auto result2 = fut.get();
By specifying the launch policy you force asynchronous execution, and if you do other work while it's executing then the result will be ready when you need it.
If you need the result of an asynchronous operation, then you have to block, no matter what library you use. The idea is that you get to choose when to block, and, hopefully when you do that, you block for a negligible time because all the work has already been done.
Note also that std::async can be launched with policies std::launch::async or std::launch::deferred. If you don't specify it, the implementation is allowed to choose, and it could well choose to use deferred evaluation, which would result in all the work being done when you attempt to get the result from the future, resulting in a longer block. So if you want to make sure that the work is done asynchronously, use std::launch::async.
I think your problem is with std::future saying that it blocks on get. It only blocks if the result isn't already ready.
If you can arrange for the result to be already ready, this isn't a problem.
There are many ways to know that the result is already ready. You can poll the future and ask it (relatively simple), you could use locks or atomic data to relay the fact that it is ready, you could build up a framework to deliver "finished" future items into a queue that consumers can interact with, you could use signals of some kind (which is just blocking on multiple things at once, or polling).
Or, you could finish all the work you can do locally, and then block on the remote work.
As an example, imagine a parallel recursive merge sort. It splits the array into two chunks, then does an async sort on one chunk while sorting the other chunk. Once it is done sorting its half, the originating thread cannot progress until the second task is finished. So it does a .get() and blocks. Once both halves have been sorted, it can then do a merge (in theory, the merge can be done at least partially in parallel as well).
This task behaves like a linear task to those interacting with it on the outside -- when it is done, the array is sorted.
We can then wrap this in a std::async task, and have a future sorted array. If we want, we could add in a signally procedure to let us know that the future is finished, but that only makes sense if we have a thread waiting on the signals.
In the reference: http://en.cppreference.com/w/cpp/thread/async
If the async flag is set (i.e. policy & std::launch::async != 0), then
async executes the function f on a separate thread of execution as if
spawned by std::thread(f, args...), except that if the function f
returns a value or throws an exception, it is stored in the shared
state accessible through the std::future that async returns to the
caller.
It is a nice property to keep a record of exceptions thrown.
http://www.cplusplus.com/reference/future/async/
there are three type of policy,
launch::async
launch::deferred
launch::async|launch::deferred
by default launch::async|launch::deferred is passed to std::async.

Does async(launch::async) in C++11 make thread pools obsolete for avoiding expensive thread creation?

It is loosely related to this question: Are std::thread pooled in C++11?. Though the question differs, the intention is the same:
Question 1: Does it still make sense to use your own (or 3rd-party library) thread pools to avoid expensive thread creation?
The conclusion in the other question was that you cannot rely on std::thread to be pooled (it might or it might be not). However, std::async(launch::async) seems to have a much higher chance to be pooled.
It don't think that it is forced by the standard, but IMHO I would expect that all good C++11 implementations would use thread pooling if thread creation is slow. Only on platforms where it is inexpensive to create a new thread, I would expect that they always spawn a new thread.
Question 2: This is just what I think, but I have no facts to prove it. I may very well be mistaken. Is it an educated guess?
Finally, here I have provided some sample code that first shows how I think thread creation can be expressed by async(launch::async):
Example 1:
thread t([]{ f(); });
// ...
t.join();
becomes
auto future = async(launch::async, []{ f(); });
// ...
future.wait();
Example 2: Fire and forget thread
thread([]{ f(); }).detach();
becomes
// a bit clumsy...
auto dummy = async(launch::async, []{ f(); });
// ... but I hope soon it can be simplified to
async(launch::async, []{ f(); });
Question 3: Would you prefer the async versions to the thread versions?
The rest is no longer part of the question, but only for clarification:
Why must the return value be assigned to a dummy variable?
Unfortunately, the current C++11 standard forces that you capture the return value of std::async, as otherwise the destructor is executed, which blocks until the action terminates. It is by some considered an error in the standard (e.g., by Herb Sutter).
This example from cppreference.com illustrates it nicely:
{
std::async(std::launch::async, []{ f(); });
std::async(std::launch::async, []{ g(); }); // does not run until f() completes
}
Another clarification:
I know that thread pools may have other legitimate uses but in this question I am only interested in the aspect of avoiding expensive thread creation costs.
I think there are still situations where thread pools are very useful, especially if you need more control over resources.
For example, a server might decide to handle only a fixed number of requests simultaneously to guarantee fast response times and to increase the predictability of memory usage. Thread pools should be fine, here.
Thread-local variables may also be an argument for your own thread pools, but I'm not sure whether it is relevant in practice:
Creating a new thread with std::thread starts without initialized thread-local variables. Maybe this is not what you want.
In threads spawned by async, it is somewhat unclear for me because the thread could have been reused. From my understanding, thread-local variables are not guaranteed to be resetted, but I may be mistaken.
Using your own (fixed-size) thread pools, on the other hand, gives you full control if you really need it.
Question 1:
I changed this from the original because the original was wrong. I was under the impression that Linux thread creation was very cheap and after testing I determined that the overhead of function call in a new thread vs. a normal one is enormous. The overhead for creating a thread to handle a function call is something like 10000 or more times slower than a plain function call. So, if you're issuing a lot of small function calls, a thread pool might be a good idea.
It's quite apparent that the standard C++ library that ships with g++ doesn't have thread pools. But I can definitely see a case for them. Even with the overhead of having to shove the call through some kind of inter-thread queue, it would likely be cheaper than starting up a new thread. And the standard allows this.
IMHO, the Linux kernel people should work on making thread creation cheaper than it currently is. But, the standard C++ library should also consider using pool to implement launch::async | launch::deferred.
And the OP is correct, using ::std::thread to launch a thread of course forces the creation of a new thread instead of using one from a pool. So ::std::async(::std::launch::async, ...) is preferred.
Question 2:
Yes, basically this 'implicitly' launches a thread. But really, it's still quite obvious what's happening. So I don't really think the word implicitly is a particularly good word.
I'm also not convinced that forcing you to wait for a return before destruction is necessarily an error. I don't know that you should be using the async call to create 'daemon' threads that aren't expected to return. And if they are expected to return, it's not OK to be ignoring exceptions.
Question 3:
Personally, I like thread launches to be explicit. I place a lot of value on islands where you can guarantee serial access. Otherwise you end up with mutable state that you always have to be wrapping a mutex around somewhere and remembering to use it.
I liked the work queue model a whole lot better than the 'future' model because there are 'islands of serial' lying around so you can more effectively handle mutable state.
But really, it depends on exactly what you're doing.
Performance Test
So, I tested the performance of various methods of calling things and came up with these numbers on an 8 core (AMD Ryzen 7 2700X) system running Fedora 29 compiled with clang version 7.0.1 and libc++ (not libstdc++):
Do nothing calls per second: 35365257
Empty calls per second: 35210682
New thread calls per second: 62356
Async launch calls per second: 68869
Worker thread calls per second: 970415
And native, on my MacBook Pro 15" (Intel(R) Core(TM) i7-7820HQ CPU # 2.90GHz) with Apple LLVM version 10.0.0 (clang-1000.10.44.4) under OSX 10.13.6, I get this:
Do nothing calls per second: 22078079
Empty calls per second: 21847547
New thread calls per second: 43326
Async launch calls per second: 58684
Worker thread calls per second: 2053775
For the worker thread, I started up a thread, then used a lockless queue to send requests to another thread and then wait for a "It's done" reply to be sent back.
The "Do nothing" is just to test the overhead of the test harness.
It's clear that the overhead of launching a thread is enormous. And even the worker thread with the inter-thread queue slows things down by a factor of 20 or so on Fedora 25 in a VM, and by about 8 on native OS X.
I created an OSDN chamber holding the code I used for the performance test. It can be found here: https://osdn.net/users/omnifarious/pf/launch_thread_performance/

cancel a c++ 11 async task

How can I stop/cancel an asynchronous task created with std::async and policy std::launch::async? In other words, I have started a task running on another thread, using future object. Is there a way to cancel or stop the running task?
In short no.
Longer explanation: There is no safe way to cancel any threads in standard C++. This would require thread cancellation. This feature has been discussed many times during the C++11 standardisation and the general consensus is that there is no safe way to do so. To my knowledge there were three main considered ways to do thread cancellation in C++.
Abort the thread. This would be rather like an emergency stop. Unfortunately it would result in no stack unwinding or destructors called. The thread could have been in any state so possibly holding mutexes, having heap allocated data which would be leaked, etc. This was clearly never going to be considered for long since it would make the entire program undefined. If you want to do this yourself however just use native_handle to do it. It will however be non-portable.
Compulsory cancellation/interruption points. When a thread cancel is requested it internally sets some variable so that next time any of a predefined set of interruption points is called (such as sleep, wait, etc) it will throw some exception. This would cause the stack to unwind and cleanup can be done. Unfortunately this type of system makes it very difficult make any code exception safe since most multithreaded code can then suddenly throw. This is the model that boost.thread uses. It uses disable_interruption to work around some of the problems but it is still exceedingly difficult to get right for anything but the simplest of cases. Boost.thread uses this model but it has always been considered risky and understandably it was not accepted into the standard along with the rest.
Voluntary cancellation/interruption points. ultimately this boils down to checking some condition yourself when you want to and if appropriate exiting the thread yourself in a controlled fashion. I vaguely recall some talk about adding some library features to help with this but it was never agreed upon.
I would just use a variation of 3. If you are using lambdas for instance it would be quite easy to reference an atomic "cancel" variable which you can check from time to time.
In C++11 (I think) there is no standard way to cancel a thread. If you get std::thread::native_handle(), you can do something with it but that's not portable.
maybe you can do like this way by checking some condition:
class Timer{
public:
Timer():timer_destory(false){}
~Timer(){
timer_destory=true;
for(auto result:async_result){
result.get();
}
}
int register_event(){
async_result.push_back(
std::async(std::launch::async,[](std::atomic<bool>& timer_destory){
while(!timer_destory){
//do something
}
},std::ref(timer_destory))
);
}
private:
std::vector<std::future<int>> async_result;
std::atomic<bool> timer_destory;
}