Why is there no thread pool in C++ standard library? [duplicate] - c++

This question already has an answer here:
Why doesn't C++ have a std::thread_pool in the standard library?
(1 answer)
Closed 4 months ago.
Since C++11 there has been a surge in the amount of parallel/concurrent programming tools in C++: threads, async functions, parallel algorithms, coroutines… But what about a popular parallel programming pattern: thread pool?
As far as I can see, nothing in the standard library implements this directly. Threading via std::thread can be used to implement a thread pool, but this requires manual labor. Asynchronous function via std::async can be launched either in a new thread (std::launch::async) or in the calling thread (std::launch::deferred).
I think std::async could've been easily made to support thread pooling: via another launch policy (std::launch::thread_pool) which executes the task in an implicitly created global thread pool; or there could be a std::thread_pool object plus an overload of std::async which takes a thread pool.
Was something like this considered, and if so, why was it rejected? Or is there a standard solution that I am missing?

In principal std::async could use a thread pool and is seems to me that allowing this was the intention. But in practice the existence of thread_local makes it difficult.
From cppreference on std::async with std::launch::async:
[...] execute the callable object f on a new thread of execution (with all thread-locals initialized) as if spawned by std::thread(std::forward<F>(f), std::forward<Args>(args)...) [...]
If the function contains any local thread_local variables, and if std::async used a thread pool, the behavior of running the function would std::async could be different from the behavior of std::thread.
One example may be that the thread_local might not have not have it's initial value the second time its called by the same thread. If you were to use std::thread instead, it would always have the initial value.
Another way the behavior would diverge is thread_local object's destructors would not run in the same way for std::async and std::thread. This is illustrated by Microsoft's attempt at using a thread pool, which I suspect scared off others from trying it. You can read about this non-conformance here : In Visual Studio, thread_local variables' destructor not called when used with std::async, is this a bug?.
To be completely conforming, the implementation would need to "reset" all the thread_local objects anyway. This requires compiler support and starts to look an awful lot like starting a new thread anyway.

Related

Thread pool for std::async

In the documentation of std::async it's mentioned that (emphasis mine):
The function template async runs the function f asynchronously (potentially in a separate thread which might be a part of a thread pool) and returns a std::future that will eventually hold the result of that function call.
Is it possible to specify a thread pool for calls to async? If not can I somehow check whether a thread pool was used or not?
The standard std::async lack of this level of control, although implementation may provide such as non-standard guarantees/extensions.
The lack of this level of control is recognized -- there's https://wg21.link/p0443 proposal to address this.
For now the best option is probably to own thread pool using std::thread and std::packaged_task.

C++ Which thread pool is cppreference.com talking about?

I was reading the description of std::async at cppreference.com.
The first description says :
The template function async runs the function f asynchronously
(potentially in a separate thread which may be part of a thread
pool) and returns a std::future that will eventually hold the
result of that function call.
. [cppreference link]: std::async
What is the thread pool cppreference.com is talking about ?
I read the standard draft N4713 (C++ 17) and there is no mention of a possible thread pool usage.
I also know that there is no thread pool in the standard C++ as of now.
cppreference and the C++ standard are in fact at odds about this. cppreference says this (emphasis and strikethrough mine):
The template function async runs the function f asynchronously (potentially optionally in a separate thread which may be part of a thread pool).
Whereas the C++ standard says this:
If launch::async is set in policy, [std::async] calls [the function f] as if in a new thread of execution ...
And these are clearly two different things.
Only Windows' implementation of std::async uses a thread pool AFAIK, while gcc and clang start a new thread for every invocation of std::async (when launch::async is set in policy), and thus follow the standard.
More analysis here: https://stackoverflow.com/a/50898570/5743288
Purely hypothetical. cppreference is trying to tell you that standard allows execution of the task in the thread pool (as opposed to launching a new thread to execute it). And although standard may not explicitly allow it, there is nothing which would prohibit it either.
I am not aware of any implementation which would use a thread pool for std::async.

Can long-running std::asyncs starve other std::asyncs?

As I understand it, usual implementations of std::async schedule these jobs on threads from a pre-allocated thread pool.
So lets say I first create and schedule enough long-running std::asyncs to keep all threads from that thread pool occupied. Directly afterwards (before long they finished executing) I also create and schedule some short-running std::asyncs. Could it happen that the short-running ones aren't executed at all until at least one of the long-running ones has finished? Or is there some guarantee in the standard (specifically C++11) that prevents this kind of situation (like spawning more threads so that the OS can schedule them in a round-robin fasion)?
The standard reads:
[futures.async#3.1] If launch​::​async is set in policy, calls INVOKE(DECAY_­COPY(std​::​forward<F>(f)), DECAY_­COPY(std​::​forward<Args>(args))...) ([func.require], [thread.thread.constr]) as if in a new thread of execution represented by a thread object with the calls to DECAY_­COPY being evaluated in the thread that called async.[...]
so, under the as-if rule, new threads must be spawned when async() is invoked with ​async launch policy. Of course, an implementation may use a thread pool internally but, usual thread creation overhead aside, no special 'starving' can occur. Moreover, things like the initialization of thread locals should always happen.
In fact, clang libc++ trunk async implementation reads:
unique_ptr<__async_assoc_state<_Rp, _Fp>, __release_shared_count>
__h(new __async_assoc_state<_Rp, _Fp>(_VSTD::forward<_Fp>(__f)));
VSTD::thread(&__async_assoc_state<_Rp, _Fp>::__execute, __h.get()).detach();
return future<_Rp>(__h.get());
as you can see, no 'explicit' thread pool is used internally.
Moreover, as you can read here also the libstdc++ implementation shipping with gcc 5.4.0 just invokes a plain thread.
Yes, MSVC's std::async seem to have exactly that property, at least as of MSVC2015.
I don't know if they fixed it in an 2017 update.
This is against the spirit of the standard. However, the standard is extremely vague about thread forward progress guarantees (at least as of C++14). So while std::async must behave as if it wraps a std::thread, the guarantees on std::thread forward progress are sufficiently weak that this isn't much of a guarantee under the as-if rule.
In practice, this has led me to replace std::async in my thread pool implementations with raw calls to std::thread, as raw use of std::thread in MSVC2015 doesn't appear to have that problem.
I find that a thread pool (with a task queue) is far more practical than raw calls to either std::async or std::thread, and as it is really easy to write a thread pool with either std::thread or std::async, I'd advise writing one with std::thread.
Your thread pool can return std::futures just like std::async does (but without the auto-blocking on destruction feature, as the pool itself manages the thread lifetimes).
I have read that C++17 added better forward progress guarantees, but I lack sufficient understanding to conclude if MSVC's behavior is now against the standard requirements.

Is it possible to get a thread object for the main thread, and `join()` with it?

Is there a way to treat the main thread like any other thread with the C++11 (or later) facilities?
Concretely, I am looking for is the ability to join() with the main thread. So, basically, I would like to do something like: main_thread.join(), but don't know how to obtain the main_thread object.
The thread constructors do not seem to offer any facilities based for instance on the thread id obtained with get_id(). The this_thread namespace offers also only minimal functionality, but misses for instance join(), which is what I am looking for.
As pointed out in the comments by #molbdnilo and #yohjb (see also What happens to a detached thread when main() exits?), C++11 semantics say that all threads are ended when the main() function terminates.
Since C++11 does not have a pthread_exit() equivalent, the main thread cannot be joined, because the program would end anyway.
So, to answer my question, it does not seem to be possible, and with the terminating semantics of main(), it would not be very useful.

How is std::async implemented?

I wanted to know how appropriate its to use std::async in performance oriented code. Specifically
Is there any penalty in catching the exception from worker thread to main thread?
How are the values returned from worker to main?
Are the input arguments passed by ref actually never get copied or not?
I am planning to pass a heavy session object to a thread or write std::async.
bool fun(MySession& sessRef);
MySession sess;
auto r = std::async(&fun, sess);
EDIT:
I am planning to use it with GCC 4.9.1 and VS2013 both since the application is platform agnostic. However most deployments will be *nix based so atleast GCC should be performant.
We can't tell exactly "how is std::async implemented", since you're not referring to a particular toolchain that provides that implementation actually.
1. Is there any penalty in catching the exception from worker thread to main thread?
Define "Penalty" by which means exactly? That can't be answered unless you clarify about your concerns/requirements.
Usually there shouldn't be any penalty, by just catching the exception in the thread, that created the throwing one. It's just about the exception may be provided to the creating thread via the join(), and this causes some cost for keeping that particular exception through handling of join().
2. How are the values returned from worker to main?
To cite what's the c++ standards definition saying about this point:
30.6.8 Function template async
4 Returns: An object of type future<typename result_of<typename decay<F>::type(typename decay<Args>::type...)>::type> that refers to the shared state created by this call to async.
3. Are the input arguments passed by ref actually never get copied or
not?
That point is answered in detail here: Passing arguments to std::async by reference fails. As you see, the default case they are copied.
According to #Yakk's comment, it might be possible to pass these parameters via std::ref to avoid operating on copies, but take references.
how is std::async implemented
I can tell only for the c++ standards requirements, how it should be implemented, unless you're referring to a particular toolchain, that tries to provide a proper implementation of std::async.