C++ Which thread pool is cppreference.com talking about? - c++

I was reading the description of std::async at cppreference.com.
The first description says :
The template function async runs the function f asynchronously
(potentially in a separate thread which may be part of a thread
pool) and returns a std::future that will eventually hold the
result of that function call.
. [cppreference link]: std::async
What is the thread pool cppreference.com is talking about ?
I read the standard draft N4713 (C++ 17) and there is no mention of a possible thread pool usage.
I also know that there is no thread pool in the standard C++ as of now.

cppreference and the C++ standard are in fact at odds about this. cppreference says this (emphasis and strikethrough mine):
The template function async runs the function f asynchronously (potentially optionally in a separate thread which may be part of a thread pool).
Whereas the C++ standard says this:
If launch::async is set in policy, [std::async] calls [the function f] as if in a new thread of execution ...
And these are clearly two different things.
Only Windows' implementation of std::async uses a thread pool AFAIK, while gcc and clang start a new thread for every invocation of std::async (when launch::async is set in policy), and thus follow the standard.
More analysis here: https://stackoverflow.com/a/50898570/5743288

Purely hypothetical. cppreference is trying to tell you that standard allows execution of the task in the thread pool (as opposed to launching a new thread to execute it). And although standard may not explicitly allow it, there is nothing which would prohibit it either.
I am not aware of any implementation which would use a thread pool for std::async.

Related

C++20 coroutines and happens-before relation

Do C++20 coroutines provide some guarantees about synchronizes-with relation?
I would hope for something more or less like ``suspension synchronizes-with resumption" but that's probably too much. So far I was only able to find this statement on cppreference.com
Note that because the coroutine is fully suspended before entering awaiter.await_suspend(), that function is free to transfer the coroutine handle across threads, with no additional synchronization. For example, it can put it inside a callback, scheduled to run on a threadpool when async I/O operation completes.[...]
I also found this in [coroutine.handle.resumption]
Resuming a coroutine via resume, operator(), or destroy on an execution agent other than the one on which it was suspended has implementation-defined behavior unless each execution agent either is an instance of std​::​thread or std​::​jthread, or is the thread that executes main.
[ Note: A coroutine that is resumed on a different execution agent should avoid relying on consistent thread identity throughout, such as holding a mutex object across a suspend point.
— end note
]
[ Note: A concurrent resumption of the coroutine may result in a data race.
— end note
]
I also realize that the linked documentation of handle.resume() does not say that it happens-after some other operation. But still, the quoted statements seem to indicate that there are some synchronization guarantees, when
we pass the handle between different std::threads. So what exactly is guaranteed? [Some links to relevant fragments of the standard for further reading will be deeply appreciated.]

Thread pool for std::async

In the documentation of std::async it's mentioned that (emphasis mine):
The function template async runs the function f asynchronously (potentially in a separate thread which might be a part of a thread pool) and returns a std::future that will eventually hold the result of that function call.
Is it possible to specify a thread pool for calls to async? If not can I somehow check whether a thread pool was used or not?
The standard std::async lack of this level of control, although implementation may provide such as non-standard guarantees/extensions.
The lack of this level of control is recognized -- there's https://wg21.link/p0443 proposal to address this.
For now the best option is probably to own thread pool using std::thread and std::packaged_task.

Why is there no thread pool in C++ standard library? [duplicate]

This question already has an answer here:
Why doesn't C++ have a std::thread_pool in the standard library?
(1 answer)
Closed 4 months ago.
Since C++11 there has been a surge in the amount of parallel/concurrent programming tools in C++: threads, async functions, parallel algorithms, coroutines… But what about a popular parallel programming pattern: thread pool?
As far as I can see, nothing in the standard library implements this directly. Threading via std::thread can be used to implement a thread pool, but this requires manual labor. Asynchronous function via std::async can be launched either in a new thread (std::launch::async) or in the calling thread (std::launch::deferred).
I think std::async could've been easily made to support thread pooling: via another launch policy (std::launch::thread_pool) which executes the task in an implicitly created global thread pool; or there could be a std::thread_pool object plus an overload of std::async which takes a thread pool.
Was something like this considered, and if so, why was it rejected? Or is there a standard solution that I am missing?
In principal std::async could use a thread pool and is seems to me that allowing this was the intention. But in practice the existence of thread_local makes it difficult.
From cppreference on std::async with std::launch::async:
[...] execute the callable object f on a new thread of execution (with all thread-locals initialized) as if spawned by std::thread(std::forward<F>(f), std::forward<Args>(args)...) [...]
If the function contains any local thread_local variables, and if std::async used a thread pool, the behavior of running the function would std::async could be different from the behavior of std::thread.
One example may be that the thread_local might not have not have it's initial value the second time its called by the same thread. If you were to use std::thread instead, it would always have the initial value.
Another way the behavior would diverge is thread_local object's destructors would not run in the same way for std::async and std::thread. This is illustrated by Microsoft's attempt at using a thread pool, which I suspect scared off others from trying it. You can read about this non-conformance here : In Visual Studio, thread_local variables' destructor not called when used with std::async, is this a bug?.
To be completely conforming, the implementation would need to "reset" all the thread_local objects anyway. This requires compiler support and starts to look an awful lot like starting a new thread anyway.

Can long-running std::asyncs starve other std::asyncs?

As I understand it, usual implementations of std::async schedule these jobs on threads from a pre-allocated thread pool.
So lets say I first create and schedule enough long-running std::asyncs to keep all threads from that thread pool occupied. Directly afterwards (before long they finished executing) I also create and schedule some short-running std::asyncs. Could it happen that the short-running ones aren't executed at all until at least one of the long-running ones has finished? Or is there some guarantee in the standard (specifically C++11) that prevents this kind of situation (like spawning more threads so that the OS can schedule them in a round-robin fasion)?
The standard reads:
[futures.async#3.1] If launch​::​async is set in policy, calls INVOKE(DECAY_­COPY(std​::​forward<F>(f)), DECAY_­COPY(std​::​forward<Args>(args))...) ([func.require], [thread.thread.constr]) as if in a new thread of execution represented by a thread object with the calls to DECAY_­COPY being evaluated in the thread that called async.[...]
so, under the as-if rule, new threads must be spawned when async() is invoked with ​async launch policy. Of course, an implementation may use a thread pool internally but, usual thread creation overhead aside, no special 'starving' can occur. Moreover, things like the initialization of thread locals should always happen.
In fact, clang libc++ trunk async implementation reads:
unique_ptr<__async_assoc_state<_Rp, _Fp>, __release_shared_count>
__h(new __async_assoc_state<_Rp, _Fp>(_VSTD::forward<_Fp>(__f)));
VSTD::thread(&__async_assoc_state<_Rp, _Fp>::__execute, __h.get()).detach();
return future<_Rp>(__h.get());
as you can see, no 'explicit' thread pool is used internally.
Moreover, as you can read here also the libstdc++ implementation shipping with gcc 5.4.0 just invokes a plain thread.
Yes, MSVC's std::async seem to have exactly that property, at least as of MSVC2015.
I don't know if they fixed it in an 2017 update.
This is against the spirit of the standard. However, the standard is extremely vague about thread forward progress guarantees (at least as of C++14). So while std::async must behave as if it wraps a std::thread, the guarantees on std::thread forward progress are sufficiently weak that this isn't much of a guarantee under the as-if rule.
In practice, this has led me to replace std::async in my thread pool implementations with raw calls to std::thread, as raw use of std::thread in MSVC2015 doesn't appear to have that problem.
I find that a thread pool (with a task queue) is far more practical than raw calls to either std::async or std::thread, and as it is really easy to write a thread pool with either std::thread or std::async, I'd advise writing one with std::thread.
Your thread pool can return std::futures just like std::async does (but without the auto-blocking on destruction feature, as the pool itself manages the thread lifetimes).
I have read that C++17 added better forward progress guarantees, but I lack sufficient understanding to conclude if MSVC's behavior is now against the standard requirements.

Are std::signal and std::raise thread-safe?

The C and C++ standards support the concept of signal. However, the C11 standard says that the function signal() cannot be called in multi-threaded environments, or the behavior is undefined. But I think the signal mechanism is by nature for multi-threaded environments.
A quote from the C11 standard 7.14.1.1.7
"Use of this function in a multi-threaded program results in undefined behavior. The
implementation shall behave as if no library function calls the signal function."
Any explanations about this?
The following code is self-evident.
#include <thread>
#include <csignal>
using namespace std;
void SignalHandler(int)
{
// Which thread context here?
}
void f()
{
//
// Running in another thread context.
//
raise(SIGINT); // Is this call safe?
}
int main()
{
//
// Register the signal handler in main thread context.
//
signal(SIGINT, SignalHandler);
thread(f).join();
}
But I think the signal mechanism is by nature for multi-threaded environments.
I think this sentence is the central misunderstanding. signal() is a method for inter-process communication, not for inter-thread. Threads share common memory and can therefore communicate via mutexes and control structures. Processes don't have common memory and must make-do with some explicit communication structures like signal() or the filesystem.
I think you're confusing signaling, which is process specific, with communication between threads. If it is sharing information between threads that you're after, you will probably find what you want in the new C++11 thread support library. Of course, it depends on what you really want to do.
From what I can tell of your code, you want a thread to "signal" an event in some way and you want to be able to run some code when that event is signalled. Given that, I'd take a closer look at the Futures section in the thread support library.
The C11 standard's statement that "Use of this function in a multi-threaded program results in undefined behavior," refers specifically to the function signal(). So the question is if the use of signal() is done "in a multi-threaded program."
The term 'multi-threaded program' isn't defined in the C standard as far as I can tell, but I would take it to mean a program in which multiple threads of execution have been created and have not completed. That would mean that at the time signal() is called in your example program the program is not multi-threaded and therefore the program's behavior is not undefined under this requirement.
(However C++11 requires that "All signal handlers shall have C linkage," [18.10 Other runtime support [support.runtime] p9]. Since your example program uses a handler with C++ linkage the behavior is undefined.)
As others have pointed out signals aren't intended for communication between threads. For example the C and C++ standards don't even specify what thread they run on. The standard library instead provides other tools for inter-thread communcation, such as mutexes, atomics, etc.
I think you just misintrepret the term undefined behavior, which is unfortunately much overloaded to mean "bad things will happen". Here the term really just means what it says: the C standard doesn't make any assumption about what it means to use signal in a multi-threaded context.
In general the signal/raise interface in the C standard is not very useful by itself, but only a placeholder for platform/OS specific things that are defined on top of it.
So for an interaction between signal and threats doesn't give you a contract. Or stated otherwise, the interaction of signal and threads is left to the platform
implementation.