Thread pool for std::async - c++

In the documentation of std::async it's mentioned that (emphasis mine):
The function template async runs the function f asynchronously (potentially in a separate thread which might be a part of a thread pool) and returns a std::future that will eventually hold the result of that function call.
Is it possible to specify a thread pool for calls to async? If not can I somehow check whether a thread pool was used or not?

The standard std::async lack of this level of control, although implementation may provide such as non-standard guarantees/extensions.
The lack of this level of control is recognized -- there's https://wg21.link/p0443 proposal to address this.
For now the best option is probably to own thread pool using std::thread and std::packaged_task.

Related

In a thread which never calls asio::io_conterxt.run(), must I invoke post() to dispatch tasks to the thread which has called io_context.run()?

In a thread which has not ever called and would never call asio::io_context.run(), must I invoke post() or dispatch() to dispatch tasks to the thread which has called asio::io_context.run()?
Is it safe to directly call asio::async_write() or asio::async_read() in a thread which has not ever called and would never call asio::io_context.run() to dispatch tasks to the thread which has called asio::io_context.run()?
In a thread which has not ever called and would never call asio::io_context.run(), must I invoke post() or dispatch() to dispatch tasks to the thread which has called asio::io_context.run()?
That's basically how that works. Yes, it's also a "cheap" way to implement a task queue (see e.g. stackoverflow.com/questions/…)
To the first part: yes. (You can replace "must" with "can")
Is it safe to directly call asio::async_write() or asio::async_read() in a thread which has not ever called and would never call asio::io_context.run() to dispatch tasks to the thread which has called asio::io_context.run()?
Yes (with caveats).
The caveats are that you're responsible for thread safety/synchronization. E.g. a tcp::socket object is not thread-safe. You should only call methods on it from one logical thread (e.g. strand) or critical section (e.g. using mutual exclusions, mutex).
The async initiators will work to get work onto the execution context (which .run()s on any number of threads). From there it is highly idiomatic that all subsequent async initiation functions happen from completion handlers, so from these threads already.
Note that none of this is magical. In fact, all the async_ initiation functions know the executor (associated with the IO object, usually) and this determines where the completion handler gets post/dispatch/defer-ed to. In some cases you want to override this (e.g. using strand.wrap() or the newer bind_executor() function).
See also When must you pass io_context to boost::asio::spawn? (C++)

Why is there no thread pool in C++ standard library? [duplicate]

This question already has an answer here:
Why doesn't C++ have a std::thread_pool in the standard library?
(1 answer)
Closed 4 months ago.
Since C++11 there has been a surge in the amount of parallel/concurrent programming tools in C++: threads, async functions, parallel algorithms, coroutines… But what about a popular parallel programming pattern: thread pool?
As far as I can see, nothing in the standard library implements this directly. Threading via std::thread can be used to implement a thread pool, but this requires manual labor. Asynchronous function via std::async can be launched either in a new thread (std::launch::async) or in the calling thread (std::launch::deferred).
I think std::async could've been easily made to support thread pooling: via another launch policy (std::launch::thread_pool) which executes the task in an implicitly created global thread pool; or there could be a std::thread_pool object plus an overload of std::async which takes a thread pool.
Was something like this considered, and if so, why was it rejected? Or is there a standard solution that I am missing?
In principal std::async could use a thread pool and is seems to me that allowing this was the intention. But in practice the existence of thread_local makes it difficult.
From cppreference on std::async with std::launch::async:
[...] execute the callable object f on a new thread of execution (with all thread-locals initialized) as if spawned by std::thread(std::forward<F>(f), std::forward<Args>(args)...) [...]
If the function contains any local thread_local variables, and if std::async used a thread pool, the behavior of running the function would std::async could be different from the behavior of std::thread.
One example may be that the thread_local might not have not have it's initial value the second time its called by the same thread. If you were to use std::thread instead, it would always have the initial value.
Another way the behavior would diverge is thread_local object's destructors would not run in the same way for std::async and std::thread. This is illustrated by Microsoft's attempt at using a thread pool, which I suspect scared off others from trying it. You can read about this non-conformance here : In Visual Studio, thread_local variables' destructor not called when used with std::async, is this a bug?.
To be completely conforming, the implementation would need to "reset" all the thread_local objects anyway. This requires compiler support and starts to look an awful lot like starting a new thread anyway.

C++ Which thread pool is cppreference.com talking about?

I was reading the description of std::async at cppreference.com.
The first description says :
The template function async runs the function f asynchronously
(potentially in a separate thread which may be part of a thread
pool) and returns a std::future that will eventually hold the
result of that function call.
. [cppreference link]: std::async
What is the thread pool cppreference.com is talking about ?
I read the standard draft N4713 (C++ 17) and there is no mention of a possible thread pool usage.
I also know that there is no thread pool in the standard C++ as of now.
cppreference and the C++ standard are in fact at odds about this. cppreference says this (emphasis and strikethrough mine):
The template function async runs the function f asynchronously (potentially optionally in a separate thread which may be part of a thread pool).
Whereas the C++ standard says this:
If launch::async is set in policy, [std::async] calls [the function f] as if in a new thread of execution ...
And these are clearly two different things.
Only Windows' implementation of std::async uses a thread pool AFAIK, while gcc and clang start a new thread for every invocation of std::async (when launch::async is set in policy), and thus follow the standard.
More analysis here: https://stackoverflow.com/a/50898570/5743288
Purely hypothetical. cppreference is trying to tell you that standard allows execution of the task in the thread pool (as opposed to launching a new thread to execute it). And although standard may not explicitly allow it, there is nothing which would prohibit it either.
I am not aware of any implementation which would use a thread pool for std::async.

Can long-running std::asyncs starve other std::asyncs?

As I understand it, usual implementations of std::async schedule these jobs on threads from a pre-allocated thread pool.
So lets say I first create and schedule enough long-running std::asyncs to keep all threads from that thread pool occupied. Directly afterwards (before long they finished executing) I also create and schedule some short-running std::asyncs. Could it happen that the short-running ones aren't executed at all until at least one of the long-running ones has finished? Or is there some guarantee in the standard (specifically C++11) that prevents this kind of situation (like spawning more threads so that the OS can schedule them in a round-robin fasion)?
The standard reads:
[futures.async#3.1] If launch​::​async is set in policy, calls INVOKE(DECAY_­COPY(std​::​forward<F>(f)), DECAY_­COPY(std​::​forward<Args>(args))...) ([func.require], [thread.thread.constr]) as if in a new thread of execution represented by a thread object with the calls to DECAY_­COPY being evaluated in the thread that called async.[...]
so, under the as-if rule, new threads must be spawned when async() is invoked with ​async launch policy. Of course, an implementation may use a thread pool internally but, usual thread creation overhead aside, no special 'starving' can occur. Moreover, things like the initialization of thread locals should always happen.
In fact, clang libc++ trunk async implementation reads:
unique_ptr<__async_assoc_state<_Rp, _Fp>, __release_shared_count>
__h(new __async_assoc_state<_Rp, _Fp>(_VSTD::forward<_Fp>(__f)));
VSTD::thread(&__async_assoc_state<_Rp, _Fp>::__execute, __h.get()).detach();
return future<_Rp>(__h.get());
as you can see, no 'explicit' thread pool is used internally.
Moreover, as you can read here also the libstdc++ implementation shipping with gcc 5.4.0 just invokes a plain thread.
Yes, MSVC's std::async seem to have exactly that property, at least as of MSVC2015.
I don't know if they fixed it in an 2017 update.
This is against the spirit of the standard. However, the standard is extremely vague about thread forward progress guarantees (at least as of C++14). So while std::async must behave as if it wraps a std::thread, the guarantees on std::thread forward progress are sufficiently weak that this isn't much of a guarantee under the as-if rule.
In practice, this has led me to replace std::async in my thread pool implementations with raw calls to std::thread, as raw use of std::thread in MSVC2015 doesn't appear to have that problem.
I find that a thread pool (with a task queue) is far more practical than raw calls to either std::async or std::thread, and as it is really easy to write a thread pool with either std::thread or std::async, I'd advise writing one with std::thread.
Your thread pool can return std::futures just like std::async does (but without the auto-blocking on destruction feature, as the pool itself manages the thread lifetimes).
I have read that C++17 added better forward progress guarantees, but I lack sufficient understanding to conclude if MSVC's behavior is now against the standard requirements.

C++1z coroutine threading context and coroutine scheduling

Per this latest C++ TS: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4628.pdf, and based on the understanding of C# async/await language support, I'm wondering what is the "execution context" (terminology borrowed from C#) of the C++ coroutines?
My simple test code in Visual C++ 2017 RC reveals that coroutines seem to always execute on a thread pool thread, and little control is given to the application developer on which threading context the coroutines could be executed - e.g. Could an application forces all the coroutines (with the compiler generated state machine code) to be executed on the main thread only, without involving any thread pool thread?
In C#, SynchronizationContext is a way to specify the "context" where all the coroutine "halves" (compiler generated state machine code) will be posted and executed, as illustrated in this post: https://blogs.msdn.microsoft.com/pfxteam/2012/01/20/await-synchronizationcontext-and-console-apps/, while the current coroutine implementation in Visual C++ 2017 RC seems to always rely on the concurrency runtime, which by default executes the generated state machine code on a thread pool thread. Is there a similar concept of synchronization context that the user application can use to bind coroutine execution to a specific thread?
Also, what's the current default "scheduler" behavior of the coroutines as implemented in Visual C++ 2017 RC? i.e. 1) how a wait condition is exactly specified? and 2) when a wait condition is satisfied, who invokes the "bottom half" of the suspended coroutine?
My (naive) speculation regarding Task scheduling in C# is that C# "implements" the wait condition purely by task continuation - a wait condition is synthesized by a TaskCompletionSource owned task, and any code logic that needs to wait will be chained as a continuation to it, so if the wait condition is satisfied, e.g. if a full message is received from the low level network handler, it does TaskCompletionSource.SetValue, which transitions the underlying task to the completed state, effectively allowing the chained continuation logic to start execution (putting the task into the ready state/list from the previous created state) - In C++ coroutine, I'm speculating that std::future and std::promise would be used as similar mechanism (std::future being the task, while std::promise being the TaskCompletionSource, and the usage is surprisingly similar too!) - so does the C++ coroutine scheduler, if any, relies on some similar mechanism to carry out the behavior?
[EDIT]: after doing some further research, I was able to code a very simple yet very powerful abstraction called awaitable that supports single threaded and cooperative multitasking, and features a simple thread_local based scheduler, which can execute coroutines on the thread the root coroutine is started. The code can be found from this github repo: https://github.com/llint/Awaitable
Awaitable is composable in a way that it maintains correct invocation ordering at nested levels, and it features primitive yielding, timed wait, and setting ready from somewhere else, and very complex usage pattern can be derived from this (such as infinite looping coroutines that only get woken up when certain events happen), the programming model follows C# Task based async/await pattern closely. Please feel free to give your feedbacks.
The opposite!
C++ coroutine is all about control. the key point here is the void await_suspend(std::experimental::coroutine_handle<> handle)
function.
evey co_await expects awaitable type. in a nutshell, awaitable type is a type which provide these three functions:
bool await_ready() - should the program halt the execution of the coroutine?
void await_suspend(handle) - the program passes you a continuation context for that coroutine frame. if you activate the handle (for example, by calling operator () that the handle provides - the current thread resumes the coroutine immediately).
T await_resume() - tells the thread which resumes the coroutine what to do when resuming the coroutine and what to return from co_await.
so when you call co_await on awaitable type, the program asks the awaitable if the coroutine should be suspended (if await_ready returns false) and if so - you get a coroutine handle in which you can do whatever you like.
for example, you can pass the coroutine handle to a thread-pool. in this case a thread-pool thread will resume the coroutine.
you can pass the coroutine handle to a simple std::thread - your own create thread will resume the coroutine.
you can attach the coroutine handle into a derived class of OVERLAPPED and resume the coroutine when the asynchronous IO finishes.
as you can see - you can control where and when the coroutine is suspended and resumes - by managing the coroutine handle passed in await_suspend. there is no "default scheduler" - how you implement you awaitable type will decide how the coroutine is schedueled.
So, what happens in VC++? unfortunately, std::future still doesn't have then function, so you can't pass the coroutine handle to a std::future. if you await on std::future - the program will just open a new thread. look at the source code given by the future header:
template<class _Ty>
void await_suspend(future<_Ty>& _Fut,
experimental::coroutine_handle<> _ResumeCb)
{ // change to .then when future gets .then
thread _WaitingThread([&_Fut, _ResumeCb]{
_Fut.wait();
_ResumeCb();
});
_WaitingThread.detach();
}
So why did you see a win32 threadpool-thread if the coroutines are launched in a regular std::thread? that's because it wasn't the coroutine. std::async calls behind the scenes to concurrency::create_task. a concurrency::task is launched under the win32 threadpool by default. after all, the whole purpose of std::async is to launch the callable in another thread.