C++1z coroutine threading context and coroutine scheduling

C++1z coroutine threading context and coroutine scheduling - c++

Per this latest C++ TS: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4628.pdf, and based on the understanding of C# async/await language support, I'm wondering what is the "execution context" (terminology borrowed from C#) of the C++ coroutines?
My simple test code in Visual C++ 2017 RC reveals that coroutines seem to always execute on a thread pool thread, and little control is given to the application developer on which threading context the coroutines could be executed - e.g. Could an application forces all the coroutines (with the compiler generated state machine code) to be executed on the main thread only, without involving any thread pool thread?
In C#, SynchronizationContext is a way to specify the "context" where all the coroutine "halves" (compiler generated state machine code) will be posted and executed, as illustrated in this post: https://blogs.msdn.microsoft.com/pfxteam/2012/01/20/await-synchronizationcontext-and-console-apps/, while the current coroutine implementation in Visual C++ 2017 RC seems to always rely on the concurrency runtime, which by default executes the generated state machine code on a thread pool thread. Is there a similar concept of synchronization context that the user application can use to bind coroutine execution to a specific thread?
Also, what's the current default "scheduler" behavior of the coroutines as implemented in Visual C++ 2017 RC? i.e. 1) how a wait condition is exactly specified? and 2) when a wait condition is satisfied, who invokes the "bottom half" of the suspended coroutine?
My (naive) speculation regarding Task scheduling in C# is that C# "implements" the wait condition purely by task continuation - a wait condition is synthesized by a TaskCompletionSource owned task, and any code logic that needs to wait will be chained as a continuation to it, so if the wait condition is satisfied, e.g. if a full message is received from the low level network handler, it does TaskCompletionSource.SetValue, which transitions the underlying task to the completed state, effectively allowing the chained continuation logic to start execution (putting the task into the ready state/list from the previous created state) - In C++ coroutine, I'm speculating that std::future and std::promise would be used as similar mechanism (std::future being the task, while std::promise being the TaskCompletionSource, and the usage is surprisingly similar too!) - so does the C++ coroutine scheduler, if any, relies on some similar mechanism to carry out the behavior?
[EDIT]: after doing some further research, I was able to code a very simple yet very powerful abstraction called awaitable that supports single threaded and cooperative multitasking, and features a simple thread_local based scheduler, which can execute coroutines on the thread the root coroutine is started. The code can be found from this github repo: https://github.com/llint/Awaitable
Awaitable is composable in a way that it maintains correct invocation ordering at nested levels, and it features primitive yielding, timed wait, and setting ready from somewhere else, and very complex usage pattern can be derived from this (such as infinite looping coroutines that only get woken up when certain events happen), the programming model follows C# Task based async/await pattern closely. Please feel free to give your feedbacks.

The opposite!
C++ coroutine is all about control. the key point here is the void await_suspend(std::experimental::coroutine_handle<> handle)
function.
evey co_await expects awaitable type. in a nutshell, awaitable type is a type which provide these three functions:
bool await_ready() - should the program halt the execution of the coroutine?
void await_suspend(handle) - the program passes you a continuation context for that coroutine frame. if you activate the handle (for example, by calling operator () that the handle provides - the current thread resumes the coroutine immediately).
T await_resume() - tells the thread which resumes the coroutine what to do when resuming the coroutine and what to return from co_await.
so when you call co_await on awaitable type, the program asks the awaitable if the coroutine should be suspended (if await_ready returns false) and if so - you get a coroutine handle in which you can do whatever you like.
for example, you can pass the coroutine handle to a thread-pool. in this case a thread-pool thread will resume the coroutine.
you can pass the coroutine handle to a simple std::thread - your own create thread will resume the coroutine.
you can attach the coroutine handle into a derived class of OVERLAPPED and resume the coroutine when the asynchronous IO finishes.
as you can see - you can control where and when the coroutine is suspended and resumes - by managing the coroutine handle passed in await_suspend. there is no "default scheduler" - how you implement you awaitable type will decide how the coroutine is schedueled.
So, what happens in VC++? unfortunately, std::future still doesn't have then function, so you can't pass the coroutine handle to a std::future. if you await on std::future - the program will just open a new thread. look at the source code given by the future header:
template<class _Ty>
void await_suspend(future<_Ty>& _Fut,
experimental::coroutine_handle<> _ResumeCb)
{ // change to .then when future gets .then
thread _WaitingThread([&_Fut, _ResumeCb]{
_Fut.wait();
_ResumeCb();
});
_WaitingThread.detach();
}
So why did you see a win32 threadpool-thread if the coroutines are launched in a regular std::thread? that's because it wasn't the coroutine. std::async calls behind the scenes to concurrency::create_task. a concurrency::task is launched under the win32 threadpool by default. after all, the whole purpose of std::async is to launch the callable in another thread.

Related

Thread pool for std::async

In the documentation of std::async it's mentioned that (emphasis mine):
The function template async runs the function f asynchronously (potentially in a separate thread which might be a part of a thread pool) and returns a std::future that will eventually hold the result of that function call.
Is it possible to specify a thread pool for calls to async? If not can I somehow check whether a thread pool was used or not?

The standard std::async lack of this level of control, although implementation may provide such as non-standard guarantees/extensions.
The lack of this level of control is recognized -- there's https://wg21.link/p0443 proposal to address this.
For now the best option is probably to own thread pool using std::thread and std::packaged_task.

Scheduling a coroutine with a context

There are plenty of tutorials that explain how it's easy to use coroutines in C++, but I've spent a lot of time getting how to schedule "detached" coroutines.
Assume, I have the following definition of coroutine result type:
struct task {
struct promise_type {
auto initial_suspend() const noexcept { return std::suspend_never{}; }
auto final_suspend() const noexcept { return std::suspend_never{}; }
void return_void() const noexcept { }
void unhandled_exception() const { std::terminate(); }
task get_return_object() const noexcept { return {}; }
};
};
And there is also a method that runs "detached" coroutine, i.e. runs it asynchronously.
/// Handler should have overloaded operator() returning task.
template<class Handler>
void schedule_coroutine(Handler &&handler) {
std::thread([handler = std::forward<Handler>(handler)]() { handler(); }).detach();
}
Obviously, I can not pass lambda-functions or any other functional object that has a state into this method, because once the coroutine is suspended, the lambda passed into std::thread method will be destroyed with all the captured variables.
task coroutine_1() {
std::vector<object> objects;
// ...
schedule_coroutine([objects]() -> task {
// ...
co_await something;
// ...
co_return;
});
// ...
co_return;
}
int main() {
// ...
schedule_coroutine(coroutine_1);
// ...
}
I think there is should be a way to save the handler somehow (preferably near or within the coroutine promise) so that the next time coroutine is resumed it won't try to access to the destroyed object data. But unfortunately I have no idea how to do it.

I think your problem is a general (and common) misunderstanding of how co_await coroutines are intended to work.
When a function performs co_await <expr>, this (generally) means that the function suspends execution until expr resumes its execution. That is, your function is waiting until some process completes (and typically returns a value). That process, represented by expr, is the one who is supposed to resume the function (generally).
The whole point of this is to make code that executes asynchronously look like synchronous code as much as possible. In synchronous code, you would do something like <expr>.wait(), where wait is a function that waits for the task represented by expr to complete. Instead of "waiting" on it, you "a-wait" or "asynchronously wait" on it. The rest of your function executes asynchronously relative to your caller, based on when expr completes and how it decides to resume your function's execution. In this way, co_await <expr> looks and appears to act very much like <expr>.wait().
Compiler Magictm then goes in behind the scenes to make it asynchronous.
So the idea of launching a "detached coroutine" doesn't make sense within this framework. The caller of a coroutine function (usually) isn't the one who determines where the coroutine executes; it's the processes the coroutine invokes during its execution that decides that.
Your schedule_coroutine function really ought to just be a regular "execute a function asynchronously" operation. It shouldn't have any particular association with coroutines, nor any expectation that the given functor is or represents some asynchronous task or if it happens to invoke co_await. The function is just going to create a new thread and execute a function on it.
Just as you would have done pre-C++20.
If your task type represents an asynchronous task, then in proper RAII style, its destructor ought to wait until the task is completed before exiting (this includes any resumptions of coroutines scheduled by that task, throughout the entire execution of said task. The task isn't done until it is entirely done). Therefore, if handler() in your schedule_coroutine call returns a task, then that task will be initialized and immediately destroyed. Since the destructor waits for the asynchronous task to complete, the thread will not die until the task is done. And since the thread's functor is copied/moved from the function object given to the thread constructor, any captures will continue to exist until the thread itself exits.

I hope I got you right, but I think there might be a couple of misconceptions here. First off, you clearly cannot detach a coroutine, that would not make any sense at all. But you can execute asynchronous tasks inside a coroutine for sure, even though in my opinion this defeats its purpose entirely.
But let's take a look at the second block of code you posted. Here you invoke std::async and forward a handler to it. Now, in order to prevent any kind of early destruction you should use std::move instead and pass the handler to the lambda so it will be kept alive for as long as the scope of the lambda function is valid. This should probably already answer your final question as well, because the place where you want this handler to be stored would be the lambda capture itself.
Another thing that bothers me is the usage of std::async. The call will return a std::future-kind of type that will block until the lambda has been executed. But this will only happen if you set the launch type to std::launch::async, otherwise you will need to call .get() or .wait() on the future as the default launch type is std::launch::deferred and this will only lazy fire (meaning: when you actually request the result).
So, in your case and if you really wanted to use coroutines that way, I would suggest to use a std::thread instead and store it for a later join() somewhere pseudo-globally. But again, I don't think you would really want to use coroutines mechanics that way.

Your question makes perfect sense, the misunderstanding is C++20 coroutines are actually generators mistakenly occupying coroutine header name.
Let me explain how generators work and then answer how to schedule detached coroutine.
How generators work
Your question Scheduling a detached coroutine then looks How to schedule a detached generator and answer is: not possible because special convention transforms regular function into generator function.
What is not obvious right there is the yielding a value must take place inside generator function body. When you want to call a helper function that yields value for you - you can't. Instead you also make a helper function into generator and then await instead of just calling helper function. This effectively chains generators and might feel like writing synchronous code that executes async.
In Javascript special convention is async keyword. In Python special convention is yield instead of return keyword.
The C++20 coroutines are low level mechanism allowing to implement Javascipt like async/await.
Nothing wrong with including this low-level mechanism in C++ language except placing it in header named coroutine.
How to schedule detached coroutine
This question makes sense if you want to have green threads or fibers and you are writing scheduler logic that uses symmetric or asymmetric coroutines to accomplish this.
Now others might ask: why should anyone bother with fibers(not windows fibers;) when you have generators? The answer is because you can have encapsulated concurrency and parallelism logic, meaning rest of your team isn't required to learn and apply additional mental gymnastics while working on the project.
The result is true asynchronous programming where the rest of the team write linear code, without callbacks and such, with simple concept of concurrency for example single spawn() library function, avoiding any locks/mutexes and other multithreading complexity.
The beauty of encapsulation is seen when all details are hidden in low level i/o methods. All context switching, scheduling, etc. happens deep inside i/o classes like Channel, Queue or File.
Everyone involved in async programming should experience working like this. The feeling is intense.
To accomplish this instead of C++20 coroutines use Boost::fiber that includes scheduler or Boost::context that allows symmetric coroutines. Symmetric coroutines allow to suspend and switch to any other coroutine while asymmetric coroutines suspend and resume calling coroutine.

boost::coroutine2 vs CoroutineTS

Boost::Coroutine2 and CoroutineTS(C++20) are popular coroutine implementations in C++. Both do suspend and resume but two implementations follow a quite different approaches.
CoroutineTS(C++20)
Stackless
Suspend by return
Uses special keywords
generator<int> Generate()
{
co_yield;
});
boost::coroutine2
Stackful
Suspend by call
Do not use special keywords
pull_type source([](push_type& sink)
{
sink();
});
Are there any specific use cases where I should select only one of them?

The main technical distinction is whether you want to be able to yield from within a nested call. This cannot be done using stackless coroutines.
Another thing to consider is that stackful coroutines have a stack and context (such as signal masks, the stack pointer, the CPU registers, etc.) of their own, so they have a larger memory footprint than stackless coroutines. This can be an issue especially if you have a resource constrained system or massive amounts of coroutines existing simultaneously.
I have no idea how they compare performance-wise in the real world, but in general, stackless coroutines are more efficient, as they have less overhead (stackless task switches do not have to swap stacks, store/load registers, and restore the signal mask, etc.).
For an example of a minimal stackless coroutine implementation, see Simon Tatham's coroutines using Duff's Device. It is pretty intuitive that they are as efficient as you can get.
Also, this question has nice answers that go more into details about the differences between stackful and stackless coroutines.
How to yield from a nested call in stackless coroutines? 
Even though I said it's not possible, that was not 100% true: You can use (at least two) tricks to achieve this, each with some drawbacks:
First, you have to convert every call that should be able to yield your calling coroutine into a coroutine as well. Now, there are two ways:
The trampoline approach: You simply call the child coroutine from the parent coroutine in a loop, until it returns. Every time you notify the child coroutine, if it does not finish, you also yield the calling coroutine. Note that this approach forbids calling the child coroutine directly, you always have to call the outermost coroutine, which then has to re-enter the whole callstack. This has a call and return complexity of O(n) for nesting depth n. If you are waiting for an event, the event simply has to notify the outermost coroutine.
The parent link approach: You pass the parent coroutine address to the child coroutine, yield the parent coroutine, and the child coroutine manually resumes the parent coroutine once it finishes. Note that this approach forbids calling any coroutine besides the inner-most coroutine directly. This approach has a call and return complexity of O(1), so it is generally preferable. The drawback is that you have to manually register the innermost coroutine somewhere, so that the next event that wants to resume the outer coroutine knows which inner coroutine to directly target.
Note: By call and return complexity I mean the number of steps taken when notifying a coroutine to resume it, and the steps taken after notifying it to return to the calling notifier again.

In what context will resumable functions execute in C++14?

One of the proposals for C++14 is Resumable Functions which gives C++ what is available in C# today with the async/await mechanisms. The basic idea is that a function can be paused
while waiting for an asynchronous operation to complete. When the asynchronous operation completes the function can be resumed where it was paused. This is done in a non-blocking way so that the thread from which the resumable function was invoked will not be blocked.
It is not obvious to me in which context (thread) the function will be resumed. Will it be resumed by the thread from which the function was paused (this is how it is done in C# as I understand it) or does it use another thread?
If it is resumed by the thread from which it was paused, does the thread has to be put in some special state or will the scheduler handle this?

To quote from N3564:
After suspending, a resumable function may be resumed by the scheduling logic of the runtime and will eventually complete its logic, at which point it executes a return statement (explicit or implicit) and sets the function’s result value in the placeholder.
It should thus be noted that there is an asymmetry between the function’s observed behavior from the outside (caller) and the inside: the outside perspective is that function returns a value of type future at the first suspension point, while the inside perspective is that the function returns a value of type T via a return statement, functions returning future/shared_future behaving somewhat different still.
A resumable function may continue execution on another thread after resuming following a suspension of its execution.
This essentially means that
When first called, a resumable function executes in the thread context of its caller.
After each suspension point, the implementation can freely choose on which thread to continue the execution of a resumable function
From the perspective of the calling code, a resumable function works like an asynchronous function, where part of the (observable) behaviour is reliably executed by the time the function call returns, but the final result might not be in yet (the returned future<T> does not have to be in a ready state).
As a programmer, you don't have to jump through hoops to get a resumable function to resume.

C++ pthread - How to cancel a thread?

I have a pthread that I created and now I want that in a specific time interval the thread execute some code. But the user should also be able to cancel the thread. How can I cancel a thread and ensure that the thread is not cancelled when it execute the code?
In Java you handle this with
while(!isInterrupted)
Is there any similar solution with pthreads.

In the Question's example code you are checking some variable. This is not the normal pattern for interrupting threads in Java.
In Java, you interrupt a thread by calling the interrupt() method.
The thread then checks if it is interrupted inside IO and system calls (which can throw InterruptedException when this happens; this means a thread that is sleeping or waiting on IO can be awoken when interrupted) or by sampling the isInterrupted() flag (typically used in a condition in a loop, as in Question).
The distinction is important; checking some flag variable you've declared is only possible in loops and your own code; the Java interrupting system works for all threads and all non-CPU-blocking code without special effort on the part of the programmer.
Pthreads has the pthread_cancel() pattern which works like the Java interrupting pattern.

pthread_cancel is available for sending cancel requests:
A thread's cancellation type, determined by pthread_setcanceltype(3), may be
either asynchronous or deferred (the default for new threads). Asynchronous
cancelability means that the thread can be canceled at any time (usually
immediately, but the system does not guarantee this). Deferred cancelability
means that cancellation will be delayed until the thread next calls a function
that is a cancellation point. A list of functions that are or may be
cancellation points is provided in pthreads(7).
A thread's cancelability state, determined by pthread_setcancelstate(3), can
be enabled (the default for new threads) or disabled. If a thread has
disabled cancellation, then a cancellation request remains queued until the
thread enables cancellation. If a thread has enabled cancellation, then its
cancelability type determines when cancellation occurs.

So there are several options:
1: while value checking (works very well, but you don't have much control).
2: check the pthread_cancel manpage, it works to but with strict rules.
3: using pthread_signal, first you need to block, than signal for resume. It has the same issues as the second option.
Using pthreads cancel and signal will only work from within the thread that must be locked. So setting a variable to initiate the signal block. Unlocking can be done by any other thread.
The same can be done using mutex or semaphores (pthread_mutex, pthread_semaphore).
A site I recommend: http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html

There's no specific function to cancel a threadYou can use pthread_cancel to cancel the thread, as mentioned (but I would advise against it, unless you know what you're doing), and you have to set up your own timers. But the while(!isInterrupted) is pretty acceptable way of doing it.
It should basically be like this:
while(!isInterrupted)
{
// whatever you want to do
sleep(howLongYouWantToWait);
}
// clean up and exit the thread function here
and in the main thread have a global (or other, see below)
volatile bool isInterrupted = false;
and set it to true when you're done, and pthread_join if you want to wait for the thread to finish.
Instead of global, you can use a class variable, or a flag pointer passed to the thread function, or any other way, global is the simplest and the least preferable.
Of course, if you want to cancel the thread while it waits, and not to have it canceled only after it finishes the whole loop, then you need to deal with signals, and other stuff, but I think you're not looking for that.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++1z coroutine threading context and coroutine scheduling - c++

Related

Thread pool for std::async

Scheduling a coroutine with a context

boost::coroutine2 vs CoroutineTS

In what context will resumable functions execute in C++14?

C++ pthread - How to cancel a thread?

Categories

Resources