There are plenty of tutorials that explain how it's easy to use coroutines in C++, but I've spent a lot of time getting how to schedule "detached" coroutines.
Assume, I have the following definition of coroutine result type:
struct task {
struct promise_type {
auto initial_suspend() const noexcept { return std::suspend_never{}; }
auto final_suspend() const noexcept { return std::suspend_never{}; }
void return_void() const noexcept { }
void unhandled_exception() const { std::terminate(); }
task get_return_object() const noexcept { return {}; }
};
};
And there is also a method that runs "detached" coroutine, i.e. runs it asynchronously.
/// Handler should have overloaded operator() returning task.
template<class Handler>
void schedule_coroutine(Handler &&handler) {
std::thread([handler = std::forward<Handler>(handler)]() { handler(); }).detach();
}
Obviously, I can not pass lambda-functions or any other functional object that has a state into this method, because once the coroutine is suspended, the lambda passed into std::thread method will be destroyed with all the captured variables.
task coroutine_1() {
std::vector<object> objects;
// ...
schedule_coroutine([objects]() -> task {
// ...
co_await something;
// ...
co_return;
});
// ...
co_return;
}
int main() {
// ...
schedule_coroutine(coroutine_1);
// ...
}
I think there is should be a way to save the handler somehow (preferably near or within the coroutine promise) so that the next time coroutine is resumed it won't try to access to the destroyed object data. But unfortunately I have no idea how to do it.
I think your problem is a general (and common) misunderstanding of how co_await coroutines are intended to work.
When a function performs co_await <expr>, this (generally) means that the function suspends execution until expr resumes its execution. That is, your function is waiting until some process completes (and typically returns a value). That process, represented by expr, is the one who is supposed to resume the function (generally).
The whole point of this is to make code that executes asynchronously look like synchronous code as much as possible. In synchronous code, you would do something like <expr>.wait(), where wait is a function that waits for the task represented by expr to complete. Instead of "waiting" on it, you "a-wait" or "asynchronously wait" on it. The rest of your function executes asynchronously relative to your caller, based on when expr completes and how it decides to resume your function's execution. In this way, co_await <expr> looks and appears to act very much like <expr>.wait().
Compiler Magictm then goes in behind the scenes to make it asynchronous.
So the idea of launching a "detached coroutine" doesn't make sense within this framework. The caller of a coroutine function (usually) isn't the one who determines where the coroutine executes; it's the processes the coroutine invokes during its execution that decides that.
Your schedule_coroutine function really ought to just be a regular "execute a function asynchronously" operation. It shouldn't have any particular association with coroutines, nor any expectation that the given functor is or represents some asynchronous task or if it happens to invoke co_await. The function is just going to create a new thread and execute a function on it.
Just as you would have done pre-C++20.
If your task type represents an asynchronous task, then in proper RAII style, its destructor ought to wait until the task is completed before exiting (this includes any resumptions of coroutines scheduled by that task, throughout the entire execution of said task. The task isn't done until it is entirely done). Therefore, if handler() in your schedule_coroutine call returns a task, then that task will be initialized and immediately destroyed. Since the destructor waits for the asynchronous task to complete, the thread will not die until the task is done. And since the thread's functor is copied/moved from the function object given to the thread constructor, any captures will continue to exist until the thread itself exits.
I hope I got you right, but I think there might be a couple of misconceptions here. First off, you clearly cannot detach a coroutine, that would not make any sense at all. But you can execute asynchronous tasks inside a coroutine for sure, even though in my opinion this defeats its purpose entirely.
But let's take a look at the second block of code you posted. Here you invoke std::async and forward a handler to it. Now, in order to prevent any kind of early destruction you should use std::move instead and pass the handler to the lambda so it will be kept alive for as long as the scope of the lambda function is valid. This should probably already answer your final question as well, because the place where you want this handler to be stored would be the lambda capture itself.
Another thing that bothers me is the usage of std::async. The call will return a std::future-kind of type that will block until the lambda has been executed. But this will only happen if you set the launch type to std::launch::async, otherwise you will need to call .get() or .wait() on the future as the default launch type is std::launch::deferred and this will only lazy fire (meaning: when you actually request the result).
So, in your case and if you really wanted to use coroutines that way, I would suggest to use a std::thread instead and store it for a later join() somewhere pseudo-globally. But again, I don't think you would really want to use coroutines mechanics that way.
Your question makes perfect sense, the misunderstanding is C++20 coroutines are actually generators mistakenly occupying coroutine header name.
Let me explain how generators work and then answer how to schedule detached coroutine.
How generators work
Your question Scheduling a detached coroutine then looks How to schedule a detached generator and answer is: not possible because special convention transforms regular function into generator function.
What is not obvious right there is the yielding a value must take place inside generator function body. When you want to call a helper function that yields value for you - you can't. Instead you also make a helper function into generator and then await instead of just calling helper function. This effectively chains generators and might feel like writing synchronous code that executes async.
In Javascript special convention is async keyword. In Python special convention is yield instead of return keyword.
The C++20 coroutines are low level mechanism allowing to implement Javascipt like async/await.
Nothing wrong with including this low-level mechanism in C++ language except placing it in header named coroutine.
How to schedule detached coroutine
This question makes sense if you want to have green threads or fibers and you are writing scheduler logic that uses symmetric or asymmetric coroutines to accomplish this.
Now others might ask: why should anyone bother with fibers(not windows fibers;) when you have generators? The answer is because you can have encapsulated concurrency and parallelism logic, meaning rest of your team isn't required to learn and apply additional mental gymnastics while working on the project.
The result is true asynchronous programming where the rest of the team write linear code, without callbacks and such, with simple concept of concurrency for example single spawn() library function, avoiding any locks/mutexes and other multithreading complexity.
The beauty of encapsulation is seen when all details are hidden in low level i/o methods. All context switching, scheduling, etc. happens deep inside i/o classes like Channel, Queue or File.
Everyone involved in async programming should experience working like this. The feeling is intense.
To accomplish this instead of C++20 coroutines use Boost::fiber that includes scheduler or Boost::context that allows symmetric coroutines. Symmetric coroutines allow to suspend and switch to any other coroutine while asymmetric coroutines suspend and resume calling coroutine.
Related
I had a question regarding the working of co_await in C++. I have the following code snippet:-
// Downloads url to cache and
// returns cache file path.
future<path> cacheUrl(string url)
{
cout << "Downloading url.";
string text = co_await downloadAsync(url); // suspend coroutine
cout << "Saving in cache.";
path p = randomFileName();
co_await saveInCacheAsync(p, text); // suspend coroutine
co_return p;
}
int main(void) {
future<path> filePath = cacheUrl("https://localhost:808/");
return 0;
}
The co_await keyword is used to suspend the execution of any co-routine. We have 2 instances in the above code where it is used. In the main function, we get access to the co-routine. When the program executes the line co_await downloadAsync(url) will it invoke downloadAsync or just suspend the co-routine.
Also, for executing the next saveInCacheAsync(p, text) function, should the main function call resume on the co-routine ? Or will it get called automatically ?
The coroutine model in C++ is opaque: the caller of a coroutine sees it as an ordinary function call that synchronously returns a value of the declared type (here, future<path>). That value is but a placeholder, though: the function body executes only when that result is awaited—but not necessarily co_awaited, since the caller need not be a coroutine (the opacity again).
Separately, co_await may suspend a coroutine, but need not do so (consider that it might be “waiting” on a coroutine with an empty function body). It’s also quite separate from calling the coroutine: one may write
auto cr=coroutine(…);
do_useful_work();
co_await cr;
to create the placeholder long before using it.
co_await in C++ is an operator, just like prefix * or whatever. If you saw *downloadAsync(...), you would expect the function call to happen, then the * operator would act on the value returned by that function. So too with co_await.
The objects that downloadAsync and saveInCacheAsync return are expected to have some mechanism in them to determine when and where to continue the execution of the coroutine once their asynchronous processes have concluded. The co_await expression (potentially) suspends execution of the coroutine and then accesses those mechanisms, scheduling the resumption of the coroutine's execution with that mechanism.
The future object return value defined by your coroutine function is meant to be able to shepherd the co_returned value from your function to whomever asks for it. How that works depends entirely on how you wrote your promise/future machinery for your coroutine.
The typical way to handle it is to be able to block the thread who asks for the value (eg. with a mutex) until the asynchronous process computing that value has completed. But it could do something else. Indeed, being able to co_await on such things, and thereby form long-chains of asynchronous continuations to build more complex values, is a common part of most coroutine machinery.
But at some point, someone has to actually retrieve the value resulting from all of this.
Boost::Coroutine2 and CoroutineTS(C++20) are popular coroutine implementations in C++. Both do suspend and resume but two implementations follow a quite different approaches.
CoroutineTS(C++20)
Stackless
Suspend by return
Uses special keywords
generator<int> Generate()
{
co_yield;
});
boost::coroutine2
Stackful
Suspend by call
Do not use special keywords
pull_type source([](push_type& sink)
{
sink();
});
Are there any specific use cases where I should select only one of them?
The main technical distinction is whether you want to be able to yield from within a nested call. This cannot be done using stackless coroutines.
Another thing to consider is that stackful coroutines have a stack and context (such as signal masks, the stack pointer, the CPU registers, etc.) of their own, so they have a larger memory footprint than stackless coroutines. This can be an issue especially if you have a resource constrained system or massive amounts of coroutines existing simultaneously.
I have no idea how they compare performance-wise in the real world, but in general, stackless coroutines are more efficient, as they have less overhead (stackless task switches do not have to swap stacks, store/load registers, and restore the signal mask, etc.).
For an example of a minimal stackless coroutine implementation, see Simon Tatham's coroutines using Duff's Device. It is pretty intuitive that they are as efficient as you can get.
Also, this question has nice answers that go more into details about the differences between stackful and stackless coroutines.
How to yield from a nested call in stackless coroutines?
Even though I said it's not possible, that was not 100% true: You can use (at least two) tricks to achieve this, each with some drawbacks:
First, you have to convert every call that should be able to yield your calling coroutine into a coroutine as well. Now, there are two ways:
The trampoline approach: You simply call the child coroutine from the parent coroutine in a loop, until it returns. Every time you notify the child coroutine, if it does not finish, you also yield the calling coroutine. Note that this approach forbids calling the child coroutine directly, you always have to call the outermost coroutine, which then has to re-enter the whole callstack. This has a call and return complexity of O(n) for nesting depth n. If you are waiting for an event, the event simply has to notify the outermost coroutine.
The parent link approach: You pass the parent coroutine address to the child coroutine, yield the parent coroutine, and the child coroutine manually resumes the parent coroutine once it finishes. Note that this approach forbids calling any coroutine besides the inner-most coroutine directly. This approach has a call and return complexity of O(1), so it is generally preferable. The drawback is that you have to manually register the innermost coroutine somewhere, so that the next event that wants to resume the outer coroutine knows which inner coroutine to directly target.
Note: By call and return complexity I mean the number of steps taken when notifying a coroutine to resume it, and the steps taken after notifying it to return to the calling notifier again.
What are coroutines in c++20?
In what ways it is different from "Parallelism2" or/and "Concurrency2" (look into below image)?
The below image is from ISOCPP.
https://isocpp.org/files/img/wg21-timeline-2017-03.png
At an abstract level, Coroutines split the idea of having an execution state off of the idea of having a thread of execution.
SIMD (single instruction multiple data) has multiple "threads of execution" but only one execution state (it just works on multiple data). Arguably parallel algorithms are a bit like this, in that you have one "program" run on different data.
Threading has multiple "threads of execution" and multiple execution states. You have more than one program, and more than one thread of execution.
Coroutines has multiple execution states, but does not own a thread of execution. You have a program, and the program has state, but it has no thread of execution.
The easiest example of coroutines are generators or enumerables from other languages.
In pseudo code:
function Generator() {
for (i = 0 to 100)
produce i
}
The Generator is called, and the first time it is called it returns 0. Its state is remembered (how much state varies with implementation of coroutines), and the next time you call it it continues where it left off. So it returns 1 the next time. Then 2.
Finally it reaches the end of the loop and falls off the end of the function; the coroutine is finished. (What happens here varies based on language we are talking about; in python, it throws an exception).
Coroutines bring this capability to C++.
There are two kinds of coroutines; stackful and stackless.
A stackless coroutine only stores local variables in its state and its location of execution.
A stackful coroutine stores an entire stack (like a thread).
Stackless coroutines can be extremely light weight. The last proposal I read involved basically rewriting your function into something a bit like a lambda; all local variables go into the state of an object, and labels are used to jump to/from the location where the coroutine "produces" intermediate results.
The process of producing a value is called "yield", as coroutines are bit like cooperative multithreading; you are yielding the point of execution back to the caller.
Boost has an implementation of stackful coroutines; it lets you call a function to yield for you. Stackful coroutines are more powerful, but also more expensive.
There is more to coroutines than a simple generator. You can await a coroutine in a coroutine, which lets you compose coroutines in a useful manner.
Coroutines, like if, loops and function calls, are another kind of "structured goto" that lets you express certain useful patterns (like state machines) in a more natural way.
The specific implementation of Coroutines in C++ is a bit interesting.
At its most basic level, it adds a few keywords to C++: co_return co_await co_yield, together with some library types that work with them.
A function becomes a coroutine by having one of those in its body. So from their declaration they are indistinguishable from functions.
When one of those three keywords are used in a function body, some standard mandated examining of the return type and arguments occurs and the function is transformed into a coroutine. This examining tells the compiler where to store the function state when the function is suspended.
The simplest coroutine is a generator:
generator<int> get_integers( int start=0, int step=1 ) {
for (int current=start; true; current+= step)
co_yield current;
}
co_yield suspends the functions execution, stores that state in the generator<int>, then returns the value of current through the generator<int>.
You can loop over the integers returned.
co_await meanwhile lets you splice one coroutine onto another. If you are in one coroutine and you need the results of an awaitable thing (often a coroutine) before progressing, you co_await on it. If they are ready, you proceed immediately; if not, you suspend until the awaitable you are waiting on is ready.
std::future<std::expected<std::string>> load_data( std::string resource )
{
auto handle = co_await open_resouce(resource);
while( auto line = co_await read_line(handle)) {
if (std::optional<std::string> r = parse_data_from_line( line ))
co_return *r;
}
co_return std::unexpected( resource_lacks_data(resource) );
}
load_data is a coroutine that generates a std::future when the named resource is opened and we manage to parse to the point where we found the data requested.
open_resource and read_lines are probably async coroutines that open a file and read lines from it. The co_await connects the suspending and ready state of load_data to their progress.
C++ coroutines are much more flexible than this, as they were implemented as a minimal set of language features on top of user-space types. The user-space types effectively define what co_return co_await and co_yield mean -- I've seen people use it to implement monadic optional expressions such that a co_await on an empty optional automatically propogates the empty state to the outer optional:
modified_optional<int> add( modified_optional<int> a, modified_optional<int> b ) {
co_return (co_await a) + (co_await b);
}
instead of
std::optional<int> add( std::optional<int> a, std::optional<int> b ) {
if (!a) return std::nullopt;
if (!b) return std::nullopt;
return *a + *b;
}
A coroutine is like a C function which has multiple return statements and when called a 2nd time does not start execution at the begin of the function but at the first instruction after the previous executed return. This execution location is saved together with all automatic variables that would live on the stack in non coroutine functions.
A previous experimental coroutine implementation from Microsoft did use copied stacks so you could even return from deep nested functions. But this version was rejected by the C++ committee. You can get this implementation for example with the Boosts fiber library.
coroutines are supposed to be (in C++) functions that are able to "wait" for some other routine to complete and to provide whatever is needed for the suspended, paused, waiting, routine to go on. the feature that is most interesting to C++ folks is that coroutines would ideally take no stack space...C# can already do something like this with await and yield but C++ might have to be rebuilt to get it in.
concurrency is heavily focused on the separation of concerns where a concern is a task that the program is supposed to complete. this separation of concerns may be accomplished by a number of means...usually be delegation of some sort. the idea of concurrency is that a number of processes could run independently (separation of concerns) and a 'listener' would direct whatever is produced by those separated concerns to wherever it is supposed to go. this is heavily dependent on some sort of asynchronous management. There are a number of approaches to concurrency including Aspect oriented programming and others. C# has the 'delegate' operator which works quite nicely.
parallelism sounds like concurrency and may be involved but is actually a physical construct involving many processors arranged in a more or less parallel fashion with software that is able to direct portions of code to different processors where it will be run and the results will be received back synchronously.
Per this latest C++ TS: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4628.pdf, and based on the understanding of C# async/await language support, I'm wondering what is the "execution context" (terminology borrowed from C#) of the C++ coroutines?
My simple test code in Visual C++ 2017 RC reveals that coroutines seem to always execute on a thread pool thread, and little control is given to the application developer on which threading context the coroutines could be executed - e.g. Could an application forces all the coroutines (with the compiler generated state machine code) to be executed on the main thread only, without involving any thread pool thread?
In C#, SynchronizationContext is a way to specify the "context" where all the coroutine "halves" (compiler generated state machine code) will be posted and executed, as illustrated in this post: https://blogs.msdn.microsoft.com/pfxteam/2012/01/20/await-synchronizationcontext-and-console-apps/, while the current coroutine implementation in Visual C++ 2017 RC seems to always rely on the concurrency runtime, which by default executes the generated state machine code on a thread pool thread. Is there a similar concept of synchronization context that the user application can use to bind coroutine execution to a specific thread?
Also, what's the current default "scheduler" behavior of the coroutines as implemented in Visual C++ 2017 RC? i.e. 1) how a wait condition is exactly specified? and 2) when a wait condition is satisfied, who invokes the "bottom half" of the suspended coroutine?
My (naive) speculation regarding Task scheduling in C# is that C# "implements" the wait condition purely by task continuation - a wait condition is synthesized by a TaskCompletionSource owned task, and any code logic that needs to wait will be chained as a continuation to it, so if the wait condition is satisfied, e.g. if a full message is received from the low level network handler, it does TaskCompletionSource.SetValue, which transitions the underlying task to the completed state, effectively allowing the chained continuation logic to start execution (putting the task into the ready state/list from the previous created state) - In C++ coroutine, I'm speculating that std::future and std::promise would be used as similar mechanism (std::future being the task, while std::promise being the TaskCompletionSource, and the usage is surprisingly similar too!) - so does the C++ coroutine scheduler, if any, relies on some similar mechanism to carry out the behavior?
[EDIT]: after doing some further research, I was able to code a very simple yet very powerful abstraction called awaitable that supports single threaded and cooperative multitasking, and features a simple thread_local based scheduler, which can execute coroutines on the thread the root coroutine is started. The code can be found from this github repo: https://github.com/llint/Awaitable
Awaitable is composable in a way that it maintains correct invocation ordering at nested levels, and it features primitive yielding, timed wait, and setting ready from somewhere else, and very complex usage pattern can be derived from this (such as infinite looping coroutines that only get woken up when certain events happen), the programming model follows C# Task based async/await pattern closely. Please feel free to give your feedbacks.
The opposite!
C++ coroutine is all about control. the key point here is the void await_suspend(std::experimental::coroutine_handle<> handle)
function.
evey co_await expects awaitable type. in a nutshell, awaitable type is a type which provide these three functions:
bool await_ready() - should the program halt the execution of the coroutine?
void await_suspend(handle) - the program passes you a continuation context for that coroutine frame. if you activate the handle (for example, by calling operator () that the handle provides - the current thread resumes the coroutine immediately).
T await_resume() - tells the thread which resumes the coroutine what to do when resuming the coroutine and what to return from co_await.
so when you call co_await on awaitable type, the program asks the awaitable if the coroutine should be suspended (if await_ready returns false) and if so - you get a coroutine handle in which you can do whatever you like.
for example, you can pass the coroutine handle to a thread-pool. in this case a thread-pool thread will resume the coroutine.
you can pass the coroutine handle to a simple std::thread - your own create thread will resume the coroutine.
you can attach the coroutine handle into a derived class of OVERLAPPED and resume the coroutine when the asynchronous IO finishes.
as you can see - you can control where and when the coroutine is suspended and resumes - by managing the coroutine handle passed in await_suspend. there is no "default scheduler" - how you implement you awaitable type will decide how the coroutine is schedueled.
So, what happens in VC++? unfortunately, std::future still doesn't have then function, so you can't pass the coroutine handle to a std::future. if you await on std::future - the program will just open a new thread. look at the source code given by the future header:
template<class _Ty>
void await_suspend(future<_Ty>& _Fut,
experimental::coroutine_handle<> _ResumeCb)
{ // change to .then when future gets .then
thread _WaitingThread([&_Fut, _ResumeCb]{
_Fut.wait();
_ResumeCb();
});
_WaitingThread.detach();
}
So why did you see a win32 threadpool-thread if the coroutines are launched in a regular std::thread? that's because it wasn't the coroutine. std::async calls behind the scenes to concurrency::create_task. a concurrency::task is launched under the win32 threadpool by default. after all, the whole purpose of std::async is to launch the callable in another thread.
I need something like this:
void launch_task()
{
std::thread([](){ run_async_task(); });
}
Except thread's destructor will terminate my task. I don't need any control over the task, don't need a return value either. It just has to run its course and then the thread should terminate and C++ thread object should be disposed of. What C++ 11 facility do I need?
I've looked at std::async, but couldn't find an example of usage for my case. It seems to be a pretty complicated system, and I'd need to somehow store and manipulate std::future or it'd become synchronous (if my understanding is correct; I didn't find a good clear article on std::async).
Just detach it immediately after creation.
std::thread([](){ run_async_task(); }).detach();
Once detached, the thread will no longer be joinable, so ~thread() will have no effect.
This answer discusses more details of this behavior.
As mentioned by W.B. below, std::async will not work for the following reason, pulled from this reference.
If the std::future obtained from std::async has temporary object
lifetime (not moved or bound to a variable), the destructor of the
std::future will block at the end of the full expression until the
asynchronous operation completes
Resurrecting an old thread, but there is a neat trick* how to achieve "fire and forget" functionality by using std::async as well, despite the blocking std::future that it returns. The main ingredient is a shared pointer to returned std::future that is captured in lambda by value, causing its reference counter to be incremented. This way the destructor of the std::future won't be invoked until lambda finished its work, providing real asynchronous behaviour, as desired.
template <class F>
void call_async(F&& fun) {
auto futptr = std::make_shared<std::future<void>>();
*futptr = std::async(std::launch::async, [futptr, fun]() {
fun();
});
}
*Kudos to a colleague of mine and true C++ expert, MVV, who showed this trick to me.