What are coroutines in C++20? - c++

What are coroutines in c++20?
In what ways it is different from "Parallelism2" or/and "Concurrency2" (look into below image)?
The below image is from ISOCPP.
https://isocpp.org/files/img/wg21-timeline-2017-03.png

At an abstract level, Coroutines split the idea of having an execution state off of the idea of having a thread of execution.
SIMD (single instruction multiple data) has multiple "threads of execution" but only one execution state (it just works on multiple data). Arguably parallel algorithms are a bit like this, in that you have one "program" run on different data.
Threading has multiple "threads of execution" and multiple execution states. You have more than one program, and more than one thread of execution.
Coroutines has multiple execution states, but does not own a thread of execution. You have a program, and the program has state, but it has no thread of execution.
The easiest example of coroutines are generators or enumerables from other languages.
In pseudo code:
function Generator() {
for (i = 0 to 100)
produce i
}
The Generator is called, and the first time it is called it returns 0. Its state is remembered (how much state varies with implementation of coroutines), and the next time you call it it continues where it left off. So it returns 1 the next time. Then 2.
Finally it reaches the end of the loop and falls off the end of the function; the coroutine is finished. (What happens here varies based on language we are talking about; in python, it throws an exception).
Coroutines bring this capability to C++.
There are two kinds of coroutines; stackful and stackless.
A stackless coroutine only stores local variables in its state and its location of execution.
A stackful coroutine stores an entire stack (like a thread).
Stackless coroutines can be extremely light weight. The last proposal I read involved basically rewriting your function into something a bit like a lambda; all local variables go into the state of an object, and labels are used to jump to/from the location where the coroutine "produces" intermediate results.
The process of producing a value is called "yield", as coroutines are bit like cooperative multithreading; you are yielding the point of execution back to the caller.
Boost has an implementation of stackful coroutines; it lets you call a function to yield for you. Stackful coroutines are more powerful, but also more expensive.
There is more to coroutines than a simple generator. You can await a coroutine in a coroutine, which lets you compose coroutines in a useful manner.
Coroutines, like if, loops and function calls, are another kind of "structured goto" that lets you express certain useful patterns (like state machines) in a more natural way.
The specific implementation of Coroutines in C++ is a bit interesting.
At its most basic level, it adds a few keywords to C++: co_return co_await co_yield, together with some library types that work with them.
A function becomes a coroutine by having one of those in its body. So from their declaration they are indistinguishable from functions.
When one of those three keywords are used in a function body, some standard mandated examining of the return type and arguments occurs and the function is transformed into a coroutine. This examining tells the compiler where to store the function state when the function is suspended.
The simplest coroutine is a generator:
generator<int> get_integers( int start=0, int step=1 ) {
for (int current=start; true; current+= step)
co_yield current;
}
co_yield suspends the functions execution, stores that state in the generator<int>, then returns the value of current through the generator<int>.
You can loop over the integers returned.
co_await meanwhile lets you splice one coroutine onto another. If you are in one coroutine and you need the results of an awaitable thing (often a coroutine) before progressing, you co_await on it. If they are ready, you proceed immediately; if not, you suspend until the awaitable you are waiting on is ready.
std::future<std::expected<std::string>> load_data( std::string resource )
{
auto handle = co_await open_resouce(resource);
while( auto line = co_await read_line(handle)) {
if (std::optional<std::string> r = parse_data_from_line( line ))
co_return *r;
}
co_return std::unexpected( resource_lacks_data(resource) );
}
load_data is a coroutine that generates a std::future when the named resource is opened and we manage to parse to the point where we found the data requested.
open_resource and read_lines are probably async coroutines that open a file and read lines from it. The co_await connects the suspending and ready state of load_data to their progress.
C++ coroutines are much more flexible than this, as they were implemented as a minimal set of language features on top of user-space types. The user-space types effectively define what co_return co_await and co_yield mean -- I've seen people use it to implement monadic optional expressions such that a co_await on an empty optional automatically propogates the empty state to the outer optional:
modified_optional<int> add( modified_optional<int> a, modified_optional<int> b ) {
co_return (co_await a) + (co_await b);
}
instead of
std::optional<int> add( std::optional<int> a, std::optional<int> b ) {
if (!a) return std::nullopt;
if (!b) return std::nullopt;
return *a + *b;
}

A coroutine is like a C function which has multiple return statements and when called a 2nd time does not start execution at the begin of the function but at the first instruction after the previous executed return. This execution location is saved together with all automatic variables that would live on the stack in non coroutine functions.
A previous experimental coroutine implementation from Microsoft did use copied stacks so you could even return from deep nested functions. But this version was rejected by the C++ committee. You can get this implementation for example with the Boosts fiber library.

coroutines are supposed to be (in C++) functions that are able to "wait" for some other routine to complete and to provide whatever is needed for the suspended, paused, waiting, routine to go on. the feature that is most interesting to C++ folks is that coroutines would ideally take no stack space...C# can already do something like this with await and yield but C++ might have to be rebuilt to get it in.
concurrency is heavily focused on the separation of concerns where a concern is a task that the program is supposed to complete. this separation of concerns may be accomplished by a number of means...usually be delegation of some sort. the idea of concurrency is that a number of processes could run independently (separation of concerns) and a 'listener' would direct whatever is produced by those separated concerns to wherever it is supposed to go. this is heavily dependent on some sort of asynchronous management. There are a number of approaches to concurrency including Aspect oriented programming and others. C# has the 'delegate' operator which works quite nicely.
parallelism sounds like concurrency and may be involved but is actually a physical construct involving many processors arranged in a more or less parallel fashion with software that is able to direct portions of code to different processors where it will be run and the results will be received back synchronously.

Related

Scheduling a coroutine with a context

There are plenty of tutorials that explain how it's easy to use coroutines in C++, but I've spent a lot of time getting how to schedule "detached" coroutines.
Assume, I have the following definition of coroutine result type:
struct task {
struct promise_type {
auto initial_suspend() const noexcept { return std::suspend_never{}; }
auto final_suspend() const noexcept { return std::suspend_never{}; }
void return_void() const noexcept { }
void unhandled_exception() const { std::terminate(); }
task get_return_object() const noexcept { return {}; }
};
};
And there is also a method that runs "detached" coroutine, i.e. runs it asynchronously.
/// Handler should have overloaded operator() returning task.
template<class Handler>
void schedule_coroutine(Handler &&handler) {
std::thread([handler = std::forward<Handler>(handler)]() { handler(); }).detach();
}
Obviously, I can not pass lambda-functions or any other functional object that has a state into this method, because once the coroutine is suspended, the lambda passed into std::thread method will be destroyed with all the captured variables.
task coroutine_1() {
std::vector<object> objects;
// ...
schedule_coroutine([objects]() -> task {
// ...
co_await something;
// ...
co_return;
});
// ...
co_return;
}
int main() {
// ...
schedule_coroutine(coroutine_1);
// ...
}
I think there is should be a way to save the handler somehow (preferably near or within the coroutine promise) so that the next time coroutine is resumed it won't try to access to the destroyed object data. But unfortunately I have no idea how to do it.
I think your problem is a general (and common) misunderstanding of how co_await coroutines are intended to work.
When a function performs co_await <expr>, this (generally) means that the function suspends execution until expr resumes its execution. That is, your function is waiting until some process completes (and typically returns a value). That process, represented by expr, is the one who is supposed to resume the function (generally).
The whole point of this is to make code that executes asynchronously look like synchronous code as much as possible. In synchronous code, you would do something like <expr>.wait(), where wait is a function that waits for the task represented by expr to complete. Instead of "waiting" on it, you "a-wait" or "asynchronously wait" on it. The rest of your function executes asynchronously relative to your caller, based on when expr completes and how it decides to resume your function's execution. In this way, co_await <expr> looks and appears to act very much like <expr>.wait().
Compiler Magictm then goes in behind the scenes to make it asynchronous.
So the idea of launching a "detached coroutine" doesn't make sense within this framework. The caller of a coroutine function (usually) isn't the one who determines where the coroutine executes; it's the processes the coroutine invokes during its execution that decides that.
Your schedule_coroutine function really ought to just be a regular "execute a function asynchronously" operation. It shouldn't have any particular association with coroutines, nor any expectation that the given functor is or represents some asynchronous task or if it happens to invoke co_await. The function is just going to create a new thread and execute a function on it.
Just as you would have done pre-C++20.
If your task type represents an asynchronous task, then in proper RAII style, its destructor ought to wait until the task is completed before exiting (this includes any resumptions of coroutines scheduled by that task, throughout the entire execution of said task. The task isn't done until it is entirely done). Therefore, if handler() in your schedule_coroutine call returns a task, then that task will be initialized and immediately destroyed. Since the destructor waits for the asynchronous task to complete, the thread will not die until the task is done. And since the thread's functor is copied/moved from the function object given to the thread constructor, any captures will continue to exist until the thread itself exits.
I hope I got you right, but I think there might be a couple of misconceptions here. First off, you clearly cannot detach a coroutine, that would not make any sense at all. But you can execute asynchronous tasks inside a coroutine for sure, even though in my opinion this defeats its purpose entirely.
But let's take a look at the second block of code you posted. Here you invoke std::async and forward a handler to it. Now, in order to prevent any kind of early destruction you should use std::move instead and pass the handler to the lambda so it will be kept alive for as long as the scope of the lambda function is valid. This should probably already answer your final question as well, because the place where you want this handler to be stored would be the lambda capture itself.
Another thing that bothers me is the usage of std::async. The call will return a std::future-kind of type that will block until the lambda has been executed. But this will only happen if you set the launch type to std::launch::async, otherwise you will need to call .get() or .wait() on the future as the default launch type is std::launch::deferred and this will only lazy fire (meaning: when you actually request the result).
So, in your case and if you really wanted to use coroutines that way, I would suggest to use a std::thread instead and store it for a later join() somewhere pseudo-globally. But again, I don't think you would really want to use coroutines mechanics that way.
Your question makes perfect sense, the misunderstanding is C++20 coroutines are actually generators mistakenly occupying coroutine header name.
Let me explain how generators work and then answer how to schedule detached coroutine.
How generators work
Your question Scheduling a detached coroutine then looks How to schedule a detached generator and answer is: not possible because special convention transforms regular function into generator function.
What is not obvious right there is the yielding a value must take place inside generator function body. When you want to call a helper function that yields value for you - you can't. Instead you also make a helper function into generator and then await instead of just calling helper function. This effectively chains generators and might feel like writing synchronous code that executes async.
In Javascript special convention is async keyword. In Python special convention is yield instead of return keyword.
The C++20 coroutines are low level mechanism allowing to implement Javascipt like async/await.
Nothing wrong with including this low-level mechanism in C++ language except placing it in header named coroutine.
How to schedule detached coroutine
This question makes sense if you want to have green threads or fibers and you are writing scheduler logic that uses symmetric or asymmetric coroutines to accomplish this.
Now others might ask: why should anyone bother with fibers(not windows fibers;) when you have generators? The answer is because you can have encapsulated concurrency and parallelism logic, meaning rest of your team isn't required to learn and apply additional mental gymnastics while working on the project.
The result is true asynchronous programming where the rest of the team write linear code, without callbacks and such, with simple concept of concurrency for example single spawn() library function, avoiding any locks/mutexes and other multithreading complexity.
The beauty of encapsulation is seen when all details are hidden in low level i/o methods. All context switching, scheduling, etc. happens deep inside i/o classes like Channel, Queue or File.
Everyone involved in async programming should experience working like this. The feeling is intense.
To accomplish this instead of C++20 coroutines use Boost::fiber that includes scheduler or Boost::context that allows symmetric coroutines. Symmetric coroutines allow to suspend and switch to any other coroutine while asymmetric coroutines suspend and resume calling coroutine.

boost::coroutine2 vs CoroutineTS

Boost::Coroutine2 and CoroutineTS(C++20) are popular coroutine implementations in C++. Both do suspend and resume but two implementations follow a quite different approaches.
CoroutineTS(C++20)
Stackless
Suspend by return
Uses special keywords
generator<int> Generate()
{
co_yield;
});
boost::coroutine2
Stackful
Suspend by call
Do not use special keywords
pull_type source([](push_type& sink)
{
sink();
});
Are there any specific use cases where I should select only one of them?
The main technical distinction is whether you want to be able to yield from within a nested call. This cannot be done using stackless coroutines.
Another thing to consider is that stackful coroutines have a stack and context (such as signal masks, the stack pointer, the CPU registers, etc.) of their own, so they have a larger memory footprint than stackless coroutines. This can be an issue especially if you have a resource constrained system or massive amounts of coroutines existing simultaneously.
I have no idea how they compare performance-wise in the real world, but in general, stackless coroutines are more efficient, as they have less overhead (stackless task switches do not have to swap stacks, store/load registers, and restore the signal mask, etc.).
For an example of a minimal stackless coroutine implementation, see Simon Tatham's coroutines using Duff's Device. It is pretty intuitive that they are as efficient as you can get.
Also, this question has nice answers that go more into details about the differences between stackful and stackless coroutines.
How to yield from a nested call in stackless coroutines? 
Even though I said it's not possible, that was not 100% true: You can use (at least two) tricks to achieve this, each with some drawbacks:
First, you have to convert every call that should be able to yield your calling coroutine into a coroutine as well. Now, there are two ways:
The trampoline approach: You simply call the child coroutine from the parent coroutine in a loop, until it returns. Every time you notify the child coroutine, if it does not finish, you also yield the calling coroutine. Note that this approach forbids calling the child coroutine directly, you always have to call the outermost coroutine, which then has to re-enter the whole callstack. This has a call and return complexity of O(n) for nesting depth n. If you are waiting for an event, the event simply has to notify the outermost coroutine.
The parent link approach: You pass the parent coroutine address to the child coroutine, yield the parent coroutine, and the child coroutine manually resumes the parent coroutine once it finishes. Note that this approach forbids calling any coroutine besides the inner-most coroutine directly. This approach has a call and return complexity of O(1), so it is generally preferable. The drawback is that you have to manually register the innermost coroutine somewhere, so that the next event that wants to resume the outer coroutine knows which inner coroutine to directly target.
Note: By call and return complexity I mean the number of steps taken when notifying a coroutine to resume it, and the steps taken after notifying it to return to the calling notifier again.

C++1z coroutine threading context and coroutine scheduling

Per this latest C++ TS: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4628.pdf, and based on the understanding of C# async/await language support, I'm wondering what is the "execution context" (terminology borrowed from C#) of the C++ coroutines?
My simple test code in Visual C++ 2017 RC reveals that coroutines seem to always execute on a thread pool thread, and little control is given to the application developer on which threading context the coroutines could be executed - e.g. Could an application forces all the coroutines (with the compiler generated state machine code) to be executed on the main thread only, without involving any thread pool thread?
In C#, SynchronizationContext is a way to specify the "context" where all the coroutine "halves" (compiler generated state machine code) will be posted and executed, as illustrated in this post: https://blogs.msdn.microsoft.com/pfxteam/2012/01/20/await-synchronizationcontext-and-console-apps/, while the current coroutine implementation in Visual C++ 2017 RC seems to always rely on the concurrency runtime, which by default executes the generated state machine code on a thread pool thread. Is there a similar concept of synchronization context that the user application can use to bind coroutine execution to a specific thread?
Also, what's the current default "scheduler" behavior of the coroutines as implemented in Visual C++ 2017 RC? i.e. 1) how a wait condition is exactly specified? and 2) when a wait condition is satisfied, who invokes the "bottom half" of the suspended coroutine?
My (naive) speculation regarding Task scheduling in C# is that C# "implements" the wait condition purely by task continuation - a wait condition is synthesized by a TaskCompletionSource owned task, and any code logic that needs to wait will be chained as a continuation to it, so if the wait condition is satisfied, e.g. if a full message is received from the low level network handler, it does TaskCompletionSource.SetValue, which transitions the underlying task to the completed state, effectively allowing the chained continuation logic to start execution (putting the task into the ready state/list from the previous created state) - In C++ coroutine, I'm speculating that std::future and std::promise would be used as similar mechanism (std::future being the task, while std::promise being the TaskCompletionSource, and the usage is surprisingly similar too!) - so does the C++ coroutine scheduler, if any, relies on some similar mechanism to carry out the behavior?
[EDIT]: after doing some further research, I was able to code a very simple yet very powerful abstraction called awaitable that supports single threaded and cooperative multitasking, and features a simple thread_local based scheduler, which can execute coroutines on the thread the root coroutine is started. The code can be found from this github repo: https://github.com/llint/Awaitable
Awaitable is composable in a way that it maintains correct invocation ordering at nested levels, and it features primitive yielding, timed wait, and setting ready from somewhere else, and very complex usage pattern can be derived from this (such as infinite looping coroutines that only get woken up when certain events happen), the programming model follows C# Task based async/await pattern closely. Please feel free to give your feedbacks.
The opposite!
C++ coroutine is all about control. the key point here is the void await_suspend(std::experimental::coroutine_handle<> handle)
function.
evey co_await expects awaitable type. in a nutshell, awaitable type is a type which provide these three functions:
bool await_ready() - should the program halt the execution of the coroutine?
void await_suspend(handle) - the program passes you a continuation context for that coroutine frame. if you activate the handle (for example, by calling operator () that the handle provides - the current thread resumes the coroutine immediately).
T await_resume() - tells the thread which resumes the coroutine what to do when resuming the coroutine and what to return from co_await.
so when you call co_await on awaitable type, the program asks the awaitable if the coroutine should be suspended (if await_ready returns false) and if so - you get a coroutine handle in which you can do whatever you like.
for example, you can pass the coroutine handle to a thread-pool. in this case a thread-pool thread will resume the coroutine.
you can pass the coroutine handle to a simple std::thread - your own create thread will resume the coroutine.
you can attach the coroutine handle into a derived class of OVERLAPPED and resume the coroutine when the asynchronous IO finishes.
as you can see - you can control where and when the coroutine is suspended and resumes - by managing the coroutine handle passed in await_suspend. there is no "default scheduler" - how you implement you awaitable type will decide how the coroutine is schedueled.
So, what happens in VC++? unfortunately, std::future still doesn't have then function, so you can't pass the coroutine handle to a std::future. if you await on std::future - the program will just open a new thread. look at the source code given by the future header:
template<class _Ty>
void await_suspend(future<_Ty>& _Fut,
experimental::coroutine_handle<> _ResumeCb)
{ // change to .then when future gets .then
thread _WaitingThread([&_Fut, _ResumeCb]{
_Fut.wait();
_ResumeCb();
});
_WaitingThread.detach();
}
So why did you see a win32 threadpool-thread if the coroutines are launched in a regular std::thread? that's because it wasn't the coroutine. std::async calls behind the scenes to concurrency::create_task. a concurrency::task is launched under the win32 threadpool by default. after all, the whole purpose of std::async is to launch the callable in another thread.

Event based state machine in c++ with coroutines

Co-routines in c++ is a really powerful technique for implementing state machines however examples that I find on the internet are overly simplistic, e.g. they usually represent some kind of iterator which after calling to some "Next" routine moves along, dependent only on initial arguments of the coroutine. However in reasonably complicated event based state machines each next step depends, on the specific event received which caused to resume the running and also some default event handlers should be implemented for events that can occur at any time.
Suppose we have a simple phone state machine.
STATE:HOOK OFF-->[EVT:DIAL TONE]--> [STATE:DIALING] --> [EVT: NUMBER DIALED] --> STATE:TALKING.
Now I would like a coroutine that would see something like.
PhoneSM()
{
HookOf();
Yield_Till(DialTone_Event);
Dial();
Yield_Till(EndOfDial_Event);
Talk();
...
}
e.g. requirements
Yield_Till would only continue when specific event was receive (how???) when the couroutine run is resumed.If not then it should yield again.
Yield_Till must know how to run the events to default handlers like Hangup_Event because really it can happen any time and it will be cumbersome to add it yield call each time.
Any help with c++ (only!!!) implementation or ready made infrastructures for meeting the requirements will be highly appreciated.
It seems to me that you are trying to code an event-driven state machine as a sequential flow-chart. The difference between state-charts and flow-charts is quite fundamental, and is explained, for example in the article "A Crash Course in UML State Machines":
A state machine should be rather coded as a one-shot function that processes the current event and returns without yielding or blocking. If you wish to use co-routines, you can call this state-machine function from a co-routine, which then yields after each event.
This is an old question but the first I found when searching how to achieve this with C++20 coroutines. As I already implemented it a few times with different approaches I still try to answer it for future readers.
First some background why this is actually a state machine. You might skip this part if you are only interested in how to implement it. State machines were introduced as a standard way to do code that is called once in a while with new events and progresses some internal state. As in this case program counter and state-variables obviously can't live in registers and on stack there is some additional code required to continue where you left of. State machines are a standard way to achieve this without extreme overhead. However it is possible to write coroutines for the same task and every state-machine could be transferred in such a coroutine where each state is a label and the event handling code ends with a goto to the next state at which point it yields. As every developer knows goto-code is spaghetti code and there is a cleaner way to express the intent with flow-control structures. And in fact I have yet to see a state-machine which couldn't be written in a more compact and easier to understand way using coroutines and flow-control. That being said: How can this be implemented in C/C++?
There are a few approaches to do coroutines: it could be done with a switch statement inside a loop like in Duff's device, there were POSIX coroutines which are now obsolete and removed from the standard and C++20 brings modern C++ based coroutines. In order to have a full event-handling state-machine there are a few additional requirements. First of all the coroutine has to yield a set of events that will continue it. Then there needs to be a way to pass the actually occurred event together with its arguments back into the coroutine. And finally there has to be some driver code which manages the events and registers event-handlers, callbacks or signal-slot connections on the awaited events and calls the coroutine once such an event occurred.
In my latest implementations I used event objects that reside inside the coroutine and are yielded by reference/pointer. This way the coroutine is able to decide when such an event is of interest to it even when it might not be in a state where it is able to process it (e.g. a response to a previously send request got answered but the answer isn't to be processed yet). It also allows to use different event-types that might need different approaches to listen for events independent from the used driver code (which could be simplified this way).
Here is a small Duff's device coroutine for the state-machine in the question (with an extra occupied event for demonstration purpose):
class PhoneSM
{
enum State { Start, WaitForDialTone, WaitForEndOfDial, … };
State state = Start;
std::unique_ptr<DialTone_Event> dialToneEvent;
std::unique_ptr<EndOfDial_Event> endOfDialEvent;
std::unique_ptr<Occupied_Event> occupiedEvent;
public:
std::vector<Event*> operator()(Event *lastEvent = nullptr)
{
while (1) {
switch (state) {
case Start:
HookOf();
dialToneEvent = std::make_unique<DialTone_Event>();
state = WaitForDialTone;
// yield ( dialToneEvent )
return std::vector<Event*>{ dialToneEvent.get() };
case WaitForDialTone:
assert(lastEvent == dialToneEvent);
dialToneEvent.reset();
Dial();
endOfDialEvent = std::make_unique<EndOfDial_Event>();
occupiedEvent = std::make_unique<Occupied_Event>();
state = WaitForEndOfDial;
// yield ( endOfDialEvent, occupiedEvent )
return std::vector<Event*>{ endOfDialEvent.get(), occupiedEvent.get() };
case WaitForEndOfDial:
if (lastEvent == occupiedEvent) {
// Just return from the coroutine
return std::vector<Event*>();
}
assert(lastEvent == endOfDialEvent);
occupiedEvent.reset();
endOfDialEvent.reset();
Talk();
…
}
}
}
}
Of course implementing all the coroutine handling makes this overly complicated. A real coroutine would be much simpler. The following is pseudocode:
coroutine std::vector<Event*> PhoneSM() {
HookUp();
{
DialToneEvent dialTone;
yield { & dialTone };
}
Dial();
{
EndOfDialEvent endOfDial;
OccupiedEvent occupied;
Event *occurred = yield { & endOfDial, & occupied };
if (occurred == & occupied) {
return;
}
}
Talk();
…
}
Most co-routine libraries do not off sophisticated yield function. They simply yield and your co-routine will get control back at some arbitrary point. Hence, after a yield you will have to test the appropriate conditions in your code and yield again if they are not met. In this code you would also put tests for events like hangup, in which case you would terminate your co-routine.
There are a number of implementation in the public domain and some operating systems (e.g. Windows) offer co-routine services. Just google for co-routine or fiber.

Why should I use std::async?

I'm trying to explore all the options of the new C++11 standard in depth, while using std::async and reading its definition, I noticed 2 things, at least under linux with gcc 4.8.1 :
it's called async, but it got a really "sequential behaviour", basically in the row where you call the future associated with your async function foo, the program blocks until the execution of foo it's completed.
it depends on the exact same external library as others, and better, non-blocking solutions, which means pthread, if you want to use std::async you need pthread.
at this point it's natural for me asking why choosing std::async over even a simple set of functors ? It's a solution that doesn't even scale at all, the more future you call, the less responsive your program will be.
Am I missing something ? Can you show an example that is granted to be executed in an async, non blocking, way ?
it's called async, but it got a really "sequential behaviour",
No, if you use the std::launch::async policy then it runs asynchronously in a new thread. If you don't specify a policy it might run in a new thread.
basically in the row where you call the future associated with your async function foo, the program blocks until the execution of foo it's completed.
It only blocks if foo hasn't completed, but if it was run asynchronously (e.g. because you use the std::launch::async policy) it might have completed before you need it.
it depends on the exact same external library as others, and better, non-blocking solutions, which means pthread, if you want to use std::async you need pthread.
Wrong, it doesn't have to be implemented using Pthreads (and on Windows it isn't, it uses the ConcRT features.)
at this point it's natural for me asking why choosing std::async over even a simple set of functors ?
Because it guarantees thread-safety and propagates exceptions across threads. Can you do that with a simple set of functors?
It's a solution that doesn't even scale at all, the more future you call, the less responsive your program will be.
Not necessarily. If you don't specify the launch policy then a smart implementation can decide whether to start a new thread, or return a deferred function, or return something that decides later, when more resources may be available.
Now, it's true that with GCC's implementation, if you don't provide a launch policy then with current releases it will never run in a new thread (there's a bugzilla report for that) but that's a property of that implementation, not of std::async in general. You should not confuse the specification in the standard with a particular implementation. Reading the implementation of one standard library is a poor way to learn about C++11.
Can you show an example that is granted to be executed in an async, non blocking, way ?
This shouldn't block:
auto fut = std::async(std::launch::async, doSomethingThatTakesTenSeconds);
auto result1 = doSomethingThatTakesTwentySeconds();
auto result2 = fut.get();
By specifying the launch policy you force asynchronous execution, and if you do other work while it's executing then the result will be ready when you need it.
If you need the result of an asynchronous operation, then you have to block, no matter what library you use. The idea is that you get to choose when to block, and, hopefully when you do that, you block for a negligible time because all the work has already been done.
Note also that std::async can be launched with policies std::launch::async or std::launch::deferred. If you don't specify it, the implementation is allowed to choose, and it could well choose to use deferred evaluation, which would result in all the work being done when you attempt to get the result from the future, resulting in a longer block. So if you want to make sure that the work is done asynchronously, use std::launch::async.
I think your problem is with std::future saying that it blocks on get. It only blocks if the result isn't already ready.
If you can arrange for the result to be already ready, this isn't a problem.
There are many ways to know that the result is already ready. You can poll the future and ask it (relatively simple), you could use locks or atomic data to relay the fact that it is ready, you could build up a framework to deliver "finished" future items into a queue that consumers can interact with, you could use signals of some kind (which is just blocking on multiple things at once, or polling).
Or, you could finish all the work you can do locally, and then block on the remote work.
As an example, imagine a parallel recursive merge sort. It splits the array into two chunks, then does an async sort on one chunk while sorting the other chunk. Once it is done sorting its half, the originating thread cannot progress until the second task is finished. So it does a .get() and blocks. Once both halves have been sorted, it can then do a merge (in theory, the merge can be done at least partially in parallel as well).
This task behaves like a linear task to those interacting with it on the outside -- when it is done, the array is sorted.
We can then wrap this in a std::async task, and have a future sorted array. If we want, we could add in a signally procedure to let us know that the future is finished, but that only makes sense if we have a thread waiting on the signals.
In the reference: http://en.cppreference.com/w/cpp/thread/async
If the async flag is set (i.e. policy & std::launch::async != 0), then
async executes the function f on a separate thread of execution as if
spawned by std::thread(f, args...), except that if the function f
returns a value or throws an exception, it is stored in the shared
state accessible through the std::future that async returns to the
caller.
It is a nice property to keep a record of exceptions thrown.
http://www.cplusplus.com/reference/future/async/
there are three type of policy,
launch::async
launch::deferred
launch::async|launch::deferred
by default launch::async|launch::deferred is passed to std::async.