What is the purpose of .then construction in PPL tasks?

What is the purpose of .then construction in PPL tasks? - c++

I'm interesting what is the purpose of .then construction in PPL and where can I test it? It seems Visual Studio 2012 don't support it yet (may be some future CTP's?). And does it have equivalents in standard C++11 async library?

The purpose is for you to be able to express asynchronous tasks that have to be executed in sequence.
For example, let's say I'm in a GUI application. When the user presses a button, I want to start a task asynchronously to retrieve a file online, then process it to retrieve some kind of data, then use this data to update the GUI. While this is happening, there are tons of other tasks going on, mainly to keep the GUI responsive.
This can be done by using callbacks that call callbacks.
The .then() feature associated with lambdas allows you to write all the callbacks content where you instantiate it (you can still use separate callbacks if you want).
It also don't guarantee that the work of each separate task will be done by the same thread, making possible for free threads to steal the tasks if the initial thread have too much work to do already.
The .then() function doesn't exists in C++11 but it is proposed to be added to the std::future class (that is basically a handle to a task or task result).

Klaim already made a great answer, but I thought I'd give a specific example.
.then attaches a continuation to the task, and is the async equivalent to the synchronous .get, basically.
C++11 has std::future, which is the equivalent to a concurrency::task. std::future currently only has .get, but there is a proposal to add .then (and other good stuff).
std::async(calculate_answer(the_question_of_everything))
.then([](std::future<int> f){ std::cout << f.get() << "\n"; });
The above snippet will create an asynchronous task (launched with std::async), and then attach a continuation which gets passed the std::future of the finished task as soon as the aforementioned task is done. This actually returns another std::future for that task, and the current C++11 standard would block on its destructor, but there is another proposal to make the destructor unblocking. So with the above code, you create a fire-and-forget task that prints the answer as soon as it's calculated.
The blocking equivalent would be:
auto f = std::async(calculate_answer(the_question_of_everything));
std::cout << f.get() << "\n";
This code will block in f.get() until the answer becomes available.

Related

Scheduling a coroutine with a context

There are plenty of tutorials that explain how it's easy to use coroutines in C++, but I've spent a lot of time getting how to schedule "detached" coroutines.
Assume, I have the following definition of coroutine result type:
struct task {
struct promise_type {
auto initial_suspend() const noexcept { return std::suspend_never{}; }
auto final_suspend() const noexcept { return std::suspend_never{}; }
void return_void() const noexcept { }
void unhandled_exception() const { std::terminate(); }
task get_return_object() const noexcept { return {}; }
};
};
And there is also a method that runs "detached" coroutine, i.e. runs it asynchronously.
/// Handler should have overloaded operator() returning task.
template<class Handler>
void schedule_coroutine(Handler &&handler) {
std::thread([handler = std::forward<Handler>(handler)]() { handler(); }).detach();
}
Obviously, I can not pass lambda-functions or any other functional object that has a state into this method, because once the coroutine is suspended, the lambda passed into std::thread method will be destroyed with all the captured variables.
task coroutine_1() {
std::vector<object> objects;
// ...
schedule_coroutine([objects]() -> task {
// ...
co_await something;
// ...
co_return;
});
// ...
co_return;
}
int main() {
// ...
schedule_coroutine(coroutine_1);
// ...
}
I think there is should be a way to save the handler somehow (preferably near or within the coroutine promise) so that the next time coroutine is resumed it won't try to access to the destroyed object data. But unfortunately I have no idea how to do it.

I think your problem is a general (and common) misunderstanding of how co_await coroutines are intended to work.
When a function performs co_await <expr>, this (generally) means that the function suspends execution until expr resumes its execution. That is, your function is waiting until some process completes (and typically returns a value). That process, represented by expr, is the one who is supposed to resume the function (generally).
The whole point of this is to make code that executes asynchronously look like synchronous code as much as possible. In synchronous code, you would do something like <expr>.wait(), where wait is a function that waits for the task represented by expr to complete. Instead of "waiting" on it, you "a-wait" or "asynchronously wait" on it. The rest of your function executes asynchronously relative to your caller, based on when expr completes and how it decides to resume your function's execution. In this way, co_await <expr> looks and appears to act very much like <expr>.wait().
Compiler Magictm then goes in behind the scenes to make it asynchronous.
So the idea of launching a "detached coroutine" doesn't make sense within this framework. The caller of a coroutine function (usually) isn't the one who determines where the coroutine executes; it's the processes the coroutine invokes during its execution that decides that.
Your schedule_coroutine function really ought to just be a regular "execute a function asynchronously" operation. It shouldn't have any particular association with coroutines, nor any expectation that the given functor is or represents some asynchronous task or if it happens to invoke co_await. The function is just going to create a new thread and execute a function on it.
Just as you would have done pre-C++20.
If your task type represents an asynchronous task, then in proper RAII style, its destructor ought to wait until the task is completed before exiting (this includes any resumptions of coroutines scheduled by that task, throughout the entire execution of said task. The task isn't done until it is entirely done). Therefore, if handler() in your schedule_coroutine call returns a task, then that task will be initialized and immediately destroyed. Since the destructor waits for the asynchronous task to complete, the thread will not die until the task is done. And since the thread's functor is copied/moved from the function object given to the thread constructor, any captures will continue to exist until the thread itself exits.

I hope I got you right, but I think there might be a couple of misconceptions here. First off, you clearly cannot detach a coroutine, that would not make any sense at all. But you can execute asynchronous tasks inside a coroutine for sure, even though in my opinion this defeats its purpose entirely.
But let's take a look at the second block of code you posted. Here you invoke std::async and forward a handler to it. Now, in order to prevent any kind of early destruction you should use std::move instead and pass the handler to the lambda so it will be kept alive for as long as the scope of the lambda function is valid. This should probably already answer your final question as well, because the place where you want this handler to be stored would be the lambda capture itself.
Another thing that bothers me is the usage of std::async. The call will return a std::future-kind of type that will block until the lambda has been executed. But this will only happen if you set the launch type to std::launch::async, otherwise you will need to call .get() or .wait() on the future as the default launch type is std::launch::deferred and this will only lazy fire (meaning: when you actually request the result).
So, in your case and if you really wanted to use coroutines that way, I would suggest to use a std::thread instead and store it for a later join() somewhere pseudo-globally. But again, I don't think you would really want to use coroutines mechanics that way.

Your question makes perfect sense, the misunderstanding is C++20 coroutines are actually generators mistakenly occupying coroutine header name.
Let me explain how generators work and then answer how to schedule detached coroutine.
How generators work
Your question Scheduling a detached coroutine then looks How to schedule a detached generator and answer is: not possible because special convention transforms regular function into generator function.
What is not obvious right there is the yielding a value must take place inside generator function body. When you want to call a helper function that yields value for you - you can't. Instead you also make a helper function into generator and then await instead of just calling helper function. This effectively chains generators and might feel like writing synchronous code that executes async.
In Javascript special convention is async keyword. In Python special convention is yield instead of return keyword.
The C++20 coroutines are low level mechanism allowing to implement Javascipt like async/await.
Nothing wrong with including this low-level mechanism in C++ language except placing it in header named coroutine.
How to schedule detached coroutine
This question makes sense if you want to have green threads or fibers and you are writing scheduler logic that uses symmetric or asymmetric coroutines to accomplish this.
Now others might ask: why should anyone bother with fibers(not windows fibers;) when you have generators? The answer is because you can have encapsulated concurrency and parallelism logic, meaning rest of your team isn't required to learn and apply additional mental gymnastics while working on the project.
The result is true asynchronous programming where the rest of the team write linear code, without callbacks and such, with simple concept of concurrency for example single spawn() library function, avoiding any locks/mutexes and other multithreading complexity.
The beauty of encapsulation is seen when all details are hidden in low level i/o methods. All context switching, scheduling, etc. happens deep inside i/o classes like Channel, Queue or File.
Everyone involved in async programming should experience working like this. The feeling is intense.
To accomplish this instead of C++20 coroutines use Boost::fiber that includes scheduler or Boost::context that allows symmetric coroutines. Symmetric coroutines allow to suspend and switch to any other coroutine while asymmetric coroutines suspend and resume calling coroutine.

Custom creation of QFuture

I've faced quite an odd problem with QtConcurrent, mostly because of strange programming desires, maybe it's just an XY-problem, but...
So, there is my code, trying to communicate with the database, a backend code actually (on Qt, yes). It has to work quick and handle some requests, so I need a thread pool. As a well-known fact I suppose the connection establishing itself is a very time-consuming operation, so there is a need in persistent database connections resulting in persistent threads (QSqlDatabase cannot be moved around between the threads). Also it is quite natural to want asynchronous request handling, thus resulting in some need of a simple way to pass them to the persistent threads.
Nothing too complex, lets assume there already exists some boilerplate in a form like...
// That's what I want for now
QFuture<int> res = workers[i]->async(param1, param2);
// OR
// That's what I DO NOT want to get
workers[i]->async(param1, param2, [](QFuture<int> res) { // QFuture to pass exceptions
// callback here
});
That can be done for sure. Why not std::future? Well, it is much easier to use QFutureWatcher and it's signals for notifications about result's readiness. Pure C++ notification solutions are muuuch more complex and callbacks are also someting that has to be dragged through the class hierarchy. Each worker interfaces a thread with DB connections, obviously.
Okay, all of that can be written, but... custom thread pool would mean no QtConcurrent convenience, there seem to be only risky ways to create that QFuture so that it could be returned by the custom worker. QThreadPool is of no use, because it would be a whole big story to create persistent runnables in it. More to say, the boilerplate I've briefly described is gonna be some kind of project's core, used in many places, not something to be easily replaced by a 100 hand-made thread managings.
In short: if I could construst a QFuture for my results, the problem would be solved.
Could anyone point me to a solution or a workaround? Would be grateful for any bright ideas.
UPD:
#VladimirBershov offered a good modern solution which implements observer pattern. After some googling I've found a QPromise library. Of course, constructing a custom QFuture is still hacky and can be only done via undocumented QFutureInterface class, but still some "promise-like" solution makes asynchronous calls neater by far as I can judge.

You can use AsyncFuture library as a custom QFuture creation tool or ideas source:
AsyncFuture - Use QFuture like a Promise object
QFuture is used together with QtConcurrent to represent the result of
an asynchronous computation. It is a powerful component for
multi-thread programming. But its usage is limited to the result of
threads, it doesn't work with the asynchronous signal emitted by
QObject. And it is a bit trouble to setup the listener function via
QFutureWatcher.
AsyncFuture is designed to enhance the function to offer a better way
to use it for asynchronous programming. It provides a Promise object
like interface. This project is inspired by AsynQt and RxCpp.
Features:
Convert a signal from QObject into a QFuture object
Combine multiple futures with different type into a single future object
Use Future like a Promise object
Chainable Callback - Advanced multi-threading programming model
Convert a signal from QObject into a QFuture object:
#include "asyncfuture.h"
using namespace AsyncFuture;
// Convert a signal from QObject into a QFuture object
QFuture<void> future = observe(timer, &QTimer::timeout).future();
/* Listen from the future without using QFutureWatcher<T>*/
observe(future).subscribe([]() {
// onCompleted. It is invoked when the observed future is finished successfully
qDebug() << "onCompleted";
},[]() {
// onCanceled
qDebug() << "onCancel";
});

My idea is to use thread pools with maximum 1 thread available for each.
QThreadPool* persistentThread = new QThreadPool; // no need to write custom thread pool
persistentThread->setMaxThreadCount(1);
persistentThread->setExpiryTimeout(-1);
and then
QFuture<int> future_1 = QtConcurrent::run(persistentThread, func_1);
QFuture<int> future_2 = QtConcurrent::run(persistentThread, func_2);
func_2 will be executed after func_1 in the same one "persistent" thread.

Consume a std::future by connecting a QObject

I have some existing code that uses std::future/std::promise that I'd like to integrate with a Qt GUI cleanly.
Ideally, one could just:
std::future<int> future{do_something()};
connect(future, this, &MyObject::resultOfFuture);
and then implement resultOfFuture as a slot that gets one argument: the int value that came out of the std::future<int>. I've added this suggestion as a comment on QTBUG-50676. I like this best because most of my future/promises are not concurrent anyway, so I'd like to avoid firing up a thread just to wait on them. Also, type inference could then work between the future and the slot's parameter.
But it seems to me that this shouldn't be hard to implement using a wrapper Qt object (e.g., a version of QFutureWatcher that takes a std::future<int>). The two issues with a wrapper are:
the wrapper will have to be concrete in its result type.
the watcher would have to be concurrent in a thread?
Is there a best-practice to implement this sort of connection? Is there another way that can hook into the Qt main loop and avoid thread creation?

std::future is missing continuations. The only way to turn the result of a std::future asynchronously into a function call delivering the result is to launch a thread watching it, and if you want to avoid busy-waiting you need one such thread per std::future, as there is no way to lazy-wait on multiple futures at once.
There are plans to create a future with continuation (a then operation), but they are not in C++ as of c++17 let alone c++11.
You could write your own system of future/promise that mimics the interface of std::future and std::promise that does support continuations, or find a library that already did that.
A busy-wait solution that regularly checked if the future was ready could avoid launching a new thread.
In any case, std::experimental::then would make your problem trivial.
future.then( [some_state](auto future){
try {
auto x = future.get();
// send message with x
} catch( ... ) {
// deal with exception
}
} );
you can write your own std::experimetnal::future or find an implementation to use yourself, but this functionality cannot be provided without using an extra thread with a std::future.

Why should I use std::async?

I'm trying to explore all the options of the new C++11 standard in depth, while using std::async and reading its definition, I noticed 2 things, at least under linux with gcc 4.8.1 :
it's called async, but it got a really "sequential behaviour", basically in the row where you call the future associated with your async function foo, the program blocks until the execution of foo it's completed.
it depends on the exact same external library as others, and better, non-blocking solutions, which means pthread, if you want to use std::async you need pthread.
at this point it's natural for me asking why choosing std::async over even a simple set of functors ? It's a solution that doesn't even scale at all, the more future you call, the less responsive your program will be.
Am I missing something ? Can you show an example that is granted to be executed in an async, non blocking, way ?

it's called async, but it got a really "sequential behaviour",
No, if you use the std::launch::async policy then it runs asynchronously in a new thread. If you don't specify a policy it might run in a new thread.
basically in the row where you call the future associated with your async function foo, the program blocks until the execution of foo it's completed.
It only blocks if foo hasn't completed, but if it was run asynchronously (e.g. because you use the std::launch::async policy) it might have completed before you need it.
it depends on the exact same external library as others, and better, non-blocking solutions, which means pthread, if you want to use std::async you need pthread.
Wrong, it doesn't have to be implemented using Pthreads (and on Windows it isn't, it uses the ConcRT features.)
at this point it's natural for me asking why choosing std::async over even a simple set of functors ?
Because it guarantees thread-safety and propagates exceptions across threads. Can you do that with a simple set of functors?
It's a solution that doesn't even scale at all, the more future you call, the less responsive your program will be.
Not necessarily. If you don't specify the launch policy then a smart implementation can decide whether to start a new thread, or return a deferred function, or return something that decides later, when more resources may be available.
Now, it's true that with GCC's implementation, if you don't provide a launch policy then with current releases it will never run in a new thread (there's a bugzilla report for that) but that's a property of that implementation, not of std::async in general. You should not confuse the specification in the standard with a particular implementation. Reading the implementation of one standard library is a poor way to learn about C++11.
Can you show an example that is granted to be executed in an async, non blocking, way ?
This shouldn't block:
auto fut = std::async(std::launch::async, doSomethingThatTakesTenSeconds);
auto result1 = doSomethingThatTakesTwentySeconds();
auto result2 = fut.get();
By specifying the launch policy you force asynchronous execution, and if you do other work while it's executing then the result will be ready when you need it.

If you need the result of an asynchronous operation, then you have to block, no matter what library you use. The idea is that you get to choose when to block, and, hopefully when you do that, you block for a negligible time because all the work has already been done.
Note also that std::async can be launched with policies std::launch::async or std::launch::deferred. If you don't specify it, the implementation is allowed to choose, and it could well choose to use deferred evaluation, which would result in all the work being done when you attempt to get the result from the future, resulting in a longer block. So if you want to make sure that the work is done asynchronously, use std::launch::async.

I think your problem is with std::future saying that it blocks on get. It only blocks if the result isn't already ready.
If you can arrange for the result to be already ready, this isn't a problem.
There are many ways to know that the result is already ready. You can poll the future and ask it (relatively simple), you could use locks or atomic data to relay the fact that it is ready, you could build up a framework to deliver "finished" future items into a queue that consumers can interact with, you could use signals of some kind (which is just blocking on multiple things at once, or polling).
Or, you could finish all the work you can do locally, and then block on the remote work.
As an example, imagine a parallel recursive merge sort. It splits the array into two chunks, then does an async sort on one chunk while sorting the other chunk. Once it is done sorting its half, the originating thread cannot progress until the second task is finished. So it does a .get() and blocks. Once both halves have been sorted, it can then do a merge (in theory, the merge can be done at least partially in parallel as well).
This task behaves like a linear task to those interacting with it on the outside -- when it is done, the array is sorted.
We can then wrap this in a std::async task, and have a future sorted array. If we want, we could add in a signally procedure to let us know that the future is finished, but that only makes sense if we have a thread waiting on the signals.

In the reference: http://en.cppreference.com/w/cpp/thread/async
If the async flag is set (i.e. policy & std::launch::async != 0), then
async executes the function f on a separate thread of execution as if
spawned by std::thread(f, args...), except that if the function f
returns a value or throws an exception, it is stored in the shared
state accessible through the std::future that async returns to the
caller.
It is a nice property to keep a record of exceptions thrown.

http://www.cplusplus.com/reference/future/async/
there are three type of policy,
launch::async
launch::deferred
launch::async|launch::deferred
by default launch::async|launch::deferred is passed to std::async.

Does async(launch::async) in C++11 make thread pools obsolete for avoiding expensive thread creation?

It is loosely related to this question: Are std::thread pooled in C++11?. Though the question differs, the intention is the same:
Question 1: Does it still make sense to use your own (or 3rd-party library) thread pools to avoid expensive thread creation?
The conclusion in the other question was that you cannot rely on std::thread to be pooled (it might or it might be not). However, std::async(launch::async) seems to have a much higher chance to be pooled.
It don't think that it is forced by the standard, but IMHO I would expect that all good C++11 implementations would use thread pooling if thread creation is slow. Only on platforms where it is inexpensive to create a new thread, I would expect that they always spawn a new thread.
Question 2: This is just what I think, but I have no facts to prove it. I may very well be mistaken. Is it an educated guess?
Finally, here I have provided some sample code that first shows how I think thread creation can be expressed by async(launch::async):
Example 1:
thread t([]{ f(); });
// ...
t.join();
becomes
auto future = async(launch::async, []{ f(); });
// ...
future.wait();
Example 2: Fire and forget thread
thread([]{ f(); }).detach();
becomes
// a bit clumsy...
auto dummy = async(launch::async, []{ f(); });
// ... but I hope soon it can be simplified to
async(launch::async, []{ f(); });
Question 3: Would you prefer the async versions to the thread versions?
The rest is no longer part of the question, but only for clarification:
Why must the return value be assigned to a dummy variable?
Unfortunately, the current C++11 standard forces that you capture the return value of std::async, as otherwise the destructor is executed, which blocks until the action terminates. It is by some considered an error in the standard (e.g., by Herb Sutter).
This example from cppreference.com illustrates it nicely:
{
std::async(std::launch::async, []{ f(); });
std::async(std::launch::async, []{ g(); }); // does not run until f() completes
}
Another clarification:
I know that thread pools may have other legitimate uses but in this question I am only interested in the aspect of avoiding expensive thread creation costs.
I think there are still situations where thread pools are very useful, especially if you need more control over resources.
For example, a server might decide to handle only a fixed number of requests simultaneously to guarantee fast response times and to increase the predictability of memory usage. Thread pools should be fine, here.
Thread-local variables may also be an argument for your own thread pools, but I'm not sure whether it is relevant in practice:
Creating a new thread with std::thread starts without initialized thread-local variables. Maybe this is not what you want.
In threads spawned by async, it is somewhat unclear for me because the thread could have been reused. From my understanding, thread-local variables are not guaranteed to be resetted, but I may be mistaken.
Using your own (fixed-size) thread pools, on the other hand, gives you full control if you really need it.

Question 1:
I changed this from the original because the original was wrong. I was under the impression that Linux thread creation was very cheap and after testing I determined that the overhead of function call in a new thread vs. a normal one is enormous. The overhead for creating a thread to handle a function call is something like 10000 or more times slower than a plain function call. So, if you're issuing a lot of small function calls, a thread pool might be a good idea.
It's quite apparent that the standard C++ library that ships with g++ doesn't have thread pools. But I can definitely see a case for them. Even with the overhead of having to shove the call through some kind of inter-thread queue, it would likely be cheaper than starting up a new thread. And the standard allows this.
IMHO, the Linux kernel people should work on making thread creation cheaper than it currently is. But, the standard C++ library should also consider using pool to implement launch::async | launch::deferred.
And the OP is correct, using ::std::thread to launch a thread of course forces the creation of a new thread instead of using one from a pool. So ::std::async(::std::launch::async, ...) is preferred.
Question 2:
Yes, basically this 'implicitly' launches a thread. But really, it's still quite obvious what's happening. So I don't really think the word implicitly is a particularly good word.
I'm also not convinced that forcing you to wait for a return before destruction is necessarily an error. I don't know that you should be using the async call to create 'daemon' threads that aren't expected to return. And if they are expected to return, it's not OK to be ignoring exceptions.
Question 3:
Personally, I like thread launches to be explicit. I place a lot of value on islands where you can guarantee serial access. Otherwise you end up with mutable state that you always have to be wrapping a mutex around somewhere and remembering to use it.
I liked the work queue model a whole lot better than the 'future' model because there are 'islands of serial' lying around so you can more effectively handle mutable state.
But really, it depends on exactly what you're doing.
Performance Test
So, I tested the performance of various methods of calling things and came up with these numbers on an 8 core (AMD Ryzen 7 2700X) system running Fedora 29 compiled with clang version 7.0.1 and libc++ (not libstdc++):
Do nothing calls per second: 35365257
Empty calls per second: 35210682
New thread calls per second: 62356
Async launch calls per second: 68869
Worker thread calls per second: 970415
And native, on my MacBook Pro 15" (Intel(R) Core(TM) i7-7820HQ CPU # 2.90GHz) with Apple LLVM version 10.0.0 (clang-1000.10.44.4) under OSX 10.13.6, I get this:
Do nothing calls per second: 22078079
Empty calls per second: 21847547
New thread calls per second: 43326
Async launch calls per second: 58684
Worker thread calls per second: 2053775
For the worker thread, I started up a thread, then used a lockless queue to send requests to another thread and then wait for a "It's done" reply to be sent back.
The "Do nothing" is just to test the overhead of the test harness.
It's clear that the overhead of launching a thread is enormous. And even the worker thread with the inter-thread queue slows things down by a factor of 20 or so on Fedora 25 in a VM, and by about 8 on native OS X.
I created an OSDN chamber holding the code I used for the performance test. It can be found here: https://osdn.net/users/omnifarious/pf/launch_thread_performance/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js