Problems about using C++ coroutines - c++

I'm experimenting the incoming C++ coroutines with liburing.
I'm a JavaScript developer and trying to implement a function like Promise.all
My first attempt
template <typename T>
task<std::vector<T>> taskAll1(std::vector<task<T>> list) {
std::vector<T> result;
result.reserve(list.size());
for (auto&& t : list) {
result.push_back(co_await t);
}
co_return result;
}
task is modified from gor_task. I mainly make task::promise_type::initial_suspend return suspend_never to let coroutines start by default.
The usage:
// async_delay: https://github.com/CarterLi/liburing-http-demo/blob/12af992d0d87a721bbe67bb67aee8e4b0e965114/async_coro.hpp#L68
// It waits some seconds asynchronously and then prints the number of seconds it waits.
task<bool> start() {
// TODO: less verbose code?
std::vector<task<int>> vec;
vec.emplace_back(async_delay(1));
vec.emplace_back(async_delay(2));
vec.emplace_back(async_delay(3));
co_await taskAll<int>(std::move(vec));
co_return true;
}
The code works as expected. But if I rearrange the order of emplace_backs, for example let async_delay(2) come before async_delay(1), the code still runs but freezes at the end of program. I know it is because the task async_delay(1) is resolved before it's awaited, but I have no idea how to fix it.
The second attempt, what I want to to do is translating following JS code into C++
/** #param {Promise<any>[]} list */
function taskAll(list) {
return new Promise(resolve => {
const result = new Array(list.length);
let left = list.length;
list.forEach((p, i) => {
p.then(x => {
result[i] = x;
left--;
if (!left) resolve(result);
})
});
});
}
The c++ code
// completion: https://github.com/CarterLi/liburing-http-demo/blob/12af992d0d87a721bbe67bb67aee8e4b0e965114/task.hpp#L78
// It's like JS Promise and can be awaited directly without calling an async function
template <typename T>
task<std::vector<T>> taskAll(std::vector<task<T>> list) {
std::vector<T> result(list.size());
size_t left = list.size();
completion<std::vector<T>> promise;
for (size_t i=0; i<list.size(); ++i) {
[&, i]() mutable -> task<bool> {
result[i] = co_await list[i];
left--;
if (!left) promise.resolve(std::move(result));
co_return true;
}();
}
co_await promise;
co_return result;
}
The code crashes immediately. Async code is really hard to debug and I gave up.
Full code can be found in Github, please help.

For 2nd attempt, I found that I forgot to assign the task returned by lambda expression, which make the task destruct immediately after the lambda function returned.

Related

Why the below program aborts when a std::vector<std::future<T>> is used.?

I wanted to perform hashing of a stream of input messages in multithreading, so was trying to implement
std::vector<std::future<HashData>> futures;
but the program aborts from abort.h when debugging in Visual Studio 2019.
Code Snippet:
std::vector<std::future<HashData>> futures;
std::vector<std::string> messages;
for (int i = 0; i < messages.size(); i++)
{
std::promise<HashData> promiseHashData;
std::future<HashData> futureHashData = promiseHashData.get_future();
futures.emplace_back(std::move(futureHashData));
std::async(std::launch::async, [&]() {PerformHash(std::move(promiseHashData), messages[i]);});
}
std::vector<HashData> vectorOfHashData;
// wait for all async tasks to complete
for (auto& futureObj : futures)
{
vectorOfHashData.push_back(futureObj.get());
}
void PerformHash(std::promise<HashData>&& promObject, std::string& message)
{
ComputeHashUsingSHA256(message);
HashData data;
// set data for HashData object
data.i = i;
data.blocks = blocks;
data.blocksize = blocksize;
data.blockbufs = blockbufs;
data.secs = secs;
memcpy(data.digest, digest, SHA256_DIGEST_SIZE);
data.has_hashdata = has_hashdata;
memcpy(data.hashdata_buf, hashdata_buf, c_hashsize);
promObject.set_value(data);
}
while debugging the code, observed as only few threads were created using async and post that, the program aborts from abort.h as shown in this
image
The problem is that you capture promiseHashData by reference. At each loop iteration it gets invalidated while the async thread performs computation on it.
You need to capture the instance of the promise by moving it into the lambda, like:
std::async(std::launch::async, [promiseHashData2=std::move(promiseHashData)] ()mutable{PerformHash(std::move(promiseHashData2), messages[i]);});
Or use std::async's feature of returning std::future while changing performHash to return hashData. Using both async and promise is redundant.
To build on the good answer from #ALX23z and answer your comments there:
The reason you get that error is that PerformHash (and your lambda) returns void. The return value from std::async is std::future<X>, where X is the return value of the function you give std::async. Here is a small toy example:
struct HashData {std::size_t h;};
HashData performHash(const std::string &msg) // <- returns HashData
{
HashData hd = {msg.size()};
return hd;
}
int main()
{
std::vector<std::string> messages = {"Bla", "klaf", "this is a message"};
std::vector<std::future<HashData>> futures;
for (const auto &msg : messages)
{
auto fut = std::async(std::launch::async, [&]()
{ return performHash(msg); }); // <- also returns HashData
futures.emplace_back(std::move(fut));
}
std::vector<HashData> results;
for (auto &fut : futures)
results.push_back(fut.get());
for (const auto &hash : results)
std::cout << hash.h << '\n';
}
Also note that you can skip the lambda, and call std::async like this:
auto fut = std::async(std::launch::async, performHash, msg); // performHash is a free function
// If performHash is a method of class HashCalculator - included for completeness sake
HashCalculator calc; // Need an instance somewhere
auto fut = std::async(std::launch::async, &HashCalculator::performHash, calc, msg);

How to use std future and async with threading in a for loop with a shared resource as param?

I got stuck with an implementation problem in my threading practice project. I know what I want to achieve but I dont know how. I am new in the topic of std::future and std::async so I am not even sure if the implementation is even possible in the way I imagined.
I have a NuclearPP class which has this function(It should be staright forwad).
const error_codes::PPErrorCode& NuclearPowerPlant::Operation(const float& power_required, float&
power_generated)
{
error_codes::PPErrorCode power_plant_error_code = error_codes::success;
float energy_required_per_core = power_required / (float)operating_generators_;
for (const auto& core_it : reactor_cores_)
{
//Threaded call of each core's GenerateEnergy(energy_required_per_core, power_generated)
}
//Error handling per core with param power_plant_error_code
return power_plant_error_code;
}
I also have a NPPCore class with a function which generates the energy:
const error_codes::NPPCoreErrorCode& GenerateEnergy(const float& energy_required_per_core, float&
produced_energy)
{
//Compliacted stuff which adds the generated energy to the shared resource "produced_energy" received as a param
}
My question is: How can I start a thread for every core_it->GenerateEnergy(energy_required_per_core, power_generated).
Thank you very much in forward. If you need any more information feel free to ask.
Br,
Steve
First - define what information each thread shall provide.
In this case - it is probably something like this:
struct Result
{
error_codes::NPPCoreErrorCode error;
float produced_energy;
};
So your future type is std::future<Result>.
Then start your work in many threads:
std::vector<std::future<Result>> results;
for (const auto& core_it : reactor_cores_)
{
auto action = [&]{
Result res;
res.error = core_it.GenerateEnergy(energy_required_per_core, res.power_generated);
return res;
};
// start thread
results.emplace_back(std::async(std::launch::async, action));
}
Then wait for each thread to finish:
for (auto& f : results) f.wait();
Then, I guess, you want to sum up:
for (auto& f : results) {
Result res = f.get();
if (res.error == error_codes::success)
power_generated += res.power_generated;
else {
power_plant_error_code = res.error;
// depending on your error strategy, you might break here
break;
}
}
Read more here.

Invoke a callback in a boost asio GUI loop exactly once per frame

The following problem originates from https://github.com/cycfi/elements/issues/144 which is me struggling to find a way in elements GUI library to invoke a callback once per frame.
So far in every library I have seen, there is some callback/explicit loop that continuously processes user input, measures time since last frame and performs the render.
In the elements library, such loop is a platform-specific implementation detail and instead, the library-using code is given access to boost::asio::io_context object to which any callable can be posted. poll is invoked inside platform-specific event loop.
I had no problems changing code from typical waterfall update(time_since_last_frame) to posting functors that do it, however this is where the real problem begins:
Posted functors are only invoked once. The answer from the library author is "just post again".
If I post again immediately from the functor, I create an endless busy loop because as soon as one functor from the poll is completed, boost asio runs the newly posted one. This completely freezes the thread that runs the GUI because of an infinite self-reposting callback loop. The answer from the library author is "post with a timer".
If I post with a timer, I don't fix anything:
If the time is too small, it runs out before the callback finishes so the newly posted callback copy is invoked again ... which brings infinite loop again.
If the time is too large to cause an infinite loop, but small enough to fit in multiple times within one frame, it is run multiple times per frame ... which is a waste because there is no point in calculating UI/animation/input state multiple times per frame.
If the time is too large, the callback is not invoked on each frame. The application renders multiple times without processing user-generated events ... which is a waste because identical state is rendered multiple times for each logic update.
There is no way to calculate FPS because library-using code does not even know how many frames have been rendered between posted callbacks (if any).
In other words:
In a typical update+input+render loop the loop runs as fast as possible, yielding as many frames as it can (or to a specified cap thanks to sleeps). If the code is slow, it's just FPS loss.
In elements library, if the callback is too fast it is repeated multiple times per frame because registered timer may finish multiple times within one frame. If the code is too slow, it's a "deadlock" callback loop that never gets out of asio's poll.
I do not want my code to be invoked every X time (or more-than-X because of OS-scheduler). I want my code to be invoked once per frame (preferably with the time delta argument, but I can also measure it myself from previous invokation).
Is such usage of asio in the elements library a bad design? I find the "post with a timer" solution to be an antipattern. It feels to me like fixing a deadlock between 2 threads by adding a sleep in one of them and hoping they will never collide after such change - in case of elements I'm posting a timed callback and hoping it's not too fast to waste CPU but also not to slow to cause infinite timed-callback loop. The ideal time is too hard to calculate because of so many factors that can affect it, including user actions - basically a lose-lose situation.
Extra note 1: I have tried defer instead of poll, no difference.
Extra note 2: I have already created 100+ issues/PRs for the library so it's very likely that a motivating answer will end in another PR. In other words, solutions that attempt to modify library are fine too.
Extra note 3: MCVE (here without a timer, which causes almost-infinite loop until the counter finishes, during coutning the GUI thread is frozen):
#include <elements.hpp>
using namespace cycfi::elements;
bool func()
{
static int x = 0;
if (++x == 10'000'000)
return true;
return false;
}
void post_func(view& v)
{
if (!func())
v.post([&v](){ post_func(v); });
}
int main(int argc, char* argv[])
{
app _app(argc, argv);
window _win(_app.name());
_win.on_close = [&_app]() { _app.stop(); };
view view_(_win);
view_.content(box(rgba(35, 35, 37, 255)));
view_.post([&view_](){ post_func(view_); });
_app.run();
return 0;
}
So, finally found time to look at this.
In the back-end it seems that Elements already integrates with Asio. Therefore, when you post tasks to the view with they become async tasks.
You can give them a delay, so you don't have to busy loop.
Let's do a demo
Defining A Task
Let's define a task that has fake progress and a fixed deadline for completion:
#include <utility>
#include <chrono>
using namespace std::chrono_literals;
auto now = std::chrono::high_resolution_clock::now;
struct Task {
static constexpr auto deadline = 2.0s;
std::chrono::high_resolution_clock::time_point _start = now();
bool _done = false;
void reset() { *this = {}; }
auto elapsed() const { return now() - _start; } // fake progress
auto done() { return std::exchange(_done, elapsed() > deadline); }
};
How To Self-Chain?
As you noticed, this is tricky. You can stoop and just type-erase your handler:
std::function<void()> cheat;
cheat = [&cheat]() {
// do something
cheat(); // self-chain
};
However, just to humor you, let me introduce what functional programming calls the Y combinator.
#include
template<class Fun> struct ycombi {
Fun fun_;
explicit ycombi(Fun fun): fun_(std::move(fun)) {}
template<class ...Args> void operator()(Args &&...args) const {
return fun_(*this, std::forward<Args>(args)...);
}
};
With that, we can create a generic handler posting chainer:
auto chain = [&view_](auto f) {
return ycombi{ [=, &view_](auto self) {
view_.post(10ms, [=] {
if (f())
self();
});
} };
};
I opted for 10ms delay, but you don't have to. Doing no delay means "asap" which would amount to every frame, given the resources.
A Reporter Task
Let's update a progress-bar:
auto prog_bar = share(progress_bar(rbox(colors::black), rbox(pgold)));
auto make_reporter = [=, &view_](Task& t) {
static int s_reporter_id = 1;
return [=, id=s_reporter_id++, &t, &view_] {
std::clog << "reporter " << id << " task at " << (t.elapsed() / 1.0ms) << "ms " << std::endl;
prog_bar->value(t.elapsed() / Task::deadline);
view_.refresh(*prog_bar);
if (t.done()) {
std::clog << "done" << std::endl;
return false;
}
return true;
};
};
Now. let's add a button to start updating the progress bar.
auto task_btn = button("Task #1");
task_btn.on_click = [=,&task1](bool) {
if (task1.done())
task1.reset();
auto progress = chain(make_reporter(task1));
progress();
};
Let's put the button and the bar in the view and run the app:
view_.content(task_btn, prog_bar);
view_.scale(8);
_app.run();
Full Listing
Used current Elements master (a7d1348ae81f7c)
File test.cpp
#include <utility>
#include <chrono>
using namespace std::chrono_literals;
auto now = std::chrono::high_resolution_clock::now;
struct Task {
static constexpr auto deadline = 2.0s;
std::chrono::high_resolution_clock::time_point _start = now();
bool _done = false;
void reset() { *this = {}; }
auto elapsed() const { return now() - _start; } // fake progress
auto done() { return std::exchange(_done, elapsed() > deadline); }
};
#include <functional>
template<class Fun> struct ycombi {
Fun fun_;
explicit ycombi(Fun fun): fun_(std::move(fun)) {}
template<class ...Args> void operator()(Args &&...args) const {
return fun_(*this, std::forward<Args>(args)...);
}
};
#include <elements.hpp>
#include <iostream>
using namespace cycfi::elements;
constexpr auto bred = colors::red.opacity(0.4);
constexpr auto bgreen = colors::green.level(0.7).opacity(0.4);
constexpr auto bblue = colors::blue.opacity(0.4);
constexpr auto brblue = colors::royal_blue.opacity(0.4);
constexpr auto pgold = colors::gold.opacity(0.8);
int main(int argc, char* argv[]) {
app _app(argc, argv);
window _win(_app.name());
_win.on_close = [&_app]() { _app.stop(); };
view view_(_win);
Task task1;
auto chain = [&view_](auto f) {
return ycombi{ [=, &view_](auto self) {
view_.post(10ms, [=] {
if (f())
self();
});
} };
};
auto prog_bar = share(progress_bar(rbox(colors::black), rbox(pgold)));
auto make_reporter = [=, &view_](Task& t) {
static int s_reporter_id = 1;
return [=, id=s_reporter_id++, &t, &view_] {
std::clog << "reporter " << id << " task at " << (t.elapsed() / 1.0ms) << "ms " << std::endl;
prog_bar->value(t.elapsed() / Task::deadline);
view_.refresh(*prog_bar);
if (t.done()) {
std::clog << "done" << std::endl;
return false;
}
return true;
};
};
auto task_btn = button("Task #1");
task_btn.on_click = [=,&task1](bool) {
if (task1.done())
task1.reset();
auto progress = chain(make_reporter(task1));
progress();
};
view_.content(task_btn, prog_bar);
view_.scale(8);
_app.run();
}

D parallel loop

First, how D create parallel foreach (underlying logic)?
int main(string[] args)
{
int[] arr;
arr.length = 100000000;
/* Why it is working?, it's simple foreach which working with
reference to int from arr, parallel function return ParallelForeach!R
(ParallelForeach!int[]), but I don't know what it is.
Parallel function is part od phobos library, not D builtin function, then what
kind of magic is used for this? */
foreach (ref e;parallel(arr))
{
e = 100;
}
foreach (ref e;parallel(arr))
{
e *= e;
}
return 0;
}
And second, why it is slower then simple foreach?
Finally, If I create my own taskPool (and don't use global taskPool object), program never end. Why?
parallel returns a struct (of type ParallelForeach) that implements the opApply(int delegate(...)) foreach overload.
when called the struct submits a parallel function to the private submitAndExecute which submits the same task to all threads in the pool.
this then does:
scope(failure)
{
// If an exception is thrown, all threads should bail.
atomicStore(shouldContinue, false);
}
while (atomicLoad(shouldContinue))
{
immutable myUnitIndex = atomicOp!"+="(workUnitIndex, 1);
immutable start = workUnitSize * myUnitIndex;
if(start >= len)
{
atomicStore(shouldContinue, false);
break;
}
immutable end = min(len, start + workUnitSize);
foreach(i; start..end)
{
static if(withIndex)
{
if(dg(i, range[i])) foreachErr();
}
else
{
if(dg(range[i])) foreachErr();
}
}
}
where workUnitIndex and shouldContinue are shared variables and dg is the foreach delegate
The reason it is slower is simply because of the overhead required to pass the function to the threads in the pool and atomically accessing the shared variables.
the reason your custom pool doesn't shut down is likely you don't shut down the threadpool with finish

Understanding an example about resumable functions in proposal N3650 for C++1y

Consider the following example taken from N3650:
int cnt = 0;
do {
cnt = await streamR.read(512, buf);
if (cnt == 0)
break;
cnt = await streamW.write(cnt, buf);
} while (cnt > 0);
I am probably missing something, but if I understood async and await well, what is the point in showing the usefulness of the two constructs with the above example when the effects are equivalent to writing:
int cnt = 0;
do {
cnt = streamR.read(512, buf).get();
if (cnt == 0)
break;
cnt = streamW.write(cnt, buf).get();
} while (cnt > 0);
where both the read().get() and write().get() calls are synchronous?
The await keyword is not equal to calling get on a future. You might look at it more like this, suppose you start from this:
future<T> complex_function()
{
do_some_stuff();
future<Result> x = await some_async_operation();
return do_some_other_stuff(x);
}
This is functionally more or less the same as
future<T> complex_function()
{
do_some_stuff();
return some_async_operation().then([=](future<Result> x) {
return do_some_other_stuff(x);
});
}
Note the more or less, because there are some resource management implications, variables created in do_some_stuff shouldn't be copied to execute do_some_other_stuff like the lambda version will do.
The second variant makes it more clear what will happen upon invocation.
The do_some_stuff() will be invoked synchronously when you call complex_function.
some_async_operation is called asynchronously and results in a future. The exact moment when this operation is executed depends on your actual asynchronous calling implementation, it might be immediate when you use threads, it might be whenever the .get() is called when you use defered execution.
We don't execute do_some_other_stuff immediately, but rather chain it to the future obtained in step 2. This means that it can be executed as soon as the result from some_async_operation is ready but not before. Aside from that, it's moment of execution is determined by the runtime. If the implementation would just wrap the then proposal, this means it would inherit the parent future's executor/launch policy (as per N3558).
The function returns the last future, that represents the eventual result. Note this NEEDS to be a future, as part of the function body is executed asynchronously.
A more complete example (hopefully correct):
future<void> forwardMsgs(istream& streamR, ostream& streamW) async
{
char buf[512];
int cnt = 0;
do {
cnt = await streamR.read(512, buf);
if (cnt == 0)
break;
cnt = await streamW.write(cnt, buf);
} while (cnt > 0);
}
future<void> fut = forwardMsgs(myStreamR, myStreamW);
/* do something */
fut.get();
The important point is (quoting from the draft):
After suspending, a resumable function may be resumed by the scheduling logic of the runtime and will eventually complete its logic, at which point it executes a return statement (explicit or implicit) and sets the function’s result value in the placeholder.
and:
A resumable function may continue execution on another thread after resuming following a suspension of its execution.
That is, the thread who originally called forwardMsgs can return at any of the suspension points. If it does, during the /* do something */ line, the code inside forwardMsgs can be executed by another thread even though the function has been called "synchronously".
This example is very similar to
future<void> fut = std::async(forwardMsgs, myStreamR, myStreamW);
/* do something */
fut.get();
The difference is the resumable function can be executed by different threads: a different thread can resume execution (of the resumable function) after each resumption/suspension point.
I think the idea is that the streamR.read() and streamW.write() calls are asynchronous I/O operations and return futures, which are automatically waited on by the await expressions.
So the equivalent synchronous version would have to call future::get() to obtain the results e.g.
int cnt = 0;
do {
cnt = streamR.read(512, buf).get();
if (cnt == 0)
break;
cnt = streamW.write(cnt, buf).get();
} while (cnt > 0);
You're correct to point out that there is no concurrency here. However in the context of a resumable function the await makes the behaviour different to the snippet above. When the await is reached the function will return a future, so the caller of the function can proceed without blocking even if the resumable function is blocked at an await while waiting for some other result (e.g. in this case the read() or write() calls to finish.) The resumable function might resume running asynchronously, so the result becomes available in the background while the caller is doing something else.
Here's the correct translation of the example function to not use await:
struct Copy$StackFrame {
promise<void> $result;
input_stream& streamR;
output_stream& streamW;
int cnt;
char buf[512];
};
using Copy$StackPtr = std::shared_ptr<Copy$StackFrame>;
future<void> Copy(input_stream& streamR, output_stream& streamW) {
Copy$StackPtr $stack{ new Copy$StackFrame{ {}, streamR, streamW, 0 } };
future<int> f$1 = $stack->streamR.read(512, stack->buf);
f$1.then([$stack](future<int> f) { Copy$Cont1($stack, std::move(f)); });
return $stack->$result.get_future();
}
void Copy$Cont1(Copy$StackPtr $stack, future<int> f$1) {
try {
$stack->cnt = f$1.get();
if ($stack->cnt == 0) {
// break;
$stack->$result.set_value();
return;
}
future<int> f$2 = $stack->streamW.write($stack->cnt, $stack->buf);
f$2.then([$stack](future<int> f) { Copy$Cont2($stack, std::move(f)); });
} catch (...) {
$stack->$result.set_exception(std::current_exception());
}
}
void Copy$Cont2(Copy$StackPtr $stack, future<int> f$2) {
try {
$stack->cnt = f$2.get();
// while (cnt > 0)
if (cnt <= 0) {
$stack->$result.set_value();
return;
}
future<int> f$1 = $stack->streamR.read(512, stack->buf);
f$1.then([$stack](future<int> f) { Copy$Cont1($stack, std::move(f)); });
} catch (...) {
$stack->$result.set_exception(std::current_exception());
}
}
As you can see, the compiler transformation here is quite complex. The key point here is that, unlike the get() version, the original Copy returns its future as soon as the first async call has been made.
I have the same issue with the meaning of the difference between these two code samples. Let's re write them a little to be more complete.
// Having two functions
future<void> f (istream&streamR, ostream&streamW) async
{ int cnt = 0;
do {
cnt = await streamR.read(512, buf);
if (cnt == 0)
break;
cnt = await streamW.write(cnt, buf);
} while (cnt > 0);
}
void g(istream&streamR, ostream&streamW)
{ int cnt = 0;
do {
cnt = streamR.read(512, buf).get();
if (cnt == 0)
break;
cnt = streamW.write(cnt, buf).get();
} while (cnt > 0);
}
// what is the difference between
auto a = f(streamR, streamW);
// and
auto b = async(g, streamR, streamW);
You still need at least three stacks. In both cases main thread is not blocked. Is it assumption that await would be implemented by compiler more efficiently than future<>:get()?. Well, the one without await can be used now.
Thanks
Adam Zielinski