Suspend and Resume Lambdas in C++ - c++

I need something to suspend Lambdas in C++ and resume them. I try to narrow it down to a very simple example:
Lets assume I have a singleton class orchestrator where I can register a lambda:
int main() {
orchestrator::getInstance().registerLambda([&](){
// Do something:
...
wait(); // Suspend here
// When waked up continue here:
...
wait(); // Suspend here
...
});
orchestrator::start()
}
In the orchestrator class itself there is a main loop which calls then this lambda-function from time to time.
orchestrator::start()
{
while(true) {
lambda();
// Do other stuff:
...
}
}
I thought about co-routines but they seam to complex in my opinion. The solution should stick with the concept of lambda and standard C++. Modern C++ like '11, '17 or '20 would also be fine.

In the interest of the future where coroutine support will be more complete, here's one way a coroutine could look:
resumable foo() {
std::cout << "Starting foo\n";
while (true) {
co_await std::suspend_always{}; // Could be co_await wait(); if you prefer, if wait() returns suspend_always
std::cout << "Resuming foo\n";
}
}
int main() {
auto m = foo();
for (int i = 0; i < 5; ++i) {
std::cout << "Back in main to resume foo\n";
m();
}
std::cout << "Done main\n";
}
This outputs "Starting foo", followed by 5 back-and-forths between main and foo, and then "Done main". Using a lambda is trivial: specify resumable as the return type. (See the live example)
The messy part is defining resumable, and this part belongs in a library. I'd say it's a good candidate for the standard library in some form after some more common types like task and generator. In fact, this type is basically a generator<void> with a different iteration API. Without using a library, it's not too bad, but note that I haven't bothered to do things like define what happens if you try to resume the lambda after it's done:
class resumable {
std::coroutine_handle<> _coro;
explicit resumable(std::coroutine_handle<> h) noexcept
: _coro(h) {}
public:
// All fluff except for giving resumable a coroutine handle
struct promise_type {
resumable get_return_object() noexcept { return resumable(std::coroutine_handle<promise_type>::from_promise(*this)); }
std::suspend_never initial_suspend() noexcept { return {}; }
std::suspend_never final_suspend() noexcept { return {}; }
void return_void() noexcept {}
void unhandled_exception() noexcept {}
};
// This is how the caller interacts, they just call this object repeatedly.
void operator()() const noexcept {
_coro.resume();
}
};
If not using coroutines, it's back to good old state machines:
struct state_resumable {
// TODO: Store all state
state_resumable() {
std::cout << "Starting resumable\n";
}
void operator()() {
// TODO: Figure out what to execute next based on the stored state
std::cout << "Resuming resumable\n";
}
};
int main() {
auto m = state_resumable();
for (int i = 0; i < 5; ++i) {
std::cout << "Back in main to resume resumable\n";
m();
}
std::cout << "Done main\n";
}
What isn't shown here is the effort required to manually keep track of state. Coroutines automatically store away the local variables in your function and restore them when the coroutine is resumed, plus keep track of which part of the function to execute next. With a state machine, you have to do all of these yourself. You cannot use a single lambda as above because only coroutines can actually suspend mid-execution. With a state machine, you're pretending to do this, but the function must actually finish completely each time.

Related

Switching between threads with C++20 coroutines

There is an example of switching to a different thread with C++20 coroutines:
#include <coroutine>
#include <iostream>
#include <stdexcept>
#include <thread>
auto switch_to_new_thread(std::jthread& out) {
struct awaitable {
std::jthread* p_out;
bool await_ready() { return false; }
void await_suspend(std::coroutine_handle<> h) {
std::jthread& out = *p_out;
if (out.joinable())
throw std::runtime_error("Output jthread parameter not empty");
out = std::jthread([h] { h.resume(); });
// Potential undefined behavior: accessing potentially destroyed *this
// std::cout << "New thread ID: " << p_out->get_id() << '\n';
std::cout << "New thread ID: " << out.get_id() << '\n'; // this is OK
}
void await_resume() {}
};
return awaitable{ &out };
}
struct task {
struct promise_type {
task get_return_object() { return {}; }
std::suspend_never initial_suspend() { return {}; }
std::suspend_never final_suspend() noexcept { return {}; }
void return_void() {}
void unhandled_exception() {}
};
};
task resuming_on_new_thread(std::jthread& out) {
std::cout << "Coroutine started on thread: " << std::this_thread::get_id() << '\n';
co_await switch_to_new_thread(out);
// awaiter destroyed here
std::cout << "Coroutine resumed on thread: " << std::this_thread::get_id() << '\n';
}
int main() {
std::jthread out;
resuming_on_new_thread(out);
}
the coroutine starts on the main thread and switches to a newly created thread.
What is the right way to make it switch back to the main thread?
So the code below
task resuming_on_new_thread(std::jthread& out) {
std::cout << "Coroutine started on thread: " << std::this_thread::get_id() << '\n';
co_await switch_to_new_thread(out);
// awaiter destroyed here
std::cout << "Coroutine resumed on thread: " << std::this_thread::get_id() << '\n';
co_await switch_to_main_thread();
std::cout << "Coroutine resumed on thread: " << std::this_thread::get_id() << '\n';
}
would print
Coroutine started on thread: 139972277602112
New thread ID: 139972267284224
Coroutine resumed on thread: 139972267284224
Coroutine resumed on thread: 139972277602112
switch_to_new_thread actually creates a new thread, it doesn't switch to a new thread. It then injects code that resumes the coroutine in it.
To run code on a specific thread, you have to actually run code on that thread. To resume a coroutine, that specific thread has to run code that resume that coroutine.
Here you did it by creating a brand-new thread and injecting code that does a resume.
A traditional way to do stuff like this is with a message pump. The thread you want to participate has a message pump and a queue of events. It runs the events in order.
To make a specific thread run some code, you send a message to that queue of events with the instructions (maybe the actual code, maybe just a value) in it.
To this end, such an "event consuming thread" is more than a std::jthread or std::thread; it is a thread safe queue and some in the thread popping tasks off it an executing them.
In such a system, you'd move between threads by sending messages.
So you'd have a queue:
template<class T>
struct threadsafe_queue {
[[nodiscard]] std::optional<T> pop();
[[nodiscard]] std::deque<T> pop_many(std::optional<std::size_t> count = {}); // defaults to all
[[nodiscard]] bool push(T);
template<class C, class D>
[[nodiscard]] std::optional<T> wait_until_pop(std::chrono::time_point<C,D>);
void abort();
[[nodiscard]] bool is_aborted() const { return aborted; }
private:
mutable std::mutex m;
std::condition_variable cv;
std::deque<T> queue;
bool aborted = false;
auto lock() const { return std::unique_lock(m); }
};
of tasks:
using task_queue = threadsafe_queue<std::function<void()>>;
a basic message pump is:
void message_pump( task_queue& q ) {
while (auto f = q.pop()) {
if (*f) (*f)();
}
}
you'd then make two task_queues, one for your main thread and one for your worker thread. To switch to worker instead of creating a new jthread you'd:
workerq.push( [&]{ h.resume(); } );
and similarly to switch to the main
mainq.push( [&]{ h.resume(); } );
there are lots of details I have skipped over, but this is a sketch of how you'd do it.
One way to make this happen is to have a thread-safe queue that the coroutine places itself in to tell the main thread "please resume me now". At that point, you're basically building a thread pool. The main function has to watch that queue (poll it at regular intervals or wait for something to be placed in it), then fetch and execute an element (work item) once one is available.

Calling destroy() from final_suspend() results in a crash

I call h.destroy() in final_suspend to destroy the coroutine automatically when it finishes execution and then I resume awaiting coroutine (that awaits the task to complete). I found a question about this technique and an answer explaining why it should work.
As far as I can see, this technique really works, but not with MSVC 2022 that calls task destructor twice, see the code below:
#include <coroutine>
#include <optional>
#include <iostream>
#include <thread>
#include <chrono>
#include <queue>
#include <vector>
// simple timers
// stored timer tasks
struct timer_task
{
std::chrono::steady_clock::time_point target_time;
std::coroutine_handle<> handle;
};
// comparator
struct timer_task_before_cmp
{
bool operator()(const timer_task& left, const timer_task& right) const
{
return left.target_time > right.target_time;
}
};
std::priority_queue<timer_task, std::vector<timer_task>, timer_task_before_cmp> timers;
inline void submit_timer_task(std::coroutine_handle<> handle, std::chrono::nanoseconds timeout)
{
timers.push(timer_task{ std::chrono::steady_clock::now() + timeout, handle });
}
//template <bool owning>
struct UpdatePromise;
//template <bool owning>
struct UpdateTask
{
// declare promise type
using promise_type = UpdatePromise;
UpdateTask(std::coroutine_handle<promise_type> handle) :
handle(handle)
{
std::cout << "UpdateTask constructor." << std::endl;
}
UpdateTask(const UpdateTask&) = delete;
UpdateTask(UpdateTask&& other) : handle(other.handle)
{
std::cout << "UpdateTask move constructor." << std::endl;
}
UpdateTask& operator = (const UpdateTask&) = delete;
UpdateTask& operator = (const UpdateTask&& other)
{
handle = other.handle;
std::cout << "UpdateTask move assignment." << std::endl;
return *this;
}
~UpdateTask()
{
std::cout << "UpdateTask destructor." << std::endl;
}
std::coroutine_handle<promise_type> handle;
};
struct UpdatePromise
{
std::coroutine_handle<> awaiting_coroutine;
UpdateTask get_return_object();
std::suspend_never initial_suspend()
{
return {};
}
void unhandled_exception()
{
std::terminate();
}
auto final_suspend() noexcept
{
// if there is a coroutine that is awaiting on this coroutine resume it
struct transfer_awaitable
{
std::coroutine_handle<> awaiting_coroutine;
// always stop at final suspend
bool await_ready() noexcept
{
return false;
}
std::coroutine_handle<> await_suspend(std::coroutine_handle<UpdatePromise> h) noexcept
{
// resume awaiting coroutine or if there is no coroutine to resume return special coroutine that do
// nothing
std::coroutine_handle<> val = awaiting_coroutine ? awaiting_coroutine : std::noop_coroutine();
h.destroy();
return val;
}
void await_resume() noexcept {}
};
return transfer_awaitable{ awaiting_coroutine };
}
void return_void() {}
// use `co_await std::chrono::seconds{n}` to wait specified amount of time
auto await_transform(std::chrono::milliseconds d)
{
struct timer_awaitable
{
std::chrono::milliseconds m_d;
// always suspend
bool await_ready()
{
return m_d <= std::chrono::milliseconds(0);
}
// h is a handler for current coroutine which is suspended
void await_suspend(std::coroutine_handle<> h)
{
// submit suspended coroutine to be resumed after timeout
submit_timer_task(h, m_d);
}
void await_resume() {}
};
return timer_awaitable{ d };
}
// also we can await other UpdateTask<T>
auto await_transform(UpdateTask& update_task)
{
if (!update_task.handle)
{
throw std::runtime_error("coroutine without promise awaited");
}
if (update_task.handle.promise().awaiting_coroutine)
{
throw std::runtime_error("coroutine already awaited");
}
struct task_awaitable
{
std::coroutine_handle<UpdatePromise> handle;
// check if this UpdateTask already has value computed
bool await_ready()
{
return handle.done();
}
// h - is a handle to coroutine that calls co_await
// store coroutine handle to be resumed after computing UpdateTask value
void await_suspend(std::coroutine_handle<> h)
{
handle.promise().awaiting_coroutine = h;
}
// when ready return value to a consumer
auto await_resume()
{
}
};
return task_awaitable{ update_task.handle };
}
};
inline UpdateTask UpdatePromise::get_return_object()
{
return { std::coroutine_handle<UpdatePromise>::from_promise(*this) };
}
// timer loop
void loop()
{
while (!timers.empty())
{
auto& timer = timers.top();
// if it is time to run a coroutine
if (timer.target_time < std::chrono::steady_clock::now())
{
auto handle = timer.handle;
timers.pop();
handle.resume();
}
else
{
std::this_thread::sleep_until(timer.target_time);
}
}
}
// example
using namespace std::chrono_literals;
UpdateTask TestTimerAwait()
{
using namespace std::chrono_literals;
std::cout << "testTimerAwait started." << std::endl;
co_await 1s;
std::cout << "testTimerAwait finished." << std::endl;
}
UpdateTask TestNestedTimerAwait()
{
using namespace std::chrono_literals;
std::cout << "testNestedTimerAwait started." << std::endl;
auto task = TestTimerAwait();
co_await 2s;
//co_await task;
std::cout << "testNestedTimerAwait finished." << std::endl;
}
// main can't be a coroutine and usually need some sort of looper (io_service or timer loop in this example)
int main()
{
auto task = TestNestedTimerAwait();
// execute deferred coroutines
loop();
}
the output with MSVC 2022 is:
UpdateTask constructor.
testNestedTimerAwait started.
UpdateTask constructor.
testTimerAwait started.
testTimerAwait finished.
testNestedTimerAwait finished.
UpdateTask destructor.
UpdateTask destructor.
UpdateTask destructor.
but the output with GCC 11.1.0 is:
UpdateTask constructor.
testNestedTimerAwait started.
UpdateTask constructor.
testTimerAwait started.
testTimerAwait finished.
testNestedTimerAwait finished.
UpdateTask destructor.
UpdateTask destructor.
as you can see there is one extra destructor call with MSVC 2022, so the behaviour of the code generated with MSVC 2022 is undefined and it can potentially format your hard drive.
MSVC 2022 version: Microsoft (R) C/C++ Optimizing Compiler Version 19.30.30709 for x86
EDIT9:
Figured out what happens. The destructor of UpdateTask is called twice with MSVC 2022, see updated code.
EDIT10:
From docs: The coroutine is suspended (its coroutine state is populated with local variables and current suspension point).
awaiter.await_suspend(handle) is called, where handle is the coroutine handle representing the current coroutine. Inside that function, the suspended coroutine state is observable via that handle, and it's this function's responsibility to schedule it to resume on some executor, or to be destroyed (returning false counts as scheduling)
Looks like it was a compiler bug, that is probably fixed in Microsoft (R) C/C++ Optimizing Compiler Version 19.31.31106.2 for x86, at least now the output is:
UpdateTask constructor.
testNestedTimerAwait started.
UpdateTask constructor.
testTimerAwait started.
testTimerAwait finished.
testNestedTimerAwait finished.
UpdateTask destructor.
UpdateTask destructor.

c++ coroutines final_suspend for promise_type

below is a snippet testing empty coroutine playing with promise_type
#include <iostream>
#include <coroutine>
#define DEBUG std::cout << __PRETTY_FUNCTION__ << std::endl
struct TaskSuspendAll {
// must be of this name
struct promise_type {
TaskSuspendAll get_return_object() noexcept {
return TaskSuspendAll{
std::coroutine_handle<promise_type>::from_promise(*this)
};
}
std::suspend_always initial_suspend() noexcept {
DEBUG;
return {};
}
std::suspend_always final_suspend() noexcept {
DEBUG;
return {};
}
void unhandled_exception() {}
void return_void() {
DEBUG;
}
};
std::coroutine_handle<promise_type> ch;
};
TaskSuspendAll TestSuspendAll() {
DEBUG;
co_return;
}
int main() {
std::cout << std::endl;
auto t = TestSuspendAll();
t.ch.resume();
//t.ch.resume()
//t.ch.destroy();
return 0;
}
running this I get
std::__n4861::suspend_always TaskSuspendAll::promise_type::initial_suspend()
TaskSuspendAll TestSuspendAll()
void TaskSuspendAll::promise_type::return_void()
std::__n4861::suspend_always TaskSuspendAll::promise_type::final_suspend()
My understanding is that co_await is applied to initial_suspend and final_suspend. When I call TestSuspendAll in the main function it will eventually call co_await promise.initial_suspend() and return to the caller given i have std::suspend_always awaitable. Then i resume the coroutine and the body gets executed. At some point, we will have co_await promise.final_suspend() and again return to the caller.
question: I would expect that i have to do a second call to resume coroutine so that co_await promise.final_suspend() succeeded and coroutine completed. However that causes seg fault. I know that it's undefined behavior calling resume on completed coroutine, however it's not completed 100% as far as I understand. My expectation was that final_suspend behaves the same as initial_suspend... what is the logic here? is that we have to use destroy after call to final_suspend?
thanks a lot for clarification!
VK
Being suspended at its final suspend point is the definition of a coroutine being done. Literally; that's what coroutine_handle::done returns. Attempting to resume such a coroutine is UB.
So your expectation is not correct.

Custom std::thread that ends only on signal

I already asked this question in another post, but it came out poorly, so I want to rephrase it better.
I have to start a series of threads doing different tasks, that only have to return if an exit signal was sent, otherwise (if they incur in exceptions or anything else) they just restart their code from beginning.
To make my intent clear, here's some code:
class thread_wrapper
{
public:
template<typename _Callable, typename... _Args>
thread_wrapper();
void signal_exit() {exit_requested_ = true;}
void join() {th_.join();}
private:
std::thread th_;
bool exit_requested_{false};
void execute()
{
while(!exit_requested_)
{
try
{
// Do thread processing
}
catch (const std::exception& e)
{
std::cout << e.what() << std::endl;
}
}
return;
}
};
What I want to achieve, is to use this class as it was a normal std::thread, passing a function and its arguments when it is initialized, but then I want the inner std::thread to run the "execute" function, and only inside the try block I want it to run the behaviour passed in constructor.
How could I achieve this? Thanks in advance.
EDIT: I found a solution, but I am able to run only in c++ 17 (because of the template on lambda), and it is not really that elegant in my opinion.
template<typename Lambda>
class thread_wrapper
{
public:
explicit thread_wrapper(Lambda&& lambda) : lambda_{std::move(lambda)}, th_(&thread_wrapper::execute, this){};
void signal_exit() {exit_requested_ = true;}
void join() {th_.join();}
private:
std::thread th_;
bool exit_requested_{false};
Lambda lambda_;
void execute()
{
while(!exit_requested_)
{
try
{
lambda_();
}
catch (const std::exception& e)
{
std::cout << e.what() << std::endl;
}
}
return;
}
};
And here is a sample main:
class Foo
{
public:
void say_hello() { std::cout << "Hello!" << std::endl;}
};
int main()
{
Foo foo;
thread_wrapper th([&foo](){foo.say_hello(); std::this_thread::sleep_for(2s);});
std::this_thread::sleep_for(10s);
th.signal_exit();
th.join();
}
What do you think?
I'd say the solution you found is fine. You might want to avoid the thread_wrapper itself being a templated class and only template the constructor:
// no template
class thread_wrapper {
public:
template<typename Lambda, typename... Args>
explicit thread_wrapper(Lambda lambda, Args&&... args) {
:lambda_(std::bind(lambda, std::forward<Args>(args)...))
}
// ...
private:
std::function<void()> lambda_;
// ...
};
(I didn't try to compile this - small syntax errors etc are to be expected. It's more to show the concept)
Important: if you do call signal_exit, it will not abort the execution of lambda_. It will only exit once the lambda has returned/thrown.
Two little naming things to consider:
thread_wrapper is not a great name. It doesn't tell us anything about the purpose, or what it does different than a regular thread. Maybe robust_thread (to signify the automatic exception recovery) or something.
The method signal_exit could just be named exit. There is no reason to make the interface of this class specific to signals. You could use this class for any thread that should auto-restart until it is told to stop by some other part of the code.
Edit: One more thing I forgot, exit_requested_ must be either atomic or protected by a mutex to protect from undefined behavior. I'd suggest an std::atomic<bool>, that should be enough in your case.
I would use std::async and a condition variable construction for this.
I wrapped all the condition variable logic in one class so it can easily be reused.
More info on condition variables here : https://www.modernescpp.com/index.php/c-core-guidelines-be-aware-of-the-traps-of-condition-variables
Don't hesitate to ask for more information if you need it.
#include <chrono>
#include <future>
#include <condition_variable>
#include <mutex>
#include <iostream>
#include <thread>
//-----------------------------------------------------------------------------
// synchronization signal between two threads.
// by using a condition variable the waiting thread
// can even react with the "sleep" time of your example
class signal_t
{
public:
void set()
{
std::unique_lock<std::mutex> lock{m_mtx};
m_signalled = true;
// notify waiting threads that something worth waking up for has happened
m_cv.notify_all();
}
bool wait_for(const std::chrono::steady_clock::duration& duration)
{
std::unique_lock<std::mutex> lock{ m_mtx };
// condition variable wait is better then using sleep
// it can detect signal almost immediately
m_cv.wait_for(lock, duration, [this]
{
return m_signalled;
});
if ( m_signalled ) std::cout << "signal set detected\n";
return m_signalled;
}
private:
std::mutex m_mtx;
std::condition_variable m_cv;
bool m_signalled = false;
};
//-----------------------------------------------------------------------------
class Foo
{
public:
void say_hello() { std::cout << "Hello!" << std::endl; }
};
//-----------------------------------------------------------------------------
int main()
{
Foo foo;
signal_t stop_signal;
// no need to create a threadwrapper object
// all the logic fits within the lambda
// also std::async is a better abstraction then
// using std::thread. Through the future
// information on the asynchronous process can
// be fed back into the calling thread.
auto ft = std::async(std::launch::async, [&foo, &stop_signal]
{
while (!stop_signal.wait_for(std::chrono::seconds(2)))
{
foo.say_hello();
}
});
std::this_thread::sleep_for(std::chrono::seconds(10));
std::cout << "setting stop signal\n";
stop_signal.set();
std::cout << "stop signal set\n";
// synchronize with stopping of the asynchronous process.
ft.get();
std::cout << "async process stopped\n";
}

Should the coroutine result object ever be constructed after initial_suspend()?

I've been writing a coroutine library, and I've run into a peculiar problem. In some cases, the construction of the coroutine result object was sequenced after the call to initial_suspend.
Question: is this sequencing a bug on the part of the compiler?
Background
In my particular case, this was causing a crash, as a generator promise executed under the assumption that it had no owner.
The relevant section of the C++20 standard is section 9.5.4.7, which states:
The expression promise.get_return_object() is used to initialize the glvalue result or prvalue result object of a call to a coroutine. The call to get_return_object is sequenced before the call to initial_suspend and is invoked at most once.
When I read this portion of the standard, I originally interpreted it to mean that the initialization of the coroutine's result object is sequenced immediately after the call to promise.get_return_object(): when you initialize something, initialization occurs immediately after the computation of the arguments to the constructor, and not as a delayed effect.
Unfortunately, this is not the behavior I observed. The source of the crash is that the initialization of the result object was sequenced after the initial suspend, despite the call to promise.get_return_object() being sequenced before the initial suspend.
Observing this behavior in code (view live example here)
Let's write a very simple type that can be returned from a coroutine:
template <class Promise>
struct coroutine {
std::coroutine_handle<Promise> handle;
using promise_type = Promise;
coroutine(std::coroutine_handle<Promise> p) : handle(p) {
std::cout << " Running coroutine(std::coroutine_handle<Promise> p)\n";
}
~coroutine() {
if (handle) {
handle.destroy();
}
}
};
Because coroutine<Promise> can be constructed from a std::coroutine_handle<Promise>, the call to promise.get_return_object() can return either a std::coroutine_handle<Promise> which is used to construct the coroutine, or it can return a
coroutine<Promise> directly.
Let's write two different promise types, one for each option:
struct promise_base {
std::suspend_never initial_suspend() {
std::cout << " Running initial_suspend()\n";
return {};
}
std::suspend_always final_suspend() { return {}; }
void return_void() {}
void unhandled_exception() { std::terminate(); }
};
struct promise_A : promise_base {
using handle = std::coroutine_handle<promise_A>;
handle get_return_object() {
std::cout << " Running get_return_object()\n";
return handle::from_promise(*this);
}
};
struct promise_B : promise_base {
using handle = std::coroutine_handle<promise_B>;
coroutine<promise_B> get_return_object() {
std::cout << " Running get_return_object()\n";
return {handle::from_promise(*this)};
}
};
We can then write two identical coroutines based on promise_A and promise_B:
coroutine<promise_A> run_A() {
std::cout << "Inside coroutine body\n";
co_return;
}
coroutine<promise_B> run_B() {
std::cout << "Inside coroutine body\n";
co_return;
}
In this code,
promise_A returns a handle from get_return_object, and this is used to construct a result object of type coroutine<promise_A>.
promise_B directly returns a coroutine<promise_B>, which is returned as the result object.
In the case of promise_A, the construction of the coroutine<promise_A> result object is sequenced after the call to initial_suspend().
We can write the following test code to verify this:
void test_promise_A() {
std::cout << "------ Testing A ------\n";
run_A();
std::cout << "\n\n";
}
void test_promise_B() {
std::cout << "------ Testing B ------\n";
run_B();
std::cout << "\n\n";
}
int main() {
test_promise_A();
test_promise_B();
}
Under GCC 10.1, this produces the following output. Note the difference in sequencing between Testing A and Testing B.
------ Testing A ------
Running get_return_object()
Running initial_suspend()
Inside coroutine body
Running coroutine(std::coroutine_handle<Promise> p)
------ Testing B ------
Running get_return_object()
Running coroutine(std::coroutine_handle<Promise> p)
Running initial_suspend()
Inside coroutine body
To reiterate the question: Is the sequencing of Testing A a bug?