Based on the answer of Turning a function call which takes a callback into a coroutine I'm able to come up with a my version of a generic CallbackAwaiter class that I can inherent and wait for callbacks. However I can't figure out how to make it support symmetric transfer. Causing stack overflow in certain cases (mostly in complicated business logic). My code looks like this
template <typename T>
struct CallbackAwaiter
bool await_ready() noexcept
return false;
const T &await_resume() const noexcept(false)
assert(result_.has_value() == true || exception_ != nullptr);
if (exception_)
return result_.value();
optional<T> result_;
std::exception_ptr exception_{nullptr};
void setException(const std::exception_ptr &e)
exception_ = e;
void setValue(const T &v)
void setValue(T &&v)
// For example. I can inherent the class and fill in `await_suspend` to convert callbacks into coroutines.
struct SQLAwaiter : CallbackAwaiter<std::map<std::string, std::string>>
void await_suspend(std::coroutine_handle<> handle)
dbClient->runSQL(..., [handle](){
This works. But by calling handle.resume() manually I don't support symmetric transfer. Which stack overflows after deep corouting resume. So far I tried adding promise_type and using std::noop_coroutine to get symmetric transfer working. For example
std::noop_coroutine_handle await_suspend(std::coroutine_handle<> handle)
dbClient->runSQL(..., [handle](){
return std::noop_coroutine{};
// and
struct CallbackAwaiter
CallbackAwaiter() : coro_(std::noop_coroutine{}) {}
std::coroutine_handle<promise_type> coro_;
But obviously these wouldn't work. Returning noop_coroutine doesn't magically make handle.resume() not take up stack space. Nor does adding promise_type would work as there's no compiler generated coroutine.
I'm out of idea. How can I support symmetric transfer for such case?
In most cases, your existing code
void await_suspend(std::coroutine_handle<> handle)
dbClient->runSQL(..., [handle](){
should already be fine.
Depending on your implementation of dbClient->runSQL, the callback which calls handle.resume will be executed on a fresh stack, with only the stack frames from the IO multiplexer, and a couple of other dbClient-internal functions on it.
Symmetric transfer is only a concern if your runSQL function calls its callback synchronously. As long as runSQL always calls its callback asynchronously (i.e. on a "fresh" stack), then you are already fine
I have problems finding the right place for an actor and a timer used in a state machine.
I found some inspiration from this site about the state pattern:
State Design Pattern in Modern C++ and created a small example:
Simple door state machine
There might be more transitions possible but I kept it short and simple.
class Door
void open() {}
void close() {}
class EventOpenDoor
OpenDoor(Door* door) : m_pDoor(door) {}
Door* m_pDoor;
class EventOpenDoorTemporary
EventOpenDoorTemporary(Door* door) : m_pDoor(door) {}
Door* m_pDoor;
class EventOpenDoorTimeout
EventOpenDoorTimeout(Door* door) : m_pDoor(door) {}
Door* m_pDoor;
class EventCloseDoor
EventCloseDoor(Door* door) : m_pDoor(door) {}
Door* m_pDoor;
using Event = std::variant<EventOpenDoor,
class StateClosed {};
class StateOpen {};
class StateTemporaryOpen {};
using State = std::variant<StateClosed,
Transitions (not complete):
struct Transitions {
std::optional<State> operator()(StateClosed &s, const EventOpenDoor &e) {
if (e.m_pDoor)
auto newState = StateOpen{};
return newState;
std::optional<State> operator()(StateClosed &s, const EventOpenDoorTemporary &e) {
if (e.m_pDoor)
**// start timer here?**
auto newState = StateOpen{};
return newState;
std::optional<State> operator()(StateTemporaryOpen &s, const EventOpenDoorTimeout &e) {
if (e.m_pDoor)
auto newState = StateOpen{};
return newState;
std::optional<State> operator()(StateTemporaryOpen &s, const EventOpenDoor &e) {
if (e.m_pDoor)
**// stop active timer here?**
auto newState = StateOpen{};
return newState;
/* --- default ---------------- */
template <typename State_t, typename Event_t>
std::optional<State> operator()(State_t &s, const Event_t &e) const {
// "Unknown transition!";
return std::nullopt;
Door controller:
template <typename StateVariant, typename EventVariant, typename Transitions>
class DoorController {
StateVariant m_curr_state;
void dispatch(const EventVariant &Event)
std::optional<StateVariant> new_state = visit(Transitions{this}, m_curr_state, Event);
if (new_state)
m_curr_state = *move(new_state);
template <typename... Events>
void handle(Events... e)
{ (dispatch(e), ...); }
void setState(StateVariant s)
m_curr_state = s;
The events can be triggered by a client which holds an instance to the "DoorController"
In the events I pass a pointer to the door itself so it's available in the transitions. The door is operated within the transitons only.
I have problems now with modelling the 20s timeout/timer. Where to have such a timer, which triggers the transition to close the door?
Having a timer within the door instance means, I have a circular dependency, because in case of a timeout it needs to call "handle()" of the "door_controller".
I can break the circular dependency with a forward declarations.
But is there a better solution?
Maybe I have modelled it not well. I'm open to improving suggetions.
Thanks a lot!
This isn't going to be the best answer, but I have more questions than answers.
Some of your choices seem odd. I presume there's a complicated reason why you're storing state based on a variant rather than using an enum class State{}, for instance.
I also get nervous when I see raw pointers in modern C++. I'd feel a whole lot better with smart pointers.
When I've done state machines, the events I can handle always subclass from a common Event class -- or I might even just use a single class and give it as many distinct data fields are required for the things that I need to handle. It's a little odd that you use unrelated classes and depend on a dispatch method. Does that even work? Aren't you pushing objects onto an event queue? How do you end up calling that dispatch method with random objects?
You don't show your event loop, but maybe you have a state machine without an event loop. Is it a state machine then? Or maybe you didn't show it. Maybe you can have a state machine without an event loop, but I thought the two concepts were tied together.
class close_queue
class dispatcher
queue* q;
bool chained;
dispatcher(dispatcher const&)=delete;
dispatcher& operator=(dispatcher const&)=delete;
typename Dispatcher,
typename Msg,
typename Func>
friend class TemplateDispatcher;
void wait_and_dispatch()
auto msg=q->wait_and_pop();
bool dispatch(
std::shared_ptr<message_base> const& msg)
throw close_queue();
return false;
dispatcher(dispatcher&& other):
explicit dispatcher(queue* q_):
template<typename Message,typename Func>
handle(Func&& f)
return TemplateDispatcher<dispatcher,Message,Func>(
~dispatcher() noexcept(false)
class receiver
queue q;
sender operator()()
return sender(&q);
dispatcher wait()
return dispatcher(&q);
template<typename PreviousDispatcher,typename Msg,typename Func>
class TemplateDispatcher
queue* q;
PreviousDispatcher* prev;
Func f;
bool chained;
TemplateDispatcher(TemplateDispatcher const&)=delete;
TemplateDispatcher& operator=(TemplateDispatcher const&)=delete;
template<typename Dispatcher,typename OtherMsg,typename OtherFunc>
friend class TemplateDispatcher;
void wait_and_dispatch()
auto msg=q->wait_and_pop();
bool dispatch(std::shared_ptr<message_base> const& msg)
if(wrapped_message<Msg>* wrapper=
return true;
return prev->dispatch(msg);
TemplateDispatcher(TemplateDispatcher&& other):
TemplateDispatcher(queue* q_,PreviousDispatcher* prev_,Func&& f_):
template<typename OtherMsg,typename OtherFunc>
handle(OtherFunc&& of)
return TemplateDispatcher<
~TemplateDispatcher() noexcept(false)
class bank_machine
messaging::receiver incoming;
void run()
[&](verify_pin const& msg)
[&](withdraw const& msg)
[&](get_balance const& msg)
[&](withdrawal_processed const& msg)
[&](cancel_withdrawal const& msg)
The code above is a snippet from
C++ Concurrency in Action.
and I was wondering if someone can explain, what looks like, chained template instantiation inside bank_machine::run()? Why is it that we can we have a long chain of handle<some_type>( ...).handle<some_type>( ...).handle<some_type>( ...) . If you could point me to some resources and also correct any missuses of nomenclature I would appreciate it.
Why is it that we can we have a long chain of handle<some_type>( ...).handle<some_type>( ...).handle<some_type>( ...)?
For the same reason that you can chain any operator, e.g.
a + b + c + ...
works, so long as a + b returns an object that can be used as the left-hand side of operator+ with c as the right hand side.
In your example
must return an object that has a member access operator . that can be invoked on it, where the member itself can be a handle<some_other_type> that can then be invoked, and so on.
I studied this example code snippet as well and used it in some of my own projects so I wanted to understand it in depth. This my best bet:
Everytime you call
on the temporary dispatcher-object constructed by
a temporary object of type TemplateDispatcher is created. The magic happens in the destructor:
the last object to be created will be destroyed first and this will trigger the call to
This makes the current thread of execution which is executing this line of code wait for a message to arrive at the message queue (it will sleep, besides spurious wakeups, as the code for dequeueing messages involves condition_variables and associated mutexes). If a message arrives, the thread of execution will check if it is able to deal with the message type and else, if this is not the case, delegate the call to the previously chained TemplateDispatcher or dispatcher-object. When this call is resolved, the destruction of the temporary objects will conclude and due to the
the thread will continue to wait for incoming messages in the same manner, again, until a close_queue-message arrives at the queue and will trigger an exception thrown in the temporary dispatcher-object's code, exactly in:
The TemplateDispatcher-objects created by calls to
will not deal with close_queue-message objects and will therefore delegate any objects of this type to their predecessor in the call chain (dispatcher-class should be the only class that can deal with close_queue-objects) and it will finally be delegated to the dispatcher-object which will then trigger the exception.
The delegation of message objects that are not handled by a certain TemplateDispatcher-objects (identified by the template parameters on instantation in the handle-calls) are delegated to the previous TemplateDispatcher-object in the method
of class TemplateDispatcher. A dynamic cast is used to determine if the current TemplateDispatcher-object has to deal with the arrived message.
I have two functions foo and bar that should be mutually exclusive since they operate on the same data. However foo duplicates a lot of code from bar, so I would like to refactor foo to make a call to bar.
This is a problem because then I can't use a single mutex for both functions, because then foo would deadlock when it calls bar. So rather than "mutually exclusive" I only want "mutually exclusive from different threads".
Is there a pattern for implementing this? I'm using C++ and I'm okay with C++14/boost if I need something like shared_mutex.
Define a private "unlocked" function and use that from both foo and bar:
void bar_unlocked()
// assert that mx_ is locked
// real work
void bar()
std::lock_guard<std::mutex> lock(mx_);
void foo()
std::lock_guard<std::mutex> lock(mx_);
// stuff
// more stuff
another way - this has the advantage that you can prove that the lock has been taken:
void bar_impl(std::unique_lock<std::mutex> lock)
// real work
void bar()
void foo()
// stuff
// more stuff
std::mutex is not (mandated by the standard to be) moveable, but a std::unique_lock<std::mutex> is. For this reason, we can move a lock into a callee and return it back to a caller (if necessary).
This allows us to prove ownership of the lock at every stage of a call chain.
In addition, once the optimiser gets involved, it's likely that all the lock-moving will be optimised away. This gives us the best of both worlds - provable ownership and maximal performance.
A more complete example:
#include <mutex>
#include <cassert>
#include <functional>
struct actor
// public interface
// perform a simple synchronous action
void simple_action()
/// perform an action either now or asynchronously in the future
/// hander() is called when the action is complete
/// handler is a latch - i.e. it will be called exactly once
/// #pre an existing handler must not be pending
void complex_action(std::function<void()> handler)
impl_complex_action(take_lock(), std::move(handler));
// private external interface (for callbacks)
void my_callback()
auto lock = take_lock();
_condition_met = true;
// private interface
using mutex_type = std::mutex;
using lock_type = std::unique_lock<mutex_type>;
void impl_simple_action(const lock_type& lock)
// assert preconditions
// actions here
void impl_complex_action(lock_type my_lock, std::function<void()> handler)
_handler = std::move(handler);
if (_condition_met)
return impl_condition_met(std::move(my_lock));
else {
// initiate some action that will result in my_callback() being called
// some time later
void impl_condition_met(lock_type lock)
_condition_met = false;
auto copy = std::move(_handler);
// unlock here because the callback may call back into our public interface
auto take_lock() const -> lock_type
return lock_type(_mutex);
mutable mutex_type _mutex;
std::function<void()> _handler = {};
bool _condition_met = false;
void act(actor& a)
// other stuff...
// note: calling another public interface function of a
// during a handler initiated by a
// the unlock() in impl_condition_met() makes this safe.
I want a member std::future<void> to continuously call a function inside a loop until the parent object is destroyed.
My current solution involves wrapping the future in a class with a boolean flag and setting the flag to false on destruction.
class Wrapper
std::future<void> fut;
bool wrapperAlive{true};
Wrapper() : fut{std::async(std::launch::async, [this]
while(wrapperAlive) doSomething();
})} { }
wrapperAlive = false;
Is there a more idiomatic way of doing this?
This is a data-race free version of your code:
class Wrapper {
std::atomic<bool> wrapperAlive{true}; // construct flag first!
std::future<void> fut;
Wrapper() :
fut{std::async(std::launch::async, [this]
~Wrapper() {
wrapperAlive = false;
fut.get(); // block, so it sees wrapperAlive before it is destroyed.
the next thing I'd do is write:
template<class F>
struct repeat_async_t {
F f;
// ...
using repeat_async = repeat_async_t<std::function<void()>>;
template<class F>
repeat_async_t<std::decay_t<F>> make_repeat_async(F&&f){
return {std::forward<F>(f)};
which takes a task to repeat forever, and bundle it up in there, rather than mixing the flow logic with what is executed logic.
At this point, we will probably want to add in an abort method.
Finally, it is very rarely a good idea to busy-loop a thread. So we'd add in some kind of wait-for-more-data-to-consume system.
And it ends up looking a lot different than your code.
For example I've an EventGenerator class that call IEventHandler::onEvent for all registered event handlers:
class IEventHandler {
public: virtual void onEvent(...) = 0;
class EventGenerator {
std::vector<IEventHandler*> _handlers;
std::mutex _mutex; // [1]
void AddHandler(IEventHandler* handler) {
std::lock_guard<std::mutex> lck(_mutex); // [2]
void RemoveHanler(IEventHandler* handler) {
std::lock_guard<std::mutex> lck(_mutex); // [3]
// remove from "_handlers"
void threadMainTask() {
while(true) {
// Do some work ...
// Post event to all registered handlers
std::lock_guard<std::mutex> lck(_mutex); // [4]
for(auto& h : _handlers) { h->onEvent(...); )
// Do some work ...
The code should be thread safe in the following manner:
one thread is executing the EventGenerator::threadMainTask
many threads might access EventGenerator::AddHandler and EventGenerator::RemoveHandler APIs.
To support this, I have the following synchonization (see comment in the code):
[1] is the mutex that protects the vector _handlers from multiple thread access.
[2] and [3] are protect adding or removing handlers simultaneously.
[4] is preventing from changing the vector while the main thread is posting events.
This code works until... If for some reason, during the execution of IEventHandler::onEvent(...) the code is trying to call EventManager::RemoveHandler or EventManager::AddHandler. The result is runtime exception.
What is the best approach to handle registration of the event handlers and executing the event handler callback in the thread safe manner?
>> UPDATE <<
So based on the inputs, I've updated to the following design:
class IEventHandler {
public: virtual void onEvent(...) = 0;
class EventDelegate {
IEventHandler* _handler;
std::atomic<bool> _cancelled;
EventDelegate(IEventHandler* h) : _handler(h), _cancelled(false) {};
void Cancel() { _cancelled = true; }
void Invoke(...) { if (!_cancelled) _handler->onEvent(...); }
class EventGenerator {
std::vector<std::shared_ptr<EventDelegate>> _handlers;
std::mutex _mutex;
void AddHandler(std::shared_ptr<EventDelegate> handler) {
std::lock_guard<std::mutex> lck(_mutex);
void RemoveHanler(std::shared_ptr<EventDelegate> handler) {
std::lock_guard<std::mutex> lck(_mutex);
// remove from "_handlers"
void threadMainTask() {
while(true) {
// Do some work ...
std::vector<std::shared_ptr<EventDelegate>> handlers_copy;
std::lock_guard<std::mutex> lck(_mutex);
handlers_copy = _handlers;
for(auto& h : handlers_copy) { h->Invoke(...); )
// Do some work ...
As you can see, there is additional class EventDelegate that have two purposes:
hold the event callback
enable to cancel the callback
In the threadMainTask, I'm using a local copy of the std::vector<std::shared_ptr<EventDelegate>> and I'm releasing the lock before invoking the callbacks. This approach solves an issue when during the IEventHandler::onEvent(...) the EventGenerator::{AddHandler,RemoveHanler} is called.
Any thoughts about the new design?
Copy-on-Write vector implemented on atomic swap of shared_ptr's (in assumptions callback registration is occurring far less frequently than events the callbacks are notified about):
using callback_t = std::shared_ptr<std::function<void(event_t const&)> >;
using callbacks_t = std::shared_ptr<std::vector<callback_t> >;
callbacks_t callbacks_;
mutex_t mutex_; // a mutex of your choice
void register(callback_t cb)
// the mutex is to serialize concurrent callbacks registrations
// this is not always necessary, as depending on the application
// architecture, single writer may be enforced by design
scoped_lock lock(mutex_);
auto callbacks = atomic_load(&callbacks_);
auto new_callbacks = std::make_shared< std::vector<callback_t> >();
new_callbacks->reserve(callbacks->size() + 1);
*new_callbacks = callbacks;
atomic_store(&callbacks_, new_callbacks);
void invoke(event_t const& evt)
auto callbacks = atomic_load(&callbacks_);
// many people wrap each callback invocation into a try-catch
// and de-register on exception
for(auto& cb: *callbacks) (*cb)(evt);
Specifically on the subject of asynchronous behavior when callback is executed while being de-registered, well here the best approach to take is remember of the Separation of Concerns principle.
The callback should not be able to die until it has been executed. This is achieved via another classic trick called "extra level of indirection". Namely, instead of registering user provided callback one would wrap it to something like the below and callback de-registration apart from updating the vector will call the below defined discharge() method on the callback wrapper and will even notify the caller of de-registration method of whether the callback execution finished successfully.
template <class CB> struct cb_wrapper
mutable std::atomic<bool> done_;
CB cb_;
cb_wrapper(CB&& cb): cb(std::move(cb_)) {}
bool discharge()
bool not_done = false;
return done_.compare_exchange_strong(not_done, true);
void operator()(event_t const&)
if (discharge())
I can't see a right thing here. From your update I can see a problem: you are not synchronizing the invoke method with callback removal. There's an atomic but it's not enough. Example: just after this line of code:
if (!_cancelled)
Another thread calls the remove method. What can happen is that the onEvent() is called anyway, even if the removed method has removed the callback from the list and returned the result, there's nothing to keep synchronized this execution flow. Same problem for the answer of #bobah.