boost::asio Strictly sequential invocation of event handlers using strand

boost::asio Strictly sequential invocation of event handlers using strand - c++

I have a question regarding to the usage of strand in boost::asio framework.
The manuals refer the following
In the case of composed asynchronous operations, such as async_read()
or async_read_until(), if a completion handler goes through a strand,
then all intermediate handlers should also go through the same strand.
This is needed to ensure thread safe access for any objects that are
shared between the caller and the composed operation (in the case of
async_read() it's the socket, which the caller can close() to cancel
the operation). This is done by having hook functions for all
intermediate handlers which forward the calls to the customisable hook
associated with the final handler:
Let's say that we have the following example
Strand runs in a async read socket operation . Socket read the data and forwards them to a async writer socket. Two operation are in the same io_service. Is this write operation thread safe as well?Is is called implicity in the same strand? Or is it needed to explicitly call async_write in the strand
read_socket.async_read_some(my_buffer,
boost::asio::bind_executor(my_strand,
[](error_code ec, size_t length)
{
write_socket.async_write_some(boost::asio::buffer(data, size), handler);
}));
Is the async_write_some sequential executing in the following example or needs strand as well?

Yes, since you bound the completion handler to the strand executor (explicitly, as well), you know it will be invoked on the strand - which includes async_write_some.
Note you can also have an implicit default executor for the completion by constructing the socket on the strand:
tcp::socket read_socket { my_strand };
In that case you don't have to explicitly bind the handler to the strand:
read_socket.async_read_some( //
my_buffer, my_strand, [](error_code ec, size_t length) {
write_socket.async_write_some(asio::buffer(data, size), handler);
});
I prefer this style because it makes it much easier to write generic code which may or may not require strands.
Note that the quoted documentation has no relation to the question because none of the async operations are composed operations.

Related

Is the asio strand object thread safe?

I have to develop an asynchronous client that talks to a server. The client runs in a separate thread from the main application and just reads what the server sends using a callback chain. Each read handler registers the next one through a strand (it is a bit more complex since I use a class method as a callback so I need to bind *this to match the handler's signature):
_socketObject.async_read_some(
asio::buffer(_recv_buf.data(),_recv_buf.size()),
asio::bind_executor(_strand, std::bind(
&Connection::_handleRead, shared_from_this(),
std::placeholders::_1, std::placeholders::_2)));
To write to the server I'd like the main application to post (https://think-async.com/Asio/asio-1.16.1/doc/asio/reference/post/overload2.html) through the same strand a callback that performs the write to the server (this is to avoid concurrent access to the socket and some shared data).
The thing that I want to know is if it is sufficient to copy the strand object used in the client or it is necessary to keep a reference to the original. In the latter case I am concerned about the thread safety of the operation.
I'd like to avoid an explicit mutex on the strand object, if possible.
I use the header only version of the library (non-Boost).

Yes. See docs
Thread Safety
Distinct objects: Safe.
Shared objects: Safe.
Strands can be copied. In fact, you can create a new strand off another executor and if that was on a strand it will end up representing the same strand identity.
Additionally, a mutex on a strand couldn't possibly work because composed operations need to dispatch work on the thread, and they would not be aware of the need for locking.
In general locking is a no-no in async tasks: Strands: Use Threads Without Explicit Locking

Which thread async operations take place

Afte reading the asio's documentation, it's clear to me that the completion handlers are called by one of the threads that called the io_service's io.run() method. However, something that's is not clear to me is which thread the read/write async methods take place. Is it the thread that I call the methods or is it in one of the threads that called the io.run() method? Or, in last case, does the library create another thread behind the scenes and performs the operation?

The I/O operation will be attempted within the initiating async_* function. If either the operation's completion condition is satisfied or an error occurs, then the operation is complete and the completion handler will be posted into the io_service. Otherwise, the operation is not complete, and it will be enqueued into the io_service, where an application thread running the io_service's function poll(), poll_one(), run(), or run_one() performs the underlying I/O operation. In both cases, the completion handler is invoked by a thread processing the io_service.
The async_write() documentation notes that the asynchronous operation may be completed immediately:
Regardless of whether the asynchronous operation completes immediately or not, the handler will not be invoked from within this function. Invocation of the handler will be performed in a manner equivalent to using boost::asio::io_service::post().
This behavior is also noted in the Requirements on Asynchronous Operations documentation:
When an asynchronous operation is complete, the handler for the operation will be invoked as if by:
Constructing a bound completion handler bch for the handler ...
Calling ios.post(bch) to schedule the handler for deferred invocation ...
This implies that the handler must not be called directly from within the initiating function, even if the asynchronous operation completes immediately.
Here is a complete example demonstrating this behavior. In it, socket1 and socket2 are connected. Initially, socket2 has no data available. However, after invoking async_write(socket1, ...), socket2 has data even though the io_service has not been ran:
#include <boost/asio.hpp>
constexpr auto noop = [](auto&& ...){};
int main()
{
using boost::asio::ip::tcp;
boost::asio::io_service io_service;
// Create all I/O objects.
tcp::acceptor acceptor{io_service, {{}, 0}};
tcp::socket socket1{io_service};
tcp::socket socket2{io_service};
// Connect sockets.
acceptor.async_accept(socket1, noop);
socket2.async_connect(acceptor.local_endpoint(), noop);
io_service.run();
io_service.reset();
// Verify socket2 has no data.
assert(0 == socket2.available());
// Initiate an asynchronous write. However, do not run
// the `io_service`.
std::string data{"example"};
async_write(socket1, boost::asio::buffer(data), noop);
// Verify socket2 has data.
assert(0 < socket2.available());
}

For instance, you want to send some data to a remote-partner - asynchronous.
boost::asio::async_write(_socket, boost::asio::buffer(msg.data(), msg.size()),
std::bind(&Socket::WriteHandlerInternal, this->shared_from_this(), std::placeholders::_1, std::placeholders::_2));
//Where 'this' is the class Socket
Before that, you probably have created a thread which called ioService.run(). The async_write function will take the same ioService you have used to create your socket. It puts it into the queue of your ioService to execute the write operation and the handler - on the thread your ioService runs on, as the async_ already suggests.

How strands guarantee correct execution of pending events in boost.asio

Consider an echo server implemented using Boost.asio. Read events from connected clients result in blocks of data being placed on to an arrival event queue. A pool of threads works through these events - for each event, a thread takes the data in the event and echos it back to the connected client.
As shown in the diagram above, there could be multiple events in the event queue all from a single client. In order to ensure that these events for a given client are executed and delivered in order, strands are used. In this case, all events from a given connected client with be executed in a strand for the client.
My question is: how do strands guarantee the correct order of processing of events? I presume there must be some kind of lock-per-strand, but even that won't be sufficient, so there must be more to it, and I was hoping someone could perhaps explain it our point me to some code which does this?
I found this document:
How strands work and why you should use them
It sheds some light on the mechanism, but says that in a strand "Handler execution order is not guaranteed". Does that mean that we could end up with receiving back "Strawberry forever. fields"?
Also - whenever a new client connects, do we have to create a new strand, so that there is one strand per client?
Finally - when a read event arrives, how do we know which strand to add it to? The strand has to be looked up form all strands using the connection as a key?

strand provides a guarantee for non-concurrency and the invocation order of handlers; strand does not control the order in which operations are executed and demultiplexed. Use a strand if you have either:
multiple threads accessing a shared object that is not thread safe
a need for a guaranteed sequential ordering of handlers
The io_service will provide the desired and expected ordering of buffers being filled or used in the order in which operations are initiated. For instance, if the socket has "Strawberry fields forever." available to be read, then given:
buffer1.resize(11); // buffer is a std::vector managed elsewhere
buffer2.resize(7); // buffer is a std::vector managed elsewhere
buffer3.resize(8); // buffer is a std::vector managed elsewhere
socket.async_read_some(boost::asio::buffer(buffer1), handler1);
socket.async_read_some(boost::asio::buffer(buffer2), handler2);
socket.async_read_some(boost::asio::buffer(buffer3), handler3);
When the operations complete:
handler1 is invoked, buffer1 will contain "Strawberry "
handler2 is invoked, buffer2 will contain "fields "
handler3 is invoked, buffer3 will contain "forever."
However, the order in which the completion handlers are invoked is unspecified. This unspecified ordering remains true even with a strand.
Operation Demultiplexing
Asio uses the Proactor design pattern[1] to demultiplex operations. On most platforms, this is implemented in terms of a Reactor. The official documentation mentions the components and their responsibilities. Consider the following example:
socket.async_read_some(buffer, handler);
The caller is the initiator, starting an async_read_some asynchronous operation and creating the handler completion handler. The asynchronous operation is executed by the StreamSocketService operation processor:
Within the initiating function, if the socket has no other outstanding asynchronous read operations and data is available, then StreamSocketService will read from the socket and enqueue the handler completion handler into the io_service
Otherwise, the read operation is queued onto the socket, and the reactor is informed to notify Asio once data becomes available on the socket. When the io_service is ran and data is available on the socket, then the reactor will inform Asio. Next, Asio will dequeue an outstanding read operation from the socket, execute it, and enqueue the handler completion handler into the io_service
The io_service proactor will dequeue a completion handler, demultiplex the handler to threads that are running the io_service, from which the handler completion handler will be executed. The order of invocation of the completion handlers is unspecified.
Multiple Operations
If multiple operations of the same type are initiated on a socket, it is currently unspecified as to the order in which the buffers will be used or filled. However, in the current implementation, each socket uses a FIFO queue for each type of pending operation (e.g. a queue for read operations; a queue for write operations; etc). The networking-ts draft, which is based partially on Asio, specifies:
the buffers are filled in the order in which these operations were issued. The order of invocation of the completion handlers for these operations is unspecified.
Given:
socket.async_read_some(buffer1, handler1); // op1
socket.async_read_some(buffer2, handler2); // op2
As op1 was initiated before op2, then buffer1 is guaranteed to contain data that was received earlier in the stream than the data contained in buffer2, but handler2 may be invoked before handler1.
Composed Operations
Composed operations are composed of zero or more intermediate operations. For example, the async_read() composed asynchronous operation is composed of zero or more intermediate stream.async_read_some() operations.
The current implementation uses operation chaining to create a continuation, where a single async_read_some() operation is initiated, and within its internal completion handle, it determines whether or not to initiate another async_read_some() operation or to invoke the user's completion handler. Because of the continuation, the async_read documentation requires that no other reads occur until the composed operation completes:
The program must ensure that the stream performs no other read operations (such as async_read, the stream's async_read_some function, or any other composed operations that perform reads) until this operation completes.
If a program violates this requirement, one may observe interwoven data, because of the aforementioned order in which buffers are filled.
For a concrete example, consider the case where an async_read() operation is initiated to read 26 bytes of data from a socket:
buffer.resize(26); // buffer is a std::vector managed elsewhere
boost::asio::async_read(socket, boost::asio::buffer(buffer), handler);
If the socket receives "Strawberry ", "fields ", and then "forever.", then the async_read() operation may be composed of one or more socket.async_read_some() operations. For instance, it could be composed of 3 intermediate operations:
The first async_read_some() operation reads 11 bytes containing "Strawberry " into the buffer starting at an offset of 0. The completion condition of reading 26 bytes has not been satisfied, so another async_read_some() operation is initiated to continue the operation
The second async_read_some() operation reads 7 byes containing "fields " into the buffer starting at an offset of 11. The completion condition of reading 26 bytes has not been satisfied, so another async_read_some() operation is initiated to continue the operation
The third async_read_some() operation reads 8 byes containing "forever." into the buffer starting at an offset of 18. The completion condition of reading 26 bytes has been satisfied, so handler is enqueued into the io_service
When the handler completion handler is invoked, buffer contains "Strawberry fields forever."
Strand
strand is used to provide serialized execution of handlers in a guaranteed order. Given:
a strand object s
a function object f1 that is added to strand s via s.post(), or s.dispatch() when s.running_in_this_thread() == false
a function object f2 that is added to strand s via s.post(), or s.dispatch() when s.running_in_this_thread() == false
then the strand provides a guarantee of ordering and non-concurrency, such that f1 and f2 will not be invoked concurrently. Furthermore, if the addition of f1 happens before the addition of f2, then f1 will be invoked before f2.
With:
auto wrapped_handler1 = strand.wrap(handler1);
auto wrapped_handler2 = strand.wrap(handler2);
socket.async_read_some(buffer1, wrapped_handler1); // op1
socket.async_read_some(buffer2, wrapped_handler2); // op2
As op1 was initiated before op2, then buffer1 is guaranteed to contain data that was received earlier in the stream than the data contained in buffer2, but the order in which the wrapped_handler1 and wrapped_handler2 will be invoked is unspecified. The strand guarantees that:
handler1 and handler2 will not be invoked concurrently
if wrapped_handler1 is invoked before wrapped_handler2, then handler1 will be invoked before handler2
if wrapped_handler2 is invoked before wrapped_handler1, then handler2 will be invoked before handler1
Similar to the composed operation implementation, the strand implementation uses operation chaining to create a continuation. The strand manages all handlers posted to it in a FIFO queue. When the queue is empty and a handler is posted to the strand, then the strand will post an internal handle to the io_service. Within the internal handler, a handler will be dequeued from the strand's FIFO queue, executed, and then if the queue is not empty, the internal handler posts itself back to the io_service.
Consider reading this answer to find out how a composed operation uses asio_handler_invoke() to wrap intermediate handlers within the same context (i.e. strand) of the completion handler. The implementation details can be found in the comments on this question.
1. [POSA2] D. Schmidt et al, Pattern Oriented Software Architecture, Volume 2. Wiley, 2000.

A strand is an execution context which executes handlers within a critical section, on a correct thread.
That critical section is implemented (more or less) with a mutex.
It's a little cleverer than that because if a dispatcher detects that a thread is already in the strand, it appends the handler to a queue of handlers to be executed before the critical section has been left, but after the current handler has completed.
thus in this case the new handler is 'sort of' posted to the currently executing thread.
There are some guarantees in ordering.
strand::post/dispatch(x);
strand::post/dispatch(y);
will always result in x happening before y.
but if x dispatches a handler z during its execution, then the execution order will be:
x, z, y
note that the idiomatic way to handle io completion handlers with strands is not to post work to a strand in the completion handler, but to wrap the completion handler in the strand, and do the work there.
asio contains code to detect this and will do the right thing, ensuring correct ordering and eliding un-necessary intermediate posts.
e.g.:
async_read(sock, mystrand.wrap([](const auto& ec, auto transferred)
{
// this code happens in the correct strand, in the correct order.
});

When do I have to use boost::asio:strand

Reading the document of boost::asio, it is still not clear when I need to use asio::strand. Suppose that I have one thread using io_service is it then safe to write on a socket as follows ?
void Connection::write(boost::shared_ptr<string> msg)
{
_io_service.post(boost::bind(&Connection::_do_write,this,msg));
}
void Connection::_do_write(boost::shared_ptr<string> msg)
{
if(_write_in_progress)
{
_msg_queue.push_back(msg);
}
else
{
_write_in_progress=true;
boost::asio::async_write(_socket, boost::asio::buffer(*(msg.get())),
boost::bind(&Connection::_handle_write,this,
boost::asio::placeholders::error));
}
}
void Connection::_handle_write(boost::system::error_code const &error)
{
if(!error)
{
if(!_msg_queue.empty())
{
boost::shared_ptr<string> msg=_msg_queue.front();
_msg_queue.pop_front();
boost::asio::async_write(_socket, boost::asio::buffer(*(msg.get())),
boost::bind(&Connection::_handle_write,this,
boost::asio::placeholders::error));
}
else
{
_write_in_progress=false;
}
}
}
Where multiple threads calls Connection::write(..) or do I have to use asio::strand ?

Short answer: no, you don't have to use a strand in this case.
Broadly simplificated, an io_service contains a list of function objects (handlers). Handlers are put into the list when post() is called on the service. e.g. whenever an asynchronous operation completes, the handler and its arguments are put into the list. io_service::run() executes one handler after another. So if there is only one thread calling run() like in your case, there are no synchronisation problems and no strands are needed.
Only if multiple threads call run() on the same io_service, multiple handlers will be executed at the same time, in N threads up to N concurrent handlers. If that is a problem, e.g. if there might be two handlers in the queue at the same time that access the same object, you need the strand.
You can see the strand as a kind of lock for a group of handlers. If a thread executes a handler associated to a strand, that strand gets locked, and it gets released after the handler is done. Any other thread can execute only handlers that are not associated to a locked strand.
Caution: this explanation may be over-simplified and technically not accurate, but it gives a basic concept of what happens in the io_service and of the strands.

Calling io_service::run() from only one thread will cause all event handlers to execute within the thread, regardless of how many threads are invoking Connection::write(...). Therefore, with no possible concurrent execution of handlers, it is safe. The documentation refers to this as an implicit strand.
On the other hand, if multiple threads are invoking io_service::run(), then a strand would become necessary. This answer covers strands in much more detail.

concurrent async_write. is there a wait-free solution?

async_write() is forbidden to be called concurrently from different threads. It sends data by chunks using async_write_some and such chunks can be interleaved. So it is up to the user to take care of not calling async_write() concurrently.
Is there a nicer solution than this pseudocode?
void send(shared_ptr<char> p) {
boost::mutex::scoped_lock lock(m_write_mutex);
async_write(p, handler);
}
I do not like the idea to block other threads for a quite long time (there are ~50Mb sends in my application).
May be something like that would work?
void handler(const boost::system::error_code& e) {
if(!e) {
bool empty = lockfree_pop_front(m_queue);
if(!empty) {
shared_ptr<char> p = lockfree_queue_get_first(m_queue);
async_write(p, handler);
}
}
}
void send(shared_ptr<char> p) {
bool q_was_empty = lockfree_queue_push_back(m_queue, p)
if(q_was_empty)
async_write(p, handler);
}
I'd prefer to find a ready-to-use cookbook recipe. Dealing with lock-free is not easy, a lot of subtle bugs can appear.

async_write() is forbidden to be
called concurrently from different
threads
This statement is not quite correct. Applications can freely invoke async_write concurrently, as long as they are on different socket objects.
Is there a nicer solution than this
pseudocode?
void send(shared_ptr<char> p) {
boost::mutex::scoped_lock lock(m_write_mutex);
async_write(p, handler);
}
This likely isn't accomplishing what you intend since async_write returns immediately. If you intend the mutex to be locked for the entire duration of the write operation, you will need to keep the scoped_lock in scope until the completion handler is invoked.
There are nicer solutions for this problem, the library has built-in support using the concept of a strand. It fits this scenario nicely.
A strand is defined as a strictly
sequential invocation of event
handlers (i.e. no concurrent
invocation). Use of strands allows
execution of code in a multithreaded
program without the need for explicit
locking (e.g. using mutexes).
Using an explicit strand here will ensure your handlers are only invoked by a single thread that has invoked io_service::run(). With your example, the m_queue member would be protected by a strand, ensuring atomic access to the outgoing message queue. After adding an entry to the queue, if the size is 1, it means no outstanding async_write operation is in progress and the application can initiate one wrapped through the strand. If the queue size is greater than 1, the application should wait for the async_write to complete. In the async_write completion handler, pop off an entry from the queue and handle any errors as necessary. If the queue is not empty, the completion handler should initiate another async_write from the front of the queue.
This is a much cleaner design that sprinkling mutexes in your classes since it uses the built-in Asio constructs as they are intended. This other answer I wrote has some code implementing this design.

We've solved this problem by having a seperate queue of data to be written held in our socket object. When the first piece of data to be written is "queued", we start an async_write(). In our async_write's completion handler, we start subsequent async_write operations if there is still data to be transmitted.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js