concurrent async_write. is there a wait-free solution?

concurrent async_write. is there a wait-free solution? - c++

async_write() is forbidden to be called concurrently from different threads. It sends data by chunks using async_write_some and such chunks can be interleaved. So it is up to the user to take care of not calling async_write() concurrently.
Is there a nicer solution than this pseudocode?
void send(shared_ptr<char> p) {
boost::mutex::scoped_lock lock(m_write_mutex);
async_write(p, handler);
}
I do not like the idea to block other threads for a quite long time (there are ~50Mb sends in my application).
May be something like that would work?
void handler(const boost::system::error_code& e) {
if(!e) {
bool empty = lockfree_pop_front(m_queue);
if(!empty) {
shared_ptr<char> p = lockfree_queue_get_first(m_queue);
async_write(p, handler);
}
}
}
void send(shared_ptr<char> p) {
bool q_was_empty = lockfree_queue_push_back(m_queue, p)
if(q_was_empty)
async_write(p, handler);
}
I'd prefer to find a ready-to-use cookbook recipe. Dealing with lock-free is not easy, a lot of subtle bugs can appear.

async_write() is forbidden to be
called concurrently from different
threads
This statement is not quite correct. Applications can freely invoke async_write concurrently, as long as they are on different socket objects.
Is there a nicer solution than this
pseudocode?
void send(shared_ptr<char> p) {
boost::mutex::scoped_lock lock(m_write_mutex);
async_write(p, handler);
}
This likely isn't accomplishing what you intend since async_write returns immediately. If you intend the mutex to be locked for the entire duration of the write operation, you will need to keep the scoped_lock in scope until the completion handler is invoked.
There are nicer solutions for this problem, the library has built-in support using the concept of a strand. It fits this scenario nicely.
A strand is defined as a strictly
sequential invocation of event
handlers (i.e. no concurrent
invocation). Use of strands allows
execution of code in a multithreaded
program without the need for explicit
locking (e.g. using mutexes).
Using an explicit strand here will ensure your handlers are only invoked by a single thread that has invoked io_service::run(). With your example, the m_queue member would be protected by a strand, ensuring atomic access to the outgoing message queue. After adding an entry to the queue, if the size is 1, it means no outstanding async_write operation is in progress and the application can initiate one wrapped through the strand. If the queue size is greater than 1, the application should wait for the async_write to complete. In the async_write completion handler, pop off an entry from the queue and handle any errors as necessary. If the queue is not empty, the completion handler should initiate another async_write from the front of the queue.
This is a much cleaner design that sprinkling mutexes in your classes since it uses the built-in Asio constructs as they are intended. This other answer I wrote has some code implementing this design.

We've solved this problem by having a seperate queue of data to be written held in our socket object. When the first piece of data to be written is "queued", we start an async_write(). In our async_write's completion handler, we start subsequent async_write operations if there is still data to be transmitted.

Related

boost::asio Strictly sequential invocation of event handlers using strand

I have a question regarding to the usage of strand in boost::asio framework.
The manuals refer the following
In the case of composed asynchronous operations, such as async_read()
or async_read_until(), if a completion handler goes through a strand,
then all intermediate handlers should also go through the same strand.
This is needed to ensure thread safe access for any objects that are
shared between the caller and the composed operation (in the case of
async_read() it's the socket, which the caller can close() to cancel
the operation). This is done by having hook functions for all
intermediate handlers which forward the calls to the customisable hook
associated with the final handler:
Let's say that we have the following example
Strand runs in a async read socket operation . Socket read the data and forwards them to a async writer socket. Two operation are in the same io_service. Is this write operation thread safe as well?Is is called implicity in the same strand? Or is it needed to explicitly call async_write in the strand
read_socket.async_read_some(my_buffer,
boost::asio::bind_executor(my_strand,
[](error_code ec, size_t length)
{
write_socket.async_write_some(boost::asio::buffer(data, size), handler);
}));
Is the async_write_some sequential executing in the following example or needs strand as well?

Yes, since you bound the completion handler to the strand executor (explicitly, as well), you know it will be invoked on the strand - which includes async_write_some.
Note you can also have an implicit default executor for the completion by constructing the socket on the strand:
tcp::socket read_socket { my_strand };
In that case you don't have to explicitly bind the handler to the strand:
read_socket.async_read_some( //
my_buffer, my_strand, [](error_code ec, size_t length) {
write_socket.async_write_some(asio::buffer(data, size), handler);
});
I prefer this style because it makes it much easier to write generic code which may or may not require strands.
Note that the quoted documentation has no relation to the question because none of the async operations are composed operations.

Can two threads share boost::asio tcp socket for exclusively reading and writing? [duplicate]

( This is a simplified version of my original question )
I have several threads that write to a boost asio socket. This seems to work very well, with no problems.
The documentation says a shared socket is not thread safe( here, way down at the bottom ) so I am wondering if I should protect the socket with mutex, or something.
This question insists that protection is necessary, but gives no advice on how to do so.
All the answers to my original question also insisted that what I was doing dangerous, and most urged me to replace my writes with async_writes or even more complicated things. However, I am reluctant to do this, since it would complicate code that is already working and none of the answerers convinced me they knew what they ware talking about - they seemed to have read the same documentation as I and were guessing, just as I was.
So, I wrote a simple program to stress test writing to a shared socket from two threads.
Here is the server, which simply writes out whatever it receives from the client
int main()
{
boost::asio::io_service io_service;
tcp::acceptor acceptor(io_service, tcp::endpoint(tcp::v4(), 3001));
tcp::socket socket(io_service);
acceptor.accept(socket);
for (;;)
{
char mybuffer[1256];
int len = socket.read_some(boost::asio::buffer(mybuffer,1256));
mybuffer[len] = '\0';
std::cout << mybuffer;
std::cout.flush();
}
return 0;
}
Here is the client, which creates two threads that write to a shared socket as fast as they can
boost::asio::ip::tcp::socket * psocket;
void speaker1()
{
string msg("speaker1: hello, server, how are you running?\n");
for( int k = 0; k < 1000; k++ ) {
boost::asio::write(
*psocket,boost::asio::buffer(msg,msg.length()));
}
}
void speaker2()
{
string msg("speaker2: hello, server, how are you running?\n");
for( int k = 0; k < 1000; k++ ) {
boost::asio::write(
*psocket,boost::asio::buffer(msg,msg.length()));
}
}
int main(int argc, char* argv[])
{
boost::asio::io_service io_service;
// connect to server
tcp::resolver resolver(io_service);
tcp::resolver::query query("localhost", "3001");
tcp::resolver::iterator endpoint_iterator = resolver.resolve(query);
tcp::resolver::iterator end;
psocket = new tcp::socket(io_service);
boost::system::error_code error = boost::asio::error::host_not_found;
while (error && endpoint_iterator != end)
{
psocket->close();
psocket->connect(*endpoint_iterator++, error);
}
boost::thread t1( speaker1 );
boost::thread t2( speaker2 );
Sleep(50000);
}
This works! Perfectly, as far as I can tell. The client does not crash. The messages arrive at the server without garbles. They usually arrive alternately, one from each thread. Sometimes one thread get two or three messages in before the other, but I do not think this is a problem so long as there are no garbles and all the messages arrive.
My conclusion: the socket may not be thread safe in some theoretical sense, but it is so hard to make it fail that I am not going to worry about it.

After restudying the code for async_write I am now convinced that any write operation is thread safe if and only if the packet size is smaller than
default_max_transfer_size = 65536;
What happens is that as soon as an async_write is called an async_write_some is called in the same thread. Any threads in the pool calling some form of io_service::run will keep on calling async_write_some for that write operation until it completes.
These async_write_some calls can and will interleave if it has to be called more than once (the packets are larger than 65536).
ASIO does not queue writes to a socket as you would expect, one finishing after the other. In order to ensure both thread and interleave safe writes consider the following piece of code:
void my_connection::async_serialized_write(
boost::shared_ptr<transmission> outpacket) {
m_tx_mutex.lock();
bool in_progress = !m_pending_transmissions.empty();
m_pending_transmissions.push(outpacket);
if (!in_progress) {
if (m_pending_transmissions.front()->scatter_buffers.size() > 0) {
boost::asio::async_write(m_socket,
m_pending_transmissions.front()->scatter_buffers,
boost::asio::transfer_all(),
boost::bind(&my_connection::handle_async_serialized_write,
shared_from_this(),
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred));
} else { // Send single buffer
boost::asio::async_write(m_socket,
boost::asio::buffer(
m_pending_transmissions.front()->buffer_references.front(), m_pending_transmissions.front()->num_bytes_left),
boost::asio::transfer_all(),
boost::bind(
&my_connection::handle_async_serialized_write,
shared_from_this(),
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred));
}
}
m_tx_mutex.unlock();
}
void my_connection::handle_async_serialized_write(
const boost::system::error_code& e, size_t bytes_transferred) {
if (!e) {
boost::shared_ptr<transmission> transmission;
m_tx_mutex.lock();
transmission = m_pending_transmissions.front();
m_pending_transmissions.pop();
if (!m_pending_transmissions.empty()) {
if (m_pending_transmissions.front()->scatter_buffers.size() > 0) {
boost::asio::async_write(m_socket,
m_pending_transmissions.front()->scatter_buffers,
boost::asio::transfer_exactly(
m_pending_transmissions.front()->num_bytes_left),
boost::bind(
&chreosis_connection::handle_async_serialized_write,
shared_from_this(),
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred));
} else { // Send single buffer
boost::asio::async_write(m_socket,
boost::asio::buffer(
m_pending_transmissions.front()->buffer_references.front(),
m_pending_transmissions.front()->num_bytes_left),
boost::asio::transfer_all(),
boost::bind(
&my_connection::handle_async_serialized_write,
shared_from_this(),
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred));
}
}
m_tx_mutex.unlock();
transmission->handler(e, bytes_transferred, transmission);
} else {
MYLOG_ERROR(
m_connection_oid.toString() << " " << "handle_async_serialized_write: " << e.message());
stop(connection_stop_reasons::stop_async_handler_error);
}
}
This basically makes a queue for sending one packet at a time. async_write is called only after the first write succeeds which then calls the original handler for the first write.
It would have been easier if asio made write queues automatic per socket/stream.

Use a boost::asio::io_service::strand for asynchronous handlers that are not thread safe.
A strand is defined as a strictly sequential invocation of event
handlers (i.e. no concurrent invocation). Use of strands allows
execution of code in a multithreaded program without the need for
explicit locking (e.g. using mutexes).
The timer tutorial is probably the easiest way to wrap your head around strands.

It sounds like this question boils down to:
what happens when async_write_some() is called simultaneously on a single socket from two different threads
I believe this is exactly the operation that's not thread safe. The order those buffers will go out on the wire is undefined, and they may even be interleaved. Especially if you use the convenience function async_write(), since it's implemented as a series of calls to async_write_some() underneath, until the whole buffer has been sent. In this case each fragment that's sent from the two threads may be interleaved randomly.
The only way to protect you from hitting this case is to build your program to avoid situations like this.
One way to do that is by writing an application layer send buffer which a single thread is responsible for pushing onto the socket. That way you could protect the send buffer itself only. Keep in mind though that a simple std::vector won't work, since adding bytes to the end may end up re-allocating it, possibly while there is an outstanding async_write_some() referencing it. Instead, it's probably a good idea to use a linked list of buffers, and make use of the scatter/gather feature of asio.

The key to understanding ASIO is to realize that completion handlers only run in the context of a thread that has called io_service.run() no matter which thread called the asynchronous method. If you've only called io_service.run() in one thread then all completion handlers will execute serially in the context of that thread. If you've called io_service.run() in more than one thread then completion handlers will execute in the context of one of those threads. You can think of this as a thread pool where the threads in the pool are those threads that have called io_service.run() on the same io_service object.
If you have multiple threads call io_service.run() then you can force completion handlers to be serialized by putting them in a strand.
To answer the last part of your question, you should call boost::async_write(). This will dispatch the write operation onto a thread that has called io_service.run() and will invoked the completion handler when the write is done. If you need to serialize this operation then it's a little more complicated and you should read the documentation on strands here.

Consider first that the socket is a stream and is not internally guarded against concurrent read and/or write. There are three distinct considerations.
Concurrent execution of functions that access the same socket.
Concurrent execution of delegates that enclose the same socket.
Interleaved execution of delegates that write to the same socket.
The chat example is asynchronous but not concurrent. The io_service is run from a single thread, making all chat client operations non-concurrent. In other words, it avoids all of these problems. Even the async_write must internally complete sending all parts of a message before any other work can proceed, avoiding the interleaving problem.
Handlers are invoked only by a thread that is currently calling any overload of run(), run_one(), poll() or poll_one() for the io_service.
By posting work to the single thread io_service other threads can safely avoid both concurrency and blocking by queuing up work in the io_service. If however your scenario precludes you from buffering all work for a given socket, things get more complicated. You may need to block the socket communication (but not threads) as opposed to queuing up work indefinately. Also, the work queue can be very difficult to manage as it's entirely opaque.
If your io_service runs more than one thread you can still easily avoid the above problems, but you can only invoke reads or writes from the handlers of other reads or writes (and at startup). This sequences all access to the socket while remaining non-blocking. The safety arises from the fact that the pattern is using only one thread at any given time. But posting work from an independent thread is problematic - even if you don't mind buffering it.
A strand is an asio class that posts work to an io_service in a way that ensures non-concurrent invocation. However using a strand to invoke async_read and/or async_write solves only the first of the three problems. These functions internally post work to the io_service of the socket. If that service is running multiple threads the work can be exectuted concurrently.
So how do you, for a given socket, safely invoke async_read and/or async_write concurrently?
With concurrent callers the first problem can be resolved with a mutex or a strand, using the former if you don't want to buffer the work and the latter if you do. This protects the socket during the function invocations but does nothing for the other problems.
The second problem seems hardest, because it's difficult to see what's going on inside of the code executing asynchronously from the two functions. The async functions both post work to the io_service of the socket.
From the boost socket source:
/**
* This constructor creates a stream socket without opening it. The socket
* needs to be opened and then connected or accepted before data can be sent
* or received on it.
*
* #param io_service The io_service object that the stream socket will use to
* dispatch handlers for any asynchronous operations performed on the socket.
*/
explicit basic_stream_socket(boost::asio::io_service& io_service)
: basic_socket<Protocol, StreamSocketService>(io_service)
{
}
And from the io_service::run()
/**
* The run() function blocks until all work has finished and there are no
* more handlers to be dispatched, or until the io_service has been stopped.
*
* Multiple threads may call the run() function to set up a pool of threads
* from which the io_service may execute handlers. All threads that are
* waiting in the pool are equivalent and the io_service may choose any one
* of them to invoke a handler.
*
* ...
*/
BOOST_ASIO_DECL std::size_t run();
So if you give a socket multiple threads, it has no choice but to utilize multiple threads - despite not being thread safe. The only way to avoid this problem (apart from replacing the socket implementation) is to give the socket only one thread to work with. For a single socket this is what you want anyway (so don't bother running off to write a replacement).
The third problem can be resolved by using a (different) mutex that is locked before the async_write, passed into the completion handler and unlocked at that point. This will prevent any caller from beginning a write until all parts of the preceding write are complete.
Note that the async_write posts work to a queue - that's how it is able to return almost immediately. If you throw too much work at it you may have to deal with some consequences. Despite using a single io_service thread for the socket, you may have any number of threads posting work via concurrent or non-concurrent calls to async_write.
On the other hand, async_read is straightforward. There is no interleaving problem and you simply loop back from the handler of the previous call. You may or may not want to dispatch the resulting work to another thread or queue, but if you perform it on the completion handler thread you are simply blocking all reads and writes on your single-threaded socket.
UPDATE
I did some more digging into the implementation of the underlying implementation of the socket stream (for one platform). It appears to be the case that the socket consistently executes platform socket calls on the invoking thread, not the delegate posted to the io_service. In other words, despite the fact that async_read and async_write appear to return immediately, they do in fact execute all socket operations before returning. Only the handlers are posted to the io_service. This is neither documented nor exposed by the exaple code I've reviewed, but assuming it is guaranteed behavior, it significantly impacts the second problem above.
Assuming that the work posted to the io_service does not incorporate socket operations, there is no need to limit the io_service to a single thread. It does however reinforce the importance of guarding against concurrent execution of the async functions. So, for example, if one follows the chat example but instead adds another thread to the io_service, there becomes a problem. With async function invocations executing within function handlers, you have concurrent function execution. This would require either a mutex, or all async function invocations to be reposted for execution on a strand.
UPDATE 2
With respect to the third problem (interleaving), if the data size exceeds 65536 bytes, the work is broken up internal to async_write and sent in parts. But it is critical to understand that, if there is more than one thread in the io_service, chunks of work other than the first will be posted to different threads. This all happens internal in the async_write function before your completion handler is called. The implementation creates its own intermediate completion handlers and uses them to execute all but the first socket operation.
This means any guard around the async_write call (mutex or strand) will not protect the socket if there are multiple io_service threads and more than 64kb of data to post (by default, this may possibly vary). Therefore, in this case, the interleave guard is necessary not only for interleave safety, but also thread safety of the socket. I verified all of this in a debugger.
THE MUTEX OPTION
The async_read and async_write functions internally use the io_service in order to obtain threads on which to post completion handlers, blocking until threads are available. This makes them hazardous to guard with mutex locks. When a mutex is used to guard these functions a deadlock will occur when threads back up against the lock, starving the io_service. Given that there is no other way to guard async_write when sending > 64k with a multithread io_service, it effectively locks us into a single thread in that scenario - which of course resolves the concurrency question.

According to Nov. 2008 boost 1.37 asio updates, certain synchronous operations including writes "are now thread safe" allowing "concurrent synchronous operations on an individual socket, if supported by the OS" boost 1.37.0 history. This would seem to support what you are seeing but the oversimplification "Shared objects: Unsafe" clause remains in the boost docs for ip::tcp::socket.

Another comment on an old post...
I think the key sentence in the asio documentation for asio::async_write() overloads is the following:
This operation is implemented in terms of zero or more calls to the stream's async_write_some function, and is known as a composed operation. The program must ensure that the stream performs no other write operations (such as async_write, the stream's async_write_some function, or any other composed operations that perform writes) until this operation completes.
As I understand it, this documents what was assumed in many of the above answers:
Data from calls to asio::async_write may be interleaved if multiple threads execute io_context.run().
Maybe this helps someone ;-)

It depends if you access same socket object from several threads. Let's say you have two threads running same io_service::run() function.
If for example you do reading and writing simultaneously or may be perform cancel operation
from other thread. Then it is not safe.
However if your protocol does only one operation in a time.
If only one thread runs the io_service run then there is no problem. If you want to execute something on the socket from other thread you may call io_service::post() with
handler that does this operation on socket so it would be executed in the same thread.
If you have several threads executing io_service::run and you try to do operations simultaneously - let's say cancel and read operation then you should use strands. There is a tutorial for this in Boost.Asio documentation.

I have been running extensive tests and haven't been able to break asio. Even without locking any mutex.
I would nevertheless advise that you use async_read and async_write with a mutex around each of those calls.
I believe the only draw back is that your completion handlers could be called concurrently if you have more than one thread calling io_service::run.
In my case this has not been an issue. Here is my test code:
#include <boost/thread.hpp>
#include <boost/date_time.hpp>
#include <boost/asio.hpp>
#include <vector>
using namespace std;
char databuffer[256];
vector<boost::asio::const_buffer> scatter_buffer;
boost::mutex my_test_mutex;
void my_test_func(boost::asio::ip::tcp::socket* socket, boost::asio::io_service *io) {
while(1) {
boost::this_thread::sleep(boost::posix_time::microsec(rand()%1000));
//my_test_mutex.lock(); // It would be safer
socket->async_send(scatter_buffer, boost::bind(&mycallback));
//my_test_mutex.unlock(); // It would be safer
}
}
int main(int argc, char **argv) {
for(int i = 0; i < 256; ++i)
databuffer[i] = i;
for(int i = 0; i < 4*90; ++i)
scatter_buffer.push_back(boost::asio::buffer(databuffer));
boost::asio::io_service my_test_ioservice;
boost::asio::ip::tcp::socket my_test_socket(my_test_ioservice);
boost::asio::ip::tcp::resolver my_test_tcp_resolver(my_test_ioservice);
boost::asio::ip::tcp::resolver::query my_test_tcp_query("192.168.1.10", "40000");
boost::asio::ip::tcp::resolver::iterator my_test_tcp_iterator = my_test_tcp_resolver.resolve(my_test_tcp_query);
boost::asio::connect(my_test_socket, my_test_tcp_iterator);
for (size_t i = 0; i < 8; ++i) {
boost::shared_ptr<boost::thread> thread(
new boost::thread(my_test_func, &my_test_socket, &my_test_ioservice));
}
while(1) {
my_test_ioservice.run_one();
boost::this_thread::sleep(boost::posix_time::microsec(rand()%1000));
}
return 0;
}
And here is my makeshift server in python:
import socket
def main():
mysocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysocket.bind((socket.gethostname(), 40000))
mysocket.listen(1)
while 1:
(clientsocket, address) = mysocket.accept()
print("Connection from: " + str(address))
i = 0
count = 0
while i == ord(clientsocket.recv(1)):
i += 1
i %= 256
count+=1
if count % 1000 == 0:
print(count/1000)
print("Error!")
return 0
if __name__ == '__main__':
main()
Please note that running this code can cause your computer to thrash.

does boost::strand producer/consumer make sense?

In this blog post (2010), someone is trying to solve the producer/consumer problem using Boost::strand facility. I have the feeling that he missed the point and that his program never runs simultaneously some producer and some consumer, but I'm not that expert of boost library to get confident about it.
He's got only one strand, on which both producer() and consumer() calls are dispatched by some timers;
He's got two thread, both invoking io_service::run()
Yet, one strand only with the guarantee that "none of those handlers will execute concurrently" also means that we'll be either producing or at a time, while I'd say nothing should prevent a producer from producing unit U+t while consumer uses unit U, right ?
void producer_consumer::producer() {
if ( count_ < num) {
++count_;
intvec_.push_back(count_);
std::cout << count_ < " pushed back into integer vector." << std::endl;
timer1_.async_wait(strand_.wrap(
boost::bind(&producer_consumer::producer, this))); // loops back
timer2_.async_wait(strand_.wrap(
boost::bind(&producer_consumer::consumer, this))); // start consumer
}
}
Or am I missing the fact that there would be some File::async_read() accepting a strand-wrapped-"produce" function as completion callback and a similar Socket::ready-to-write-again that would explain that his proposal make sense as long as "producer()" and "consumer()" are actually the monitor-protected parts that interface with the shared buffer ?

The example code tends to focus much more on demonstrating strand as a synchronization mechanism, rather than providing a solution for the producer-consumer problem.
For a motivational case of using strand to solve a producer-consumer problem, consider a GUI based chat client using TCP. The GUI can produce multiple messages, trying to send a message before the previous message has been written to the connection. Meanwhile, the application needs to consume and write each message to the TCP connection while preserving the messages, resulting in no interleaving data. Composed operations, such as async_write, requires that the stream perform no other write operations until the composed operation completes. To account for these behaviors:
A queue can buffer chat messages.
The GUI can post an operation into strand that will:
Add the chat message to the queue.
Conditionally start the consumer.
The consumer is an asynchronous call-chain that reads from the queue and writes to the socket within the strand.
See this answer for an implementation.

When do I have to use boost::asio:strand

Reading the document of boost::asio, it is still not clear when I need to use asio::strand. Suppose that I have one thread using io_service is it then safe to write on a socket as follows ?
void Connection::write(boost::shared_ptr<string> msg)
{
_io_service.post(boost::bind(&Connection::_do_write,this,msg));
}
void Connection::_do_write(boost::shared_ptr<string> msg)
{
if(_write_in_progress)
{
_msg_queue.push_back(msg);
}
else
{
_write_in_progress=true;
boost::asio::async_write(_socket, boost::asio::buffer(*(msg.get())),
boost::bind(&Connection::_handle_write,this,
boost::asio::placeholders::error));
}
}
void Connection::_handle_write(boost::system::error_code const &error)
{
if(!error)
{
if(!_msg_queue.empty())
{
boost::shared_ptr<string> msg=_msg_queue.front();
_msg_queue.pop_front();
boost::asio::async_write(_socket, boost::asio::buffer(*(msg.get())),
boost::bind(&Connection::_handle_write,this,
boost::asio::placeholders::error));
}
else
{
_write_in_progress=false;
}
}
}
Where multiple threads calls Connection::write(..) or do I have to use asio::strand ?

Short answer: no, you don't have to use a strand in this case.
Broadly simplificated, an io_service contains a list of function objects (handlers). Handlers are put into the list when post() is called on the service. e.g. whenever an asynchronous operation completes, the handler and its arguments are put into the list. io_service::run() executes one handler after another. So if there is only one thread calling run() like in your case, there are no synchronisation problems and no strands are needed.
Only if multiple threads call run() on the same io_service, multiple handlers will be executed at the same time, in N threads up to N concurrent handlers. If that is a problem, e.g. if there might be two handlers in the queue at the same time that access the same object, you need the strand.
You can see the strand as a kind of lock for a group of handlers. If a thread executes a handler associated to a strand, that strand gets locked, and it gets released after the handler is done. Any other thread can execute only handlers that are not associated to a locked strand.
Caution: this explanation may be over-simplified and technically not accurate, but it gives a basic concept of what happens in the io_service and of the strands.

Calling io_service::run() from only one thread will cause all event handlers to execute within the thread, regardless of how many threads are invoking Connection::write(...). Therefore, with no possible concurrent execution of handlers, it is safe. The documentation refers to this as an implicit strand.
On the other hand, if multiple threads are invoking io_service::run(), then a strand would become necessary. This answer covers strands in much more detail.

boost::asio::socket thread safety

( This is a simplified version of my original question )
I have several threads that write to a boost asio socket. This seems to work very well, with no problems.
The documentation says a shared socket is not thread safe( here, way down at the bottom ) so I am wondering if I should protect the socket with mutex, or something.
This question insists that protection is necessary, but gives no advice on how to do so.
All the answers to my original question also insisted that what I was doing dangerous, and most urged me to replace my writes with async_writes or even more complicated things. However, I am reluctant to do this, since it would complicate code that is already working and none of the answerers convinced me they knew what they ware talking about - they seemed to have read the same documentation as I and were guessing, just as I was.
So, I wrote a simple program to stress test writing to a shared socket from two threads.
Here is the server, which simply writes out whatever it receives from the client
int main()
{
boost::asio::io_service io_service;
tcp::acceptor acceptor(io_service, tcp::endpoint(tcp::v4(), 3001));
tcp::socket socket(io_service);
acceptor.accept(socket);
for (;;)
{
char mybuffer[1256];
int len = socket.read_some(boost::asio::buffer(mybuffer,1256));
mybuffer[len] = '\0';
std::cout << mybuffer;
std::cout.flush();
}
return 0;
}
Here is the client, which creates two threads that write to a shared socket as fast as they can
boost::asio::ip::tcp::socket * psocket;
void speaker1()
{
string msg("speaker1: hello, server, how are you running?\n");
for( int k = 0; k < 1000; k++ ) {
boost::asio::write(
*psocket,boost::asio::buffer(msg,msg.length()));
}
}
void speaker2()
{
string msg("speaker2: hello, server, how are you running?\n");
for( int k = 0; k < 1000; k++ ) {
boost::asio::write(
*psocket,boost::asio::buffer(msg,msg.length()));
}
}
int main(int argc, char* argv[])
{
boost::asio::io_service io_service;
// connect to server
tcp::resolver resolver(io_service);
tcp::resolver::query query("localhost", "3001");
tcp::resolver::iterator endpoint_iterator = resolver.resolve(query);
tcp::resolver::iterator end;
psocket = new tcp::socket(io_service);
boost::system::error_code error = boost::asio::error::host_not_found;
while (error && endpoint_iterator != end)
{
psocket->close();
psocket->connect(*endpoint_iterator++, error);
}
boost::thread t1( speaker1 );
boost::thread t2( speaker2 );
Sleep(50000);
}
This works! Perfectly, as far as I can tell. The client does not crash. The messages arrive at the server without garbles. They usually arrive alternately, one from each thread. Sometimes one thread get two or three messages in before the other, but I do not think this is a problem so long as there are no garbles and all the messages arrive.
My conclusion: the socket may not be thread safe in some theoretical sense, but it is so hard to make it fail that I am not going to worry about it.

After restudying the code for async_write I am now convinced that any write operation is thread safe if and only if the packet size is smaller than
default_max_transfer_size = 65536;
What happens is that as soon as an async_write is called an async_write_some is called in the same thread. Any threads in the pool calling some form of io_service::run will keep on calling async_write_some for that write operation until it completes.
These async_write_some calls can and will interleave if it has to be called more than once (the packets are larger than 65536).
ASIO does not queue writes to a socket as you would expect, one finishing after the other. In order to ensure both thread and interleave safe writes consider the following piece of code:
void my_connection::async_serialized_write(
boost::shared_ptr<transmission> outpacket) {
m_tx_mutex.lock();
bool in_progress = !m_pending_transmissions.empty();
m_pending_transmissions.push(outpacket);
if (!in_progress) {
if (m_pending_transmissions.front()->scatter_buffers.size() > 0) {
boost::asio::async_write(m_socket,
m_pending_transmissions.front()->scatter_buffers,
boost::asio::transfer_all(),
boost::bind(&my_connection::handle_async_serialized_write,
shared_from_this(),
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred));
} else { // Send single buffer
boost::asio::async_write(m_socket,
boost::asio::buffer(
m_pending_transmissions.front()->buffer_references.front(), m_pending_transmissions.front()->num_bytes_left),
boost::asio::transfer_all(),
boost::bind(
&my_connection::handle_async_serialized_write,
shared_from_this(),
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred));
}
}
m_tx_mutex.unlock();
}
void my_connection::handle_async_serialized_write(
const boost::system::error_code& e, size_t bytes_transferred) {
if (!e) {
boost::shared_ptr<transmission> transmission;
m_tx_mutex.lock();
transmission = m_pending_transmissions.front();
m_pending_transmissions.pop();
if (!m_pending_transmissions.empty()) {
if (m_pending_transmissions.front()->scatter_buffers.size() > 0) {
boost::asio::async_write(m_socket,
m_pending_transmissions.front()->scatter_buffers,
boost::asio::transfer_exactly(
m_pending_transmissions.front()->num_bytes_left),
boost::bind(
&chreosis_connection::handle_async_serialized_write,
shared_from_this(),
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred));
} else { // Send single buffer
boost::asio::async_write(m_socket,
boost::asio::buffer(
m_pending_transmissions.front()->buffer_references.front(),
m_pending_transmissions.front()->num_bytes_left),
boost::asio::transfer_all(),
boost::bind(
&my_connection::handle_async_serialized_write,
shared_from_this(),
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred));
}
}
m_tx_mutex.unlock();
transmission->handler(e, bytes_transferred, transmission);
} else {
MYLOG_ERROR(
m_connection_oid.toString() << " " << "handle_async_serialized_write: " << e.message());
stop(connection_stop_reasons::stop_async_handler_error);
}
}
This basically makes a queue for sending one packet at a time. async_write is called only after the first write succeeds which then calls the original handler for the first write.
It would have been easier if asio made write queues automatic per socket/stream.

Use a boost::asio::io_service::strand for asynchronous handlers that are not thread safe.
A strand is defined as a strictly sequential invocation of event
handlers (i.e. no concurrent invocation). Use of strands allows
execution of code in a multithreaded program without the need for
explicit locking (e.g. using mutexes).
The timer tutorial is probably the easiest way to wrap your head around strands.

It sounds like this question boils down to:
what happens when async_write_some() is called simultaneously on a single socket from two different threads
I believe this is exactly the operation that's not thread safe. The order those buffers will go out on the wire is undefined, and they may even be interleaved. Especially if you use the convenience function async_write(), since it's implemented as a series of calls to async_write_some() underneath, until the whole buffer has been sent. In this case each fragment that's sent from the two threads may be interleaved randomly.
The only way to protect you from hitting this case is to build your program to avoid situations like this.
One way to do that is by writing an application layer send buffer which a single thread is responsible for pushing onto the socket. That way you could protect the send buffer itself only. Keep in mind though that a simple std::vector won't work, since adding bytes to the end may end up re-allocating it, possibly while there is an outstanding async_write_some() referencing it. Instead, it's probably a good idea to use a linked list of buffers, and make use of the scatter/gather feature of asio.

The key to understanding ASIO is to realize that completion handlers only run in the context of a thread that has called io_service.run() no matter which thread called the asynchronous method. If you've only called io_service.run() in one thread then all completion handlers will execute serially in the context of that thread. If you've called io_service.run() in more than one thread then completion handlers will execute in the context of one of those threads. You can think of this as a thread pool where the threads in the pool are those threads that have called io_service.run() on the same io_service object.
If you have multiple threads call io_service.run() then you can force completion handlers to be serialized by putting them in a strand.
To answer the last part of your question, you should call boost::async_write(). This will dispatch the write operation onto a thread that has called io_service.run() and will invoked the completion handler when the write is done. If you need to serialize this operation then it's a little more complicated and you should read the documentation on strands here.

Consider first that the socket is a stream and is not internally guarded against concurrent read and/or write. There are three distinct considerations.
Concurrent execution of functions that access the same socket.
Concurrent execution of delegates that enclose the same socket.
Interleaved execution of delegates that write to the same socket.
The chat example is asynchronous but not concurrent. The io_service is run from a single thread, making all chat client operations non-concurrent. In other words, it avoids all of these problems. Even the async_write must internally complete sending all parts of a message before any other work can proceed, avoiding the interleaving problem.
Handlers are invoked only by a thread that is currently calling any overload of run(), run_one(), poll() or poll_one() for the io_service.
By posting work to the single thread io_service other threads can safely avoid both concurrency and blocking by queuing up work in the io_service. If however your scenario precludes you from buffering all work for a given socket, things get more complicated. You may need to block the socket communication (but not threads) as opposed to queuing up work indefinately. Also, the work queue can be very difficult to manage as it's entirely opaque.
If your io_service runs more than one thread you can still easily avoid the above problems, but you can only invoke reads or writes from the handlers of other reads or writes (and at startup). This sequences all access to the socket while remaining non-blocking. The safety arises from the fact that the pattern is using only one thread at any given time. But posting work from an independent thread is problematic - even if you don't mind buffering it.
A strand is an asio class that posts work to an io_service in a way that ensures non-concurrent invocation. However using a strand to invoke async_read and/or async_write solves only the first of the three problems. These functions internally post work to the io_service of the socket. If that service is running multiple threads the work can be exectuted concurrently.
So how do you, for a given socket, safely invoke async_read and/or async_write concurrently?
With concurrent callers the first problem can be resolved with a mutex or a strand, using the former if you don't want to buffer the work and the latter if you do. This protects the socket during the function invocations but does nothing for the other problems.
The second problem seems hardest, because it's difficult to see what's going on inside of the code executing asynchronously from the two functions. The async functions both post work to the io_service of the socket.
From the boost socket source:
/**
* This constructor creates a stream socket without opening it. The socket
* needs to be opened and then connected or accepted before data can be sent
* or received on it.
*
* #param io_service The io_service object that the stream socket will use to
* dispatch handlers for any asynchronous operations performed on the socket.
*/
explicit basic_stream_socket(boost::asio::io_service& io_service)
: basic_socket<Protocol, StreamSocketService>(io_service)
{
}
And from the io_service::run()
/**
* The run() function blocks until all work has finished and there are no
* more handlers to be dispatched, or until the io_service has been stopped.
*
* Multiple threads may call the run() function to set up a pool of threads
* from which the io_service may execute handlers. All threads that are
* waiting in the pool are equivalent and the io_service may choose any one
* of them to invoke a handler.
*
* ...
*/
BOOST_ASIO_DECL std::size_t run();
So if you give a socket multiple threads, it has no choice but to utilize multiple threads - despite not being thread safe. The only way to avoid this problem (apart from replacing the socket implementation) is to give the socket only one thread to work with. For a single socket this is what you want anyway (so don't bother running off to write a replacement).
The third problem can be resolved by using a (different) mutex that is locked before the async_write, passed into the completion handler and unlocked at that point. This will prevent any caller from beginning a write until all parts of the preceding write are complete.
Note that the async_write posts work to a queue - that's how it is able to return almost immediately. If you throw too much work at it you may have to deal with some consequences. Despite using a single io_service thread for the socket, you may have any number of threads posting work via concurrent or non-concurrent calls to async_write.
On the other hand, async_read is straightforward. There is no interleaving problem and you simply loop back from the handler of the previous call. You may or may not want to dispatch the resulting work to another thread or queue, but if you perform it on the completion handler thread you are simply blocking all reads and writes on your single-threaded socket.
UPDATE
I did some more digging into the implementation of the underlying implementation of the socket stream (for one platform). It appears to be the case that the socket consistently executes platform socket calls on the invoking thread, not the delegate posted to the io_service. In other words, despite the fact that async_read and async_write appear to return immediately, they do in fact execute all socket operations before returning. Only the handlers are posted to the io_service. This is neither documented nor exposed by the exaple code I've reviewed, but assuming it is guaranteed behavior, it significantly impacts the second problem above.
Assuming that the work posted to the io_service does not incorporate socket operations, there is no need to limit the io_service to a single thread. It does however reinforce the importance of guarding against concurrent execution of the async functions. So, for example, if one follows the chat example but instead adds another thread to the io_service, there becomes a problem. With async function invocations executing within function handlers, you have concurrent function execution. This would require either a mutex, or all async function invocations to be reposted for execution on a strand.
UPDATE 2
With respect to the third problem (interleaving), if the data size exceeds 65536 bytes, the work is broken up internal to async_write and sent in parts. But it is critical to understand that, if there is more than one thread in the io_service, chunks of work other than the first will be posted to different threads. This all happens internal in the async_write function before your completion handler is called. The implementation creates its own intermediate completion handlers and uses them to execute all but the first socket operation.
This means any guard around the async_write call (mutex or strand) will not protect the socket if there are multiple io_service threads and more than 64kb of data to post (by default, this may possibly vary). Therefore, in this case, the interleave guard is necessary not only for interleave safety, but also thread safety of the socket. I verified all of this in a debugger.
THE MUTEX OPTION
The async_read and async_write functions internally use the io_service in order to obtain threads on which to post completion handlers, blocking until threads are available. This makes them hazardous to guard with mutex locks. When a mutex is used to guard these functions a deadlock will occur when threads back up against the lock, starving the io_service. Given that there is no other way to guard async_write when sending > 64k with a multithread io_service, it effectively locks us into a single thread in that scenario - which of course resolves the concurrency question.

According to Nov. 2008 boost 1.37 asio updates, certain synchronous operations including writes "are now thread safe" allowing "concurrent synchronous operations on an individual socket, if supported by the OS" boost 1.37.0 history. This would seem to support what you are seeing but the oversimplification "Shared objects: Unsafe" clause remains in the boost docs for ip::tcp::socket.

Another comment on an old post...
I think the key sentence in the asio documentation for asio::async_write() overloads is the following:
This operation is implemented in terms of zero or more calls to the stream's async_write_some function, and is known as a composed operation. The program must ensure that the stream performs no other write operations (such as async_write, the stream's async_write_some function, or any other composed operations that perform writes) until this operation completes.
As I understand it, this documents what was assumed in many of the above answers:
Data from calls to asio::async_write may be interleaved if multiple threads execute io_context.run().
Maybe this helps someone ;-)

It depends if you access same socket object from several threads. Let's say you have two threads running same io_service::run() function.
If for example you do reading and writing simultaneously or may be perform cancel operation
from other thread. Then it is not safe.
However if your protocol does only one operation in a time.
If only one thread runs the io_service run then there is no problem. If you want to execute something on the socket from other thread you may call io_service::post() with
handler that does this operation on socket so it would be executed in the same thread.
If you have several threads executing io_service::run and you try to do operations simultaneously - let's say cancel and read operation then you should use strands. There is a tutorial for this in Boost.Asio documentation.

I have been running extensive tests and haven't been able to break asio. Even without locking any mutex.
I would nevertheless advise that you use async_read and async_write with a mutex around each of those calls.
I believe the only draw back is that your completion handlers could be called concurrently if you have more than one thread calling io_service::run.
In my case this has not been an issue. Here is my test code:
#include <boost/thread.hpp>
#include <boost/date_time.hpp>
#include <boost/asio.hpp>
#include <vector>
using namespace std;
char databuffer[256];
vector<boost::asio::const_buffer> scatter_buffer;
boost::mutex my_test_mutex;
void my_test_func(boost::asio::ip::tcp::socket* socket, boost::asio::io_service *io) {
while(1) {
boost::this_thread::sleep(boost::posix_time::microsec(rand()%1000));
//my_test_mutex.lock(); // It would be safer
socket->async_send(scatter_buffer, boost::bind(&mycallback));
//my_test_mutex.unlock(); // It would be safer
}
}
int main(int argc, char **argv) {
for(int i = 0; i < 256; ++i)
databuffer[i] = i;
for(int i = 0; i < 4*90; ++i)
scatter_buffer.push_back(boost::asio::buffer(databuffer));
boost::asio::io_service my_test_ioservice;
boost::asio::ip::tcp::socket my_test_socket(my_test_ioservice);
boost::asio::ip::tcp::resolver my_test_tcp_resolver(my_test_ioservice);
boost::asio::ip::tcp::resolver::query my_test_tcp_query("192.168.1.10", "40000");
boost::asio::ip::tcp::resolver::iterator my_test_tcp_iterator = my_test_tcp_resolver.resolve(my_test_tcp_query);
boost::asio::connect(my_test_socket, my_test_tcp_iterator);
for (size_t i = 0; i < 8; ++i) {
boost::shared_ptr<boost::thread> thread(
new boost::thread(my_test_func, &my_test_socket, &my_test_ioservice));
}
while(1) {
my_test_ioservice.run_one();
boost::this_thread::sleep(boost::posix_time::microsec(rand()%1000));
}
return 0;
}
And here is my makeshift server in python:
import socket
def main():
mysocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysocket.bind((socket.gethostname(), 40000))
mysocket.listen(1)
while 1:
(clientsocket, address) = mysocket.accept()
print("Connection from: " + str(address))
i = 0
count = 0
while i == ord(clientsocket.recv(1)):
i += 1
i %= 256
count+=1
if count % 1000 == 0:
print(count/1000)
print("Error!")
return 0
if __name__ == '__main__':
main()
Please note that running this code can cause your computer to thrash.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js