I'm writing a networking library that uses Boost asio and am confused on whether I should use a separate thread to run the io_service or not.
I currently have a class that wraps all asio work. It has one io_service, one socket, etc, and uses async_read and async_write methods to communicate with the remote server. This class exposes read and write methods to allow users to communicate with the remote server.
This class is then called by other classes that use it's read/write methods to send and receive data to the remote server. In some cases there are chained calls to read/write data from the server until a final user-provided callback is called to pass on the final result of the computation.
I'm now trying to implement a connection pool and am wondering if I need a thread pool: all reads and writes to the remote server use async methods, none post-read processing involves blocking calls until the final user-provided callback. Should it not be ok to have a series of connection objects running at the same time without the need for a separate thread pool?
If you only have one thread, then when you get the data and process it, you are blocking any other calls. Of course, if the only thing that you do in a async_read or async_write is start the next async call, then the io_service threads is always waiting for new data to arrive, and it populates the relevant connection underlying data structures. No problem with just one thread.
But you probably have some kind of processing that interacts with the read/write data, and this is the part that you can parallelize with the thread pool. So the question is: how big is the fraction of time consumed in this processing? Is it the bottleneck (latency and bandwidth) of the server?
I saw different cases here in the past. One case was a simple server working on one list of jobs to do and dispatching the data to clients. It didn't require threading, I didn't care about the latency, as the clients would come only from time to time, and no bottleneck. Then I had another case where everything needed to be processed quickly and in this instance, I used a thread pool.
So the real question is: where is the bottleneck?
Related
I am working on designing a websocket server which receives a message and saves it to an embedded database. For reading the messages I am using boost asio. To save the messages to the embedded database I see a few options in front of me:
Save the messages synchronously as soon as I receive them over the same thread.
Save the messages asynchronously on a separate thread.
I am pretty sure the second answer is what I want. However, I am not sure how to pass messages from the socket thread to the IO thread. I see the following options:
Use one io service per thread and use the post function to communicate between threads. Here I have to worry about lock contention. Should I?
Use Linux domain sockets to pass messages between threads. No lock contention as far as I understand. Here I can probably use BOOST_ASIO_DISABLE_THREADS macro to get some performance boost.
Also, I believe it would help to have multiple IO threads which would receive messages in a round robin fashion to save to the embedded database.
Which architecture would be the most performant? Are there any other alternatives from the ones I mentioned?
A few things to note:
The messages are exactly 8 bytes in length.
Cannot use an external database. The database must be embedded in the running
process.
I am thinking about using RocksDB as the embedded
database.
I don't think you want to use a unix socket, which is always going to require a system call and pass data through the kernel. That is generally more suitable as an inter-process mechanism than an inter-thread mechanism.
Unless your database API requires that all calls be made from the same thread (which I doubt) you don't have to use a separate boost::asio::io_service for it. I would instead create an io_service::strand on your existing io_service instance and use the strand::dispatch() member function (instead of io_service::post()) for any blocking database tasks. Using a strand in this manner guarantees that at most one thread may be blocked accessing the database, leaving all the other threads in your io_service instance available to service non-database tasks.
Why might this be better than using a separate io_service instance? One advantage is that having a single instance with one set of threads is slightly simpler to code and maintain. Another minor advantage is that using strand::dispatch() will execute in the current thread if it can (i.e. if no task is already running in the strand), which may avoid a context switch.
For the ultimate optimization I would agree that using a specialized queue whose enqueue operation cannot make a system call could be fastest. But given that you have network i/o by producers and disk i/o by consumers, I don't see how the implementation of the queue is going to be your bottleneck.
After benchmarking/profiling I found the facebook folly implementation of MPMC Queue to be the fastest by at least a 50% margin. If I use the non-blocking write method, then the socket thread has almost no overhead and the IO threads remain busy. The number of system calls are also much less than other queue implementations.
The SPSC queue with cond variable in boost is slower. I am not sure why that is. It might have something to do with the adaptive spin that folly queue uses.
Also, message passing (UDP domain sockets in this case) turned out to be orders of magnitude slower especially for larger messages. This might have something to do with copying of data twice.
You probably only need one io_service -- you can create additional threads which will process events occurring within the io_service by providing boost::asio::io_service::run as the thread function. This should scale well for receiving 8-byte messages from clients over the network socket.
For storing the messages in the database, it depends on the database & interface. If it's multi-threaded, then you might as well just send each message to the DB from the thread that received it. Otherwise, I'd probably set up a boost::lockfree::queue where a single reader thread pulls items off and sends them to the database, and the io_service threads append new messages to the queue when they arrive.
Is that the most efficient approach? I dunno. It's definitely simple, and gives you a baseline that you can profile if it's not fast enough for your situation. But I would recommend against designing something more complicated at first: you don't know whether you'll need it at all, and unless you know a lot about your system, it's practically impossible to say whether a complicated approach would perform any better than the simple one.
void Consumer( lockfree::queue<uint64_t> &message_queue ) {
// Connect to database...
while (!Finished) {
message_queue.consume_all( add_to_database ); // add_to_database is a Functor that takes a message
cond_var.wait_for( ... ); // Use a timed wait to avoid missing a signal. It's OK to consume_all() even if there's nothing in the queue.
}
}
void Producer( lockfree::queue<uint64_t> &message_queue ) {
while (!Finished) {
uint64_t m = receive_from_network( );
message_queue.push( m );
cond_var.notify_all( );
}
}
Assuming that the constraint of using cxx11 is not too hard in your situtation, I would try to use the std::async to make an asynchronous call to the embedded DB.
I am trying to learn C++ (with prior programming knowledge) by creating a server application with multiple clients. Server application will run on Raspberry Pi/Debian (Raspbian). I thought this would also be a good opportunity to learn about low-level concurrent programming with threads (e.g. POSIX). Then I came across with select() function which basically allows usage of blocking functions in a single thread to handle multiple clients, which is interesting. Some people here on StackOverflow mentioned that threads cause a lot of overhead and select() seems to be a nice alternative.
In reality, I will have 1-3 clients connected but I would like to keep my application flexible. As a structure design, I was thinking about a main thread invoking a data thread (processing stuff non-stop) and a server thread (listening for incoming connections). Since accept() call is blocking, the latter one needs to be a separate thread. If a client connects, then for each client, I may need a separate thread as well.
At the end, worker thread will write to the shared memory and client threads will read from there and communicate with the clients. Some people were opposing to the usage of threads but in my understanding, threads are good if they are invoked rarely (and long living) and if there are blocking function calls. For the last one as it seems there is the select() function, which used in a loop, allows for handling of multiple sockets in a single thread.
I think at least for the data processing and server accept() call, I will need 2 separate threads initiated at the beginning. I may handle all clients with select() in a single thread or separate threads. What would be the correct approach and are there smarter alternatives?
I'm trying to implement a two way communication using boost:asio. I'm writing the server that will communicate with multiple clients.
I want the writes and reads to and from clients to happen without any synchronization and order - the client can send a command to the server at any time and it still receives some data in a loop. Of course access to shared resources must be protected.
What is the best way to achieve this? Is having two threads - one for reading and one for writing a good option? What about accepting the connections and managing many clients?
//edit
By "no synchronization and order" I mean that the server should stream to the clients its data all the time and that it can respond(change its behaviour) to clients requests at any time regardless of what is now being sent to them.
One key idea behind asio is exactly that you don't need multiple threads to deal with multiple client sessions. Your description is a bit generic, and I'm not sure I understand what you mean by 'I want the writes and reads to and from clients to happen without any synchronization and order'.
A good starting point would be the asio chat server example. Notice how in this example an instance of the class chat_session is created for each connected client. Objects of that class keep on posting asynchronous reads as long as the connection is alive and at the same time they can write data to the connected clients. In the mean time an object of class chat_server keeps accepting new incoming client connections.
At work we're doing something conceptually very similar and there I noticed the big impact a heavy handler has on performance. The writing side of the code/write handler does too much work and occupies a worker thread for too long, thereby jeopardizing the program flow. Especially RST packets (closed connections) weren't detected quick enough by the read handler because the write actions were taking their sweet time and hogging most of the processing time in the worker thread. Currently I fixed that by creating two worker threads so that one line of code was not starved of processing time. Admittedly, this is far from ideal and it is on my lengthy to-do list of optimizations.
Long story short, you can get away with using a single thread for reading and writing if your handlers are light-weight while a second thread handles the rest of your program. Once you notice weird synchronization issues it's time to either lighten your network handlers or add an extra thread to the worker pool.
Basically, what I'm trying to achieve is to implement a generic multithreaded TCP server that can handle arbitrary requests for usage by 2 different servers with slightly different needs.
My requirements are:
A request cannot begin to be processed until an entire initial request has been received. (Essentially, I have a request header of a fixed size that among other things, includes the size of the entire request).
Handling a request may result in multiple response messages to the requesting client. I.E., normally, requests can be handled in a single response, but at times, in response to long running database transactions, I need to ping back to the client, letting them know that I'm still working and to not time out the connection.
To achieve this, I've been following fairly closely the HTTP server example #2 from boost v1.44. In general, the example has worked for simple cases. What I've noticed is that when I scale up to handling multiple requests concurrently, the changes I've made have somehow resulted in all request being handled serially, by a single thread. Obviously, I'm doing something wrong.
I cannot post the entirety of the actual code I'm using, due to employer restrictions, but suffice it to say, I've kept the async calls to accept new connections, but have replaced the async read/writes with synchronous calls. If there are specific pieces that you think you need to see, I can see what I can do.
Essentially, what I'm looking for are pointers in how to use boost::asio for a multithreaded TCP server where individual connections are handled by a single thread with synchronous I/O. Again, keep in mind, my abstraction is based upon the http server example #2 (one io_service per CPU), but I am flexible to altern
The Boost.Asio documentation suggests using a single io_service per application, and invoking io_service::run from a pool of threads.
It's also not obvious to me why you cannot use asynchronous read and write combined with deadline_timer objects to periodically ping your clients. Such a design will almost certainly scale better than a thread-per-connectiong using synchronous reads and writes.
Some diagnostics: can you print the value of io_service_pool_.get_io_service() before using it in the following code?
// from server.cpp
void server::handle_accept(const boost::system::error_code& e)
{
if (!e)
{
new_connection_->start();
new_connection_.reset(new connection(
io_service_pool_.get_io_service(), request_handler_));
acceptor_.async_accept(new_connection_->socket(),
boost::bind(&server::handle_accept, this,
boost::asio::placeholders::error));
}
}
You'll need to store it in a temporary before passing it to new_connection_.reset(); that is, don't call get_io_service() twice for this test.
We first must make sure you're getting a new io_service.
If you are doing lots of synchronous I/O, your concurrency is limited to the number of threads you have. I would suggest having one io_service for all your asynchronous I/O (ie: all the comms, timers) as you have now, and then decide on how to deal with the synchronous I/O.
For the synchronous I/O you need to decide what your peak concurrency will be. Because it is synchronous and it is I/O, you will want more threads that CPUs, and the decision will be based on how much I/O concurrency you want. Use a separate io_service, and then use io_service::dispatch() to distribute work into the threads doing the synchronous workload.
Doing it this way avoids the problem of a blocking I/O call stopping processing on other asynchronous events.
having several connections in several different threads.. I'm basically doing a base class that uses boost/asio.hpp and the tcp stuff there..
now i was reading this: http://www.boost.org/doc/libs/1_44_0/doc/html/boost_asio/tutorial/tutdaytime1.html
it says that "All programs that use asio need to have at least one io_service object."
so should my base class has a static io_service (which means there will be only 1 for all the program and a all the different threads and connections will use the same io_service object)
or make each connection its own io_service?
thanks in front!
update:
OK so basically what I wish to do is a class for a basic client which will have a socket n it.
For each socket I'm going to have a thread that always-receives and a different thread that sometimes sends packets.
after looking in here: www.boost.org/doc/libs/1_44_0/doc/html/boost_asio/reference/ip__tcp/socket.html (cant make hyperlink since im new here.. so only 1 hyperling per post) I can see that socket class isn't entirely thread-safe..
so 2 questions:
1. Based on the design I just wrote, do I need 1 io_service for all the sockets (meaning make it a static class member) or I should have one for each?
2. How can I make it thread-safe to do? should I put it inside a "thread safe environment" meaning making a new socket class that has mutexes and stuff that doesn't let u send and receive at the same time or you have other suggestions?
3. Maybe I should go on a asynch design? (ofc each socket will have a different thread but the sending and receiving would be on the same thread?)
just to clarify: im doing a tcp client that connects to a lot of servers.
You need to decide first which style of socket communication you are going to use:
synchronous - means that all low-level operations are blocking, and typically you need a thread for the accept, and then threads (read thread or io_service) to handle each client.
asynchronous - means that all low-level operations are non-blocking, and here you only need a single thread (io_service), and you need to be able to handle callbacks when certain things happen (i.e. accepts, partial writes, result of reads etc.)
Advantage of approach 1 is that it's a lot simpler to code (??) than 2, however I find that 2 is most flexible, and in fact with 2, by default you have a single threaded application (internally the event callbacks are done in a separate thread to the main dispatching thread), downside of 2 of course is that your processing delay hits the next read/write operations... Of course you can make multi-threaded applications with approach 2, but not vice-versa (i.e. single threaded with 1) - hence the flexibility...
So, fundamentally, it all depends on the selection of style...
EDIT: updated for the new information, this is quite long, I can't be bothered to write the code, there is plenty in the boost docs, I'll simply describe what is happening for your benefit...
[main thread]
- declare an instance of io_service
- for each of the servers you are connecting to (I'm assuming that this information is available at start), create a class (say ServerConnection), and in this class, create a tcp::socket using the same io_service instance from above, and in the constructor itself, call async_connect, NOTE: this call is a scheduling a request for connect rather than the real connection operation (this doesn't happen till later)
- once all the ServerConnection objects (and their respective async_connects queued up), call run() on the instance of io_service. Now the main thread is blocked dispatching events in the io_service queue.
[asio thread] io_service by default has a thread in which scheduled events are invoked, you don't control this thread, and to implement a "multi-threaded" program, you can increase the number of threads that the io_service uses, but for the moment stick with one, it will make your life simple...
asio will invoke methods in your ServerConnection class depending on which events are ready from the scheduled list. The first event you queued up (before calling run()) was async_connect, now asio will call you back when a connection is established to a server, typically, you will implement a handle_connect method which will get called (you pass the method in to the async_connect call). On handle_connect, all you have to do is schedule the next request - in this case, you want to read some data (potentially from this socket), so you call async_read_some and pass in a function to be notified when there is data. Once done, then the main asio dispatch thread will continue dispatching other events which are ready (this could be the other connect requests or even the async_read_some requests that you added).
Let's say you get called because there is some data on one of the server sockets, this is passed to you via your handler for async_read_some - you can then process this data, do as you need to, but and this is the most important bit - once done, schedule the next async_read_some, this way asio will deliver more data as it becomes available. VERY IMPORTANT NOTE: if you no longer schedule any requests (i.e. exit from the handler without queueing), then the io_service will run out of events to dispatch, and run() (which you called in the main thread) will end.
Now, as for writing, this is slightly trickier. If all your writes are done as part of the handling of data from a read call (i.e. in the asio thread), then you don't need to worry about locking (unless your io_service has multiple threads), else in your write method, append the data to a buffer, and schedule an async_write_some request (with a write_handler that will get called when the buffer is written, either partially or completely). When asio handles this request, it will invoke your handler once the data is written and you have the option of calling async_write_some again if there is more data left in the buffer or if none, you don't have to bother scheduling a write. At this point, I will mention one technique, consider double buffering - I'll leave it at that. If you have a completely different thread that is outside of the io_service and you want to write, you must call the io_service::post method and pass in a method to execute (in your ServerConnection class) along with the data, the io_service will then invoke this method when it can, and within that method, you can then buffer the data and optionally call async_write_some if a write is currently not in progress.
Now there is one VERY important thing that you must be careful about, you must NEVER schedule async_read_some or async_write_some if there is already one in progress, i.e. let's say you called async_read_some on a socket, until this event is invoked by asio, you must not schedule another async_read_some, else you'll have lots of crap in your buffers!
A good starting point is the asio chat server/client that you find in the boost docs, it shows how the async_xxx methods are used. And keep this in mind, all async_xxx calls return immediately (within some tens of microseconds), so there are no blocking operations, it all happens asynchronously. http://www.boost.org/doc/libs/1_39_0/doc/html/boost_asio/example/chat/chat_client.cpp, is the example I was referring to.
Now if you find that performance of this mechanism is too slow and you want to have threading, all you need to do is increase the number of threads that are available to the main io_service and implement the appropriate locking in your read/write methods in ServerConnection and you're done.
For asynchronous operations, you should use a single io_service object for the entire program. Whether its a static member of a class, or instantiated elsewhere is up to you. Multiple threads can invoke its run method, this is described in Inverse's answer.
Multiple threads may call
io_service::run() to set up a pool of
threads from which completion handlers
may be invoked. This approach may also
be used with io_service::post() to use
a means to perform any computational
tasks across a thread pool.
Note that all threads that have joined
an io_service's pool are considered
equivalent, and the io_service may
distribute work across them in an
arbitrary fashion.
if you have handlers that are not thread safe, read about strands.
A strand is defined as a strictly
sequential invocation of event
handlers (i.e. no concurrent
invocation). Use of strands allows
execution of code in a multithreaded
program without the need for explicit
locking (e.g. using mutexes).
The io_service is what invokes all the handler functions for you connections. So you should have one running for thread in order to distribute the work across threads. Here is a page explain the io_service and threads:
Threads and Boost.Asio