boost asio multithreaded tcp server with thread pool

boost asio multithreaded tcp server with thread pool - c++

I have a single threaded asynchronous tcp server written using boost asio. Each incoming request will go through several processing steps (synchronous and asynchronous) and finally send back the response using async write.
For small loads with 10 concurrent requests, it works decently. However, when I test using a parallelism of 100, things start worsening. Response latency starts increasing as time progresses. So, I want to try with some multi-threaded processing for handling requests.
I am looking for a decent example / help on creating and running multiple threads for asynchronous reading/writing to clients. I have the following doubts:
Should I use a single IOS object and call its run method in all of the threads of the thread pool, or should I use a separate IOS per thread?
If I use a single IOS, is there a possibility that part of the tcp data goes to one thread, while another part going to another thread and so on.. Is this understanding correct?
Is there any other better way?
Thanks for any help and pointers here.

Without seeing your code I can only guess what goes wrong. Most probably you're running long actions inside async completion handlers. The completion handlers should be fast - get the data, hand it off for further processing, done.
As a first priority, I would go full-asynchronous and run all processing in a thread pool. You can find an example here, where a new thread is started for every new client, which you can replace with a thread pool.
Use a single io_service. A single io_service can handle a lot of parallelism, provided you don't delay it inside completion handlers. This simplifies the implementation because you don't have to worry about completion handlers running in parallel, which will happen if you run multiple IOS in multiple threads.

Q1: Should I use a single IOS object and call its run method in all of the threads of the thread pool, or should I use a separate IOS per thread?
Either you can
HTTP Server 2 - IOS per thread
HTTP Server 3 - single IOS with thread pool
Q2: If I use a single IOS, is there a possibility that part of the tcp data goes to one thread, while another part going to another thread and so on.. Is this understanding correct?
Yes, there is a race condition, but boost.asio support strand to avoid it.
Q3: Is there any other better way?
To me, not find a better way, if you find, tell me or past here, thank you.
BTW, as #rustyx said, your program is blocked at sync calls, turn to full-asynchronous calls will help.

Related

For a client server program, what is the best approach to receive multiple client connection requests in parallel?

The program is a client server socket application being developed with C on Linux. There is a remote server to which each client connects and logs itself as being online. There will be most likely be several clients online at any given point of time, all trying to connect to the server to log themselves as being online/busy/idle etc. So how can the server handle these concurrent requests. What's a good design approach (Forking/multithreading for each connection request maybe?)?

personally i would use the event driven approach for servers. there you register a callback that is called as soon as a connection arrives. and event callbacks whenever the socket is ready to read or write.
with a huge amount of connections you will have a great performance and resource benefit compared to threads. But i would also prefere this for a smaler count of connections.
i only would use threads if you really need to use multiple cores or if you have some request that could take longer to process and where it is too complicate to handle it without threads.
i use libev as base library to handle event driven networking.

Generally speaking, you want a thread pool to service requests.
A typical structure will start with a single thread that does nothing but queue up incoming requests. Since it doesn't do very much, it's typically pretty easy for one thread to keep up with the maximum speed of the network.
That puts the items into some sort of concurrent queue. Then you have a pool of other threads reading items from the queue, doing what's needed, then depositing the result in another queue (and repeating, and repeating until the servers shuts down).
Finally, you have another single thread that just takes items from the result queue, and sends replies out to the clients.

Best approach is a combination of event driven model with multithreaded model.
You create a bunch of nonblocking sockets, but threads count should be much fewver. I.e. 10 sockets per thread.
Then you just listen for an event (incoming request) on every thread in a non-blocking mode and process it as it happens.
This technique usually performs better then non-blocking sockets or multithreaded model separately.

Take a look at Comer's "Internetworking with TCP/IP" volume 3 (BSD sockets version), it has detailed examples for different ways of writing servers and clients. The full code (sans explanations, unfortunally) is on the web. Or rummage around in http://tldp.org, there you'll find a collection of tutorials.

select or poll or epoll
These are facilities on *nix systems to aggregate multiple event sources (connections) into a single waiting point. The server adds the connections to a data structure, and then waits by calling select etc. It gets woken up when stuff happens on any of these connections, figures out which one, handles it, and then goes back to sleep. See manual for details.
There are several higher level libraries built on top of these mechanisms, that make programming them somewhat easier e.g. libevent, libev etc.

POCO raise event on TCPServer connected threads

I'm new to Poco framework and not to good with C++ but I am learning. I have to create a server-client based application in windows.
The problem that I have now is that I need to send repeatedly from minute to minute some data to the clients. i need to do this for the clients that have an active tcp connection with the server. I don't know how can I create an event, or something that is triggered in a thread and starts all the active threads to send data to the clients.
My first idea is that I have to rewrite, or extend the TCPServerDispatcher Class. And I don't know how can I identify the active threads from the ThreadPool.
Do you have any ideas, or maybe suggestions, or a tutorial, something?
I can't figure it out how to do it...
Hope somebody can give me an idea, or some code example. Thank you.

Can these server<> client threads not obtain the data for themselves? It would be fairly easy to add a 60-second timeout on a read() in each thread and send the data then. Maybe this would involve too many database connections?
Failing that, can you put the latest data in a lockable object and have the threads just lock, write and unlock the latest data on a timeout? Such a solution should really have a write timeout as well to prevent a badly-behaved client causing its server thread to block while holding the lock. If it's not too large, I suppose the server<> client thread could make a copy of the data to send, but I'm not a great fan of copying, TBH.
There are more complex ways of signaling the server<> client threads that new data is avalable. It is quite possible to signal each thread that new data is available and have them act upon it 'immediately'. This usually means the server<> client thread waiting on more than one signal. In general, the lower the latency, the more complex the solution:(
Rgds,
Martin

boost::asio, threads and synchronization

This is somewhat related to this question, but I think I need to know a little bit more. I've been trying to get my head around how to do this for a few days (whilst working on other parts), but the time has come for me to bite the bullet and get multi-threaded. Also, I'm after a bit more information than the question linked.
Firstly, about multi-threading. As I have been testing my code, I've not bothered with any multi-threading. It's just a console application that starts a connection to a test server and everything else is then handled. The main loop is this:
while(true)
{
Root::instance().performIO(); // calls io_service::runOne();
}
When I write my main application, I'm guessing this solution won't be acceptable (as it would have to be called in the message loop which, whilst possible, would have issues when the message queue blocks waiting for a message. You could change it so that the message-loop doesn't block, but then isn't that going to whack the CPU usage through the roof?)
The solution it seems is to throw another thread at it. Okay, fine. But then I've read that io_service::run() returns when there is no work to do. What is that? Is that when there's no data, or no connections? If at least one connection exists does it stay alive? If so, that's not so much of a problem as I only have to start up a new thread when the first connection is made and I'm happy if it all stops when there is nothing going on at all. I guess I am confused by the definition of 'no work to do'.
Then I have to worry about synchronizing my boost thread with my main GUI thread. So, I guess my questions are:
What is the best-practice way of using boost::asio in a client application with regard to threads and keeping them alive?
When writing to a socket from the main thread to the IO thread, is synchronization achieved using boost::asio::post, so that the call happens later in the io_service?
When data is received, how do people get the data back to the UI thread? In the past when I used completion ports, I made a special event that could post the data back to the main UI thread using a ::SendMessage. It wasn't elegant, but it worked.
I'll be reading some more today, but it would be great to get a heads up from someone who has done this already. The Boost::asio documentation isn't great, and most of my work so far has been based on a bit of the documentation, some trial/error, some example code on the web.

1) Have a look at io_service::work. As long as an work object exists io_service::run will not return. So if you start doing your clean up, destroy the work object, cancel any outstanding operations, for example an async_read on a socket, wait for run to return and clean up your resources.
2) io_service::post will asynchronously execute the given handler from a thread running the io_service. A callback can be used to get the result of the operation executed.
3) You needs some form of messaging system to inform your GUI thread of the new data. There are several possibilities here.
As far as your remark about the documention, I thing Asio is one of the better documented boost libraries and it comes with clear examples.

boost::io_service::run() will return only when there's nothing to do, so no async operations are pending, e.g. async accept/connection, async read/write or async timer wait. so before calling io_service::run() you first have to start any async op.
i haven't got do you have console or GUI app? in any case multithreading looks like a overkill. you can use Asio in conjunction with your message loop. if it's win32 GUI you can call io_service::run_one() from you OnIdle() handler. in case of console application you can setup deadline_timer that regularly checks (every 200ms?) for user input and use it with io_service::run(). everything in single thread to greatly simplify the solution

1) What is the best-practice way of using
boost::asio in a client application
with regard to threads and keeping
them alive?
As the documentation suggests, a pool of threads invoking io_service::run is the most scalable and easiest to implement.
2) When writing to a socket from the main
thread to the IO thread, is
synchronization achieved using
boost::asio::post, so that the call
happens later in the io_service?
You will need to use a strand to protect any handlers that can be invoked by multiple threads. See this answer as it may help you, as well as this example.
3) When data is received, how do people
get the data back to the UI thread? In
the past when I used completion ports,
I made a special event that could post
the data back to the main UI thread
using a ::SendMessage. It wasn't
elegant, but it worked.
How about providing a callback in the form of a boost::function when you post an asynchronous event to the io_service? Then the event's handler can invoke the callback and update the UI with the results.

When data is received, how do people get the data back to the UI thread? In the past when I used completion ports, I made a special event that could post the data back to the main UI thread using a ::SendMessage. It wasn't elegant, but it worked
::PostMessage may be more appropriate.
Unless everything runs in one thread these mechanisms must be used to safely post events to the UI thread.

Processing messages is too slow, resulting in a jerky, unresponsive UI - how can I use multiple threads to alleviate this?

I'm having trouble keeping my app responsive to user actions. Therefore, I'd like to split message processing between multiple threads.
Can I simply create several threads, reading from the same message queue in all of them, and letting which ever one is able process each message?
If so, how can this be accomplished?
If not, can you suggest another way of resolving this problem?

You cannot have more than one thread which interacts with the message pump or any UI elements. That way lies madness.
If there are long processing tasks which can be farmed out to worker threads, you can do it that way, but you'll have to use another thread-safe queue to manage them.

If this were later in the future, I would say use the Asynchronous Agents APIs (plug for what I'm working on) in the yet to be released Visual Studio 2010 however what I would say given todays tools is to separate the work, specifically in your message passing pump you want to do as little work as possible to identify the message and pass it along to another thread which will process the work (hopefully there isn't Thread Local information that is needed). Passing it along to another thread means inserting it into a thread safe queue of some sort either locked or lock-free and then setting an event that other threads can watch to pull items from the queue (or just pull them directly). You can look at using a 'work stealing queue' with a thread pool for efficiency.
This will accomplish getting the work off the UI thread, to have the UI thread do additional work (like painting the results of that work) you need to generate a windows message to wake up the UI thread and check for the results, an easy way to do this is to have another 'work ready' queue of work objects to execute on the UI thread. imagine an queue that looks like this: threadsafe_queue<function<void(void)> basically you can check if it to see if it is non-empty on the UI thread, and if there are work items then you can execute them inline. You'll want the work objects to be as short lived as possible and preferably not do any blocking at all.
Another technique that can help if you are still seeing jerky movement responsiveness is to either ensure that you're thread callback isn't executing longer that 16ms and that you aren't taking any locks or doing any sort of I/O on the UI thread. There's a series of tools that can help identify these operations, the most freely available is the 'windows performance toolkit'.

Create the separate thread when processing the long operation i.e. keep it simple, the issue is with some code you are running that is taking too long, that's the code that should have a separate thread.

Network Multithreading

I'm programming an online game for two reasons, one to familiarize myself with server/client requests in a realtime environment (as opposed to something like a typical web browser, which is not realtime) and to actually get my hands wet in that area, so I can proceed to actually properly design one.
Anywho, I'm doing this in C++, and I've been using winsock to handle my basic, basic network tests. I obviously want to use a framelimiter and have 3D going and all of that at some point, and my main issue is that when I do a send() or receive(), the program kindly idles there and waits for a response. That would lead to maybe 8 fps on even the best internet connection.
So the obvious solution to me is to take the networking code out of the main process and start it up in its own thread. Ideally, I would call a "send" in my main process which would pass the networking thread a pointer to the message, and then periodically (every frame) check to see if the networking thread had received the reply, or timed out, or what have you. In a perfect world, I would actually have 2 or more networking threads running simultaneously, so that I could say run a chat window and do a background download of a piece of armor and still allow the player to run around all at once.
The bulk of my problem is that this is a new thing to me. I understand the concept of threading, but I can see some serious issues, like what happens if two threads try to read/write the same memory address at the same time, etc. I know that there are already methods in place to handle this sort of thing, so I'm looking for suggestions on the best way to implement something like this. Basically, I need thread A to be able to start a process in thread B by sending a chunk of data, poll thread B's status, and then receive the reply, also as a chunk of data., ideally without any major crashing going on. ^_^ I'll worry about what that data actually contains and how to handle dropped packets, etc later, I just need to get that happening first.
Thanks for any help/advice.
PS: Just thought about this, may make the question simpler. Is there a way to use the windows event handling system to my advantage? Like, would it be possible to have thread A initialize data somewhere, then trigger an event in thread B to have it pick up the data, and vice versa for thread B to tell thread A it was done? That would probably solve a lot of my problems, since I don't really need both threads to be able to work on the data at the same time, more of a baton pass really. I just don't know if this is possible between two different threads. (I know one thread can create its own messages for the event handler.)

The easiest thing
for you to do, would be to simply invoke the windows API QueueUserWorkItem. All you have to specify is the function that the thread will execute and the input passed to it. A thread pool will be automatically created for you and the jobs executed in it. New threads will be created as and when is required.
http://msdn.microsoft.com/en-us/library/ms684957(VS.85).aspx
More Control
You could have a more detailed control using another set of API's which can again manage the thread pool for you -
http://msdn.microsoft.com/en-us/library/ms686980(VS.85).aspx
Do it yourself
If you want to control all aspects of your thread creation and the pool management you would have to create the threads yourself, decide how they should end , how many to create etc (beginthreadex is the api you should be using to create threads. If you use MFC you should use AfxBeginThread function).
Send jobs to worker threads - Io completion Ports
In this case, you would also have to worry about how to communicate your jobs - i would recommend IoCOmpletionPorts to do that. It is the most scalable notification mechanism that i currently know of made for this purpose. It has the additional advantage that it is implemented in the kernel so you avoid all kinds of dead loack sitautions you would encounter if you decide to handroll something yourself.
This article will show you how with code samples -
http://blogs.msdn.com/larryosterman/archive/2004/03/29/101329.aspx
Communicate Back - Windows Messages
You could use windows messages to communicate the status back to your parent thread since it is doing the message wait anyway. use the PostMessage function to do this. (and check for errors)
ps : You could also allocate the data that needs to be sent out on a dedicated pointer and then the worker thread could take care of deleting it after sending it out. That way you avoid the return pointer traffic too.

BlodBath's suggestion of non-blocking sockets is potentially the right approach.
If you're trying to avoid using a multithreaded approach, then you could investigate the use of setting up overlapped I/O on your sockets. They will not block when you do a transmit or receive, but have the added bonus of giving you the option of waiting for multiple events within your single event loop. When your transmit has finished, you will receive an event. (see this for some details)
This is not incompatible with a multithreaded approach, so there's the option of changing your mind later. ;-)
On the design of your multithreaded app. the best thing to do is to work out all of the external activities that you want to be alerted to. For example, so far in your question you've listed network transmits, network receives, and user activity.
Depending on the number of concurrent connections you're going to be dealing with you'll probably find it conceptually simpler to have a thread per socket (assuming small numbers of sockets), where each thread is responsible for all of the processing for that socket.
Then you can implement some form of messaging system between your threads as RC suggested.
Arrange your system so that when a message is sent to a particular thread and event is also sent. Your threads can then be sent to sleep waiting for one of those events. (as well as any other stimulus - like socket events, user events etc.)
You're quite right that you need to be careful of situations where more than one thread is trying to access the same piece of memory. Mutexes and semaphores are the things to use there.
Also be aware of the limitations that your gui has when it comes to multithreading.
Some discussion on the subject can be found in this question.
But the abbreviated version is that most (and Windows is one of these) GUIs don't allow multiple threads to perform GUI operations simultaneously. To get around this problem you can make use of the message pump in your application, by sending custom messages to your gui thread to get it to perform gui operations.

I suggest looking into non-blocking sockets for the quick fix. Using non-blocking sockets send() and recv() do not block, and using the select() function you can get any waiting data every frame.

See it as a producer-consumer problem: when receiving, your network communication thread is the producer whereas the UI thread is the consumer. When sending, it's just the opposite. Implement a simple buffer class which gives you methods like push and pop (pop should be blocking for the network thread and non-blocking for the UI thread).
Rather than using the Windows event system, I would prefer something that is more portable, for example Boost condition variables.

I don't code games, but I've used a system similar to what pukku suggested. It lends nicely to doing things like having the buffer prioritize your messages to be processed if you have such a need.
I think of them as mailboxes per thread. You want to send a packet? Have the ProcessThread create a "thread message" with the payload to go on the wire and "send" it to the NetworkThread (i.e. push it on the NetworkThread's queue/mailbox and signal the condition variable of the NetworkThread so he'll wake up and pull it off). When the NetworkThread receives the response, package it up in a thread message and send it back to the ProcessThread in the same manner. Difference is the ProcessThread won't be blocked on a condition variable, just polling on mailbox.empty( ) when you want to check for the response.
You may want to push and pop directly, but a more convenient way for larger projects is to implement a toThreadName, fromThreadName scheme in a ThreadMsg base class, and a Post Office that threads register their Mailbox with. The PostOffice then has a send(ThreadMsg*); function that gets/pushes the messages to the appropriate Mailbox based on the to and from. Mailbox (the buffer/queue class) contains the ThreadMsg* = receiveMessage(), basically popping it off the underlying queue.
Depending on your needs, you could have ThreadMsg contain a virtual function process(..) that could be overridden accordingly in derived classes, or just have an ordinary ThreadMessage class with a to, from members and a getPayload( ) function to get back the raw data and deal with it directly in the ProcessThread.
Hope this helps.

Some topics you might be interested in:
mutex: A mutex allows you to lock access to specific resources for one thread only
semaphore: A way to determine how many users a certain resource still has (=how many threads are accessing it) and a way for threads to access a resource. A mutex is a special case of a semaphore.
critical section: a mutex-protected piece of code (street with only one lane) that can only be travelled by one thread at a time.
message queue: a way of distributing messages in a centralized queue
inter-process communication (IPC) - a way of threads and processes to communicate with each other through named pipes, shared memory and many other ways (it's more of a concept than a special technique)
All topics in bold print can be easily looked up on a search engine.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js