I've begun using Boost.ASIO for some simple network programming, my understanding of the library is not a great deal, so please bear with me and my newbie question.
At the moment in my project I only have 1 io_service object. Which use for all the async I/O operations etc.
My understanding is that one can create multiple threads and pass the run method of an io_service instance to the thread to provide more threads to the io_service.
My question: Is it good design to have multiple io_service objects? say for example have 2 distinct io_service instances, each with 2 threads associated, do they somehow know about each other (and hence cooperate with each), or if not would they negatively affect each other?
My intention is to have 1 io_service for socket based I/O and another for serial based (tty) I/O.
We use multiple io_service's because some of the components in our application need to run all their worker threads at certain fixed priorities, different for each component. Thus each component is given its own io_service, and each component has its own pool of threads executing run().
Other designs I could think of would be if a different number of threads in the pool is required for each IO, or, more relevant to your case, is if the pool cannot be shared because, for example, if your network IO can take out every thread and leave your serial IO waiting.
IIRC, during Michael Caisse's Boostcon ASIO talk (which is worth watching anyway), I believe this question is explicitly asked by an audience member and ok'd as a potential solution. I take from that that it's not wrong per se, and can be used that way according to your design.
This discussion may be enlightening:
http://thread.gmane.org/gmane.comp.lib.boost.asio.user/1300
I don't have the code right here, but why would you use multiple io_services?
I thought it used one io_service and multiple threads executing run on
that one io_service.
IIUC, each io_service owns a select/epoll/whatever queue, so having multiple
io_services is akin to having multiple independent select/epoll loops. In some
situations, eg. large numbers of sockets and multiple CPUs, this might help.
Something I'm less sure about is with multiple threads all running
io_service::run (with the same io_service). I think this just means the
handlers are run concurrently, while the select/epoll/etc. loop is 'shared'.
I think this is best for when your handlers are relatively long-running
operations.
Related
I'm reading some of the answers regarding Asio and a pattern that stands out, both in examples and here in SO, is to use a single io_service and share it between workers that would handle opening, sending and receiving messages over sockets.
Are there any benefits in sharing an io_service between multiple socket abstractions? Why not let each have their own io_service?
As far as I understand it, the io_service "owns" the resource. If you have one io_service handling all asio functions, then you can manage priorities. If you have multiple io_service instances, all "owning" the same resource, then they will clash.
I tried that pattern and would not recommend you to use it anymore expect for only some very specific scenarios. Instead I recommend the "use a socket always only from a single io_service" approach, and using multiple io_services (each running in a dedicated thread) if you have the need for it.
The reason for this is that if you use one io_service from multiple threads all your callbacks (completion handlers) can be invoked from any of the participating threads, and you have to provide additional synchronization for them. In the "resource belongs to one io_service which is executing on one thread" model you don't need this, since no concurrent handlers will be executed from another thread.
I'm running an fully operational IOCP TCP socket application. Today I was thinking about the Critical Section design and now I have one endless question in my head: global or per client Critical Section? I came to this because as I see there is no point to use multiple working threads if every threads depends on a single lock, right? I mean... now I don't see any performance issue with 100 simultaneous clients, but what if was 10000?
My shared resource is per client pre allocated struct, so, each client have your own IO context, socket and stuff. There is no inter-client resource share, so I think that is another point for use the per client CS. I use one accept thread and 8 (processors * 2) working threads. This applications is basicaly designed for small (< 1KB) packets but sometimes for file streaming.
The "correct" answer probably depends on your design, the number of concurrent clients and the performance that you require from the hardware that you have available.
In general, I find it best to go with the simplest thing that works and then profile to locate hot spots.
However... You say that you have no inter-client shared resources so I assume the only synchronisation that you need to do is around 'per-connection' state.
Since it's per connection the obvious (to me) design would be for the per-connection state to contain its own critical section. What do you perceive to be the downside of this approach?
The problem with a single shared lock is that you introduce contention between connections (and threads) that have no reason to block each other. This will adversely affect performance and will likely become a hot-spot as connection numbers rise.
Once you have a per connection lock you might want to look at avoiding using it as often as possible by having the IOCP threads simply lock to place completions in a per connection queue for processing. This has the advantage of allowing a single IOCP thread to work on each connection and preventing a single connection from having additional IOCP threads blocking on it. It also works well with 'skip completion port on success' processing.
I am implementing custom server that needs to maintain very large number (100K or more) of long lived connections. Server simply passes messages between sockets and it doesn't do any serious data processing. Messages are small, but many of them are received/send every second. Reducing latency is one of the goals. I realize that using multiple cores won't improve performance and therefore I decided to run the server in a single thread by calling run_one or poll methods of io_service object. Anyway multi-threaded server would be much harder to implement.
What are the possible bottlenecks? Syscalls, bandwidth, completion queue / event demultiplexing? I suspect that dispatching handlers may require locking (that is done internally by asio library). Is it possible to disable even queue locking (or any other locking) in boost.asio?
EDIT: related question. Does syscall performance improve with multiple threads? My feeling is that because syscalls are atomic/synchronized by the kernel adding more threads won't improve speed.
You might want to read my question from a few years ago, I asked it when first investigating the scalability of Boost.Asio while developing the system software for the Blue Gene/Q supercomputer.
Scaling to 100k or more connections should not be a problem, though you will need to be aware of the obvious resource limitations such as the maximum number of open file descriptors. If you haven't read the seminal C10K paper, I suggest reading it.
After you have implemented your application using a single thread and a single io_service, I suggest investigating a pool of threads invoking io_service::run(), and only then investigate pinning an io_service to a specific thread and/or cpu. There are multiple examples included in the Asio documentation for all three of these designs, and several questions on SO with more information. Be aware that as you introduce multiple threads invoking io_service::run() you may need to implement strands to ensure the handlers have exclusive access to shared data structures.
Using boost::asio you can write single-thread or multi-thread server approximately at same development cost. You can write single-threaded version as first version, then convert it to multithreaded, if needed.
Typically, only bottleneck for boost::asio is that epoll/kqueue reactor is working in a mutex. So, only one thread is doing epoll at same time. This can decrease performance in case when you have multithreaded server, which serves lots and lots very small packets. But, imo it anyway should be faster than just plain-singlethread server.
Now about your task. If you want to just pass messages between connections - i think it must be multithreaded server. The problem is syscalls(recv/send etc). An instruction is very easy think to do for CPU, but any syscall is not very "light" operation (everything is relative, but relative to other jobs in your task). So, with single thread you will get big syscalls overhead, its why i recommend to use multithreaded scheme.
Also, you can separate io_service and make it work as "io_service per thread" idiom. I think this must give best performance, but it has drawback: if one of io_service will get too big queue - other threads will not help it, so some connections may slowdown. On other side, with single io_service - queue overrun can lead to big locking overhead. All you can do - do the both variants and measure bandwidth/latency. It should be not too difficult to implement both variants.
Regarding the answer on: How game servers with Boost:Asio work asynchronously?
What if I have a server which does calculations and at the same time sends/receive packets from clients?
I mean if I was coding a http-server the example on the answer would suffice since all the data sent are functions of the data received.
Assume my program calculates values and needs to update clients according to their needs (some may want update frequency 1 hz, where another one 10 hz etc).
This kind of structure would be very helpful to me:
while(1){
pollNetworking(); //<- my function
value1 += 5; value2 = random();
}
In my pollNetworking function I was thinking of calling something like acceptor.accept(*socket,10); where 10 is the timeout in milliseconds but since there is no timeout parameter I don't know how to structure this.
Scalability is not the biggest issue, can I spawn a thread per socket,an extra thread for accepting and another one for calculations? Will this be easy to implement? Because I want this to be as stable as possible, then comes speed, then comes scalability. And when it comes to multi-threading I don't trust myself that I can code&debug it cleanly yet.
Edit: I learned that I can use io_service::poll, which only dispatch ready events without blocking. So it is a synchronous function with 0 timeout, exactly as I needed.
The server can do calculations at the same time as data is being sent and received from the client. However, the buffers and socket will likely need to be protected from concurrency access.
For most Boost.Asio operations, portable timeout functionality is only possible on asynchronous actions. This requires issuing an async operation on an entity, setting a timer, then waiting. For an example of canceling async_read with a timeout, see this question.
The simplest, and less scalable, approach is to designate a thread per responsibility (thread per socket, accepting, and calculations). Synchronization will likely need to occur, such as protecting calculation results. For example, if value1 and value2 are only meaningful in the same iteration, then socket threads need to guarantee that the values are written together without the calculation thread changing the values mid-write. Various synchronization constructs, such as those provided by Boost.Thread, can be used to accomplish this. Also, it may be easier to implement and debug by minimizing the amount of asynchronous calls being used.
For a much scalable approach, most of the program will be written as a series of handlers invoked from asynchronous operations. This allows for the program to take advantage of threads and thread pools much easier. However, it can scatter program logic across numerous functions, and can quickly become difficult to follow. Often times, programs written with asynchronous actions in mind will perform synchronization with boost::asio::strand, and manage object lifetimes through boost::shared_ptr.
The ease of implementation will depend on experience. Keep in mind that network programming, concurrency, and asynchronous operations are innately difficult. There is rarely solution that is both simple and complete.
You can still have asynchronous accept and receive, but send to the clients synchronous whenever you need to send to them.
If you can use separate threads for each connected client (I'm guessing you won't be expecting hundreds or thousands of connections) then you can use one thread per connected client for both calculations and sending, while keeping the receiving asynchronous.
Seems like all the examples always show running the same io_service in all threads.
Can you start multiple io_services? Here is what I would like to do:
Start io_service A in the main thread for handling user input...
Start another io_service B in another thread that then can start a bunch of worker
threads all sharing io_service B.
Users on io_service A can "post" work on io_service B so that it gets done on the worker pool but no work is to be done on io_service A, i.e. the main thread.
Is this possible? Does this make sense?
Thanks
In my experience, it really depends on the application if an io_service per cpu or one per process is better performing. There was a discussion on the asio-users mailing list a few years ago on this very topic.
The Boost.Asio documentation has some great examples showing these two techniques in the HTTP Server 2 and HTTP Server 3 examples. But keep in mind the second HTTP server just shows how to use this technique, not when or why to use it. Those questions will need to be answered by profiling your application.
In general, you should use the following order when creating applications using Boost.Asio
Single threaded
Thread pool with a single io_service
Multiple io_service objects with some sort of CPU affinity
Good question!
Yes, it is possible for one. In an application I'm currently working on I have broken up the application into separate components responsible for different aspects of the system. Each component runs in its own thread, has its own set of timers, does its own network I/O using asio. From a testability/design perspective, it seems more clean to me, since no component can interfere with another, but I stand to be corrected. I suppose I could rewrite everything passing in the io service as a parameter, but currently haven't found the need to do so.
So coming back to your question, you can do whatever you want, IMO it's more a case of try it out and change it if you run into any issues.
Also, you might want to take a look at what Sam Miller pointed out in a different post WRT handling user input ( that is if you're using a console): https://stackoverflow.com/questions/5210796/boost-asio-how-to-write-console-server