Boost's documentation says: it is important to give the io_service some work to do before calling boost::asio::io_service::run(). But what happen if I give some work to do and my io_service object run method is running onto multiple threads? Should I give 1 work per thread, to prevent others to finish? Or I may start io's run on many threads and give only 1 work to do. I wish to mention, the word 'work' in my question DOES NOT refer to io_service::work::work.
The io_service's work state is not determined by the amount of threads processing the io_service. For example, if an io_service has work, all threads processing the io_service via io_service::run() will remain blocked processing the event loop, even if the amount of threads is greater than the amount of posted work. Therefore, it is safe to add a single work operation to an io_service, then have many threads process the io_service.
Overall, unless concurrency is specifically hinted in the io_service constructor, the io_service does not make a distinction between its event loop being processed by a single thread or multiple threads. As noted in the threads overview, an io_service will treat all threads that have joined its pool as being equivalent, distributing work across threads in an arbitrary fashion.
Related
I have hard time finding out how exactly does thread pool built with boost::asio::io_service behave.
The documentation says:
Multiple threads may call the run() function to set up a pool of
threads from which the io_service may execute handlers. All threads
that are waiting in the pool are equivalent and the io_service may
choose any one of them to invoke a handler.
I would imagine, that when threads executing run() are taking a handler to execute, they execute it, and then come back to wait for next handlers to execute. When executing a handler, a thread is not considered waiting, and hence no new handlers to execute are assigned to it. Is that correct? Or does io_service assign work to threads, without considering whether these are busy or not?
I am asking, because in one project that we are using (OSRM), that uses boost::asio::io_service based thread pool to handle incoming HTTP requests, I noticed that long running request, sometimes block other, fast requests, even though more threads and cores are available.
When executing a handler, a thread is not considered waiting, and hence no new handlers to execute are assigned to it. Is that correct?
Yes. It's a pull model queue.
A notable "apparent" exception is when strands are used: handlers wrapped on a on a strand do synchronize with other handlers running on that same strand.
I'm writing a thread pool class in C++ which receives tasks to be executed in parallel. I want all cores to be busy, if possible, but sometimes some threads are idle because they are blocked for a time for synchronization purposes. When this happens I would like to start a new thread, so that there are always approximately as many threads awake as there are cpu cores. For this purpose I need a way to find out whether a certain thread is awake or sleeping (blocked). How can I find this out?
I'd prefer to use the C++11 standard library or boost for portability purposes. But if necessary I would also use WinAPI. I'm using Visual Studio 2012 on Windows 7. But really, I'd like to have a portable way of doing this.
Preferably this thread-pool should be able to master cases like
MyThreadPool pool;
for ( int i = 0; i < 100; ++i )
pool.addTask( &block_until_this_function_has_been_called_a_hundred_times );
pool.join(); // waits until all tasks have been dispatched.
where the function block_until_this_function_has_been_called_a_hundred_times() blocks until 100 threads have called it. At this time all threads should continue running. One requirement for the thread-pool is that it should not deadlock because of a too low number of threads in the pool.
Add a facility to your thread pool for a thread to say "I'm blocked" and then "I'm no longer blocked". Before every significant blocking action (see below for what I mean by that) signal "I'm blocked", and then "I'm no longer blocked" afterwards.
What constitutes a "significant blocking action"? Certainly not a simple mutex lock: mutexes should only be held for a short period of time, so blocking on a mutex is not a big deal. I mean things like:
Waiting for I/O to complete
Waiting for another pool task to complete
Waiting for data on a shared queue
and other similar events.
Use Boost Asio. It has its own thread pool management and scheduling framework. The basic idea is to push tasks to the io_service object using the post() method, and call run() from as many threads as many CPU cores you have. You should create a work object while the calculation is running to avoid the threads from exiting if they don't have enough jobs.
The important thing about Asio is never to use any blocking calls. For I/O calls, use the asynchronous calls of Asio's own I/O objects. For synchronization, use strand objects instead of mutexes. If you post functions to the io service that is wrapped in a strand, then it ensures that at any time at most one task runs that belongs to a certain strand. If there is a conflict, the task remains in Asio's event queue instead of blocking a working thread.
There is one drawback of using asynchronous programming though. It is much harder to read a code that is scattered into several asynchronous calls than one with a clear control flow. You should be aware of this when designing your program.
I need a threadpool for my application, and I'd like to rely on standard (C++11 or boost) stuff as much as possible. I realize there is an unofficial(!) boost thread pool class, which basically solves what I need, however I'd rather avoid it because it is not in the boost library itself -- why is it still not in the core library after so many years?
In some posts on this page and elsewhere, people suggested using boost::asio to achieve a threadpool like behavior. At first sight, that looked like what I wanted to do, however I found out that all implementations I have seen have no means to join on the currently active tasks, which makes it useless for my application. To perform a join, they send stop signal to all the threads and subsequently join them. However, that completely nullifies the advantage of threadpools in my use case, because that makes new tasks require the creation of a new thread.
What I want to do is:
ThreadPool pool(4);
for (...)
{
for (int i=0;i<something;i++)
pool.pushTask(...);
pool.join();
// do something with the results
}
Can anyone suggest a solution (except for using the existing unofficial thread pool on sourceforge)? Is there anything in C++11 or core boost that can help me here?
At first sight, that looked like what I wanted to do, however I found out that all implementations I have seen have no means to join on the currently active tasks, which makes it useless for my application. To perform a join, they send stop signal to all the threads and subsequently join them. However, that completely nullifies the advantage of threadpools in my use case, because that makes new tasks require the creation of a new thread.
I think you might have misunderstood the asio example:
IIRC (and it's been a while) each thread running in the thread pool has called io_service::run which means that effectively each thread has an event loop and a scheduler. To then get asio to complete tasks you post tasks to the io_service using the io_service::post method and asio's scheduling mechanism takes care of the rest. As long as you don't call io_service::stop, the thread pool will continue running using as many threads as you started running (assuming that each thread has work to do or has been assigned a io_service::work object).
So you don't need to create new threads for new tasks, that would go against the concept of a threadpool.
Have each task class derive from a Task that has an 'OnCompletion(task)' method/event. The threadpool threads can then call that after calling the main run() method of the task.
Waiting for a single task to complete is then easy. The OnCompletion() can perform whatever is required to signal the originating thread, signaling a condvar, queueing the task to a producer-consumer queue, calling SendMessage/PostMessage API's, Invoke/BeginInvoke, whatever.
If an oringinating thread needs to wait for several tasks to all complete, you could extend the above and issue a single 'Wait task' to the pool. The wait task has its own OnCompletion to communicate the completion of other tasks and has a thread-safe 'task counter', (atomic ops or lock), set to the number of 'main' tasks to be issued. The wait task is issued to the pool first and the thread that runs it waits on a private 'allDone' condvar in the wait task. The 'main' tasks are then issued to the pool with their OnCompletion set to call a method of the wait task that decrements the task counter towards zero. When the task counter reaches zero, the thread that achieves this signals the allDone condvar. The wait task OnCompletion then runs and so signals the completion of all the main tasks.
Such a mechansism does not require the continual create/terminate/join/delete of threadpool threads, places no restriction on how the originating task needs to be signaled and you can issue as many such task-groups as you wish. You should note, however, that each wait task blocks one threadpool thread, so make sure you create a few extra threads in the pool, (not usually any problem).
This seems like a job for boost::futures. The example in the docs seems to demonstrate exactly what you're looking to do.
Joining a thread mean stop for it until it stop, and if it stop and you want to assign a new task to it, you must create a new thread. So in your case you should wait for a condition (for example boost::condition_variable) to indicate end of tasks. So using this technique it is very easy to implement it using boost::asio and boost::condition_variable. Each thread call boost::asio::io_service::run and tasks will be scheduled and executed on different threads and at the end, each task will set a boost::condition_variable or event decrement a std::atomic to indicate end of the job! that's really easy, isn't it?
Since the strand will not be executed concurrently, what is the difference in performance between strand and single thread? Moreover, a lock is not necessary to protect the share data in the handler of post function, right?
suppose an application performance several jobs, below is some sample code.
strand.post(boost::bind(&onJob, this, job1));
void onJob(tJobType oType)
{
if (oType == job1)
// do something
else if(oType == job2)
// do something
}
Edit: I try to measure the latency from post and calling onJob is quite high. I would like to know if there is any way to reduce it
A strand will typically perform better than a single thread. This is because a strand gives the scheduler and the program logic more flexibility. However, the differences are typically not significant (except in the special case I discuss below).
For example, consider the case where something happens that requires service. With a strand, there can be more than one thread that could perform the service, and whichever of those threads gets scheduled first will do the job. With a thread, that very thread must get scheduled for the job to start.
Suppose, for example, a timer fires that creates some new work to be done by the strand. If the timer thread then calls into the strand's dispatch routine, the timer thread can do the work with no context switch. If you had a dedicated thread rather than a strand, then the timer thread could not do the work, and a context switch would be needed before the work created by the timer routine could even begin.
Note that if you just have one thread that executes the strand, you don't get these benefits. (But, IMO, that's a dumb way to do things if you care about performance at this fine a level.)
For some applications, carefully breaking your program into strands can significantly reduce the amount of lock operations required. Objects that are only accessed in a single strand need not be locked. But you can still get a lot of the advantages of multi-threading. (One big disadvantage though -- if any of your code ever blocks, it will stall the entire strand. So you either have to not mind if a strand stalls or make sure none of your code for a critical strand ever blocks.)
In this case, you can have three strands, A, B, and C, and a single thread can do some work for strand A, some for strand B, and some for strand C with no context switches (and with the data hot in the cache). Using a thread for each task would require two context switches to do the same job, and each task would likely not find the data in cache. If you constantly "hand things" from strand to strand, strands can significantly outperform dedicated threads.
As to your second question, a lock is not needed unless data is being accessed in one thread while it could possibly be being modified in another thread. If all accesses to an object are through a single strand, locks are not needed because a strand can only execute in one thread at a time. Typically, strands will access some data that is only accessed by that strand and some that is shared with other threads or strands.
I've begun using Boost.ASIO for some simple network programming, my understanding of the library is not a great deal, so please bear with me and my newbie question.
At the moment in my project I only have 1 io_service object. Which use for all the async I/O operations etc.
My understanding is that one can create multiple threads and pass the run method of an io_service instance to the thread to provide more threads to the io_service.
My question: Is it good design to have multiple io_service objects? say for example have 2 distinct io_service instances, each with 2 threads associated, do they somehow know about each other (and hence cooperate with each), or if not would they negatively affect each other?
My intention is to have 1 io_service for socket based I/O and another for serial based (tty) I/O.
We use multiple io_service's because some of the components in our application need to run all their worker threads at certain fixed priorities, different for each component. Thus each component is given its own io_service, and each component has its own pool of threads executing run().
Other designs I could think of would be if a different number of threads in the pool is required for each IO, or, more relevant to your case, is if the pool cannot be shared because, for example, if your network IO can take out every thread and leave your serial IO waiting.
IIRC, during Michael Caisse's Boostcon ASIO talk (which is worth watching anyway), I believe this question is explicitly asked by an audience member and ok'd as a potential solution. I take from that that it's not wrong per se, and can be used that way according to your design.
This discussion may be enlightening:
http://thread.gmane.org/gmane.comp.lib.boost.asio.user/1300
I don't have the code right here, but why would you use multiple io_services?
I thought it used one io_service and multiple threads executing run on
that one io_service.
IIUC, each io_service owns a select/epoll/whatever queue, so having multiple
io_services is akin to having multiple independent select/epoll loops. In some
situations, eg. large numbers of sockets and multiple CPUs, this might help.
Something I'm less sure about is with multiple threads all running
io_service::run (with the same io_service). I think this just means the
handlers are run concurrently, while the select/epoll/etc. loop is 'shared'.
I think this is best for when your handlers are relatively long-running
operations.