I am writing a multi-threaded c++ application. When thread A has a very computationally expensive operation to perform, it slows down threads B, C, and D. How can I prevent this?
On windows you can use Sleep(0) to release the remainder of your timeslice for other threads that are waiting.
Hard to tell without seeing code so I can only give you the advice to lower Thread A's priority. This can be done using the SetThreadPriority function.
Note that you can set the thread priorities (SetThreadPriority)
Also, I advice the backgroundworker picks it's work from a queue. The queue can then be used as a way to throttle the calculations:
you can configure how many 'tasks' are taken from the queue for processing in one swoop
you can lock the queue (use semaphores + condition event) so you can temporarily prevent new tasks from being picked up.
you can now distribute the load across more workers (say if thread B, C, D are temporarily idle, they can start to lift the work off thread A; very useful on a Quad-core + desktop)
$0.02
There are a couple of ways:
As RedX suggested, add Sleep(0) in thread A's inner loop to have it yield time more frequently. This is the cheap and lazy solution.
Better would be to change the thread priority. When you call CreateThread, pass CREATE_SUSPENDED so that the thread does not start immediately. Then call SetPriorityClass to set the thread to a lower priority, followed by ResumeThread.
You might also want to look at having your compute-bound thread yield the processor to other threads. See this post for various ways to do this.
Related
In a multi threaded app, is
while (result->Status == Result::InProgress) Sleep(50);
//process results
better than
while (result->Status == Result::InProgress);
//process results
?
By that, I'm asking will the first method be polite to other threads while waiting for results rather than spinning constantly? The operation I'm waiting for usually takes about 1-2 seconds and is on a different thread.
I would suggest using semaphores for such case instead of polling. If you prefer active waiting, the sleep is much better solution than evaluating the loop condition constantly.
It's better, but not by much.
As long as result->Status is not volatile, the compiler is allowed to reduce
while(result->Status == Result::InProgress);
to
if(result->Status == Result::InProgress) for(;;) ;
as the condition does not change inside the loop.
Calling the external (and hence implicitly volatile) function Sleep changes this, because this may modify the result structure, unless the compiler is aware that Sleep never modifies data. Thus, depending on the compiler, the second implementation is a lot less likely to go into an endless loop.
There is also no guarantee that accesses to result->Status will be atomic. For specific memory layouts and processor architectures, reading and writing this variable may consist of multiple steps, which means that the scheduler may decide to step in in the middle.
As all you are communicating at this point is a simple yes/no, and the receiving thread should also wait on a negative reply, the best way is to use the appropriate thread synchronisation primitive provided by your OS that achieves this effect. This has the advantage that your thread is woken up immediately when the condition changes, and that it uses no CPU in the meantime as the OS is aware what your thread is waiting for.
On Windows, use CreateEvent and co. to communicate using an event object; on Unix, use a pthread_cond_t object.
Yes, sleep and variants give up the processor. Other threads can take over. But there are better ways to wait on other threads.
Don't use the empty loop.
That depends on your OS scheduling policy too.For example Linux has CFS schedular by default and with that it will fairly distribute the processor to all the tasks. But if you make this thread as real time thread with FIFO policy then code without sleep will never relenquish the processor untill and unless a higher priority thread comes, same priority or lower will never get scheduled untill you break from the loop. if you apply SCHED_RR then processes of same priority and higher will get scheduled but not lower.
I'm using pthread on Linux. I have a circular buffer to pass data from one thread to another. Maybe the circular buffer is not the best structure to use here, but changing that would not make my problem go away, so we'll just refer it as a queue.
Whenever my queue is either full or empty, pop/push operations return NULL. This is problematic since my threads fire periodically. Waiting for another thread loop would take too long.
I've tried using semaphores (sem_post, sem_wait) but unlocking under contention takes up to 25 ms, which is about the speed of my loop. I've tried waiting with pthread_cond_t, but the unlocking takes up to between 10 and 15 ms.
Is there a faster mechanism I could use to wait for data?
EDIT*
Ok I used condition variables. I'm on an embedded device so adding "more cores or cpu power" is not an option. This made me realise I had all sorts of thread priorities set all over the place so I'll sort this out before going further
You should use condition variables. The only faster ways are platform-specific, and they're only negligibly faster.
You're seeing what you think is poor performance simply because your threads are being de-scheduled. You're seeing long "delays" when your thread is near the end of its timeslice and the scheduler allows the unblocked thread to pre-empt the running thread. If you have more cores than threads or set your thread to a higher priority, you won't see these delays.
But these delays are actually a good thing, and you shouldn't be concerned about them. Other threads just get a chance to run too.
As already stated in the title I have a large number of threads (probably much higher than 100) that are rather saving a program state than running. I want only few of them (enough to use all physical processors) to really run concurrent and the rest should wait until one of the running is blocked. When this happens a new one should be running.
Is it possible to achieve this with pthreads for example with the pthread scheduling functions? How would you do this?
Regards,
Nobody
EDIT
More Information:
Each thread fetches a job from the taskpool on its own and goes on to a certain point.
I need 100 threads to gather at that certain point of program execution that cannot be calculated in parallel. When the calculation is done the threads should be awakened and go on. To make this efficient I have to avoid the scheduler from wasting time on switching between 100 threads instead of 4.
Just use a semaphore with initial count of 4?
http://pubs.opengroup.org/onlinepubs/9699919799/functions/sem_init.html
You could always launch 4 at a time, assigning them to a thread group, then waiting with a join all on the thread group. But I think more information is needed to develop a really useful answer.
Initialize a global variable to the number of threads to run concurrently.
When a thread wants to do work it obtains a slot. Using a mutex and condition variable, it waits until slots_available > 1. It then decrements slots_available releases the mutex and proceeds with its work.
When a thread has completed its work, it releases the slot by locking the mutex and incrementing slots_available. It signals all threads waiting on the condition variable so they can wake and see if slots_available > 1.
See https://computing.llnl.gov/tutorials/pthreads/#Mutexes for specific pthread library calls to use for the above.
I don't know how to do this with pthread functions, but I do have an idea:
I would implement this by adding some intelligence to the threadpool/taskpool to count the number of active threads and only make 4 - number of active threads available at any one time. This could be done by having an idle queue, a ready queue, and an active queue (or just active count). Tasks would grab from the ready queue, and the threadpool would only migrate tasks from the idle queue to the ready queue conditionally.
I am trying to speed up a piece of code by having background threads already setup to solve one specific task. When it is time to solve my task I would like to wake up these threads, do the job and block them again waiting for the next task. The task is always the same.
I tried using condition variables (and mutex that need to go with them), but I ended up slowing my code down instead of speeding it up; mostly it happened because the calls to all needed functions are very expensive (pthread_cond_wait/pthread_cond_signal/pthread_mutex_lock/pthread_mutex_unlock).
There is no point in using a thread pool (that I don't have either) because it is a too generic construct; here I want to address only my specific task. Depending on the implementation I would also pay a performance penalty for the queue.
Do you have any suggestion for a quick wake-up without using mutex or con_var?
I was thinking in setup threads like timers reading an atomic variable; if the variable is set to 1 the threads will do the job; if it is set to 0 they will go to sleep for few microseconds (I would start with microsecond sleep since I would like to avoid using spinlocks that might be too expensive for the CPU). What do you think about it? Any suggestion is very appreciated.
I am using Linux, gcc, C and C++.
These functions should be fast. If they are taking a large fraction of your time, it is quite possible that you are trying to switch threads too often.
Try buffering up a work queue, and send the signal once a significant amount of work has accumulated.
If this is impossible due to dependencies between the tasks, then your application is not amenable to multithreading at all.
In order to gain performance in a multithreaded application, spawn as many threads as there are CPUs, not a separate thread for each task. Otherwise you end up with a lot of overhead from context switching.
You may also consider making your algorithm more linear (i.e. by using non-blocking calls).
I have a program with a main thread and a diagnostics thread. The main thread is basically a while(1) loop that performs various tasks. One of these tasks is to provide a diagnostics engine with information about the system and then check back later (i.e. in the next loop) to see if there are any problems that should be dealt with. An iteration of the main loop should take no longer than 0.1 seconds. If all is well, then the diagnostic engine takes almost no time to come back with an answer. However, if there is a problem, the diagnostic engine can take seconds to isolate the problem. For this reason each time the diagnostic engine receives new information it spins up a new diagnostics thread.
The problem we're having is that the diagnostics thread is stealing time away from the main thread. Effectively, even though we have two threads, the main thread is not able to run as often as I would like because the diagnostic thread is still spinning.
Using Boost threads, is it possible to limit the amount of time that a thread can run before moving on to another thread? Also of importance here is that the diagnostic algorithm we are using is blackbox, so we can't put any threading code inside of it. Thanks!
If you run multiple threads they will indeed consume CPU time. If you only have a single processor, and one thread is doing processor intensive work then that thread will slow down the work done on other threads. If you use OS-specific facilities to change the thread priority then you can make the diagnostic thread have a lower priority than the main thread. Also, you mention that the diagnostic thread is "spinning". Do you mean it literally has the equivalent of a spin-wait like this:
while(!check_done()) ; // loop until done
If so, I would strongly suggest that you try and avoid such a busy-wait, as it will consume CPU time without achieving anything.
However, though multiple threads can cause each other to slow-down, if you are seeing an actual delay of several seconds this would suggest there is another problem, and that the main thread is actually waiting for the diagnostic thread to complete. Check that the call to join() for the diagnostic thread is outside the main loop.
Another possibility is that the diagnostic thread is locking a mutex needed by the main thread loop. Check which mutexes are locked and where.
To really help, I'd need to see some code.
looks like your threads are interlocked, so your main thread waits until background thread finished its work. check any multithreading sychronization that can cause this.
to check that it's nothing related to OS scheduling run you program on double-core system, so both threads can be executed really in parallel
From the way you've worded your question, it appears that you're not quite sure how threads work. I assume by "the amount of time that a thread can run before moving on to another thread" you mean the number of cpu cycles spent per thread. This happens hundreds of thousands of times per second.
Boost.Thread does not have support for thread priorities, although your OS-specific thread API will. However, your problem seems to indicate the necessity for a fundamental redesign -- or at least heavy profiling to find bottlenecks.
You can't do this generally at the OS level, so I doubt boost has anything specific for limiting execution time. You can kinda fake it with small-block operations and waits, but it's not clean.
I would suggest looking into processor affinity, either at a thread or process level (this will be OS-specific). If you can isolate your diagnostic processing to a limited subset of [logical] processors on a multi-core machine, it will give you a very course mechanism to control maximum execution amount relative to the main process. That's the best solution I have found when trying to do a similar type of thing.
Hope that helps.