Scheduling huge number of threads so only 4 are executed in parallel

Scheduling huge number of threads so only 4 are executed in parallel - c++

As already stated in the title I have a large number of threads (probably much higher than 100) that are rather saving a program state than running. I want only few of them (enough to use all physical processors) to really run concurrent and the rest should wait until one of the running is blocked. When this happens a new one should be running.
Is it possible to achieve this with pthreads for example with the pthread scheduling functions? How would you do this?
Regards,
Nobody
EDIT
More Information:
Each thread fetches a job from the taskpool on its own and goes on to a certain point.
I need 100 threads to gather at that certain point of program execution that cannot be calculated in parallel. When the calculation is done the threads should be awakened and go on. To make this efficient I have to avoid the scheduler from wasting time on switching between 100 threads instead of 4.

Just use a semaphore with initial count of 4?
http://pubs.opengroup.org/onlinepubs/9699919799/functions/sem_init.html

You could always launch 4 at a time, assigning them to a thread group, then waiting with a join all on the thread group. But I think more information is needed to develop a really useful answer.

Initialize a global variable to the number of threads to run concurrently.
When a thread wants to do work it obtains a slot. Using a mutex and condition variable, it waits until slots_available > 1. It then decrements slots_available releases the mutex and proceeds with its work.
When a thread has completed its work, it releases the slot by locking the mutex and incrementing slots_available. It signals all threads waiting on the condition variable so they can wake and see if slots_available > 1.
See https://computing.llnl.gov/tutorials/pthreads/#Mutexes for specific pthread library calls to use for the above.

I don't know how to do this with pthread functions, but I do have an idea:
I would implement this by adding some intelligence to the threadpool/taskpool to count the number of active threads and only make 4 - number of active threads available at any one time. This could be done by having an idle queue, a ready queue, and an active queue (or just active count). Tasks would grab from the ready queue, and the threadpool would only migrate tasks from the idle queue to the ready queue conditionally.

Related

Threading in an endless C++ program

I have a web interface where the user submits some data and it gets written to a database. In the background there is a C++ program which periodically checks the database for new entries. It then takes these entries, processes them and writes their result to a directory. It then proceeds to sleep and keep checking for new entries to process.
My question is in regards to adding multithreading to the C++ program. I have read that it's generally a bad idea just to create a new thread every time you need a another job done, but rather add the jobs to a queue and disperse them out to a fixed number of threads that have already been created (say, 5 or so). Is this the proper design route to take for my situation? Also, if I understand pthread_join correctly, I don't actually need to call it because I don't want to wait for all of the jobs to finish before continuing to check for new updates to the database.
I just wanted to make sure I'm headed in the right direction, any affirmations/criticisms/resources?

You should first decide whether you even need more than one thread - it sounds like checking the database and writing files at some given interval can be accomplished using only one thread. Multiple threads would become useful when you start having to write different data to multiple files simultaneously at non-regular intervals. You are correct that using a queue of sorts would be the best way to distribute these 'jobs' to your threads, and that using a thread pool will give you a little more control over how many 'jobs' you want running simultaneously at any given time. The pthread_join method is used when you want to make sure one thread doesn't exit before another - I've used this mostly to make sure that the program's initial thread doesn't exit after creating the thread pool, as when the parent thread exits the program's execution stops. Some psuedo code based on my comments below.
main thread:
spawn child threads
while(some exit condition){
check database for new jobs
if(new jobs){
acquire job queue mutex //mutexes ensures only one thread accesses shared
add job to queue //data at a time
signal on shared condition variable
release job queue mutex
}
sleep(some regular duration)
}
child thread:
while(some exit condition){
acquire job queue mutex
if(job queue's size == 0){
wait on the shared condition variable
}
grab job from queue
release job queue mutex
handle job
}
See here for pthread/mutex/CV usage notes.

In my experience creating a thread will most likely take tens of milliseconds. For your days computers this is not a big deal. Nothing bad will happen if it will be created/destroyed often. Looking for simple and flawless app level design might be more important.
As a possible variant, I would recommend considering a pool of threads, one thread per available CPU core. These threads should simply sleep at the end of the loop and regularly check if there is something to do or not.
This simplistic design will add minimal overhead and allow using all available CPU power at the same time.
My 2 cents.

sleeping a thread in the middle of execution

What happens when a thread is put to sleep by other thread, possible by main thread, in the middle of its execution?
assuming I've a function Producer. What if Consumer sleep()s the Producer in the middle of production of one unit ?
Suppose the unit is half produced. and then its put on sleep(). The integrity of system may be in a problem

The thread that sleep is invoked on is put in the idle queue by the thread scheduler and is context switched out of the CPU it is running on, so other threads can take it's place.
All context (registers, stack pointer, base pointer, etc) are saved on the thread stack, so when it's run next time, it can continue from where it left off.
The OS is constantly doing context switches between threads in order to make your system seem like it's doing multiple things. The OS thread scheduler algorithm takes care of that.
Thread scheduling and threading is a big subject, if you want to really understand it, I suggest you start reading up on it. :)
EDIT: Using sleep for thread synchronization purposes not advised, you should use proper synchronization mechanisms to tell the thread to wait for other threads, etc.

There is no problem associated with this, unless some state is mutated while the thread sleeps, so it wakes up with a different set of values than before going to sleep.
Threads are switched in and out of execution by the CPU all the time, but that does not affect the overall outcome of their execution, assuming no data races or other bugs are present.

It would be unadvisable for one thread to forcibly and synchronously interfere with the execution of another thread. One thread could send an asynchronous message to another requesting that it reschedule itself in some way, but that would be handled by the other thread when it was in a suitable state to do so.
Assuming they communicate using channels that are thread-safe, nothing bad shoudl happen, as the sleeping thread will wake up eventually and grab data from its task queue or see that some semaphore has been set and read the prodced data.
If the threads communicate using nonvolatile variables or direct function calls that change state, that's when Bad Things occur.

I don't know of a way for a thread to forcibly cause another thread to sleep. If two threads are accessing a shared resource (like an input/output queue, which seems likely for you Produce/Consumer example), then both threads may contend for the same lock. The losing thread must wait for the other thread to release the lock if the contention is not the "trylock" variety. The thread that waits is placed into a waiting queue associated with the lock, and is removed from the schedulers run queue. When the winning thread releases the lock, the code checks the queue to see if there are threads still waiting to acquire it. If there are, one is chosen as the winner and is given the lock, and placed in the scheduler run queue.

What happens when pthreads wait in mutex_lock/cond_wait?

I have a program that should get the maximum out of my cpu.
It is multithreaded via pthreads that do their job well apart from the fact that they "only" get my cores to about 60% load which is not enough in my opinion.
I am searching for the reason and am asking myself (and hereby you) if the blocking functions mutex_lock/cond_wait are candidates?
What happens when a thread cannot run on in such a function?
Does pthread switch to another thread it handles or
does the thread yield its time to the system and if the latter is the case, can I change this behavior?
Regards,
Nobody
More Information
The setting is one mainthread that fills the taskpool and countless workers that fetch jobs from there and wait on a conditional that is signaled via broadcast when a serialized calculation is done. They go on with the values from this calculation until they are done, deliver their mail and fetch the next job...

On a typical modern pthreads implementation, each thread is managed by the kernel not unlike a separate process. Any blocking call like pthread_mutex_lock or pthread_cond_wait (but also, say, read) will yield its time to the system. The system will then find another eligible thread to schedule, whether in your process or another process, and run it.
If your program is only taking 60% of the CPU, it is more likely blocked on I/O than on pthread operations, unless you have done something way too granular with your pthread operations.

If a thread is waiting on a mutex/condition, it doesn't use resources (well, uses just a tiny amount). Whenever the thread enters waiting state, control switches to other threads. When the mutex is released (or condition variable signalled), the thread wakes up and may acquire the mutex (if no other thread grabs it first), and continue to run. If however some other thread acquires the mutex (this can happen if several threads are waiting for it), the thread returns to sleeping state.

C++ Multi-Thread Execution Speed Slow-Down

I am writing a multi-threaded c++ application. When thread A has a very computationally expensive operation to perform, it slows down threads B, C, and D. How can I prevent this?

On windows you can use Sleep(0) to release the remainder of your timeslice for other threads that are waiting.

Hard to tell without seeing code so I can only give you the advice to lower Thread A's priority. This can be done using the SetThreadPriority function.

Note that you can set the thread priorities (SetThreadPriority)
Also, I advice the backgroundworker picks it's work from a queue. The queue can then be used as a way to throttle the calculations:
you can configure how many 'tasks' are taken from the queue for processing in one swoop
you can lock the queue (use semaphores + condition event) so you can temporarily prevent new tasks from being picked up.
you can now distribute the load across more workers (say if thread B, C, D are temporarily idle, they can start to lift the work off thread A; very useful on a Quad-core + desktop)
$0.02

There are a couple of ways:
As RedX suggested, add Sleep(0) in thread A's inner loop to have it yield time more frequently. This is the cheap and lazy solution.
Better would be to change the thread priority. When you call CreateThread, pass CREATE_SUSPENDED so that the thread does not start immediately. Then call SetPriorityClass to set the thread to a lower priority, followed by ResumeThread.

You might also want to look at having your compute-bound thread yield the processor to other threads. See this post for various ways to do this.

Possible frameworks/ideas for thread managment and work allocation in C++

I am developing a C++ application that needs to process large amount of data. I am not in position to partition data so that multi-processes can handle each partition independently. I am hoping to get ideas on frameworks/libraries that can manage threads and work allocation among worker threads.
Manage threads should include at least below functionality.
1. Decide on how many workers threads are required. We may need to provide user-defined function to calculate number of threads.
2. Create required number of threads.
3. Kill/stop unnecessary threads to reduce resource wastage.
4. Monitor healthiness of each worker thread.
Work allocation should include below functionality.
1. Using callback functionality, the library should get a piece of work.
2. Allocate the work to available worker thread.
3. Master/slave configuration or pipeline-of-worker-threads should be possible.
Many thanks in advance.

Your question essentially boils down to "how do I implement a thread pool?"
Writing a good thread pool is tricky. I recommend hunting for a library that already does what you want rather than trying to implement it yourself. Boost has a thread-pool library in the review queue, and both Microsoft's concurrency runtime and Intel's Threading Building Blocks contain thread pools.
With regard to your specific questions, most platforms provide a function to obtain the number of processors. In C++0x this is std::thread::hardware_concurrency(). You can then use this in combination with information about the work to be done to pick a number of worker threads.
Since creating threads is actually quite time consuming on many platforms, and blocked threads do not consume significant resources beyond their stack space and thread info block, I would recommend that you just block worker threads with no work to do on a condition variable or similar synchronization primitive rather than killing them in the first instance. However, if you end up with a large number of idle threads, it may be a signal that your pool has too many threads, and you could reduce the number of waiting threads.
Monitoring the "healthiness" of each thread is tricky, and typically platform dependent. The simplest way is just to check that (a) the thread is still running, and hasn't unexpectedly died, and (b) the thread is processing tasks at an acceptable rate.
The simplest means of allocating work to threads is just to use a single shared job queue: all tasks are added to the queue, and each thread takes a task when it has completed the previous task. A more complex alternative is to have a queue per thread, with a work-stealing scheme that allows a thread to take work from others if it has run out of tasks.
If your threads can submit tasks to the work queue and wait for the results then you need to have a scheme for ensuring that your worker threads do not all get stalled waiting for tasks that have not yet been scheduled. One option is to spawn a new thread when a task gets blocked, and another is to run the not-yet-scheduled task that is blocking a given thread on that thread directly in a recursive manner. There are advantages and disadvantages with both these schemes, and with other alternatives.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js