Threading in an endless C++ program - c++

I have a web interface where the user submits some data and it gets written to a database. In the background there is a C++ program which periodically checks the database for new entries. It then takes these entries, processes them and writes their result to a directory. It then proceeds to sleep and keep checking for new entries to process.
My question is in regards to adding multithreading to the C++ program. I have read that it's generally a bad idea just to create a new thread every time you need a another job done, but rather add the jobs to a queue and disperse them out to a fixed number of threads that have already been created (say, 5 or so). Is this the proper design route to take for my situation? Also, if I understand pthread_join correctly, I don't actually need to call it because I don't want to wait for all of the jobs to finish before continuing to check for new updates to the database.
I just wanted to make sure I'm headed in the right direction, any affirmations/criticisms/resources?

You should first decide whether you even need more than one thread - it sounds like checking the database and writing files at some given interval can be accomplished using only one thread. Multiple threads would become useful when you start having to write different data to multiple files simultaneously at non-regular intervals. You are correct that using a queue of sorts would be the best way to distribute these 'jobs' to your threads, and that using a thread pool will give you a little more control over how many 'jobs' you want running simultaneously at any given time. The pthread_join method is used when you want to make sure one thread doesn't exit before another - I've used this mostly to make sure that the program's initial thread doesn't exit after creating the thread pool, as when the parent thread exits the program's execution stops. Some psuedo code based on my comments below.
main thread:
spawn child threads
while(some exit condition){
check database for new jobs
if(new jobs){
acquire job queue mutex //mutexes ensures only one thread accesses shared
add job to queue //data at a time
signal on shared condition variable
release job queue mutex
}
sleep(some regular duration)
}
child thread:
while(some exit condition){
acquire job queue mutex
if(job queue's size == 0){
wait on the shared condition variable
}
grab job from queue
release job queue mutex
handle job
}
See here for pthread/mutex/CV usage notes.

In my experience creating a thread will most likely take tens of milliseconds. For your days computers this is not a big deal. Nothing bad will happen if it will be created/destroyed often. Looking for simple and flawless app level design might be more important.
As a possible variant, I would recommend considering a pool of threads, one thread per available CPU core. These threads should simply sleep at the end of the loop and regularly check if there is something to do or not.
This simplistic design will add minimal overhead and allow using all available CPU power at the same time.
My 2 cents.

Related

How do I manage thread shutdown in a multithreaded crawler?

Let's say I'm writing a mulithreaded web crawler. Threads get a job (for example, in a form of URL) from a queue, do some work, and then might add some new jobs to the queue. Sounds simple enough, but I'm not sure how to handle the situation where all the jobs are done. Let's say there are currently 0 jobs in the queue, and some thread is trying to get a new job. At this point two situations are possible:
Some other threads are working and might actually produce new jobs for this thread to get. In this case, it is probably possible to just wait for a new task (with a blocking .pop(), if the queue supports it, or just by sleeping and waking up time to time to check if a job is available)
All other threads are also waiting for a job. In this case, no new jobs can be produced, so threads must be terminated.
One solution I can think of is having an integer (behind a mutex), which should serve as a number of "busy" threads - it will be increased when thread gets a job, and decreased once it is finished processing it. This way, if there is 0 jobs and 0 threads working, a thread can safely be terminated. However, I'm not sure it is the best solution possible. Are there any other options to handle such a situation?

Notify caller that a thread has finished

I am trying to use the multithreading features in the C++11 standard library and have the following situation envisioned.
I have a parent class which maintains a queue of thread. So something like:
std::queue<MyMTObject *> _my_threads;
The class MyMTObject contains the std::thread object.
The queue has a fixed size of 5 and the class initially starts with the queue being full.
As I have jobs to process I launch threads and I remove them from the queue. What I would like is to get a notification when the job is finished along with the pointer to the MyMTObject, so that I can reinsert them into the queue and make them available again.
I have basically 2 questions:
1: Is this a sound idea? I know I have not specified specifics but broadly speaking. I will, of course, control all access to the queue with a mutex.
2: Is there a way to implement this notification mechanism without using external libraries like Qt or boost.
For duplicates, I did look on the site but could not find anything that was suitable to manage a collection of threads.
I'm not sure if I need to mention this, but std::thread objects can't be re-used. Generally, the only reason you keep a std::thread reference is to std::thread::join the thread. If you don't plan to join the thread later (e.g. dispatch to threads and wait for completion), it's generally advised to std::thread::detach it.
If you're trying to keep threads for a thread pool, it's probably easier to have each thread block on the std::queue and pull objects from the queue to work on. This is relatively easy to implement using a std::mutex and a std::condition_variable. It generally gives good throughput, but to get finer control over scheduling you can do things like keep a seperate std::queue for each thread.
Detaching the threads and creating a work queue also has the added benefit that it avoids redundantly requesting the operating system create new threads which adds overhead and increases overall resource usage.
You could try to deploy some version of Reactor pattern I think. So, you could start one additional control thread that cleans after these workers. Now, you create a ThreadSafeQueue that will be used to communicate events from worker threads to control thread. This queue should be implemented in such a way that you can select on it and wait for any activity on the other end (some thread terminates and calls queue.push for example).
All in all I think it's quite elegant solution. I does add an overhead of an additional thread, but this thread will be mostly sleeping and waking up only once a while to clean up after the worker.
There is no elegant way to do this in Posix, and C++ threading model is almost a thin wrapper on Posix.
You can join a specific thread (one at a time), or you can wait on futures - again, one future at a time.
The best you can do to avoid looping is to employ a conditional variable, and make all threads singal on it (as well as indicating which one just exited by setting some sort of per-thread flag) just before they are about to exit. The 'reaper' would notice the signal and check the flags.
The issue is that this solution requires thread cooperation. But I know not of any better.

Scheduling huge number of threads so only 4 are executed in parallel

As already stated in the title I have a large number of threads (probably much higher than 100) that are rather saving a program state than running. I want only few of them (enough to use all physical processors) to really run concurrent and the rest should wait until one of the running is blocked. When this happens a new one should be running.
Is it possible to achieve this with pthreads for example with the pthread scheduling functions? How would you do this?
Regards,
Nobody
EDIT
More Information:
Each thread fetches a job from the taskpool on its own and goes on to a certain point.
I need 100 threads to gather at that certain point of program execution that cannot be calculated in parallel. When the calculation is done the threads should be awakened and go on. To make this efficient I have to avoid the scheduler from wasting time on switching between 100 threads instead of 4.
Just use a semaphore with initial count of 4?
http://pubs.opengroup.org/onlinepubs/9699919799/functions/sem_init.html
You could always launch 4 at a time, assigning them to a thread group, then waiting with a join all on the thread group. But I think more information is needed to develop a really useful answer.
Initialize a global variable to the number of threads to run concurrently.
When a thread wants to do work it obtains a slot. Using a mutex and condition variable, it waits until slots_available > 1. It then decrements slots_available releases the mutex and proceeds with its work.
When a thread has completed its work, it releases the slot by locking the mutex and incrementing slots_available. It signals all threads waiting on the condition variable so they can wake and see if slots_available > 1.
See https://computing.llnl.gov/tutorials/pthreads/#Mutexes for specific pthread library calls to use for the above.
I don't know how to do this with pthread functions, but I do have an idea:
I would implement this by adding some intelligence to the threadpool/taskpool to count the number of active threads and only make 4 - number of active threads available at any one time. This could be done by having an idle queue, a ready queue, and an active queue (or just active count). Tasks would grab from the ready queue, and the threadpool would only migrate tasks from the idle queue to the ready queue conditionally.

Possible frameworks/ideas for thread managment and work allocation in C++

I am developing a C++ application that needs to process large amount of data. I am not in position to partition data so that multi-processes can handle each partition independently. I am hoping to get ideas on frameworks/libraries that can manage threads and work allocation among worker threads.
Manage threads should include at least below functionality.
1. Decide on how many workers threads are required. We may need to provide user-defined function to calculate number of threads.
2. Create required number of threads.
3. Kill/stop unnecessary threads to reduce resource wastage.
4. Monitor healthiness of each worker thread.
Work allocation should include below functionality.
1. Using callback functionality, the library should get a piece of work.
2. Allocate the work to available worker thread.
3. Master/slave configuration or pipeline-of-worker-threads should be possible.
Many thanks in advance.
Your question essentially boils down to "how do I implement a thread pool?"
Writing a good thread pool is tricky. I recommend hunting for a library that already does what you want rather than trying to implement it yourself. Boost has a thread-pool library in the review queue, and both Microsoft's concurrency runtime and Intel's Threading Building Blocks contain thread pools.
With regard to your specific questions, most platforms provide a function to obtain the number of processors. In C++0x this is std::thread::hardware_concurrency(). You can then use this in combination with information about the work to be done to pick a number of worker threads.
Since creating threads is actually quite time consuming on many platforms, and blocked threads do not consume significant resources beyond their stack space and thread info block, I would recommend that you just block worker threads with no work to do on a condition variable or similar synchronization primitive rather than killing them in the first instance. However, if you end up with a large number of idle threads, it may be a signal that your pool has too many threads, and you could reduce the number of waiting threads.
Monitoring the "healthiness" of each thread is tricky, and typically platform dependent. The simplest way is just to check that (a) the thread is still running, and hasn't unexpectedly died, and (b) the thread is processing tasks at an acceptable rate.
The simplest means of allocating work to threads is just to use a single shared job queue: all tasks are added to the queue, and each thread takes a task when it has completed the previous task. A more complex alternative is to have a queue per thread, with a work-stealing scheme that allows a thread to take work from others if it has run out of tasks.
If your threads can submit tasks to the work queue and wait for the results then you need to have a scheme for ensuring that your worker threads do not all get stalled waiting for tasks that have not yet been scheduled. One option is to spawn a new thread when a task gets blocked, and another is to run the not-yet-scheduled task that is blocking a given thread on that thread directly in a recursive manner. There are advantages and disadvantages with both these schemes, and with other alternatives.

How can I improve my real-time behavior in multi-threaded app using pthreads and condition variables?

I have a multi-threaded application that is using pthreads. I have a mutex() lock and condition variables(). There are two threads, one thread is producing data for the second thread, a worker, which is trying to process the produced data in a real time fashion such that one chuck is processed as close to the elapsing of a fixed time period as possible.
This works pretty well, however, occasionally when the producer thread releases the condition upon which the worker is waiting, a delay of up to almost a whole second is seen before the worker thread gets control and executes again.
I know this because right before the producer releases the condition upon which the worker is waiting, it does a chuck of processing for the worker if it is time to process another chuck, then immediately upon receiving the condition in the worker thread, it also does a chuck of processing if it is time to process another chuck.
In this later case, I am seeing that I am late processing the chuck many times. I'd like to eliminate this lost efficiency and do what I can to keep the chucks ticking away as close to possible to the desired frequency.
Is there anything I can do to reduce the delay between the release condition from the producer and the detection that that condition is released such that the worker resumes processing? For example, would it help for the producer to call something to force itself to be context switched out?
Bottom line is the worker has to wait each time it asks the producer to create work for itself so that the producer can muck with the worker's data structures before telling the worker it is ready to run in parallel again. This period of exclusive access by the producer is meant to be short, but during this period, I am also checking for real-time work to be done by the producer on behalf of the worker while the producer has exclusive access. Somehow my hand off back to running in parallel again results in significant delay occasionally that I would like to avoid. Please suggest how this might be best accomplished.
I could suggest the following pattern. Generally the same technique could be used, e.g. when prebuffering frames in some real-time renderers or something like that.
First, it's obvious that approach that you describe in your message would only be effective if both of your threads are loaded equally (or almost equally) all the time. If not, multi-threading would actually benefit in your situation.
Now, let's think about a thread pattern that would be optimal for your problem. Assume we have a yielding and a processing thread. First of them prepares chunks of data to process, the second makes processing and stores the processing result somewhere (not actually important).
The effective way to make these threads work together is the proper yielding mechanism. Your yielding thread should simply add data to some shared buffer and shouldn't actually care about what would happen with that data. And, well, your buffer could be implemented as a simple FIFO queue. This means that your yielding thread should prepare data to process and make a PUSH call to your queue:
X = PREPARE_DATA()
BUFFER.LOCK()
BUFFER.PUSH(X)
BUFFER.UNLOCK()
Now, the processing thread. It's behaviour should be described this way (you should probably add some artificial delay like SLEEP(X) between calls to EMPTY)
IF !EMPTY(BUFFER) PROCESS(BUFFER.TOP)
The important moment here is what should your processing thread do with processed data. The obvious approach means making a POP call after the data is processed, but you will probably want to come with some better idea. Anyway, in my variant this would look like
// After data is processed
BUFFER.LOCK()
BUFFER.POP()
BUFFER.UNLOCK()
Note that locking operations in yielding and processing threads shouldn't actually impact your performance because they are only called once per chunk of data.
Now, the interesting part. As I wrote at the beginning, this approach would only be effective if threads act somewhat the same in terms of CPU / Resource usage. There is a way to make these threading solution effective even if this condition is not constantly true and matters on some other runtime conditions.
This way means creating another thread that is called controller thread. This thread would merely compare the time that each thread uses to process one chunk of data and balance the thread priorities accordingly. Actually, we don't have to "compare the time", the controller thread could simply work the way like:
IF BUFFER.SIZE() > T
DECREASE_PRIORITY(YIELDING_THREAD)
INCREASE_PRIORITY(PROCESSING_THREAD)
Of course, you could implement some better heuristics here but the approach with controller thread should be clear.