Thread Pool and Job Queue Architecture? - c++

I have an epoll to receive incoming events and put them into a Job Queue.
When the event is put into Job Queue, pthread conditional signal is sent to wake up the worker threads.
However I've met a problem where all worker threads are busy and Jobs keep stacks up on the queue. This is serious problem because if Jobs are stacked, and new event doesn't come in a while, Jobs in Queue won't be handed to Worker Threads.
As soon as Thread gets available, I wish they can take jobs from the Job Queue automatically (if Possible).
Is there anyway to do this? All I can think of is.. adding a Queue Observer and send a conditional Signal in intervals.
Also, I know that STL Queue is not thread-safe. Does that mean I have to Mutex Lock everytime when I get an access to STL Queue? WOn't this slow down my working process?
Any suggestion to solve this problem will be great.

The classic way of counting/signaling jobs on a producer-consumer queue is to use a semaphore. The producer signals it after pushing on a job and the consumer waits on it before popping one off. You need a mutex around the push/pop as well to protect the queue from multiple access.

Take a look at the work-stealing thread pool in .NET. Yes you'll have to mutex lock the global queue, to that end I've written a double-lock deque so operations on the front / back can be done in parallel. I also have a lock-free deque but the overhead is too high for client-side apps.

Related

Both blocking and non blocking queue

I need to set up a producer-consumer scheme with two threads linked by a queue (the producer pushing tasks into the queue, the consumer executing them as they come).
Since the queue will be empty most of the time I must make it so that the consumer thread can sleep and be woken up as soon as something is pushed by the producer. However I must ensure that the producer is never blocked, not even shortly. In other words, I need some one-sided blocking queue.
There are lock free queues, but since those are by definition, well, lock free, it isn't possible for the consumer thread to be blocked by them.
I have thought of associating a lock free queue with a condition variable. When the consumer thread finds the queue empty it would sleep waiting for the condition to be notified. The producer thread would notify the condition when pushing a task into the queue waking up the consumer thread (if it was sleeping). However, condition variable must be protected by mutex, that mean there is still a small chance for the producer thread to be blocked when trying to acquire it to notify the condition.
I have yet to find a really good way to solve this problem so your ideas are more then welcome.
Note : I'm planning to use boost thread to implement this.
Note 2 : I'm not considering the case where the producer trys to push something and the queue is full. This is never going to happen.
tbb library provides both blocking and non-blocking queues:
tbb::concurrent_queue<> provides non-blocking try_pop() and push() for unlimited growth.
tbb::concurrent_bounded_queue<> provides push() which can block if capacity limit is specified and when it is reached; and pop() which waits for items in empty queue. It also provides non-blocking try_push() and try_pop() alternatives for the same queue.

Signaling between threads

Application design
I have a c++ application that has a producer thread, multiple queues (created during run time) and consumer thread.
Producer thread gets data via Tcp/Ip and puts into the respective queue (for E.g., if data is type A and put into the Queue A).
The consumer thread currently loops the queues from 1 - n to process the data from each queue.
As per the requirement no need to track the queue that is last updated or least. As long as any of the queue is updated, consumer should process from 1 - n queues.
If any of the queues' size is more than the defined limit, producer thread will pop the first item before it inserts the new item (to manage the queue size).
Resource synchronization and signaling between threads:
In this implementation, consumer thread should sleep until there is no queue has data from the listener. consumer thread should wake up only if producer puts data into any one of the queues.
Multiple queues are synchronized between 2 threads using mutex.
Event signaling is implemented between threads to wake up consumer thread whenever producer puts data into any of the queues.
However this way of signaling to wake up consumer thread, it is possible for the consumer to sleep although there is a data in any of the queues.
Issue:
Lets take this scenario, consider the consumer is processing n-th queue's data; at the same time it is possible for the producer to put data into the n-1, n-2 queue and signaling is not effective since the consumer is awake and processing n-th data. Once the consumer completes processing the n-th queue data, it will sleep and the data in n-1, n-2 will not be processes until any further signal is given by the listener.
How we can address this scenario?
People are also advising to use semophore. Is semaphore relavant to this scenario?
Thanks in advance.
This is the classical example for a C++11 std::condition_variable.
The condition in this case is the availability of consumable resources. If a consumer thread runs out of work, he waits on the condition variable which effectively puts him to sleep. The producer notifys after each insert to a queue. Care must be taken to arrange locking in a way that the contention on the queues is kept minimal, while still avoiding the scenario that a consumer misses a notify and goes to sleep although work is available.
A semaphore would work, yes.
But I'm not entirely certain if it's even necessary. It sounds like your problem is caused purely because the consumer thread fails to loop back after processing queue N. It should go to sleep only after it has seen N empty queues in succession, while holding a mutex to ensure that no entries were added in the mean time.
Of course, holding that mutex all the time is overkill. Instead, you should just keep looping, emptying queues one by one and counting how many empty queues you've seen. Once you've seen N empty queues in a row, take the mutex so you know no new entries can be added, and now recheck.
It does depend on your signalling mechanism. Robust signalling mechanisms allow you to signal a thread before it enters the check for that signal. This is necessary because you otherwise have a race condition.
You can use select and wait with it on file descriptor made from signal -> so it can wait on timeout(select has them) and wake up when signal is received (signal must be masked & blocked). When signalfd (look man signalfd) is readable you can read from it a struct signalfd_siginfo and check ssi_signo for signal number (if it's the one you are using for communication).

why some thread pool implementation doesn't use producer and consumer model

I intend to implement a thread pool to manage threads in my project. The basic structure of thread pool come to my head is queue, and some threads generate tasks into this queue, and some thread managed by thread pool are waiting to handle those task. I think this is class producer and consumer problem. But when I google thread pool implementation on the web, I find those implementation seldom use this classic model, so my question is why they don't use this classic model, does this model has any drawbacks? why they don't use full semaphore and empty semaphore to sync?
If you have multiple threads waiting on a single resource (in this case the semaphores and queue) then you are creating a bottle neck. You are forcing all tasks through one queue, even though you have multiple workers. Logically this might make sense if the workers are usually idle, but the whole point of a thread pool is to deal with a heavily loaded scenario where the workers are kept busy (for maximum through-put). Using a single input queue will be particularly bad on a multi-processor system where all workers read and write the head of the queue when they are trying to get the next task. Even though the lock contention might be low, the queue head pointer will still need to be shared/communicated from one CPU cache to another each time it is updated.
Think about the ideal case: all workers are always busy. When a new task is enqueued you want it to be dispatched to the worker that will complete its current/pending task(s) first.
If, as a client, you had a contention-free oracle that could tell you which worker to enqueue a new task to, and each worker had its own queue, then you could implement each worker with its own multi-writer-single-reader queue and always dispatch new tasks to the best queue, thus eliminating worker contention on a single shared input queue. Of course you don't have such an oracle, but this mechanism still works pretty well until a worker runs out of tasks or the queues get imbalanced. "Work stealing" deals with these cases, while still reducing contention compared to the single queue case.
See also:
Is Work Stealing always the most appropriate user-level thread scheduling algorithm?
Why there's no Producer and Consumer model implementation
This model is very generic and could have lots of different explanations, one of the implementation could be a Queue:
Try Apache APR Queue:
It's documented as Thread Safe FIFO bounded queue.
http://apr.apache.org/docs/apr-util/1.3/apr__queue_8h.html

Dealing with boost threads race conditions in C++

I have 6 threads running in my application continuously. The scenario is:
One thread continuously gets the messages and inserts into a message queue. Other 4 threads can be considered as workers which continuously fetch messages from queue and process them. The other final thread populates the analytics information.
Problem:
Now the sleep durations for getting messages thread is 100ms. Worker threads is 200ms. When I ran this application the messages fetch thread is taking control and inserting into the queue thus increasing the heap. The worker threads are not getting chance to process the messages and deallocate them. Finally its resulting into out of memory.
How to manage this kind of scenario so that equal opportunity is given for messages fetch thread and worker thread.
Thanks in advance :)
You need to add back-pressure to your producer thread. Usually this will done by using blocking consumer-producer queues. Producer adds items to queue, consumers dequeues items from queue and process them. If queue is empty, consumers blocks until producer adds something to queue. If queue is full producer blocks until consumers fetch items from the queue.
One system of flow-control that I use often is to create a large pool of message objects at startup and never create any more. The *objects are stored on a thread-safe, blocking 'pool queue' and circulated around, popped from the pool by producer/s, queued to consumer/s on other blocking queues and then pushed back onto the pool queue when 'consumed'.
This caps memory use, provides flow-control, (if the pool empties, the producer/s block on it until messages are returned from consumers), and eliminates continual new/delete/malloc/free. The more complex and slower bounded queues are not necessary and all queues need only to be large enough to hold the, (known), maximum number of messages.
Using 'classic' blocking queues does not require any Sleep() calls.
Your question is a little vague so I can give you these guidelines instead of a code:
Protect mutual data with Mutex. In a multi-threaded consumer producer problem usually there is a race condition on the mutual data (the message in your program). One thread is attempting to write on the mutual memory location while the other is trying to read from the same location. The message read by the reader might be corrupted because the writer has wrote over it in the middle of reading process. You can lock the mutual memory location with a Mutex. Each one of the threads should acquire this lock in order to be able to read or modify the mutual data. This way the consumer process will be absolutely sure that data has not been modified. However you should note that acquiring this lock might hold back the producer thread so you should release the lock as soon as possible.
Use condition variables to notify consumer threads. If you do not use a notification mechanisms all consumer threads should actively check for data production which will use up system resources. The consumer threads should easily go to sleep knowing that the producer thread will notify them whenever a message is ready.
The threading library in C++ 11 has everything you need to implement a consumer producer application. However if you are not able to upgrade your compiler you could use boost threading library as well.
You want to use a bounded queue which when full will block threads trying to enqueue until more space is available.
You can use concurrent_bounded_queue from tbb, or simply use a semaphore initialized to the maximum queue size, and decrement on enqueue and increment on dequeue. boost::thread doesn't provide semaphores natively, but you can implement it using locks and condition variables.

How can I wait on an I/O completion port and an event at the same time?

Is there any possible way to achieve this?
For instance, I have an I/O completion port that 10 worker threads are pulling tasks out of. Each task is associated with an object. Some objects cannot be worked on concurrently, so if one thread is working with one of these objects and a second thread pulls out a task that requires this object, the second thread has to wait for the first to complete.
As a work around, objects could have an event that gets signaled upon release. If a thread is 'stuck' because the task is received requires a locked object, it could wait on either the locked object to be released, or for a new task to be queued. If it picks up a new task, it will push the task it couldn't work on back into the queue.
I am aware of alternative approaches, but this seems like functionality that should exist. Can this be achieved with Windows API?
Change your design.
Add an internal task queue to the object. Then when a task is posted to the IOCP have the IOCP thread place the task in the object's task queue and, if no other thread is "processing" tasks for this object have this IOCP thread mark the object as being processed and begin processing the task; (lock per object queue, add task, check if we should be the processing thread, unlock the queue) and either process the task in the object or return to the IOCP.
When another thread has a task for the same object it also goes through the same process. Note that the thread processing the object DOES NOT hold a lock on the object's task queue so the new IOCP thread can add the task to the object's queue and then see that a thread is already processing and simply return to the IOCP.
Once the thread has finished the current task it checks the object's task queue again and either continues processing the next task, or, if the queue is empty, marks the object as not processing and returns to the IOCP.
This prevents you blocking IOCP threads on tasks which can't yet run and maintains locality of data to the thread that happens to be processing at the time.
The one potential issue is that you can have some always busy objects starving others but you can avoid this by simply checking how many tasks you have processed and it if exceeds a tunable max then pushing the next task to process back into the IOCP so that other objects have a chance.
The idea solution is to have a thread wait for the event and post to the completion port when the event occurs. Alternatively, have a thread wait for the event and just handle it. If you have two fundamentally different things you need to do, use two threads to do them.