Dealing with boost threads race conditions in C++

Dealing with boost threads race conditions in C++ - c++

I have 6 threads running in my application continuously. The scenario is:
One thread continuously gets the messages and inserts into a message queue. Other 4 threads can be considered as workers which continuously fetch messages from queue and process them. The other final thread populates the analytics information.
Problem:
Now the sleep durations for getting messages thread is 100ms. Worker threads is 200ms. When I ran this application the messages fetch thread is taking control and inserting into the queue thus increasing the heap. The worker threads are not getting chance to process the messages and deallocate them. Finally its resulting into out of memory.
How to manage this kind of scenario so that equal opportunity is given for messages fetch thread and worker thread.
Thanks in advance :)

You need to add back-pressure to your producer thread. Usually this will done by using blocking consumer-producer queues. Producer adds items to queue, consumers dequeues items from queue and process them. If queue is empty, consumers blocks until producer adds something to queue. If queue is full producer blocks until consumers fetch items from the queue.

One system of flow-control that I use often is to create a large pool of message objects at startup and never create any more. The *objects are stored on a thread-safe, blocking 'pool queue' and circulated around, popped from the pool by producer/s, queued to consumer/s on other blocking queues and then pushed back onto the pool queue when 'consumed'.
This caps memory use, provides flow-control, (if the pool empties, the producer/s block on it until messages are returned from consumers), and eliminates continual new/delete/malloc/free. The more complex and slower bounded queues are not necessary and all queues need only to be large enough to hold the, (known), maximum number of messages.
Using 'classic' blocking queues does not require any Sleep() calls.

Your question is a little vague so I can give you these guidelines instead of a code:
Protect mutual data with Mutex. In a multi-threaded consumer producer problem usually there is a race condition on the mutual data (the message in your program). One thread is attempting to write on the mutual memory location while the other is trying to read from the same location. The message read by the reader might be corrupted because the writer has wrote over it in the middle of reading process. You can lock the mutual memory location with a Mutex. Each one of the threads should acquire this lock in order to be able to read or modify the mutual data. This way the consumer process will be absolutely sure that data has not been modified. However you should note that acquiring this lock might hold back the producer thread so you should release the lock as soon as possible.
Use condition variables to notify consumer threads. If you do not use a notification mechanisms all consumer threads should actively check for data production which will use up system resources. The consumer threads should easily go to sleep knowing that the producer thread will notify them whenever a message is ready.
The threading library in C++ 11 has everything you need to implement a consumer producer application. However if you are not able to upgrade your compiler you could use boost threading library as well.

You want to use a bounded queue which when full will block threads trying to enqueue until more space is available.
You can use concurrent_bounded_queue from tbb, or simply use a semaphore initialized to the maximum queue size, and decrement on enqueue and increment on dequeue. boost::thread doesn't provide semaphores natively, but you can implement it using locks and condition variables.

Related

Make datastream from thread readable for all other threads

I have to distribute a data stream under clients of a multithreaded server instance, the client threads do only need to read. That means I have a thread from which the data comes and all other threads need to read that data (they do not have to change it anymore) so that they can send the data to the clients.
I tried a thread safe queue (https://blog.chrisd.info/a-simple-thread-safe-queue-for-use-in-multi-threaded-c-applications/) but as soon as I tried it with more than one client only the second or the new one received the data.
How do I solve the problem? Are there any thread safe queues that can be used in multiple threads?
Luick

As from what you described, the usual queue semantics won't work, since you actually want to pop the elements when all the threads have gotten it, not on the first access. So you have several options:
Maintain a queue per each client thread, and the producer thread always pushes the data into each of the client threads. By wrapping the data into an std::shared_ptr you could reduce memory overhead and create semantics, where the data is destroyed when the last client is done with it.
Have a single queue but multiple tail pointers for each thread. Although this can get complex in terms of handling the threads as they spawn/terminate. But you haven't stated what the constraints are in your system - is the thread count fixed or dynamic.

Signaling between threads

Application design
I have a c++ application that has a producer thread, multiple queues (created during run time) and consumer thread.
Producer thread gets data via Tcp/Ip and puts into the respective queue (for E.g., if data is type A and put into the Queue A).
The consumer thread currently loops the queues from 1 - n to process the data from each queue.
As per the requirement no need to track the queue that is last updated or least. As long as any of the queue is updated, consumer should process from 1 - n queues.
If any of the queues' size is more than the defined limit, producer thread will pop the first item before it inserts the new item (to manage the queue size).
Resource synchronization and signaling between threads:
In this implementation, consumer thread should sleep until there is no queue has data from the listener. consumer thread should wake up only if producer puts data into any one of the queues.
Multiple queues are synchronized between 2 threads using mutex.
Event signaling is implemented between threads to wake up consumer thread whenever producer puts data into any of the queues.
However this way of signaling to wake up consumer thread, it is possible for the consumer to sleep although there is a data in any of the queues.
Issue:
Lets take this scenario, consider the consumer is processing n-th queue's data; at the same time it is possible for the producer to put data into the n-1, n-2 queue and signaling is not effective since the consumer is awake and processing n-th data. Once the consumer completes processing the n-th queue data, it will sleep and the data in n-1, n-2 will not be processes until any further signal is given by the listener.
How we can address this scenario?
People are also advising to use semophore. Is semaphore relavant to this scenario?
Thanks in advance.

This is the classical example for a C++11 std::condition_variable.
The condition in this case is the availability of consumable resources. If a consumer thread runs out of work, he waits on the condition variable which effectively puts him to sleep. The producer notifys after each insert to a queue. Care must be taken to arrange locking in a way that the contention on the queues is kept minimal, while still avoiding the scenario that a consumer misses a notify and goes to sleep although work is available.

A semaphore would work, yes.
But I'm not entirely certain if it's even necessary. It sounds like your problem is caused purely because the consumer thread fails to loop back after processing queue N. It should go to sleep only after it has seen N empty queues in succession, while holding a mutex to ensure that no entries were added in the mean time.
Of course, holding that mutex all the time is overkill. Instead, you should just keep looping, emptying queues one by one and counting how many empty queues you've seen. Once you've seen N empty queues in a row, take the mutex so you know no new entries can be added, and now recheck.
It does depend on your signalling mechanism. Robust signalling mechanisms allow you to signal a thread before it enters the check for that signal. This is necessary because you otherwise have a race condition.

You can use select and wait with it on file descriptor made from signal -> so it can wait on timeout(select has them) and wake up when signal is received (signal must be masked & blocked). When signalfd (look man signalfd) is readable you can read from it a struct signalfd_siginfo and check ssi_signo for signal number (if it's the one you are using for communication).

C++ coordinating processes when launched?

Using C++ I'm planning to have a producer process writing a data vector and then several consumer processes reading that data. There will be a shared memory segment (Boost::Interprocess) where the data vector will be stored. Issue is: I have no control over the order in which the processes will be launched by a third party application, it could be that the consumer might launch before the producer has produced any data. What mechanisms are available to coordinate the processes so that the consumer processes can be commanded to wait patiently until the producer signals the data is ready; no matter what the order in which the processes launch?

I guess named semaphore is a good choice. Your producer and consumers application should agree (hard-coded) the name of the semaphore, something like /tmp/mySem and only the producer must create and post the semaphore while the consumers should wait for semaphores existance and state.

If creating shared memory is responsibility of producer process, then you can use boost barrier for synchronizing startup.
You can create a barrier for creating shared memory, maybe some number of jobs to deploy. After reaching this barrier, consumer processes can continue to process them.
You can look to details of boost barrier at this page

Thread Pool and Job Queue Architecture?

I have an epoll to receive incoming events and put them into a Job Queue.
When the event is put into Job Queue, pthread conditional signal is sent to wake up the worker threads.
However I've met a problem where all worker threads are busy and Jobs keep stacks up on the queue. This is serious problem because if Jobs are stacked, and new event doesn't come in a while, Jobs in Queue won't be handed to Worker Threads.
As soon as Thread gets available, I wish they can take jobs from the Job Queue automatically (if Possible).
Is there anyway to do this? All I can think of is.. adding a Queue Observer and send a conditional Signal in intervals.
Also, I know that STL Queue is not thread-safe. Does that mean I have to Mutex Lock everytime when I get an access to STL Queue? WOn't this slow down my working process?
Any suggestion to solve this problem will be great.

The classic way of counting/signaling jobs on a producer-consumer queue is to use a semaphore. The producer signals it after pushing on a job and the consumer waits on it before popping one off. You need a mutex around the push/pop as well to protect the queue from multiple access.

Take a look at the work-stealing thread pool in .NET. Yes you'll have to mutex lock the global queue, to that end I've written a double-lock deque so operations on the front / back can be done in parallel. I also have a lock-free deque but the overhead is too high for client-side apps.

Solution for non-blocking timer and server is boost threads?

My project has a queue, a server and a timer. The server receives data and puts it in the queue and the timer process the queue. When the queue is processed, external processes are open with popen, which means that popen will block the timer until a process has ended.
Correct me if I'm wrong, but as both server and timer are linked to the same io_service, if the server receives data, it will somehow block io_service from proceeding to the next event, and the vice-versa is the timer blocking if a process in the queue is being executed.
I'm thinking in a solution based on boost::thread but I'm not sure of what architecture should I use as I never used threads. My options are:
Two threads - one for the timer and one for the server, each one using its own io_service
One thread - one for the timer with its own io_service. the server remains in main process
In both ways the queue (a simple map) must be shared, so I think I'll have some trouble with mutexes and other things
If someone wants to take a look at the code, it is at https://github.com/MendelGusmao/CGI-for-LCD-Smartie
Thanks!

I don't see why you can't have your server listening for connections, processing data, and placing that data in the queue in one thread while your timer takes those items out of the queue in another thread and then spawns processes via popen() to process the queue data. Unless there is a detail here that I've missed, the socket that the server will be listening on (or pipe, FIFO, etc.), is separate from the pipe that will be internally opened by the libc runtime via popen(), so your server and timer threads won't be blocking each other. You'll simply have to make sure that you have enough space in the queue to store the data coming in from the server without overflowing memory (i.e., if this is a high-data-rate application, and data is coming in much faster than it's being processed, you'll eventually run out of memory).
Finally, while guarding a shared queue via muextes is a good thing, it's actually unnecessary for only a single producer/consumer situation like you're currently describing if you decide to use a bounded queue (i.e., a ring-buffer). If you decide on an unbounded queue, while there are some lockless algorithms out there, they're pretty complex, and so guarding an unbounded queue like std::queue<T> with a mutex is an absolute must.

I have implemented almost the exact thing you have described using windows threads. I had my consumer wait on an event HANDLE which is fired by the producer when the queue gets too long. There was a timeout on the wait as well so that if the queue was not filled fast enough the consumer would still wait and process the queue. It was a service in windows so the main thread was used for that. And yes, mutexes will be required to access the shared object.
So I used two threads (not including the main), 1 mutex, 1 shared object. I think your better option is also two threads as it keeps the logic cleaner. The main thread just starts the two threads and then waits (or can be used for signalling, control, output), and the two other threads are just doing their own jobs.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js