Dynamically spawn/cancel threads to handle varying load - c++

Here's a situation.
Multiple TCP requests coming from clients. Request load varies.
Currently server side threadpool instantiates fixed number of threads at program initialization.
What will be good strategy to dynamically spawn/cancel threads in a threadpool?

There's no point. You don't need more threads to do more work. Just create a few more threads than the number of things you can usefully do at once and leave it at that.

A simple way to handle a thread-pool, is to have a vector of threads, and for each thread have a queue of std::function objects. The threads simply pops a function from its queue, and executes it.
You don't have to spawn/cancel threads as they already are running, created when the thread-pool is created. While the threads don't do any work, they either wait for a signal telling it that there is functions in its queue, or poll the queue with sleeps in between.

Related

Make datastream from thread readable for all other threads

I have to distribute a data stream under clients of a multithreaded server instance, the client threads do only need to read. That means I have a thread from which the data comes and all other threads need to read that data (they do not have to change it anymore) so that they can send the data to the clients.
I tried a thread safe queue (https://blog.chrisd.info/a-simple-thread-safe-queue-for-use-in-multi-threaded-c-applications/) but as soon as I tried it with more than one client only the second or the new one received the data.
How do I solve the problem? Are there any thread safe queues that can be used in multiple threads?
Luick
As from what you described, the usual queue semantics won't work, since you actually want to pop the elements when all the threads have gotten it, not on the first access. So you have several options:
Maintain a queue per each client thread, and the producer thread always pushes the data into each of the client threads. By wrapping the data into an std::shared_ptr you could reduce memory overhead and create semantics, where the data is destroyed when the last client is done with it.
Have a single queue but multiple tail pointers for each thread. Although this can get complex in terms of handling the threads as they spawn/terminate. But you haven't stated what the constraints are in your system - is the thread count fixed or dynamic.

Dealing with boost threads race conditions in C++

I have 6 threads running in my application continuously. The scenario is:
One thread continuously gets the messages and inserts into a message queue. Other 4 threads can be considered as workers which continuously fetch messages from queue and process them. The other final thread populates the analytics information.
Problem:
Now the sleep durations for getting messages thread is 100ms. Worker threads is 200ms. When I ran this application the messages fetch thread is taking control and inserting into the queue thus increasing the heap. The worker threads are not getting chance to process the messages and deallocate them. Finally its resulting into out of memory.
How to manage this kind of scenario so that equal opportunity is given for messages fetch thread and worker thread.
Thanks in advance :)
You need to add back-pressure to your producer thread. Usually this will done by using blocking consumer-producer queues. Producer adds items to queue, consumers dequeues items from queue and process them. If queue is empty, consumers blocks until producer adds something to queue. If queue is full producer blocks until consumers fetch items from the queue.
One system of flow-control that I use often is to create a large pool of message objects at startup and never create any more. The *objects are stored on a thread-safe, blocking 'pool queue' and circulated around, popped from the pool by producer/s, queued to consumer/s on other blocking queues and then pushed back onto the pool queue when 'consumed'.
This caps memory use, provides flow-control, (if the pool empties, the producer/s block on it until messages are returned from consumers), and eliminates continual new/delete/malloc/free. The more complex and slower bounded queues are not necessary and all queues need only to be large enough to hold the, (known), maximum number of messages.
Using 'classic' blocking queues does not require any Sleep() calls.
Your question is a little vague so I can give you these guidelines instead of a code:
Protect mutual data with Mutex. In a multi-threaded consumer producer problem usually there is a race condition on the mutual data (the message in your program). One thread is attempting to write on the mutual memory location while the other is trying to read from the same location. The message read by the reader might be corrupted because the writer has wrote over it in the middle of reading process. You can lock the mutual memory location with a Mutex. Each one of the threads should acquire this lock in order to be able to read or modify the mutual data. This way the consumer process will be absolutely sure that data has not been modified. However you should note that acquiring this lock might hold back the producer thread so you should release the lock as soon as possible.
Use condition variables to notify consumer threads. If you do not use a notification mechanisms all consumer threads should actively check for data production which will use up system resources. The consumer threads should easily go to sleep knowing that the producer thread will notify them whenever a message is ready.
The threading library in C++ 11 has everything you need to implement a consumer producer application. However if you are not able to upgrade your compiler you could use boost threading library as well.
You want to use a bounded queue which when full will block threads trying to enqueue until more space is available.
You can use concurrent_bounded_queue from tbb, or simply use a semaphore initialized to the maximum queue size, and decrement on enqueue and increment on dequeue. boost::thread doesn't provide semaphores natively, but you can implement it using locks and condition variables.

Solution for non-blocking timer and server is boost threads?

My project has a queue, a server and a timer. The server receives data and puts it in the queue and the timer process the queue. When the queue is processed, external processes are open with popen, which means that popen will block the timer until a process has ended.
Correct me if I'm wrong, but as both server and timer are linked to the same io_service, if the server receives data, it will somehow block io_service from proceeding to the next event, and the vice-versa is the timer blocking if a process in the queue is being executed.
I'm thinking in a solution based on boost::thread but I'm not sure of what architecture should I use as I never used threads. My options are:
Two threads - one for the timer and one for the server, each one using its own io_service
One thread - one for the timer with its own io_service. the server remains in main process
In both ways the queue (a simple map) must be shared, so I think I'll have some trouble with mutexes and other things
If someone wants to take a look at the code, it is at https://github.com/MendelGusmao/CGI-for-LCD-Smartie
Thanks!
I don't see why you can't have your server listening for connections, processing data, and placing that data in the queue in one thread while your timer takes those items out of the queue in another thread and then spawns processes via popen() to process the queue data. Unless there is a detail here that I've missed, the socket that the server will be listening on (or pipe, FIFO, etc.), is separate from the pipe that will be internally opened by the libc runtime via popen(), so your server and timer threads won't be blocking each other. You'll simply have to make sure that you have enough space in the queue to store the data coming in from the server without overflowing memory (i.e., if this is a high-data-rate application, and data is coming in much faster than it's being processed, you'll eventually run out of memory).
Finally, while guarding a shared queue via muextes is a good thing, it's actually unnecessary for only a single producer/consumer situation like you're currently describing if you decide to use a bounded queue (i.e., a ring-buffer). If you decide on an unbounded queue, while there are some lockless algorithms out there, they're pretty complex, and so guarding an unbounded queue like std::queue<T> with a mutex is an absolute must.
I have implemented almost the exact thing you have described using windows threads. I had my consumer wait on an event HANDLE which is fired by the producer when the queue gets too long. There was a timeout on the wait as well so that if the queue was not filled fast enough the consumer would still wait and process the queue. It was a service in windows so the main thread was used for that. And yes, mutexes will be required to access the shared object.
So I used two threads (not including the main), 1 mutex, 1 shared object. I think your better option is also two threads as it keeps the logic cleaner. The main thread just starts the two threads and then waits (or can be used for signalling, control, output), and the two other threads are just doing their own jobs.

responsively checking two queues without pegging CPU

I have a thread pool system which uses message passing to organize events, and I am also using the Windows API which also does a bit of message passing. So essentially I need to use the functions which check for the presence of messages without blocking. If I block (if I use GetMessage I think it will block) while checking either queue, I may miss any incoming messages on the other queue.
The first solution I know of is to Sleep a couple of miliseconds somewhere during my loop of peeking on both queues.
Another way I can think of is to have an additional thread, so that now I have one for each loop I am listening to. I make it not responsible for doing anything other than running the windows message loop, then use it to process and forward any events to my own message queue for the event to be handled. But this won't work if Windows specifically sends the messages i'm interested in to the original thread.
Are there other good solutions?
Your requirement is a bit unclear, but I can agree that Windows message queues are awkward in that only one thread can wait on them. Windows binds windows to threads, and only the thread that creates a window can interact with it.
If you have user-defined messages that contain work to processed by to your thread pool, I suggest that you do exactly what you suggest in your question - use one thread to process all the Windows messages, (GetMessage() loop), requeue any work that turns up to your thread pool input queue and handle 'normal' Windows messages with the usual Translate/Dispatch mechanism.
If you need more help, could you describe more clearly the flow of Windows messages and/or work objects through your system? It is not obvious where the work for the thread pool comes from and how it is transported, (if forced to use a WMQ, I usually postMessage a reference in wParam/lParam, but your system?).
Rgds,
Martin
Normally, a thread pool would not be involved in the Windows message loop, and blocking indefinitely when there is no work is not only allowable for a worker thread, but even desirable.
The most elegant way of implementing a thread pool that can receive messages via some kind of queue, which automatically keeps all CPU cores busy, and which as a bonus is very efficient, is using a completion port.
CreateIoCompletionPort with a null handle will create a completion port and return the handle. Passing zero as NumberOfConcurrentThreads tells the operating system to keep as many threads running as there are cores available.
Create any number of worker threads (a few more than you have cores) and CreateIoCompletionPort with the handle returned by the first call. That will bind the workers to this completion port. Now call GetQueuedCompletionStatus with INFINITE timeout on every worker, that will block them indefinitively.
Make a struct which has an OVERLAPPED as the first member, plus any data that you want to hand as a task (some pointers to data, or anything).
For every task, set up one of your message structs, and PostQueuedCompletionStatus to the completion port handle. At application exit, post null. You can use the dwNumberOfBytesTransferred field (and the completion key) to pass some additional info.
Now Windows will wake one thread for every message you posted, in last-in-first-out order, up to the number of cores available. If one of the workers blocks on IO, Windows will wake another one for another task (keeping the CPU busy as long as there is work to do).
After finishing a task, go back to GetQueuedCompletionStatus.
A way to gracefully terminate all workers is to pass "zero bytes transferred" and have the worker re-post the event, and exit if it encounters that.
I am not an expert on windows queues, but I am nearly certain there has to be an asynchronous event driven mechanism for message passing.

Possible frameworks/ideas for thread managment and work allocation in C++

I am developing a C++ application that needs to process large amount of data. I am not in position to partition data so that multi-processes can handle each partition independently. I am hoping to get ideas on frameworks/libraries that can manage threads and work allocation among worker threads.
Manage threads should include at least below functionality.
1. Decide on how many workers threads are required. We may need to provide user-defined function to calculate number of threads.
2. Create required number of threads.
3. Kill/stop unnecessary threads to reduce resource wastage.
4. Monitor healthiness of each worker thread.
Work allocation should include below functionality.
1. Using callback functionality, the library should get a piece of work.
2. Allocate the work to available worker thread.
3. Master/slave configuration or pipeline-of-worker-threads should be possible.
Many thanks in advance.
Your question essentially boils down to "how do I implement a thread pool?"
Writing a good thread pool is tricky. I recommend hunting for a library that already does what you want rather than trying to implement it yourself. Boost has a thread-pool library in the review queue, and both Microsoft's concurrency runtime and Intel's Threading Building Blocks contain thread pools.
With regard to your specific questions, most platforms provide a function to obtain the number of processors. In C++0x this is std::thread::hardware_concurrency(). You can then use this in combination with information about the work to be done to pick a number of worker threads.
Since creating threads is actually quite time consuming on many platforms, and blocked threads do not consume significant resources beyond their stack space and thread info block, I would recommend that you just block worker threads with no work to do on a condition variable or similar synchronization primitive rather than killing them in the first instance. However, if you end up with a large number of idle threads, it may be a signal that your pool has too many threads, and you could reduce the number of waiting threads.
Monitoring the "healthiness" of each thread is tricky, and typically platform dependent. The simplest way is just to check that (a) the thread is still running, and hasn't unexpectedly died, and (b) the thread is processing tasks at an acceptable rate.
The simplest means of allocating work to threads is just to use a single shared job queue: all tasks are added to the queue, and each thread takes a task when it has completed the previous task. A more complex alternative is to have a queue per thread, with a work-stealing scheme that allows a thread to take work from others if it has run out of tasks.
If your threads can submit tasks to the work queue and wait for the results then you need to have a scheme for ensuring that your worker threads do not all get stalled waiting for tasks that have not yet been scheduled. One option is to spawn a new thread when a task gets blocked, and another is to run the not-yet-scheduled task that is blocking a given thread on that thread directly in a recursive manner. There are advantages and disadvantages with both these schemes, and with other alternatives.