I am a beginner using multithreading in C++, so I'd appreciate it if you can give me some recommendations.
I have a function which receives the previous frame and current frame from a video stream (let's call this function, readFrames()). The task of that function is to compute Motion Estimation.
The idea when calling readFrames() would be:
Store the previous and current frame in a buffer.
I want to compute the value of Motion between each pair of frames from the buffer but without blocking the function readFrames(), because more frames can be received while computing that value. I suppose I have to write a function computeMotionValue() and every time I want to execute it, create a new thread and launch it. This function should return some float motionValue.
Every time the motionValue returned by any thread is over a threshold, I want to +1 a common int variable, let's call it nValidMotion.
My problem is that I don't know how to "synchronize" the threads when accessing motionValue and nValidMotion.
Can you please explain to me in some pseudocode how can I do that?
and every time I want to execute it, create a new thread and launch it
That's usually a bad idea. Threads are usually fairly heavy-weight, and spawning one is usually slower than just passing a message to an existing thread pool.
Anyway, if you fall behind, you'll end up with more threads than processor cores and then you'll fall even further behind due to context-switching overhead and memory pressure. Eventually creating a new thread will fail.
My problem is that I don't know how to "synchronize" the threads when accessing motionValue and nValidMotion.
Synchronization of access to a shared resource is usually handled with std::mutex (mutex means "mutual exclusion", because only one thread can hold the lock at once).
If you need to wait for another thread to do something, use std::condition_variable to wait/signal. You're waiting-for/signalling a change in state of some shared resource, so you need a mutex for that as well.
The usual recommendation for this kind of processing is to have at most one thread per available core, all serving a thread pool. A thread pool has a work queue (protected by a mutex, and with the empty->non-empty transition signalled by a condvar).
For combining the results, you could have a global counter protected by a mutex (but this is relatively heavy-weight for a single integer), or you could just have each task added to added to the thread pool return a bool via the promise/future mechanism, or you could just make your counter atomic.
Here is a sample pseudo code you may use:
// Following thread awaits notification from worker threads, detecting motion
nValidMotion_woker_Thread()
{
while(true) { message_recieve(msg_q); ++nValidMotion; }
}
// Worker thread, computing motion on 2 frames; if motion detected, notify uysing message Q to nValidMotion_woker_Thread
WorkerThread(frame1 ,frame2)
{
x = computeMotionValue(frame1 ,frame2);
if x > THRESHOLD
msg_q.send();
}
// main thread
main_thread()
{
// 1. create new message Q for inter-thread communication
msg_q = new msg_q();
// start listening thread
Thread a = new nValidMotion_woker_Thread();
a.start();
while(true)
{
// collect 2 frames
frame1 = readFrames();
frame2 = readFrames();
// start workre thread
Thread b = new WorkerThread(frame1 ,frame2);
b.start();
}
}
Related
I am working on a plotting algorithm. To do this I get the data from a DAQ board in my main GUI thread and then I send the data over to a worker thread to be processed, the worker thread emits a signal with the new QImage which I display as a plot in my GUI. The problem is the function, let's call it generateImage(), to calculate and generate the QImage takes a long time (~50-70 milliseconds, depending on data length) and in between this time another set of data might arrive which will require the worker thread to recalculate the plot from the beginning. I want the generateImage() to abandon the calculation and restart from the beginning if the new data arrives while it is still calculating. My approach is to set a member boolean variable, let's call it b_abort_, and check if it is set to true inside generateImage() and return if it's true, outside generateImage() it always remains true and I only set it to false before generateImage() is called.
All this happens in a worker thread, I subclass QObject and use moveToThread() to move it to a worker thread.
The function which starts calculation:
void WorkerThread::startCalc()
{
b_abort_ = false;
generateImage();
// if b_abort_ is set to true, generateImage() will return prematurely
if(b_abort_)
emit calcFinished();
else
b_abort_ = true;
}
Function which does all calculations and generates image:
void WorkerThread::generateImage()
{
/* Calculation of some parts */
for(int ii = 0; ii < Rawdata.length(); ++ii) // Starting main time consuming loop
{
if(b_abort_)
return;
/* Perform calculation for one data point from Rawdata */
}
// data calculation complete, now it's time to generate QImage
// before that I check if b_abort_ is set to true
if(b_abort_)
return;
for(int ii = 0; ii < CalculatedData.length(); ++ii) // plotting calculated data on QImage
{
if(b_abort_)
return;
/* plot one data point from CalculatedData vector */
}
// generation of QImage finished, time to send the signal
emit renderedPlot(image); // image is a QImage object
}
In my worker thread, I have a slot to receive data from the main GUI Thread, it is configured with Qt::QueuedConnection (the default) as the connection type:
void WorkerThread::receiveData(QVector<double> data)
{
if(!b_abort_) // check if calculation is still running
{
QEventLoop loop;
connect(this, &WorkerThread::calcFinished, &loop, &QEventLoop::quit);
b_abort_ = true; // set it to true and wait for calculation to stop
loop.exec();
// start new calculation
RawData = data;
startClac();
}
else
{
RawData = data;
startClac();
}
}
When I use this approach in my main GUI Thread, the generateImage() function blocks all event loops, and my GUI freezes, which makes me think that inside a single thread (main GUI thread or a worker thread) only one function can run at a time and so any change in b_abort_ is not applied until the thread's event loop returns to process other functions. When using WorkerThread it is difficult to verify if this is working, some times it works fine while other times it generates bad allocation error which seems like it is not working (although it might be because of a different reason entirely, I am not sure). I would like to ask your opinion, is this the right approach to stop a long-running calculation prematurely? Are there any other methods that I can use which will be more robust than my current approach?
How to stop a long-running function in another thread prematurely?
You're correct that the only sane way to do this is to have the long-running thread check, at regular intervals, whether it should stop early.
Note that the flag you're checking must be atomic, or protected by a mutex, or otherwise somehow synchronized. Otherwise it's entirely legitimate for the worker thread to check the variable and never see the value change (no, you can't use volatile for this).
... which makes me think that inside a single thread (main GUI thread or a worker thread) only one function can run at a time ...
Yes, that's exactly what a thread is! It is a single, linear thread of execution. It can't do two things at once. Doing two things at once is the whole reason for having more than one thread.
The approach should be to have a worker thread waiting for work to do, and a main thread that only ever sends it asynchronous messages (start generating an image with this data, or interrupt processing and start again with this data instead, or whatever).
If the main thread calls a function that should happen in the worker thread instead, well, you've deliberately started executing it in the main thread, and the main thread won't do anything until it returns. Just like every other function.
As an aside, your design has a problem: it's possible to never finish generating a single image if it keeps being interrupted by new data.
The usual solution is double-buffering: you let the worker thread finish generating the current image while the main thread accumulates data for the next one. When the worker has finished one image, it can be passed back to the main thread for display. Then the worker can start processing the next, so it takes the buffer of "dirty" updates that the main thread has prepared for it. Subsequent updates are again added to the (now empty) buffer for the next image.
I have a very specific problem to solve. I'm pretty sure someone else in the world has already encountered and solved it but I didn't find any solutions yet.
Here it is :
I have a thread that pop command from a queue and execute them asynchronously
I can call from any other thread a function to execute a command synchronously, bypassing the queue mechanism, returning a result, and taking priority of execution (after the current execution is over).
I have a mutex protecting a command execution so only one is executed at a time
The problem is, with a simple mutex, I have no certitude that a synchronous call will get the mutex before the asynchronous thread when in conflict. In fact, our test shows that the allocation is very unfair and that the asynchronous thread always win.
So I want to block the asynchronous thread while there is a synchronous call waiting. I don't know in advance how many synchronous call can be made, and I don't control the threads that make the calls (so any solution using a pool of threads is not possible).
I'm using C++ and Microsoft library. I know the basic synchronization objects, but maybe there is an more advance object or method suitable for my problem that I don't know.
I'm open to any idea!
Ok so I finally get the chance to close this. I tried some of the solution proposed here and in the link posted.
In the end, I combined a mutex for the command execution and a counter of awaiting sync calls (the counter is also protected by a mutex of course).
The async thread check the counter before trying to get the mutex, and wait the counter to be 0. Also, to avoid a loop with sleep, I added an event that is set when the counter is set to 0. The async thread wait for this event before trying to get the mutex.
void incrementSyncCounter()
{
DLGuardThread guard(_counterMutex);
_synchCount++;
}
void decrementSyncCounter()
{
DLGuardThread guard(_counterMutex);
_synchCount--;
// If the counter is 0, it means that no other sync call is waiting, so we notify the main thread by setting the event
if(_synchCount == 0)
{
_counterEvent.set();
}
}
unsigned long getSyncCounter()
{
DLGuardThread guard(_counterMutex);
return _synchCount;
}
bool executeCommand(Command* command)
{
// Increment the sync call counter so the main thread can be locked while at least one sync call is waiting
incrementSyncCounter();
// Execute the command using mutex protection
DLGuardThread guard(_theCommandMutex);
bool res = command->execute();
guard.release();
// Decrement the sync call counter so the main thread can be unlocked if there is no sync call waiting
decrementSyncCounter();
return res;
}
void main ()
{
[...]
// Infinite loop
while(!_bStop)
{
// While the Synchronous call counter is not 0, this main thread is locked to give priority to the sync calls.
// _counterEvent will be set when the counter is decremented to 0, then this thread will check the value once again to be sure no other call has arrived inbetween.
while(getSyncCounter() > 0)
{
::WaitForSingleObject (_counterEvent.hEvent(), INFINITE);
}
// Take mutex
DLGuardThread guard(_theCommandMutex);
status = command->execute();
// Release mutex
guard.release();
}
}
I am working on multithreading in C++ using pthread. My problem is I am using frames from webcam to perform feature extraction. The feature extraction routine takes around 4-5 seconds to perform the task. However, I want the video streaming to continue and wait for the signal from the Feature extraction routine telling to send another frame. I think there are 2 functions to use here but I am unsure of its implementation. Functions are : pthread_cond_wait and pthread_cond_signal.
My program outline is as follows:
void *makefeature(void * arg){
// compute future using surf
//HERE I WANT TO SIGNAL TO THE MAIN THAT I AM DONE SEND A NEW FRAME NOW
}
int main(){
// All video streaming functions and all
pthread_create(); //! call to make feature routine
}
How can I implement the 2 instance of pthread_cond_wait and pthread_cond_signal.Please help
Independent of which library to use, the idea of condition variables is that 1 thread waits in a blocking state for a condition to change, so it doesn't have to poll it. Since you want your streamer to continue, it might as well poll the condition each time, so you only need a mutex to synchronize the condition.
so extracter:
doExtraction(Frame);
mutex.lock();
Ready = true;
mutex.unlock(); // can be avoided with RAII
streamer:
while(true)
{
doStreaming();
bool localReady;
mutex.lock();
localReady = Ready;
Ready = false;
mutex.unlock();
if (localReady) prepareFrame();
}
You might want to you a condition variable to pass the frame to the extractor thread.
I am experimenting with multithreaded synchronization at the moment. For a backround I have a set of about 100000 objects - possibly more - I want to process in different ways multiple times per second.
Now the thing concerning me most is the performance of the synchronization.
This is what I think should work just fine (I omitted all security aspects as this is just a testprogram and in case of an error the program will just crash ..). I wrote two funktions, the first to be executed by the main thread of the program, the second to be run by all additional threads.
void SharedWorker::Start()
{
while (bRunning)
{
// Send the command to start task1
SetEvent(hTask1Event);
// Do task1 (on a subset of all objects) here
// Wait for all workers to finish task1
WaitForMultipleObjects(<NumberOfWorkers>, <ListOfTask1WorkerEvents>, TRUE, INFINITE);
// Reset the command for task1
ResetEvent(hTask1Event);
// Send the command to start task2
SetEvent(hTask2Event);
// Do task2 (on a subset of all objects) here
// Wait for all workers to finish task2
WaitForMultipleObjects(<NumberOfWorkers>, <ListOfTask2WorkerEvents>, TRUE, INFINITE);
// Reset the command for task2
ResetEvent(hTask2Event);
// Send the command to do cleanup
SetEvent(hCleanupEvent);
// Do some (on a subset of all objects) cleanup
// Wait for all workers to finish cleanup
WaitForMultipleObjects(<NumberOfWorkers>, <ListOfCleanupWorkerEvents>, TRUE, INFINITE);
// Reset the command for cleanup
ResetEvent(hCleanupEvent);
}
}
DWORD WINAPI WorkerThreads(LPVOID lpParameter)
{
while (bRunning)
{
WaitForSingleObject(hTask1Event, INFINITE);
// Unset finished cleanup
ResetEvent(hCleanedUp);
// Do task1 (on a subset of all objects) here
// Signal finished task1
SetEvent(hTask1);
WaitForSingleObject(hTask2Event, INFINITE);
// Reset task1 event
ResetEvent(hTask1);
// Do task2 (on a subset of all objects) here
// Signal finished task2
SetEvent(hTask2);
WaitForSingleObject(hCleanupEvent, INFINITE);
// Reset update event
ResetEvent(hTask2);
// Do cleanup (on a subset of all objects) here
// Signal finished cleanup
SetEvent(hCleanedUp);
}
return 0;
}
To point out my requirements, I'll just give you a little example:
Say we got the 100000 objects from above, split into 8 subsets of 12500 objects each, a modern multicore processor with 8 logical cores. The relevant part is the time. All tasks must be performed in about 8ms.
My questions are now, can I get a significant boost in time from split processing or is the synchronization via events too expensive? or is there maybe even another way of synchronizing threads with less effort or process time if all the tasks need to be done this way?
If your processing of a single object is fast, do not split it between threads. The thread synchronization on windows will eat well over 50 ms on every context switch. This time is not used by system, but just the time when something else is running on a system.
However, if every object processing will take around 8ms, there is a point of scheduling the work across pool of threads. However, object processing may vary a bit, and in large counts worker threads would complete the work in a different moment.
Better approach is to organize a synchronized object queue, to which you add objects to process, and from which you take them from processing. Furthermore, as processing of a single object considerably lower, than scheduling interval of a thread, it is good to take them into processing thread in batches (like 10-20). You can estimate the best number of worker threads in your pool and the best size of a batch with tests.
So the pseudocode can look like:
main_thread:
init queue
start workers
set counter to 100000
add 100000 objects to queue
while (counter) wait();
worker_thread:
while (!done)
get up to 10 objects from queue
process objects
counter -= processed count
if (counter == 0) notify done
EDIT: below
I have one thread responsible for streaming data from a device in buffers. In addition, I have N threads doing some processing on that data. In my setup, I would like the streamer thread to fetch data from the device, and wait until the N threads are done with the processing before fetching new data or a timeout is reached. The N threads should wait until new data has been fetched before continuing to process. I believe that this framework should work if I don't want the N threads to repeat processing on a buffer and if I want all buffers to be processed without skipping any.
After careful reading, I found that condition variables is what I needed. I have followed tutorials and other stack overflow questions, and this is what I have:
global variables:
boost::condition_variable cond;
boost::mutex mut;
member variables:
std::vector<double> buffer
std::vector<bool> data_ready // Size equal to number of threads
data receiver loop (1 thread runs this):
while (!gotExitSignal())
{
{
boost::unique_lock<boost::mutex> ll(mut);
while(any(data_ready))
cond.wait(ll);
}
receive_data(buffer);
{
boost::lock_guard<boost::mutex> ll(mut);
set_true(data_ready);
}
cond.notify_all();
}
data processing loop (N threads run this)
while (!gotExitSignal())
{
{
boost::unique_lock<boost::mutex> ll(mut);
while(!data_ready[thread_id])
cond.wait(ll);
}
process_data(buffer);
{
boost::lock_guard<boost::mutex> ll(mut);
data_ready[thread_id] = false;
}
cond.notify_all();
}
These two loops are in their own member functions of the same class. The variable buffer is a member variable, so it can be shared across threads.
The receiver thread will be launched first. The data_ready variable is a vector of bools of size N. data_ready[i] is true if data is ready to be processed and false if the thread has already processed data. The function any(data_ready) outputs true if any of the elements of data_ready is true, and false otherwise. The set_true(data_ready) function sets all of the elements of data_ready to true. The receiver thread will check if any processing thread still is processing. If not, it will fetch data, set the data_ready flags, notify the threads, and continue with the loop which will stop at the beginning until processing is done. The processing threads will check their respective data_ready flag to be true. Once it is true, the processing thread will do some computations, set its respective data_ready flag to 0, and continue with the loop.
If I only have one processing thread, the program runs fine. Once I add more threads, I'm getting into issues where the output of the processing is garbage. In addition, the order of the processing threads matters for some reason; in other words, the LAST thread I launch will output correct data whereas the previous threads will output garbage, no matter what the input parameters are for the processing (assuming valid parameters). I don't know if the problem is due to my threading code or if there is something wrong with my device or data processing setup. I try using couts at the processing and receiving steps, and with N processing threads, I see the output as it should:
receive data
process 1
process 2
...
process N
receive data
process 1
process 2
...
Is the usage of the condition variables correct? What could be the problem?
EDIT: I followed fork's suggestions and changed the code to:
data receiver loop (1 thread runs this):
while (!gotExitSignal())
{
if(!any(data_ready))
{
receive_data(buffer);
boost::lock_guard<boost::mutex> ll(mut);
set_true(data_ready);
cond.notify_all();
}
}
data processing loop (N threads run this)
while (!gotExitSignal())
{
// boost::unique_lock<boost::mutex> ll(mut);
boost::mutex::scoped_lock ll(mut);
cond.wait(ll);
process_data(buffer);
data_ready[thread_id] = false;
}
It works somewhat better. Am I using the correct locks?
I did not read your whole story but if i look at the code quickly i see that you use conditions wrong.
A condition is like a state, once you set a thread in a waiting condition it gives away the cpu. So your thread will effectively stop running untill some other process/thread notifies it.
In your code you have a while loop and each time you check for data you wait. That is wrong, it should be an if instead of a while. But then again it should not be there. The checking for data should be done somewhere else. And your worker thread should put itself in waiting condition after it has done its work.
Your worker threads are the consumers. And the producers are the ones that deliver the data.
I think a better construction would be to make a thread check if there is data and notify the worker(s).
PSEUDO CODE:
//producer
while (true) {
1. lock mutex
2. is data available
3. unlock mutex
if (dataAvailableVariable) {
4. notify a worker
5. set waiting condition
}
}
//consumer
while (true) {
1. lock mutex
2. do some work
3. unlock mutex
4. notify producer that work is done
5. set wait condition
}
You should also take care of the fact that some thread needs to be alive in order to avoid a deadlock, means all threads in waiting condition.
I hope that helps you a little.