Caffe feature extraction blocks other executing threads (Qt / C++)

Caffe feature extraction blocks other executing threads (Qt / C++) - c++

Background
I am developing a Qt application with three threads: main, thread1, and thread2.
main creates, starts, and displays the results of thread1 and
thread2, while also iteratively feeding them input.
thread1 performs an intensive computation (~1s) once every n
inputs, ignoring all other inputs. The majority of this time is spent on feature extraction
using the Caffe framework.
thread2 performs fast computations (~20ms) on every input. But
every n+1 inputs depend on the output of feeding input n to thread1.
During thread execution, thread1 appears to block thread2 when extracting features using the Caffe network. However, thread1 does not block thread2 during other processing steps (e.g. network input preprocessing).
At first, I thought this was caused by an unmet dependency: i.e. thread1 "blocks" thread2 because input (for example) 2n + 1 is ready to be processed by thread2 but input 2n has not been fully processed by thread1.
However, from analysing the execution flow, I noticed this "blocking" behaviour was occurring whilst dependencies were met: i.e. let n = 10, thread2 would pause execution at input 15 while thread1 was extracting Caffe features from input 20.
Question
How do I prevent thread2 from blocking thread1 during Caffe feature extraction?
Code
Below is a stripped-down version of my code, which shows the key components and logic of my program.
I have highlighted the problem in comments // !!! PROBLEM: ..., which can be found in thread1worker.cpp and featureengine.cpp.
main.cpp:
int main(int argc, char* argv[])
{
QApplication app(argc, argv);
qRegisterMetaType<Mat>("Mat");
Camera camera; // grabs camera frame every 30ms, emitting newFrame(frame)
/* thread1 */
QThread* thread1 = new QThread();
Thread1Worker* thread1_worker = new Thread1Worker();
thread1_worker->moveToThread(thread1);
QThread::connect(&camera, SIGNAL(newFrame(Mat)),
thread1_worker, SLOT(doWork(Mat)));
QThread::connect(thread1, SIGNAL(finished()),
thread1_worker, SLOT(deleteLater()));
QThread::connect(thread1, SIGNAL(finished()),
thread1, SLOT(deleteLater()));
/* thread2 */
ImageQueue* thread2_images = new ImageQueue();
QThread::connect(camera, SIGNAL(newFrame(Mat)),
thread2_images, SLOT(add(Mat)));
QThread* thread2 = new QThread();
Thread2Worker* thread2_worker = new Thread2Worker(thread2_images);
thread2_worker->moveToThread(thread2);
QThread::connect(thread2_worker, SIGNAL(workFinished(OutputType)),
thread2_worker, SLOT(addThread1Result(OutputType)));
QThread::connect(thread2, SIGNAL(finished()),
thread2_worker, SLOT(deleteLater()));
QThread::connect(thread2, SIGNAL(finished()),
thread2, SLOT(deleteLater()));
/* start threads */
thread1->start();
thread2->start();
camera->start();
return app.exec();
}
thread1worker.cpp:
Thread1Worker::Thread1Worker()
{
thread1_interval = 10; // this is "n"
is_initialized = false;
}
void Thread1Worker::doWork(Mat frame)
{
if (!is_initialized)
initialize();
// process only every nth frame
if (!isThread1Frame())
return;
// ... break frame up into multiple image patches
// !!! PROBLEM: this call blocks thread2
vector<vector<float> > features = feature_engine->extractFeatures(patches);
// ... use features to compute output
frame_count++;
emit workFinished(output);
}
void Thread1Worker::initialize()
{
InitGoogleLogging("caffe-demo");
feature_engine = new FeatureEngine();
is_initialized = true;
}
bool Thread1Worker::isThread1Frame()
{
return frame_count % thread1_interval == 0;
}
thread2worker.cpp:
void Thread2Worker::addThread1Result(OutputType output)
{
if (!is_initialized)
initialize();
thread1_output_queue.push(output);
thread1_count++;
processFrames();
}
void Thread2Worker::processFrames()
{
size_t num_process = (thread1_count * thread1_interval) - process_count;
size_t num_queue = thread2_images->size();
for (size_t i = 0; i < num_process && i < num_queue; i++)
{
Mat frame = thread2_images->get();
if (isThread1Frame()
{
curr_result = thread1_output_queue.front();
thread1_output_queue.pop();
}
else
{
curr_result = propagator->propagate(prev_result);
}
// update
prev_result = curr_result;
emit resultReady(curr_result);
}
}
void Thread2Worker::initialize()
{
propagator = new Propagator();
is_initialized = true;
}
bool Thread2Worker::isThread1Frame()
{
return process_count % thread1_interval == 0;
}
featureengine.cpp:
vector<vector<float> > FeatureEngine::extractFeatures(const vector<Mat>& images)
{
// setup Caffe network for feature extraction:
Blob<float>* input_layer = net->input_blobs()[0];
int num_images = images.size();
int height = input_geometry.height;
int width = input_geometry.width;
input_layer->Reshape(num_images, num_channels, height, width);
net->Reshape();
vector<Mat> input_channels;
wrapInputLayer(input_channels);
preprocess(images, &input_channels);
// !!! PROBLEM: this ~1s computation blocks thread2
// details: https://github.com/BVLC/caffe/blob/master/src/caffe/net.cpp#L594
net->ForwardPrefilled();
// copy Caffe network output to features vector
vector<vector<float> > features;
// ...
return features;
}

Related

GUI app frozen while process and generate a file

I have a QT GUI application in c++ that aims to process and generates binary files.
The App works fine but it looks like it's frozen when it enters the while loop of processing and writing in the file.
I solve this by coping qApp->processEvents(); into the while loop. but the problem is that it takes much more time for generating a file:
without qApp->processEvents(); in the loop => it takes 4second
with qApp->processEvents(); in the loop => it takes 50 second for exact the same file
for (unsigned long int k=0; k<DATASIZE ; k++){
qApp->processEvents();
/* Some process*/
DataToFile.push_back(Process_Function(DATA));
}
/*Generating file*/
myFile.write((char *)&DataToFile[0], DataToFile.size()*sizeof (float));
DATA SIZE around a couple of millions
Process_Function: take specific data, calculate the value and return it back.
Questions:
1- Is there a way to process the data, generate files without being frozen and without the huge delay of the qApp->processEvents();
2- Is it possible to run qApp->processEvents(); in another thread? / OR is there another way to do it?

First, create an object that encapsulates your work:
class Generator : public QObject {
Q_OBJECT
signals:
void progress(float);
void done();
slots:
void doWork() {
QVector<float> DataToFile;
for (unsigned long int k=0; k<DATASIZE ; k++){
/* Some process*/
DataToFile.push_back(Process_Function(DATA));
if (k % 100 == 0) { // Inform the UI thread every 100 datapoints
emit progress(k/DATASIZE);
}
}
myFile.write((char *)&DataToFile[0], DataToFile.size()*sizeof (float));
emit done();
}
};
Then create a new thread and have the object do its work there:
QThread *t = new QThread(this);
Generator *g = new Generator;
g->moveToThread(t);
QObject::connect(t, &QThread::started, g, &Generator::doWork);
QObject::connect(g, &Generator::done, t, &QThread::quit);
QObject::connect(t, &QThread::finished, g, &QObject::deleteLater);
QObject::connect(t, &QThread::finished, t, &QThread::deleteLater);
t->start();
The first two connect statements tie the lifetime of the generator to that of the thread, the last two clean up everything once the thread exits.
And you can of course connect to the Generator::progress signal to monitor the generation progress.

How can I pause an action in real time (Qt5)?

I have a function that reads some commands and does things related to that commands. The problem is that I want to pause the program after each command. This is my code:
void MainWindow::moveDown(){
QPoint l = ui->label->pos();
int x = l.rx();
int y = l.ry();
if(y+50 <= 630){
QPixmap pix(":/resources/img/Penguin.png");
int w = ui->label->width();
int h = ui->label->height();
ui->label->setPixmap(pix.scaled(w,h,Qt::KeepAspectRatio));
y = y+50;
ui->label->setGeometry(x, y, 50, 50);
//sleep(1);
}
}
As you can see, I have tried the sleep() function, but it pauses the program before the label starts moving. What else should I try?

I implemented this function many years ago that I've been using since. It keeps the event loop going so that the UI doesn't freeze during the sleep. It does this by executing a local event loop for the requested amount of milliseconds. It also aborts the loop if the application wants to quit:
/*
* Sleep for 'ms' milliseconds without freezing the UI.
*/
void sleepLoop(const long ms)
{
QEventLoop idle_loop;
QTimer timer;
timer.setSingleShot(true);
QObject::connect(&timer, &QTimer::timeout, &idle_loop, &QEventLoop::quit);
QObject::connect(qApp, &QCoreApplication::aboutToQuit, &idle_loop, &QEventLoop::quit);
timer.start(ms);
idle_loop.exec();
}

How to solve the problem that multithreaded drawing is not smooth?

I wrote a data acquisition program with Qt. I collect data using the child threads of the dual cache region written by QSemphore.
void QThreadShow::run() {
m_stop=false; // when start thread,m_stop=false
int n=fullBufs.available();
if (n>0)
fullBufs.acquire(n);
while (!m_stop) {
fullBufs.acquire(); // wait fo full buffer
QVector<double> dataPackage(BufferSize);
double seq=bufNo;
if (curBuf==1)
for (int i=0;i<BufferSize;i++){
dataPackage[i]=buffer2[i]; // copy data from full buffer
}
else
for (int i=0;i<BufferSize;i++){
dataPackage[i]=buffer1[i];
}
for (int k=0;k<BufferSize;k++) {
vectorQpointFbufferData[k]=QPointF(x,dataPackage[k]);
}
emptyBufs.release(); // release a buffer
QVariant variantBufferData;
variantBufferData.setValue(vectorQpointFbufferData);
emit newValue(variantBufferData,seq); // send data to main thread
}
quit();
}
When a cache of sub-threads has collected 500 data, the data is input into a QVector and sent to the main thread and is directly assigned to a lineseries in qchartview every 20ms for drawing. I use QtChart to chart the data.
void MainWindow::onthreadB_newValue(QVariant bufferData, double bufNo) {
// Analysis of QVariant data
CH1.hardSoftDataPointPackage = bufferData.value<QVector<QPointF>>();
if (ui->CH1_Source->currentIndex()==0) {
for (int p = 0;p<CH1.hardSoftDataPointPackage.size();p++) {
series_CH3->append(CH1.hardSoftDataPointPackage[p]);
}
}
}
There is a timer in the main thread.The interval is 20ms and there is a double time (time = time +1), which controls the X-axis.
void MainWindow::drawAxis(double time) {
// dynamic draw x axis
if (time<100) {
axisX->setRange(0, TimeBase/(1000/FrameRate) * 10);
// FrameRate=50
} else {
axisX->setRange(time-TimeBase/(1000/FrameRate) * 10, time);
}
}
But when I run my program, there is a problem that every time the subthread sends data to the main thread, the main thread gets stuck for a few seconds and the plot also gets stuck for a few seconds. I added a curve in the main thread getting data from the main thread, and found that both two curves will be stuck at the same time. I don't know how to solve this problem.
Besides, I want the main thread to draw the data from the child thread evenly within 20ms, instead of drawing all the points at once.

Your main thread stucks because you copy (add to series) a lot of data at one time. Instead this you can collect all your data inside your thread instance without emitting a signal. And from main thread just take little pieces of collected data every 20 ms.
Something like this:
while(!m_stop)
{
...
//QVariant variantBufferData;
//variantBufferData.setValue(vectorQpointFbufferData);
//emit newValue(variantBufferData,seq);//send data to main thread
//instead this just store in internal buffer
m_mutex.lock();
m_internalBuffer.append(vectorQpointFbufferData);
m_mutex.unlock();
}
Read method
QVector<QPointF> QThreadShow::takeDataPiece()
{
int n = 4;
QVector<QPointF> piece;
piece.reserve(n);
m_mutex.lock();
for (int i = 0; i < n; i++)
{
QPointF point = m_internalBuffer.takeFirst();
piece.append(point);
}
m_mutex.unlock();
return piece;
}
And in Main thread read in timeout slot
void MainWindow::OnDrawTimer()
{
QVector<QPointF> piece = m_childThread.takeDataPiece();
//add to series
...
//drawAxis
...
}

How to Monitor Qt Signal Event Queue Depth

There are two objects in my program. One object is emitting a signal. The other one receives the signal in a slot and processes the incoming signals one by one. Both objects are running in different threads. Now I need to measure and monitor the workload for my receiving object.
The problem is I do not know how many signals are waiting for my second object to process in the Qt signal queue. Is there a way to get the size of this queue? Or is there a work around to know how many signals have to be still proecessed?

The qGlobalPostedEventsCount() is a starting point, although it only works for the current thread.
To poll an arbitrary thread, we can to use Qt's internals. The implementation is then very simple. It works even when the thread is blocked and doesn't process events.
// https://github.com/KubaO/stackoverflown/tree/master/questions/queue-poll-44440584
#include <QtCore>
#include <private/qthread_p.h>
#include <climits>
uint postedEventsCountForThread(QThread * thread) {
if (!thread)
return -1;
auto threadData = QThreadData::get2(thread);
QMutexLocker lock(&threadData->postEventList.mutex);
return threadData->postEventList.size() - threadData->postEventList.startOffset;
}
uint postedEventsCountFor(QObject * target) {
return postedEventsCountForThread(target->thread());
}
If one really wishes not to use private APIs, we can have a less straightforward solution with more overhead. First, let's recall that the lowest overhead means of "doing stuff in some object's thread" is to do said "stuff" in an event's destructor - see this answer for more details. We can post the highest priority event to the target object's event queue. The event wraps a task that invokes qGlobalPostedEventsCount, updates the count variable, and releases a mutex that we then acquire. At the time of mutex acquisition, the count has a valid value that is returned. If the target thread is unresponsive and the request times out, -1 is returned.
uint qGlobalPostedEventsCount(); // exported in Qt but not declared
uint postedEventsCountForPublic(QObject * target, int timeout = 1000) {
uint count = -1;
QMutex mutex;
struct Event : QEvent {
QMutex & mutex;
QMutexLocker lock;
uint & count;
Event(QMutex & mutex, uint & count) :
QEvent(QEvent::None), mutex(mutex), lock(&mutex), count(count) {}
~Event() {
count = qGlobalPostedEventsCount();
}
};
QCoreApplication::postEvent(target, new Event(mutex, count), INT_MAX);
if (mutex.tryLock(timeout)) {
mutex.unlock();
return count;
}
return -1;
}
And a test harness:
int main(int argc, char ** argv) {
QCoreApplication app(argc, argv);
struct Receiver : QObject {
bool event(QEvent *event) override {
if (event->type() == QEvent::User)
QThread::currentThread()->quit();
return QObject::event(event);
}
} obj;
struct Thread : QThread {
QMutex mutex;
Thread() { mutex.lock(); }
void run() override {
QMutexLocker lock(&mutex);
QThread::run();
}
} thread;
thread.start();
obj.moveToThread(&thread);
QCoreApplication::postEvent(&obj, new QEvent(QEvent::None));
QCoreApplication::postEvent(&obj, new QEvent(QEvent::None));
QCoreApplication::postEvent(&obj, new QEvent(QEvent::None));
QCoreApplication::postEvent(&obj, new QEvent(QEvent::User));
auto count1 = postedEventsCountFor(&obj);
thread.mutex.unlock();
auto count2 = postedEventsCountForPublic(&obj);
thread.wait();
auto count3 = postedEventsCountFor(&obj);
Q_ASSERT(count1 == 4);
Q_ASSERT(count2 == 4);
Q_ASSERT(count3 == 0);
}
QT = core-private
CONFIG += console c++11
CONFIG -= app_bundle
TARGET = queue-poll-44440584
TEMPLATE = app
SOURCES += main.cpp

Multi-threading with QT + OpenCV

I'm trying to code a simple program that reads three video files (actually, 3 cameras that are in the same room), using 3 different threads. The code I'm using is the following:
mainwindow.cpp
void MainWindow::init()
{
numCams = 3;
// Resize the video for displaying to the size of the widget
int WidgetHeight = ui->CVWidget1->height();
int WidgetWidth = ui->CVWidget1->width();
for (int i = 0; i < numCams; i++){
// Create threads
threads[i] = new QThread;
// Create workers
string Path = "/Users/alex/Desktop/PruebasHilos/Videos/" + to_string(i+1) + ".m2v";
workers[i] = new Worker(QString::fromStdString(Path), i, WidgetHeight, WidgetWidth);
workers[i]->moveToThread(threads[i]);
connectSignals2Slots(threads[i], workers[i]);
threads[i]->start();
qDebug() << "Thread from camera " << (i+1) << " started";
}
}
void MainWindow::connectSignals2Slots(QThread *thread, Worker *worker)
{
connect(thread, SIGNAL(started()), worker, SLOT(readVideo()));
connect(thread, SIGNAL(finished()), thread, SLOT(deleteLater()));
connect(worker, SIGNAL(frameFinished(Mat, int)), this, SLOT(displayFrame(Mat,int)));
connect(worker, SIGNAL(finished(int)), thread, SLOT(quit()));
connect(worker, SIGNAL(finished(int)), worker, SLOT(deleteLater()));
}
void MainWindow::displayFrame(Mat frame, int index)
{
if (index == 0) {
// Camera 1
ui->CVWidget1->showImage(frame);
}
else if (index == 1) {
// Camera 2
ui->CVWidget2->showImage(frame);
}
else if (index == 2) {
// Camera 3
ui->CVWidget3->showImage(frame);
}
}
worker.cpp
Worker::Worker(QString path, int id, int WidgetHeight, int WidgetWidth) : filepath(path), index(id), WidgetHeight(WidgetHeight), WidgetWidth(WidgetWidth) {
}
Worker::~Worker(){
}
void Worker::readVideo()
{
VideoCapture cap(filepath.toStdString());
if (! cap.isOpened()) {
qDebug() << "Can't open video file " << filepath;
emit finished(index);
return;
}
Mat ActualFrame;
while (true) {
cap >> ActualFrame;
if (ActualFrame.empty()) {
// Empty frame to display when the video has finished
ActualFrame = Mat(Size(720, 576), CV_8UC3, Scalar(192, 0, 0));
emit frameFinished(ActualFrame, index);
qDebug() << "Video finished";
break;
}
// Background Subtraction
BackgroundSubtraction(ActualFrame, BackgroundMask);
emit frameFinished(ActualFrame.clone(), index);
QThread::msleep(35);
}
emit finished(index);
}
void Worker::BackgroundSubtraction(Mat ActualFrame, Mat &BackgroundMask)
{
pMOG2->apply(ActualFrame, BackgroundMask);
}
Just reading the frames from VideoCapture and displaying them into the UI by another different class that uses QWidgets works well.
However, when I include the BackgroundSubstraction method, the UI does not display the same frame number for the three cameras, maybe Camera1 is computing frame 100 and Camera2 and Camera3 are in frame 110.
This is because some frames are calculated faster than other and this leads to syntonization problems.
I'm quite new using threads in QT so i would like to make some synconization between threads so I know when the three different frames have been process in order to call the displayFrame method, and so, that the three same frames are displayed at the exact same time.
EDIT:
I assume that the easiest way to do this is using Barriers.
http://www.boost.org/doc/libs/1_55_0/doc/html/thread/synchronization.html#thread.synchronization.barriers . But I have no clue how to do this.
EDIT 2:
I have implemented this Syncronizacion using barriers and now the code looks like this:
barrier.h
#ifndef BARRIER_H
#define BARRIER_H
#include <QMutex>
#include <QWaitCondition>
#include <QSharedPointer>
// Data "pimpl" class (not to be used directly)
class BarrierData
{
public:
BarrierData(int count) : count(count) {}
void wait() {
mutex.lock();
--count;
if (count > 0)
condition.wait(&mutex);
else
condition.wakeAll();
mutex.unlock();
}
private:
Q_DISABLE_COPY(BarrierData)
int count;
QMutex mutex;
QWaitCondition condition;
};
class Barrier {
public:
// Create a barrier that will wait for count threads
Barrier(int count) : d(new BarrierData(count)) {}
void wait() {
d->wait();
}
private:
QSharedPointer<BarrierData> d;
};
#endif // BARRIER_H
updated worker.cpp
void Worker::readVideo()
{
VideoCapture cap(filepath.toStdString());
int framenumber = 0;
if (! cap.isOpened()) {
qDebug() << "Can't open video file " << filepath;
emit finished(index);
return;
}
Mat ActualFrame;
while (true) {
cap >> ActualFrame;
if (ActualFrame.empty()) {
// Empty frame to display when the video has finished
ActualFrame = Mat(Size(720, 576), CV_8UC3, Scalar(192, 0, 0));
emit frameFinished(ActualFrame, index);
qDebug() << "Video finished";
break;
}
// Background Subtraction
BackgroundSubtraction(ActualFrame, BackgroundMask);
QThread::msleep(5);
barrier.wait();
qDebug() << "Thread " << index << " processing frame " << framenumber ;
emit frameFinished(ActualFrame.clone(), index);
framenumber++;
}
emit finished(index);
}
void Worker::BackgroundSubtraction(Mat ActualFrame, Mat &BackgroundMask)
{
pMOG2->apply(ActualFrame, BackgroundMask);
}
It seems to work perfectly, however the output of the program is the following:
Thread 1 processing frame 0
Thread 0 processing frame 0
Thread 2 processing frame 0
Thread 2 processing frame 1
Thread 1 processing frame 1
Thread 0 processing frame 1
Thread 2 processing frame 2
Thread 1 processing frame 2
Thread 0 processing frame 2
Thread 2 processing frame 3
Thread 1 processing frame 3
Thread 0 processing frame 3
Thread 2 processing frame 4
Thread 1 processing frame 4
Thread 0 processing frame 4
Thread 2 processing frame 5
Thread 0 processing frame 5
Thread 1 processing frame 5
Thread 2 processing frame 6
Thread 1 processing frame 6
Thread 2 processing frame 7
Thread 0 processing frame 6
Thread 1 processing frame 7
Thread 2 processing frame 8
Thread 0 processing frame 7
Thread 1 processing frame 8
Thread 2 processing frame 9
Thread 0 processing frame 8
Thread 1 processing frame 9
Thread 1 processing frame 10
Thread 2 processing frame 10
Thread 0 processing frame 9
Thread 1 processing frame 11
Thread 2 processing frame 11
Thread 0 processing frame 10
Thread 1 processing frame 12
At the beginning the syncronization is perfectly working, but then it seems that the barrier is not working and threads are not waiting to each other...
EDIT 3: SOLVED
It seems that changing the value of
QThread::msleep(5);
to
QThread::msleep(35);
solves the problem of the syncronization although I do not really understand the reason.

even without the background subtraction you'd need some synchronization to be sure that the same frame number is processed by each thread.
In Qt the easiest (and imho the right) way to do it is to remove the infinite loop and instead call a slot of each thread to compute the next image, after all the threads emitted their signal frameFinished.
You could further use some buffering to precompute images in your threads and just load them from that buffer. In that scenario you could do the following:
each of your threads fills his buffer in an endless loop as long as there is free buffer space available. If the buffer is full, the thread waits until buffer space got freed.
when your gui has displayed and has waited some time, it sends a signal that is connected to each thread's slot like sendMeANewImage.
each thread sends the next available image from its buffer, or waits (infinite loop or conditional wait) for an image, if the buffer is empty. Then emits a frameFinished signal and frees the used buffer-space.
when each thread has emitted the signal, display all the images, wait some time and emit sendMeANewImage again.
This isn't threadsafe yet, you'll have the critical sections in reading and writing from the buffer. For each buffer, create a QMutex and call mutex.lock() whenever reading or writing or asking size etc. from that buffer. Call mutex.unlock() immediately afterwards.
When a mutex is locked and another thread (or even the same thread) tries to lock it again, the thread will wait there, until the other thread has unlocked the mutex. That way, only a single thread can enter a critical section.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js