OpenCV execution speed (for loops and meanStdDev) - c++

I'm fairly new to OpenCV and C++ ( learning it now after doing a fair share if image processing on MATLAB and LabView).
I'm having a weird issue I wanted to ask your opinion.
I'm trying to do a fairly simple thing: moving window 1x9 stdev on a gray scaled image (~ 4500X2000 pix).
here is the heart of the code:
Mat src = imread("E:\\moon project\\Photos\\Skyline testing\\IMGP6043 sourse.jpg");
Scalar roi_mean, roi_stdev;
Mat stdev_map(src.rows, src.cols, CV_64FC1,Scalar(0));
cvtColor(src, src_gray, CV_BGR2GRAY);
int t = clock();
for (int i = 0; i < src_gray.cols - 1; i++)
{
for (int j = 0; j < src_gray.rows - 8; j++)
{
meanStdDev(src_gray.col(i).rowRange(j,j+9), roi_mean, roi_stdev);
stdev_map.at<double>(j, i) = roi_stdev[0];
}
}
t = clock() - t;
cout << "stdev calc : " << t << " msec" << endl;
Now on the aforementioned image it takes 35 seconds to run the double loop (delta t value) and even if I throw away the meanStdDev and just assign a constant to stdev_map.at(j, i) it still takes 14 seconds to run the double loop.
I'm pretty sure I'm doing something wrong since on Labview it takes only 2.5 seconds to chew this baby with the exact same math.
Please help me.

To answer your question and some of the comments: do compile the lib in release mode will surely increase the computation time, by what order it depends, for example if you are using eigen it probably will speed things up a lot.
If you really want to do the loop by yourself, consider getting the row pointer to the data directly mat.data, or mat.ptr<cv::Vec3b>.
If you want to speed up the task of computing mean/stdDev on any part of your image, then use integral images. The doc is pretty clear about it, and I'm pretty sure it will take less than 2.5s probably even in debug mode.

Related

How Can I make it faster in c++11 with std::vector?

I have cv::Mat Mat_A and cv::Mat Mat_B both are (800000 X 512) floats
and below code is looks slow .
int rows = Mat_B.rows;
cv::Mat Mat_A = cv::repeat(img, rows, 1, Mat_A);
Mat_A = Mat_A - Mat_B
cv::pow(Mat_A,2,Mat_A)
cv::reduce(Mat_A, Mat_A, 1, CV_REDUCE_SUM);
cv::minMaxLoc(Mat_A, &dis, 0, &point, 0);
How Can I do this in std::vector ?
I think it should be faster.
In my 2.4 Ghz mabook pro it took 4 sec ? very slow.
I don't think you should use std::vector to do these operations. Image processing (CV aka Computer Vision) algorithms tend to be quite computationally heavy because there is so much data to deal with. OpenCV 2.0 C++ is highly optimized for this kind of operations, e.g. cv::Mat has a header and whenever a cv::Mat is copied with copy assignment or constructor, only the headers are copied with a pointer to the data. They use reference counting to keep track of instances. So memory management is done for you, and that's a good thing.
https://docs.opencv.org/2.4/doc/tutorials/core/mat_the_basic_image_container/mat_the_basic_image_container.html
You could try to compile without debug symbols, i.e. release vs debug. And you can also try to compile with optimization flags, e.g. for gcc -O3 which should reduce the size of your binary and speed up runtime operations. Maybe it might make a difference.
https://www.rapidtables.com/code/linux/gcc/gcc-o.html
Another thing you could try is to give your process a higher priority, i.e. the higher the priority, the less it the process yields the CPU. Again, that might not make a lot of difference, it all depends of other processes and their priorities, etc.
https://superuser.com/questions/42817/is-there-any-way-to-set-the-priority-of-a-process-in-mac-os-x
I hope that helps a bit.
Well your thinking is wrong.
Why your program is slow:
Your CPU have to loop through a lot of number and do calculation. This will make computation complexity high. That's why it's slow. Your program's speed is in proportion to size of Mat A and B. You can check this point by reducing/increasing the size of Mat A and B.
Can we accelerate it by std::vector
Sorry but it's no. Using std::vector will not reduce the calculation complexity. The math arthmetic of opencv is da "best", re-writing will only lead to slower code.
How to accelerate the calculation: you need to enable the acceleration options for opencv
you can see it at : https://github.com/opencv/opencv/wiki/CPU-optimizations-build-options . Intel provide intel mkl library to accelerate the matrix calculation. You could try it first.
Personally, the easiest approach is to use the GPU. But your machine doesn't have GPU, so it's out of the scope here.
You keep iterating over the data over and over again to do independent operations on them.
Something like this iterates only once over the data.
//assumes Mat_B and img cv::Mat
using px_t = float;//you mentioned float so I'll assume both img and Mat_B use floats
int rows = Mat_B.rows;
cv::Mat output(1,rows, Mat_B.type());
auto output_ptr = output.ptr<px_t>(0);
auto img_ptr = img.ptr<px_t>(0);
int min_idx =0;
int max_idx =0;
px_t min_ele = std::numeric_limits<px_t>::max();
px_t max_ele = std::numeric_limits<px_t>::min();
for(int i = 0; i< rows; ++i)
{
output[i]=0;
auto mat_row = Mat_B.ptr<px_t>(i);
for(int j = 0; j< Mat_B.cols; ++j)
{
output[i] +=(img_ptr[j]-mat_row[j])*(img_ptr[j]-mat_row[j]);
}
if(output[i]<min_ele)
{
min_idx = i;
min_ele = output[i];
}
if(output[i]>max_ele)
{
max_idx = i;
max_ele = output[i];
}
}
While I am also not sure if it is faster you can do this, assuming Mat_B contains uchar
std::vector<uchar> array_B(Mat_B.rows* Mat_B.cols);
if(Mat_B.isContinuous())
array_B = Mat_B.data;

Dividing image to tiles in qt

I have a very big image (31000X26000 pixels). I need to create tiles of a given size from this image and store them. I'm trying to use Qt's QImagereader but I've notice that after setClipRect for the second time, it can't read from the image.
The code I have so far works, but is very slow (this first row takes 7 seconds, the second 14, the third 21 and so on...)
for (int i = 0; i < tilesPerRow; i++){
for (int j = 0; j < tilesPerCol; j++){
QImageReader reader(curImage);
reader.setClipRect(QRect(j*(tileSize-OVERLAP),i*(tileSize-OVERLAP),tileSize,tileSize));
QImage img = reader.read();
if (img.isNull())
qDebug() << reader.errorString();
else{
retImg.setTile(img,i,j);
}
}
}
What am I doing wrong? Is it reasonable that I have to create a new reader each time? Does the location of the tile I'm trying to access affects speed and performance? If you have any suggestions on a better practice, I would appreciate it

Image Magick slow drawing

I am trying to draw a bunch of lines on image using Image Magick library (Magick++ API) and the total execution time appears to be quite a large.
Are there any ways to optimize IMagick drawing performance?
int SIZE = 700, LINES_NUM = 6000;
Image outputImage(Geometry(SIZE, SIZE), Color("white"));
for (int i = 0; i < LINES_NUM; i++) {
outputImage.draw(DrawableLine(lines[i].x1,lines[i].y1,
lines[i].x2,lines[i].y2));
}
Try to avoid repeat Magick::Image.draw calls.
std::vector<Magick::Drawable> drawList;
for (int i = 0; i < LINES_NUM; i++) {
drawList.push_back(DrawableLine(lines[i].x1,lines[i].y1,
lines[i].x2,lines[i].y2));
}
outputImage.draw(drawList);
Also ensure that the ImageMagick libraries have been compiled with OpenMP support. If your going for speed, and not quality, I would recommend recompiling without High Dynamic Range Imagery --enable-hdri=no, and a low Quantum depth --with-quantum-depth=8.

OpenCV - Ensemble of exemplar SVMs

I have been trying to implement an ensemble of exemplar SVM using OpenCV. The general idea beyond it is that, given K exemplars for training (e.g., images of cars), one can train K SVMs, where each SVM is trained using one positive sample and K-1 negatives. At testing time then, the ensemble will fire K times, with the highest score being the best prediction.
To implement this, I am using OpenCV SVM implementation (using current GIT master - 3.0 at time of writing), and the C++ interface. As features, I am using HOGs, with the response being ~7000 in size. Hence each image has a 7000D feature vector.
The problem I am facing is that the various SVM do not train properly. Actually, they do not train at all! The training in fact executes extremely fast, and always returns 1 support vector per SVM, with alpha=1.0. I am not sure if this is due to the fact that I have a single positive versus many (>900) negatives, or if it's simply a bug in my code. However, after looking at it several times, I cannot spot any obvious mistakes.
This is how I set-up my problem (assuming we got the HOG responses for the entire dataset and put them in a std::vector > trainingData ). Please note that EnsambleSVMElement is a Struct holding the SVM plus a bunch of other info.
In brief: I set-up a training matrix, where each row contains the HOG response for a particular sample. I then start training each SVM separately. For each training iteration, I create a label vector, where each entry is set to -1 (negative sample), except the entry associate to the current SVM I am training which is set to 1 (so if I am training entry 100, the only positive label will be at labels[100]).
Training code
int ensambles = trainingData.size();
if(ensambles>1)
{
//get params to normalise the data in [0-1]
std::vector<float> mins(trainingData.size());
std::vector<float> maxes(trainingData.size());
for(int i=0; i<trainingData.size(); ++i)
{
mins[i] = *std::min_element(trainingData[i].begin(), trainingData[i].end());
maxes[i] = *std::max_element(trainingData[i].begin(), trainingData[i].end());
}
float min_val = *std::min_element(mins.begin(), mins.end());
float max_val = *std::min_element(maxes.begin(), maxes.end());
int featurevector_size = trainingData[0].size();
if(featurevector_size>0)
{
//set-up training data. i-th row contains HOG response for sample i
cv::Mat trainingDataMat(ensambles, featurevector_size, CV_32FC1);
for(int i=0; i<trainingDataMat.rows; ++i)
for(int j=0; j<trainingDataMat.cols; ++j)
trainingDataMat.at<float>(i, j) = (trainingData.at(i).at(j)-min_val)/(max_val-min_val); //make sure data are normalised in [0-1] - libSVM constraint
for(int i=0; i<ensambles; ++i)
{
std::vector<int> labels(ensambles, -1);
labels[i] = 1; //one positive only, and is the current sample
cv::Mat labelsMat(ensambles, 1, CV_32SC1, &labels[0]);
cv::Ptr<cv::ml::SVM> this_svm = cv::ml::StatModel::train<SVM>(trainingDataMat, ROW_SAMPLE, labelsMat, svmparams);
ensamble_svm.push_back(EnsambleSVMElement(this_svm));
Mat sv = ensamble_svm[i].svm->getSupportVectors();
std::cout << "SVM_" << i << " has " << ensamble_svm[i].svm->getSupportVectors().rows << " support vectors." << std::endl;
}
}
else
std::cout <<"You passed empty feature vectors!" << std::endl;
}
else
std::cout <<"I need at least 2 SVMs to create an ensamble!" << std::endl;
The cout always prints "SVM_i has 1 support vectors".
For completeness, these are my SVM parameters:
cv::ml::SVM::Params params;
params.svmType = cv::ml::SVM::C_SVC;
params.C = 0.1;
params.kernelType = cv::ml::SVM::LINEAR;
params.termCrit = cv::TermCriteria(cv::TermCriteria::MAX_ITER, (int)1e4, 1e-6);
Varying C between 0.1 and 1.0 doesn't affect the results. Neither does setting up weights for the samples, as read here. Just for reference, this is how I am setting up the weights (big penalty for negatives):
cv::Mat1f class_weights(1,2);
class_weights(0,0) = 0.01;
class_weights(0,1) = 0.99;
params.classWeights = class_weights;
There is clearly something wrong eiether in my code, or in my formulation of the problem. Can anyone spot that?
Thanks!
have you made any progress?
My guess is that your C parameter is too low, try bigger values (10, 100, 1000).
Another important aspect is that in the Exemplar SVM framework the training phase is not this simple. The author alternates between a training step and a hard negative mining step, in order to make the training phase more effective.
If you need more details than reported in the Exemplar-SVM article, you can look at Malisiewicz phd thesis: http://people.csail.mit.edu/tomasz/papers/malisiewicz_thesis.pdf
I think you have a small error here: the second line, I think it should be
*std::max_element(...), not *std::min_element(...)
float min_val = *std::min_element(mins.begin(), mins.end());
float max_val = *std::min_element(maxes.begin(), maxes.end());

Access pixel values without loading image in memory for large images

I need to compute the mean value of an image using CImg library like this:
int i = 0;
float mean = 0;
CImg<float> img("image.cimg");
float *ptr = img.data(); //retrieves pointer to the first value
while(i<img.width()*img.height()*img.spectrum()){
mean += *(ptr+i);
++i;
}
std::cout << "mean: " << mean/i << std::endl;
I know that img.mean() would do the trick, but here I want to do it in a low-level way.
When the size of the image increases too much, the 3rd line in my code consumes too much resources of my computer because according to the documentation it is storing all the image pixels in a memory buffer at the same time.
I thought about an even lower level solution, using the system calls open() and read() as follows:
int i = 0;
int k = WIDTH*HEIGHT*SPECTRUM; //assuming this values are known
float mean = 0, aux;
int fd = open("image.cimg", O_RDONLY);
while(i<k){
read(fd, &aux, sizeof(float));
mean += aux;
++i;
}
close(fd);
std::cout << "mean: " << mean/i << std::endl;
But the results obtained now don't make any sense. I wonder if this solution makes any sense at all, if the image is stored at the disk at the same way it is when loaded at the memory, and if at the end this solution would save time and memory or not.
The problem is the second line of your code because you have made mean (although it would be better named sum) a simple float. As each pixel in your image is also a float, you will run into problems if your image is, say 10,000x10,000 because you would be trying to store the sum of 100M floats in a float.
The easiest solution is to change line 2 to:
double mean=0;
As an alternative, you can calculate the mean incrementally as you go along without it overflowing like this:
float mean = 0;
int i = 1;
while(...){
mean+= (x - mean)/i;
++i;
}
By the way, if you have really large images, may I recommend vips, it is very fast and very efficient, for example, if I create a 10,000x10,000 pixel TIF and ask vips to average it from the command line:
time vips avg image.tif --vips-leak
0.499994
memory: high-water mark 7.33 MB
real 0m0.384s
user 0m0.492s
sys 0m0.233s
You can see it take 0.4 seconds and peaks out at 7MB memory usage.