OpenCV : Fast Template matching algorithm - c++

I am currently trying to look for an image inside a video. The main goal is to follow some actions on the video like a pressed button or a pop-up window displayed on the screen.
The code I'm using uses OpenCV Template Matching function:
// For SQDIFF and SQDIFF_NORMED, the best matches are lower values. For all the other methods, the higher the better
if( matchingMethod == CV_TM_SQDIFF || matchingMethod == CV_TM_SQDIFF_NORMED )
matchLoc = minLoc;
else
matchLoc = maxLoc;
if( !((matchLoc.x == 0) && (matchLoc.y == 0)) || maxVal >= 0.8)
return TRUE;
return FALSE;
}
The test is done with these two templates :
And the full image is a 3840x2160 image (i can't but the whole image since it is too big in bmp):
1) The question is how is it possible that for two templates with very few pixels of difference the algorithm can detect the first one but completely skip the second one ?
2) Is it possible that color depth could cause problems in the detection ?
Both templates are loaded as BMP files in 24 bits depth. The source image is converted in 24 bits depth.
Threshold is set to 0.92 for good accuracy
MaxLevels is set to 1 for a very good accuracy since 2 does not find any matches
Thank your for your help and advices

For those who maybe have the same issue, I just had to manage the return value differently.
Instead of
if( !((matchLoc.x == 0) && (matchLoc.y == 0)) || maxVal >= 0.8)
return TRUE;
Which returns true as soon as potential match (80% match) has been found.
Now I return true only if the maxVal is superior than 0.99 which means very high match.
if( maxVal >= 0.99)
return TRUE;
Second element I've changed is the threshold value which is used to classify the pixel values. I've decreased this value to 0.82 instead of 0.94 to get more possible matches and then filter with the maxVal value.

Related

Smoothing a contour with a lookup table / levels mapping (OpenCV)

I'm trying to smooth jagged contours drawn by OpenCV's drawContours() method. I'm applying a Gaussian blur to the contour, then trying to use a lookup table to map the pixel intensities.
However, I don't know what values to use in my lookup table. Right now I'm just guessing at arbitrary numbers. I put together a small mockup: The first two images are results directly from OpenCV. The last image is achieved through Photoshop's levels feature. As you can see it's smoother.
How do I know what values to use in my look up table?
std::vector<char> lut(256);
for (int i = 0; i <= 255; ++i) {
if(i >= 75) lut[i] = 255;
else if (i <= 25) lut[i] = 0;
else if (i < 75 && i > 25) lut[i] = i;
}
cv::LUT(contoursOverlay, lut, contoursOverlay);
Did you think to apply only a dilatation filter (and optionally a blur afterward): http://docs.opencv.org/doc/tutorials/imgproc/erosion_dilatation/erosion_dilatation.html
It could be simpler and a better solution. In your case, I doubt there is good practise in this case. I'm not sure it's the good solution.

Is there a way to fill a linear gradient along a diagonal?

I am trying to write code that will fill a rectangular region with a gradient that varies along a diagonal of that region. I had thought that I could play with the direction parameter as follows:
context->GradientFillLinear(
wxrect,
get_wx_colour(gradient.front()),
get_wx_colour(gradient.back()),
wxNORTH | wxEAST);
When I do this, the compiler converts the direction subexpression to an int and fails to compile because of a type mismatch. I suspect that gradients can only be filled horizontally or vertically and this is why the parameter is written expecting an enum value. Can anyone confirm this suspicion?
As of wxWidget-3.0.2, the implementation of GradientFillLinear eventually calls a specific implementation which looks somewhat like:
wxDCImpl::DoGradientFillLinear()
{
...
if ( nDirection == wxEAST || nDirection == wxWEST )
{
...
}
else // nDirection == wxNORTH || nDirection == wxSOUTH
{
...
}
So, your suspicion appears to be correct and even if you did manage to somehow coerce the direction as wxNORTH | wxEAST in the argument of GradientFillLinear, the implementation would not have supported it.
As SleuthEye's answer correctly says, this can't be done directly, but you can always apply a transformation to rotate the horizontal or vertical gradient by 45 degrees.

Recognizing an image from a list with OpenCV SIFT using the FLANN matching

The point of the application is to recognize an image from an already set list of images. The list of images have had their SIFT descriptors extracted and saved in files. Nothing interesting here:
std::vector<cv::KeyPoint> detectedKeypoints;
cv::Mat objectDescriptors;
// Extract data
cv::SIFT sift;
sift.detect(image, detectedKeypoints);
sift.compute(image, detectedKeypoints, objectDescriptors);
// Save the file
cv::FileStorage fs(file, cv::FileStorage::WRITE);
fs << "descriptors" << objectDescriptors;
fs << "keypoints" << detectedKeypoints;
fs.release();
Then the device takes a picture. SIFT descriptors are extracted in the same way. The idea now was to compare the descriptors to the ones from the files. I am doing that using the FLANN matcher from OpenCV. I am trying to quantify the similarity, image by image. After going through the whole list I should have the best match.
const cv::Ptr<cv::flann::IndexParams>& indexParams = new cv::flann::KDTreeIndexParams(1);
const cv::Ptr<cv::flann::SearchParams>& searchParams = new cv::flann::SearchParams(64);
// Match using Flann
cv::Mat indexMat;
cv::FlannBasedMatcher matcher(indexParams, searchParams);
std::vector< cv::DMatch > matches;
matcher.match(objectDescriptors, readDescriptors, matches);
After matching I understand that I get a list of the closest found distances between the feature vectors. I find the minimum distance and, using it I can count "good matches" and even get a list of the respective points:
// Count the number of mathes where the distance is less than 2 * min_dist
int goodCount = 0;
for (int i = 0; i < objectDescriptors.rows; i++)
{
if (matches[i].distance < 2 * min_dist)
{
++goodCount;
// Save the points for the homography calculation
obj.push_back(detectedKeypoints[matches[i].queryIdx].pt);
scene.push_back(readKeypoints[matches[i].trainIdx].pt);
}
}
I'm showing easy parts of the code just to make this more easy to follow, I know some of it doesn't need to be here.
Continuing, I was hoping that simply counting the number of good matches like this would be enough, but it turned out to mostly just point me to the image with the most descriptors. What I tried to after this was computing the homography. The aim was to compute it and see whether it's a valid homoraphy or not. The hope was that a good match, and only a good match, would have a homography that is a good transformation. Creating the homography was done simply using cv::findHomography on the obj and scene which are std::vector< cv::Point2f>. I checked the validity of the homography using some code I found online:
bool niceHomography(cv::Mat H)
{
std::cout << H << std::endl;
const double det = H.at<double>(0, 0) * H.at<double>(1, 1) - H.at<double>(1, 0) * H.at<double>(0, 1);
if (det < 0)
{
std::cout << "Homography: bad determinant" << std::endl;
return false;
}
const double N1 = sqrt(H.at<double>(0, 0) * H.at<double>(0, 0) + H.at<double>(1, 0) * H.at<double>(1, 0));
if (N1 > 4 || N1 < 0.1)
{
std::cout << "Homography: bad first column" << std::endl;
return false;
}
const double N2 = sqrt(H.at<double>(0, 1) * H.at<double>(0, 1) + H.at<double>(1, 1) * H.at<double>(1, 1));
if (N2 > 4 || N2 < 0.1)
{
std::cout << "Homography: bad second column" << std::endl;
return false;
}
const double N3 = sqrt(H.at<double>(2, 0) * H.at<double>(2, 0) + H.at<double>(2, 1) * H.at<double>(2, 1));
if (N3 > 0.002)
{
std::cout << "Homography: bad third row" << std::endl;
return false;
}
return true;
}
I don't understand the math behind this so, while testing, I sometimes replaced this function with a simple check whether the determinant of the homography was positive. The problem is that I kept having issues here. The homographies were either all bad, or good when they shouldn't have been (when I was checking only the determinant).
I figured I should actually use the homography and for a number of points just compute their position in the destination image using their position in the source image. Then I would compare these average distances, and I would ideally get a very obvious smaller average distance in the case of the correct image. This did not work at all. All the distances were colossal. I thought I might have used the homography the other way around to calculate the right position, but switching obj and scene with each other gave similar results.
Other things I tried were SURF descriptors instead of SIFT, BFMatcher (brute force) instead of FLANN, getting the n smallest distances for every image instead of a number depending on the minimum distance, or getting distances depending on a global maximum distance. None of these approaches gave me definite good results, and I feel stuck now.
My only next strategy would be to sharpen the images or even turn them to binary images using some local threshold or some algorithms used for segmentation. I am looking for any suggestions or mistake anyone can see in my work.
I don't know whether this is relevant, but I added some of the images I am testing this on. Many times in the test images most of the SIFT vectors come from the frame (higher contrast) than the painting. This is why I'm thinking sharpening the images might work, but I don't want to go deeper in case something I did previously is wrong.
The gallery of images is here with the descriptions in the titles. The images are of quite high resolution, please view in case it might give some hints.
You can try to test if when matching, the lines between the source image and the target image are relatively parallel. If it's not a correct match, then you'd have a lot of noise and the lines won't be parallel.
See the attached image which shows a correct match (using SURF and BF) - all the lines are mostly parallel (though I should point out that this is an easy example).
You are going correct way.
First, use second nearest ratio isntead of your "good match by 2*min_dist" https://stackoverflow.com/a/23019889/1983544.
Second, use homography other way. When you find homography, you have not only H ,matrix, but the number of correspondences consistent with it. Check if it is some reasonable number, say >=15. If less, than object is not matched.
Third, if you have a big viewpoint change, SIFT or SURF are unable to match images. Try to use MODS instead (http://cmp.felk.cvut.cz/wbs/ here is Windows and Linux binaries, as well as paper describing algorithm) or ASIFT (much slower and matches much worse, but open source) http://www.ipol.im/pub/art/2011/my-asift/
Or at least use MSER or Hessian-Affine detector instead of SIFT (retaining SIFT as descriptor).

OpenCV floodFill() fills unconnected regions

I have implemented the connected component identification algorithm from here, but it seems, that the cv::floodFill(...) fills unconnected regions in some cases.
First of all, here is the code:
void ImageMatchingOpenCV::getConnectedComponents(const cv::Mat& binImg, vector<vector<cv::Point>>& components, vector<vector<cv::Point>>& contours, const int minSize)
{
cv::Mat ccImg;
binImg.convertTo(ccImg, CV_32FC1);
int gap=startPointParams.gap;
int label = 1;
for(int y=gap; y<binImg.rows-gap; ++y)
{
for(int x=gap; x<binImg.cols-gap; ++x)
{
if((int)ccImg.at<float>(y, x)!=255) continue;
cv::Rect bBox;
cv::floodFill(ccImg, cv::Point(x, y), cv::Scalar(label), &bBox, cv::Scalar(0), cv::Scalar(0), 4 /*| cv::FLOODFILL_FIXED_RANGE*/);
if(bBox.x<gap || bBox.y<gap || bBox.x+bBox.width>=binImg.cols-gap || bBox.y+bBox.height>=binImg.rows-gap) continue;
components.push_back(vector<cv::Point>()); contours.push_back(vector<cv::Point>());
for(int i=bBox.y; i<bBox.y+bBox.height; ++i)
{
for(int j=bBox.x; j<bBox.x+bBox.width; ++j)
{
if((int)ccImg.at<float>(i, j)!=label) continue;
components.back().push_back(cv::Point(j, i));
if( (int)ccImg.at<float>(i+1, j)!=label
|| (int)ccImg.at<float>(i-1, j)!=label
|| (int)ccImg.at<float>(i, j+1)!=label
|| (int)ccImg.at<float>(i, j-1)!=label) contours.back().push_back(cv::Point(j, i));
}
}
if(components.back().size()<minSize)
{
components.pop_back();
contours.pop_back();
}
else
{
++label;
if(label==255) ++label;
break;
}
}
if(label!=1) break;
}
}
The input cv::Mat contains 2448x2050 pixels of size CV_8U. The pixel values are either 0 (background) or 255 (foreground). There are 17 connected components in the image. All components but the first are identified correctly. The erroneous component is by far largest one (~1.5 million pixels) and contains some small disconnected pixel-groups. It encompasses all of the other components. The small disconnected pixel-groups, which are wrongly assigned to the first component are all connected to the top of the components bounding box.
EDIT: I added some images to visualize the problem. The first image shows all identified connected components. The second image shows only the erroneous component (notice the small disconnected pixel groups at the top). The third images zooms a part of the second image:
If someone has an idea, where the error might be, I would be thankful.
I found the bug myself. At the end of the method small components are thrown away. In this case the component's number (label) is not increased:
if(components.back().size()<minSize)
{
components.pop_back();
contours.pop_back();
}
else
{
++label;
if(label==255) ++label;
}
This means, the label number is used again to mark the next component in the image. Hence, several small components and a sufficiantly large component might have the same label number. If now the bounding box of the large component is iterated, then this bounding box might contain some small previously identified, but unused components with the same label number.
The solution is to remove the else branch and instead increase the label number always.

Using time in OpenCV for frame processes and other tasks

I want to count the vehicles from a video. After frame differencing I got a gray scale image or kind of binary image. I have defined a Region of Interest to work on a specific area of the frames, the values of the pixels of the vehicles passing through the Region of Interest are higher than 0 or even higher than 40 or 50 because they are white.
My idea is that when a certain number of pixels in a specific interval of time (say 1-2 seconds) are white then there must be a vehicle passing so I will increment the counter.
What I want is, to check whether there are still white pixels coming or not after a 1-2 seconds. If there are no white pixels coming it means that the vehicle has passed and the next vehicle is gonna come, in this way the counter must be incremented.
One method that came to my mind is to count the frames of the video and store it in a variable called No_of_frames. Then using that variable I think I can estimate the time passed. If the value of the variable No_of_frames is greater then lets say 20, it means that nearly 1 second has passed, if my videos frame rate is 25-30 fps.
I am using Qt Creator with windows 7 and OpenCV 2.3.1
My code is something like:
for(int i=0; i<matFrame.rows; i++)
{
for(int j=0;j<matFrame.cols;j++)
if (matFrame.at<uchar>(i,j)>100)//values of pixels greater than 100
//will be considered as white.
{
whitePixels++;
}
if ()// here I want to use time. The 'if' statement must be like:
//if (total_no._of_whitepixels>100 && no_white_pixel_came_after 2secs)
//which means that a vehicle has just passed so increment the counter.
{
counter++;
}
}
Any other idea for counting the vehicles, better than mine, will be most welcomed. Thanks in advance.
For background segmentation I am using the following algorithm but it is very slow, I don't know why. The whole code is as follows:
// opencv2/video/background_segm.hpp OPENCV header file must be included.
IplImage* tmp_frame = NULL;
CvCapture* cap = NULL;
bool update_bg_model = true;
Mat element = getStructuringElement( 0, Size( 2,2 ),Point() );
Mat eroded_frame;
Mat before_erode;
if( argc > 2 )
cap = cvCaptureFromCAM(0);
else
// cap = cvCreateFileCapture( "C:\\4.avi" );
cap = cvCreateFileCapture( "C:\\traffic2.mp4" );
if( !cap )
{
printf("can not open camera or video file\n");
return -1;
}
tmp_frame = cvQueryFrame(cap);
if(!tmp_frame)
{
printf("can not read data from the video source\n");
return -1;
}
cvNamedWindow("BackGround", 1);
cvNamedWindow("ForeGround", 1);
CvBGStatModel* bg_model = 0;
for( int fr = 1;tmp_frame; tmp_frame = cvQueryFrame(cap), fr++ )
{
if(!bg_model)
{
//create BG model
bg_model = cvCreateGaussianBGModel( tmp_frame );
// bg_model = cvCreateFGDStatModel( temp );
continue;
}
double t = (double)cvGetTickCount();
cvUpdateBGStatModel( tmp_frame, bg_model, update_bg_model ? -1 : 0 );
t = (double)cvGetTickCount() - t;
printf( "%d. %.1f\n", fr, t/(cvGetTickFrequency()*1000.) );
before_erode= bg_model->foreground;
cv::erode((Mat)bg_model->background, (Mat)bg_model->foreground, element );
//eroded_frame=bg_model->foreground;
//frame=(IplImage *)erode_frame.data;
cvShowImage("BackGround", bg_model->background);
cvShowImage("ForeGround", bg_model->foreground);
// cvShowImage("ForeGround", bg_model->foreground);
char k = cvWaitKey(5);
if( k == 27 ) break;
if( k == ' ' )
{
update_bg_model = !update_bg_model;
if(update_bg_model)
printf("Background update is on\n");
else
printf("Background update is off\n");
}
}
cvReleaseBGStatModel( &bg_model );
cvReleaseCapture(&cap);
return 0;
A great deal of research has been done on vehicle tracking and counting. The approach you describe appears to be quite fragile, and is unlikely to be robust or accurate. The main issue is using a count of pixels above a certain threshold, without regard for their spatial connectivity or temporal relation.
Frame differencing can be useful for separating a moving object from its background, provided the object of interest is the only (or largest) moving object.
What you really need is to first identify the object of interest, segment it from the background, and track it over time using an adaptive filter (such as a Kalman filter). Have a look at the OpenCV video reference. OpenCV provides background subtraction and object segmentation to do all the required steps.
I suggest you read up on OpenCV - Learning OpenCV is a great read. And also on more general computer vision algorithms and theory - http://homepages.inf.ed.ac.uk/rbf/CVonline/books.htm has a good list.
Normally they just put a small pneumatic pipe across the road (soft pipe semi filled with air). It is attached to a simple counter. Each vehicle passing over the pipe generates two pulses (first front, then rear wheels). The counter records the number of pulses in specified time intervals and divides by 2 to get the approx vehicle count.