How does the Lowe's ratio test work? - computer-vision

Suppose I have a set of N images and I have already computed the SIFT descriptors of each image. I know would like to compute the matches between the different features. I have heard that a common approach is the Lowe's ratio test but I cannot understand how it works. Can someone explain it to me?

Short version: each keypoint of the first image is matched with a number of keypoints from the second image. We keep the 2 best matches for each keypoint (best matches = the ones with the smallest distance measurement). Lowe's test checks that the two distances are sufficiently different. If they are not, then the keypoint is eliminated and will not be used for further calculations.
Long version:
David Lowe proposed a simple method for filtering keypoint matches by eliminating matches when the second-best match is almost as good. Do note that, although popularized in the context of computer vision, this method is not tied to CV. Here I describe the method, and how it is implemented/applied in the context of computer vision.
Let's suppose that L1 is the set of keypoints of image 1, each keypoint having a description that lists information about the keypoint, the nature of that info really depending on the descriptor algorithm that was used. And L2 is the set of keypoints for image 2. A typical matching algorithm will work by finding, for each keypoint in L1, the closest match in L2. If using Euclidian distance, like in Lowe's paper, this means the keypoint from set L2 that has the smallest Euclidian distance from the keypoint in L1.
Here we could be tempted to just set a threshold and eliminate all the pairings where the distance is above that threshold. But it's not that simple because not all variables inside the descriptors are as "discriminant": two keypoints could have a small distance measurement because most of the variables inside their descriptors have similar values, but then those variables could be irrelevant to the actual matching. One could always add weighting to the variables of the descriptors so that the more discriminating traits "count" more. Lowe proposes a much simpler solution, described below.
First, we match the keypoints in L1 with two keypoints in L2. Working from the assumption that a keypoint in image 1 can't have more than one equivalent in image 2, we deduce that those two matches can't both be right: at least one of them is wrong. Following Lowe's reasoning, the match with the smallest distance is the "good" match, and the match with the second-smallest distance the equivalent of random noise, a base rate of sorts. If the "good" match can't be distinguished from noise, then the "good" match should be rejected because it does not bring anything interesting, information-wise. So the general principle is that there needs to be enough difference between the best and second-best matches.
How the concept of "enough difference" is operationalized is important: Lowe uses a ratio of the two distances, often expressed a such:
if distance1 < distance2 * a_constant then ....
Where distance1 is the distance between the keypoint and its best match, and distance2 is the distance between the keypoint and its second-best match. The use of a "smalled than" sign can be somewhat confusing, but that becomes obvious when taking into consideration that a smaller distance means that the point is closer. In OpenCV world, the knnMatch function will return the matches from best to worst, so the 1st match will have a smaller distance. The question is really "how smaller?" To figure that out we multiply distance2 by a constant that has to be between 0 and 1, thus decreasing the value of distance2. Then we look at distance1 again: is it still smaller than distance2? If it is, then it passed the test and will be added to the list of good points. If not, it must be eliminated.
So that explans the "smaller than" part, but what about the multiplication? Since we are looking at the difference between the distances, why not just use an actual mathematical difference between distance1 and distance2? Although technically we could, the resulting difference would be in absolute terms, it would be too dependent on the variables inside the descriptors, the type of distance measurement that we use, etc. What if the code for extracting descriptions changes, affecting all distance measurements? In short, doing distance1 - distance2 would be less robust, would require frequent tweaking and would make methodological comparisons more complicated. It's all about the ratio.
Take-away message: Lowe's solution is interesting not only because of it's simplicity, but because it is in many ways algorithm-agnostic.

Lowe's Ratio Test
Algorithm:
First, we compute the distance between feature fi in image one and all the features fj in image two.
We choose feature fc in image two with the minimum distance to feature fi in image of one as our closest match.
We then proceed to get feature fs the feature in image two with the second closest distance to the feature fi.
Then we find how much nearer our closest match fc is over our second closest match fs through the distance ratio.
Finally we keep the matches with distance ratio < distance ratio threshold.
The distance ratio = d(fi, fc)/d(fi, fs) can be defined as the distance computed between feature fi in image one and fc the closest match in image two. Over the distance computed between feature fi and fs, the second closest match in image two.
We usually set the distance ratio threshold (ρ) to around 0.5, which means that we require our best match to be at least twice as close as our second best match to our initial features descriptor. Thus discarding our ambiguous matches and retaining the good ones.

For better understanding the ratio test, you need to read his article. Only by reading the article you will find out your answer.
The simple answer is that it is low, which Lowe achieved during his experiments and suggest that for choosing between two similar distance, choose the one which its distance is 0.7 other one.
check the below link:
https://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf

Related

Template Matching Subpixel Accuracy

I use template matching to detect a specific pattern in image.The shift determined is very shaky. Currently I apply it to R,G,B channel separately and average the result to obtain float values.Please, suggest how to obtain subpixel accuracy. I was planning to resize the image and then return the data in original scale, please suggest any other better method.
.
I use the code mentioned on Opencv site "http://docs.opencv.org/2.4/doc/tutorials/imgproc/histograms/template_matching/template_matching.html"
I believe the underlying issue is that minMaxLoc has only pixel accuracy. You could try out a subpixel-accurate patch http://www.longrange.net/Temp/CV_SubPix.cpp from the discussion here: http://answers.opencv.org/question/29665/getting-subpixel-with-matchtemplate/ .
As a quick and dirty experiment if a sub-pixel accurate minMaxLoc would resolve your issue, you can scale up the template matching result image (by a factor of 4, for instance) with cubic interpolation INTER_CUBIC http://docs.opencv.org/2.4/modules/imgproc/doc/geometric_transformations.html#resize and apply minMaxLoc on it. (Contrary to linear interpolation, cubic interpolation does move maxima.)
Apart from this, you can always apply Gaussian blur to both input images and template matching results to reduce high-frequency noise and suppress local maxima.
I would first try out the quick experiment. If it helps, you can integrate the minMaxLogSubPix implementation, but that will take longer.
It's a good thing to start with pixel accuracy before moving to subpixel accuracy. Checking the whole image at subpixel accuracy would be way to expensive.
An easy solution could be to have 4 versions of your template. Besides the base one, have one that's shifted 1/2 pixel left, another that's shifted 1/2 pixel down, and finally one that's shifted 1/2 pixel in both directions. When you have a match at {x,y}, check the neighborhood to see if the half-shifted templates are a better match.
The benefit of this method is that you only need to shift the small template, and it can be done up front.
Having said that, it seems you're tracking an object position over time. It may be worthwhiel to low-pass filter that position.

How can we use both Hamming distance and distance between coordinates to match features?

As known, for tracking objects in OpenCV we can use:
FeatureDetector to find features
DescriptorMatcher to match similarity between features of the desired object and features of current frame in video
and then use findHomography to find the new position of the object
For matching features DescriptorMatcher uses Hamming distance (value of the difference between the two sequences of the same size, not the distance between coordinates).
I.e. we find the most similar object in the current frame, but not the most nearest to the previous position (if we know it).
How can we use to match both Hamming distance and distance between coordinates, for example, given the weight of both, not only Hamming distance?
It could solve the following problems:
If we start to track object from position (x,y) on previous frame, and the current frame contains two similar objects, then we will find the most similar, but not the most nearest. But due to inertia coordinates usually changes slower than similarity (a sharp change in light or rotation of the object). And we must to find the similar object with the nearest coordinates.
Thus we find the features, which not only the most similar, but which will give the most accurate homography, because we exclude features, which, although very similar, but are very far away in coordinates and most likely belong to other objects.
What you need is probably something like:
Compute matches as usual.
DMatch has queryIdx and trainIdx indices. You can use these to retrieve the corresponding keypoints. Compute the euclidean distance between them, and update the value distance if DMatch with some kind of weighting function.
Sort matches by distance (since distance has changed).
Now the matches vector is sorted according both hamming distance between descriptors and euclidean distance between keypoints.
I think there's no build-in method in opencv to do that.
What I'd do is to use the cv::DescriptorMatcher::radiusMatch. It finds all the matches that are under a certain Hamming distance. You'd need to find a radius/distance that ensures that the features are similar enough for your application, but not too big to make the whole calculation slow.
Then, from this features you can choose the one that's closest to the position you predicted the feature would be, or calculate some kind of weighted score based on Hamming distance and cordinate distance, etc.

opencv c++ compare keypoint locations in different images

When comparing 2 images via feature extraction, how do you compare keypoint distances so to disregard those that are obviously incorrect?
I've found when comparing similar images against each other, most of the time it can fairly accurate, but other times it can throws matches that are completely separate.
So I'm after a way of looking at the 2 sets of keypoints from both images and determining whether the matched keypoints are relatively in the same locations on both. As in it knows that keypoints 1, 2, and 3 are so far apart on image 1, so the corresponding keypoints matched on image 2 should be of a fairly similar distance away from each other again.
I've used RANSAC and minimum distance checks in the past but only to some effect, they don't seem to be as thorough as I'm after.
(Using ORB and BruteForce)
EDIT
Changed "x, y, and z" to "1, 2, and 3"
EDIT 2 -- I'll try to explain further with quick Paint made examples:
Say I have this as my image:
And I give it this image to compare against:
Its a cropped and squashed version of the original, but obviously similar.
Now, say you ran it through feature detection and it came back with these results for the keypoints for the two images:
The keypoints on both images are in roughly the same areas, and proportionately the same distance away from each other. Take the keypoint I've circled, lets call it "Image 1 Keypoint 1".
We can see that there are 5 keypoints around it. Its these distances between them and "Image 1 Keypoint 1" that I want to obtain so to compare them against "Image 2 Keypoint 1" and its 5 surround keypoints in the same area (see below) so as to not just compare a keypoint to another keypoint, but to compare "known shapes" based off of the locations of the keypoints.
--
Does that make sense?
Keypoint matching is a problem with several dimensions. These dimensions are:
spatial distance, ie, the (x,y) distance as measured from the locations of two keypoints in different images
feature distance, that is, a distance that describes how much two keypoints look alike.
Depending on your context, you do not want to compute the same distance, or you want to combine both. Here are some use cases:
optical flow, as implemented by opencv's sparse Lucas-Kanade optical flow. In this case, keypoints called good features are computed in each frame, then matched on a spatial distance basis. This works because the image is supposed to change relatively slowly (the input frames have a video framerate);
image stitching, as you can implement from opencv's features2d (free or non-free). In this case, the images change radically since you move your camera around. Then, your goal becomes to find stable points, ie, points that are present in two or more images whatever their location is. In this case you will use feature distance. This also holds when you have a template image of an object that you want to find in query images.
In order to compute feature distance, you need to compute a coded version of their appearance. This operation is performed by the DescriptorExtractor class.
Then, you can compute distances between the output of the descriptions: if the distance between two descriptions is small then the original keypoints are very likely to correspond to the same scene point.
Pay attention when you compute distances to use the correct distance function: ORB, FREAK, BRISK rely on Hamming distance, while SIFt and SURF use a more usual L2 distance.
Match filtering
When you have individual matches, you may want to perform match filtering in order to reject good individual matches that may arise from scene ambiguities. Think for example of a keypoint that originates from the corner of a window of a house. Then it is very likely to match with another window in another house, but this may not be the good house or the good window.
You have several ways of doing it:
RANSAC performs a consistency check of the computed matches with the current solution estimate. Basically, it picks up some matches at random, computes a solution to the problem (usually a geometric transform between 2 images) and then counts how many of the matchings agree with this estimate. The estimate with the higher count of inliers wins;
David Lowe performed another kind of filtering in the original SIFT paper.
He kept the two best candidates for a match with a given query keypoint, ie, points that had the lowest distance (or highest similarity). Then, he computed the ratio similarity(query, best)/similarity(query, 2nd best). If this ratio is too low then the second best is also a good candidate for a match, and the matching result is dubbed ambiguous and rejected.
Exactly how you should do it in your case is thus very likely to depend on your exact application.
Your specific case
In your case, you want to develop an alternate feature descriptor that is based on neighbouring keypoints.
The sky is obviously the limit here, but here are some steps that I would follow:
make your descriptor rotation and scale invariant by computing the PCA of the keypoints :
// Form a matrix from KP locations in current image
cv::Mat allKeyPointsMatrix = gatherAllKeypoints(keypoints);
// Compute PCA basis
cv::PCA currentPCA(allKeyPointsMatrix, 2);
// Reproject keypoints in new basis
cv::Mat normalizedKeyPoints = currentPCA.project(allKeyPointsMatrix);
(optional) sort the keypoints in a quadtree or kd-tree for faster spatial indexing
Compute for each keypoint a descriptor that is (for example) the offsets in normalized coordinates of the 4 or 5 closest keypoints
Do the same in your query image
Match keypoints from both mages based on these new descriptors.
What is it you are trying to do exactly? More information is needed to give you a good answer. Otherwise it'll have to be very broad and most likely not useful to your needs.
And with your statement "determining whether the matched keypoints are relatively in the same locations on both" do you mean literally on the same x,y positions between 2 images?
I would try out the SURF algorithm. It works extremely well for what you described above (though I found it to be a bit slow unless you use gpu acceleration, 5fps vs 34fps).
Here is the tutorial for surf, I personally found it very useful, but the executables are for linux users only. However you can simply remove the OS specific bindings in the source code and keep only the opencv related bindings and have it compile + run on linux just the same.
https://code.google.com/p/find-object/#Tutorials
Hope this helped!
You can do a filter on the pixels distance between two keypoint.
Let's say matches is your vector of matches, kp_1 your vector of keypoints on the first picture and kp_2 on the second. You can use the code above to eliminate obviously incorrect matches. You just need to fix a threshold.
double threshold= YourValue;
vector<DMatch> good_matches;
for (int i = 0; i < matches.size(); i++)
{
double dist_p = sqrt(pow(abs(kp_1[matches[i][0].queryIdx].pt.x - kp_2[matches[i][0].trainIdx].pt.x), 2) + pow(abs(kp_1[matches[i][0].queryIdx].pt.y - kp_2[matches[i][0].trainIdx].pt.y), 2));
if (dist_p < threshold)
{
good_matches.push_back(matches[i][0]);
}
}

Finding "how straight" is a shape. openCV

I'm working on an application were I have a set of Contours(each one representing a Potential Line) and I wanna check "How straight" is that contour/shape.
The article I am using as a refrence uses the following technique:
It Matches a "segmented" line crossing the shape like so-
Then grading how "straight" is the line.
Heres an example of the Contours I am working on:
How would you go about implementing this technique?
Is there any other way of checking "How Straight" is a contour\shape?
Regards!
My first guess would be to use a coefficient of determination. That would be, fit a linear line to all your point assuming some reasonable origin where you won't receive rounding errors and calculate R^2.
A more advanced approach, if all contours are disconnected components, would be to calculate the structure model index (the link is for bone morphometry, but they explain the concept and cite the original paper.) This gives you a number that tells you how much your segment is "like a rod". This is just an idea, though. Anything that forms curves or has branches will be less and less like a rod.
I would say that it also depends on what you are using the metric for and if your contours are always generally carrying left to right.
An additional method would be to create the covariance matrix of your points, calculate the eigenvalues from that matrix, and take their ratio (where the ratio is greater than or equal to 1; otherwise, invert the ratio.) This is the basic principle behind a PCA besides the final ratio. If you have a rather linear data set (the data set varies in only one direction) then you will have a very large ratio. As the data set becomes less and less linear (or more uncorrelated) you would see the ratio approach one. A perfectly linear data set would be infinity and a perfect circle one (I believe, but I would appreciate if someone could verify this for me.) Also, working in two dimensions would mean the calculation would be computationally cheap and straight forward.
This would handle outliers very well and would be invariant to the rotation and shape of your contour. You also have a number which is always positive. The only issue would be preventing overflow when dividing the two eigenvalues. Then again you could always divide the smaller eigenvalue by the larger and your metric would be bound between zero and one, one being a circle and zero being a straight line.
Either way, you would need to test if this parameter is sensitive enough for your application.
One example for a simple algorithm is using the dot product between two segments to determine the angle between them. The formula for dot product is:
A * B = ||A|| ||B|| cos(theta)
Solving the equation for cos(theta) yields
cos(theta) = (A * B / (||A|| ||B||))
Since cos(0) = 1, cos(pi) = -1.0 and you're checking for the "straightness" of the lines, a line whose normalization of cos(theta) angles is closest to -1.0 is the straightest.
straightness = SUM(cos(theta))/(number of line segments)
where a straight line is close to -1.0, and a non-straight line approaches 1.0. Keep in mind this is a cursory evaluation of this algorithm and it obviously has edge cases and caveats that would need to be addressed in an implementation.
The trick is to use image moments. In short, you calculate the minimum inertia around an axis, the inertia around an axis perpendicular to this, and the ratio between them (which is always between 0 and 1; since inertia is non-negative)
For a straight line, the inertia along the line is zero, so the ratio is also zero. For a circle, the inertia is the same along all axis so the ratio is one. Your segmented line will be 0.01 or so as it's a fairly good match.
A simpler method is to compare the circumference of the the convex polygon containing the shape with the circumference of the shape itself. For a line, they're trivially equal, and for a not too crooked shape it's still comparable.

Measure of accuracy in pattern recognition using SURF in OpenCV

I’m currently working on pattern recognition using SURF in OpenCV. What do I have so far: I’ve written a program in C# where I can select a source-image and a template which I want to find. After that I transfer both pictures into a C++-dll where I’ve implemented a program using the OpenCV-SURFdetector, which returns all the keypoints and matches back to my C#-program where I try to draw a rectangle around my matches.
Now my question: Is there a common measure of accuracy in pattern recognition? Like for example number of matches in proportion to the number of keypoints in the template? Or maybe the size-difference between my match-rectangle and the original size of the template-image? What are common parameters that are used to say if a match is a “real” and “good” match?
Edit: To make my question clearer. I have a bunch of matchpoints, that are already thresholded by minHessian and distance value. After that I draw something like a rectangle around my matchpoints as you can see in my picture. This is my MATCH. How can I tell now how good this match is? I'm already calculating angle, size and color differences between my now found match and my template. But I think that is much too vague.
I am not 100% sure about what you are really asking, because what you call a "match" is vague. But since you said you already matched your SURF points and mentionned pattern recognition and the use of a template, I am assuming that, ultimately, you want to localize the template in your image and you are asking about a localization score to decide whether you found the template in the image or not.
This is a challenging problem and I am not aware that a good and always-appropriate solution has been found yet.
However, given your approach, what you could do is analyze the density of matched points in your image: consider local or global maximas as possible locations for your template (global if you know your template appears only once in the image, local if it can appear multiple times) and use a threshold on the density to decide whether or not the template appears. A sketch of the algorithm could be something like this:
Allocate a floating point density map of the size of your image
Compute the density map, by increasing by a fixed amount the density map in the neighborhood of each matched point (for instance, for each matched point, add a fixed value epsilon in the rectangle your are displaying in your question)
Find the global or local maximas of the density map (global can be found using opencv function MinMaxLoc, and local maximas can be found using morpho maths, e.g. How can I find local maxima in an image in MATLAB?)
For each maxima obtained, compare the corresponding density value to a threshold tau, to decide whether your template is there or not
If you are into resarch articles, you can check the following ones for improvement of this basic algorithm:
"ADABOOST WITH KEYPOINT PRESENCE FEATURES FOR REAL-TIME VEHICLE VISUAL DETECTION", by T.Bdiri, F.Moutarde, N.Bourdis and B.Steux, 2009.
"Interleaving Object Categorization and Segmentation", by B.Leibe and B.Schiele, 2006.
EDIT: another way to address your problem is to try and remove accidently-matched points in order to keep only those truly corresponding to your template image. This can be done by enforcing a constraint of consistancy between close matched points. The following research article presents an approach like this: "Context-dependent logo matching and retrieval", by H.Sahbi, L.Ballan, G.Serra, A.Del Bimbo, 2010 (however, this may require some background knowledge...).
Hope this helps.
Well, when you compare points you use some metric.
So results or comparison have some resulting distance.
And the less this distance is the better.
Example of code:
BFMatcher matcher(NORM_L2,true);
vector<DMatch> matches;
matcher.match(descriptors1, descriptors2, matches);
matches.erase(std::remove_if(matches.begin(),matches.end(),bad_dist),matches.end());
where bad_dist is defined as
bool dist(const DMatch &m) {
return m.distance > 150;
}
In this code i get rid of 'bad' matches.
There are many ways to match two patterns in the same image, actually it's a very open topic in computer vision, because there isn't a global best solution.
For instance, if you know your object can appear rotated (I'm not familiar with SURF, but I guess the descriptors are rotation invariant like SIFT descriptors), you can estimate the rotation between the pattern you have in the training set and the pattern you just matched with. A match with the minimum error will be a better match.
I recommend you consult Computer Vision: Algorithms and Applications. There's no code in it, but lots of useful techniques typically used in computer vision (most of them already implemented in opencv).