Find a good homography from different point of view of objects? - c++

I am doing object detection using feature extraction (sift,orb).
I want to extract ORB feature from different point of view of the object (train images) and then matching all of them with a query image.
The problem I am facing is: how can I create a good homography from keypoint coming from different point of view of the image that have of course different sizes?
Edit
I was thinking to create an homography for each train images that got say 3-4 matches and then calculate some "mean" homography...
The probleam arise when you have for example say just 1-2 matches from each train image, at that point you cannot create not even 1 homography
Code for create homography
//> For each train images with at least some good matches ??
H = findHomography( train, scene, CV_RANSAC );
perspectiveTransform( trainCorners, sceneCorners, H);

I think there is no point on doing that as a pair of images A and B has nothing to do with a pair of images B and C when you talk about homography. You will get different sets of good matches and different homographies, but homographies will be unrelated and no error minimization would have a point.
All minimization has to be within matches, keypoints and descriptors considering just the pair of images.
There is an idea similar to what you ask in FREAK descriptor. You can train the selected pairs with a set of images. That means that FREAK will decide the best pattern for extracting descriptors basing on a set of images. After this training you are supposed to find more robust mathces that will give you a better homography.

To find a good homography you need accurate matches of your keypoints. You need 4 matches.
The most common methos is DLT combined with RANSAC. DLT is a linear transform that finds the homography 3x3 matrix that proyects your keypoints into the scene. RANSAC finds the best set of inliers/outliers that satisfies the mathematicl model, so it will find the best 4 points as input of DLT.
EDIT
You need to find robust keypoints. SIFT is supossed to do that, scale and perspective invariant. I don't think you need to train with different images. Finding a mean homography has no point. You need to find an only homography for an object detected, and that homography will be the the transformation between the marker and the object detected. Homography is precise, there is no point on finding a mean.

Have you tried the approach of getting keypoints from views of the object: train_kps_1, train_kps_2... then match those arrays with the scene, then select the best matches from those several arrays resulting in a single array of good matches. And finally use that result in find homography as the 'train'.
The key here is how to select the best matches, which is a different question, which you can find a nice anwser here:
http://answers.opencv.org/question/15/how-to-get-good-matches-from-the-orb-feature/
And maybe here:
http://answers.opencv.org/question/2493/best-matches/

Related

opencv c++ compare keypoint locations in different images

When comparing 2 images via feature extraction, how do you compare keypoint distances so to disregard those that are obviously incorrect?
I've found when comparing similar images against each other, most of the time it can fairly accurate, but other times it can throws matches that are completely separate.
So I'm after a way of looking at the 2 sets of keypoints from both images and determining whether the matched keypoints are relatively in the same locations on both. As in it knows that keypoints 1, 2, and 3 are so far apart on image 1, so the corresponding keypoints matched on image 2 should be of a fairly similar distance away from each other again.
I've used RANSAC and minimum distance checks in the past but only to some effect, they don't seem to be as thorough as I'm after.
(Using ORB and BruteForce)
EDIT
Changed "x, y, and z" to "1, 2, and 3"
EDIT 2 -- I'll try to explain further with quick Paint made examples:
Say I have this as my image:
And I give it this image to compare against:
Its a cropped and squashed version of the original, but obviously similar.
Now, say you ran it through feature detection and it came back with these results for the keypoints for the two images:
The keypoints on both images are in roughly the same areas, and proportionately the same distance away from each other. Take the keypoint I've circled, lets call it "Image 1 Keypoint 1".
We can see that there are 5 keypoints around it. Its these distances between them and "Image 1 Keypoint 1" that I want to obtain so to compare them against "Image 2 Keypoint 1" and its 5 surround keypoints in the same area (see below) so as to not just compare a keypoint to another keypoint, but to compare "known shapes" based off of the locations of the keypoints.
--
Does that make sense?
Keypoint matching is a problem with several dimensions. These dimensions are:
spatial distance, ie, the (x,y) distance as measured from the locations of two keypoints in different images
feature distance, that is, a distance that describes how much two keypoints look alike.
Depending on your context, you do not want to compute the same distance, or you want to combine both. Here are some use cases:
optical flow, as implemented by opencv's sparse Lucas-Kanade optical flow. In this case, keypoints called good features are computed in each frame, then matched on a spatial distance basis. This works because the image is supposed to change relatively slowly (the input frames have a video framerate);
image stitching, as you can implement from opencv's features2d (free or non-free). In this case, the images change radically since you move your camera around. Then, your goal becomes to find stable points, ie, points that are present in two or more images whatever their location is. In this case you will use feature distance. This also holds when you have a template image of an object that you want to find in query images.
In order to compute feature distance, you need to compute a coded version of their appearance. This operation is performed by the DescriptorExtractor class.
Then, you can compute distances between the output of the descriptions: if the distance between two descriptions is small then the original keypoints are very likely to correspond to the same scene point.
Pay attention when you compute distances to use the correct distance function: ORB, FREAK, BRISK rely on Hamming distance, while SIFt and SURF use a more usual L2 distance.
Match filtering
When you have individual matches, you may want to perform match filtering in order to reject good individual matches that may arise from scene ambiguities. Think for example of a keypoint that originates from the corner of a window of a house. Then it is very likely to match with another window in another house, but this may not be the good house or the good window.
You have several ways of doing it:
RANSAC performs a consistency check of the computed matches with the current solution estimate. Basically, it picks up some matches at random, computes a solution to the problem (usually a geometric transform between 2 images) and then counts how many of the matchings agree with this estimate. The estimate with the higher count of inliers wins;
David Lowe performed another kind of filtering in the original SIFT paper.
He kept the two best candidates for a match with a given query keypoint, ie, points that had the lowest distance (or highest similarity). Then, he computed the ratio similarity(query, best)/similarity(query, 2nd best). If this ratio is too low then the second best is also a good candidate for a match, and the matching result is dubbed ambiguous and rejected.
Exactly how you should do it in your case is thus very likely to depend on your exact application.
Your specific case
In your case, you want to develop an alternate feature descriptor that is based on neighbouring keypoints.
The sky is obviously the limit here, but here are some steps that I would follow:
make your descriptor rotation and scale invariant by computing the PCA of the keypoints :
// Form a matrix from KP locations in current image
cv::Mat allKeyPointsMatrix = gatherAllKeypoints(keypoints);
// Compute PCA basis
cv::PCA currentPCA(allKeyPointsMatrix, 2);
// Reproject keypoints in new basis
cv::Mat normalizedKeyPoints = currentPCA.project(allKeyPointsMatrix);
(optional) sort the keypoints in a quadtree or kd-tree for faster spatial indexing
Compute for each keypoint a descriptor that is (for example) the offsets in normalized coordinates of the 4 or 5 closest keypoints
Do the same in your query image
Match keypoints from both mages based on these new descriptors.
What is it you are trying to do exactly? More information is needed to give you a good answer. Otherwise it'll have to be very broad and most likely not useful to your needs.
And with your statement "determining whether the matched keypoints are relatively in the same locations on both" do you mean literally on the same x,y positions between 2 images?
I would try out the SURF algorithm. It works extremely well for what you described above (though I found it to be a bit slow unless you use gpu acceleration, 5fps vs 34fps).
Here is the tutorial for surf, I personally found it very useful, but the executables are for linux users only. However you can simply remove the OS specific bindings in the source code and keep only the opencv related bindings and have it compile + run on linux just the same.
https://code.google.com/p/find-object/#Tutorials
Hope this helped!
You can do a filter on the pixels distance between two keypoint.
Let's say matches is your vector of matches, kp_1 your vector of keypoints on the first picture and kp_2 on the second. You can use the code above to eliminate obviously incorrect matches. You just need to fix a threshold.
double threshold= YourValue;
vector<DMatch> good_matches;
for (int i = 0; i < matches.size(); i++)
{
double dist_p = sqrt(pow(abs(kp_1[matches[i][0].queryIdx].pt.x - kp_2[matches[i][0].trainIdx].pt.x), 2) + pow(abs(kp_1[matches[i][0].queryIdx].pt.y - kp_2[matches[i][0].trainIdx].pt.y), 2));
if (dist_p < threshold)
{
good_matches.push_back(matches[i][0]);
}
}

displacement between two images using opencv surf

I am working on image processing with OPENCV.
I want to find the x,y and the rotational displacement between two images in OPENCV.
I have found the features of the images using SURF and the features have been matched.
Now i want to find the displacement between the images. How do I do that? Can RANSAC be useful here?
regards,
shiksha
Rotation and two translations are three unknowns so your min number of matches is two (since each match delivers two equations or constraints). Indeed imagine a line segment between two points in one image and the corresponding (matched) line segment in another image. The difference between segments' orientations gives you a rotation angle. After you rotated just use any of the matched points to find translation. Thus this is 3DOF problem that requires two points. It is called Euclidean transformation or rigid body transformation or orthogonal Procrustes.
Using Homography (that is 8DOF problem ) that has no close form solution and relies on non-linear optimization is a bad idea. It is slow (in RANSAC case) and inaccurate since it adds 5 extra DOF. RANSAC is only needed if you have outliers. In the case of pure noise and overdetrmined system (more than 2 points) your optimal solution that minimizes the sum of squares of geometric distance between matched points is given in a close form by:
Problem statement: min([R*P+t-Q]2), R-rotation, t-translation
Solution: R = VUT, t = R*Pmean-Qmean
where X=P-Pmean; Y=Q-Qmean and we take SVD to get X*YT=ULVT; all matrices have data points as columns. For a gentle intro into rigid transformations see this

Measure of accuracy in pattern recognition using SURF in OpenCV

I’m currently working on pattern recognition using SURF in OpenCV. What do I have so far: I’ve written a program in C# where I can select a source-image and a template which I want to find. After that I transfer both pictures into a C++-dll where I’ve implemented a program using the OpenCV-SURFdetector, which returns all the keypoints and matches back to my C#-program where I try to draw a rectangle around my matches.
Now my question: Is there a common measure of accuracy in pattern recognition? Like for example number of matches in proportion to the number of keypoints in the template? Or maybe the size-difference between my match-rectangle and the original size of the template-image? What are common parameters that are used to say if a match is a “real” and “good” match?
Edit: To make my question clearer. I have a bunch of matchpoints, that are already thresholded by minHessian and distance value. After that I draw something like a rectangle around my matchpoints as you can see in my picture. This is my MATCH. How can I tell now how good this match is? I'm already calculating angle, size and color differences between my now found match and my template. But I think that is much too vague.
I am not 100% sure about what you are really asking, because what you call a "match" is vague. But since you said you already matched your SURF points and mentionned pattern recognition and the use of a template, I am assuming that, ultimately, you want to localize the template in your image and you are asking about a localization score to decide whether you found the template in the image or not.
This is a challenging problem and I am not aware that a good and always-appropriate solution has been found yet.
However, given your approach, what you could do is analyze the density of matched points in your image: consider local or global maximas as possible locations for your template (global if you know your template appears only once in the image, local if it can appear multiple times) and use a threshold on the density to decide whether or not the template appears. A sketch of the algorithm could be something like this:
Allocate a floating point density map of the size of your image
Compute the density map, by increasing by a fixed amount the density map in the neighborhood of each matched point (for instance, for each matched point, add a fixed value epsilon in the rectangle your are displaying in your question)
Find the global or local maximas of the density map (global can be found using opencv function MinMaxLoc, and local maximas can be found using morpho maths, e.g. How can I find local maxima in an image in MATLAB?)
For each maxima obtained, compare the corresponding density value to a threshold tau, to decide whether your template is there or not
If you are into resarch articles, you can check the following ones for improvement of this basic algorithm:
"ADABOOST WITH KEYPOINT PRESENCE FEATURES FOR REAL-TIME VEHICLE VISUAL DETECTION", by T.Bdiri, F.Moutarde, N.Bourdis and B.Steux, 2009.
"Interleaving Object Categorization and Segmentation", by B.Leibe and B.Schiele, 2006.
EDIT: another way to address your problem is to try and remove accidently-matched points in order to keep only those truly corresponding to your template image. This can be done by enforcing a constraint of consistancy between close matched points. The following research article presents an approach like this: "Context-dependent logo matching and retrieval", by H.Sahbi, L.Ballan, G.Serra, A.Del Bimbo, 2010 (however, this may require some background knowledge...).
Hope this helps.
Well, when you compare points you use some metric.
So results or comparison have some resulting distance.
And the less this distance is the better.
Example of code:
BFMatcher matcher(NORM_L2,true);
vector<DMatch> matches;
matcher.match(descriptors1, descriptors2, matches);
matches.erase(std::remove_if(matches.begin(),matches.end(),bad_dist),matches.end());
where bad_dist is defined as
bool dist(const DMatch &m) {
return m.distance > 150;
}
In this code i get rid of 'bad' matches.
There are many ways to match two patterns in the same image, actually it's a very open topic in computer vision, because there isn't a global best solution.
For instance, if you know your object can appear rotated (I'm not familiar with SURF, but I guess the descriptors are rotation invariant like SIFT descriptors), you can estimate the rotation between the pattern you have in the training set and the pattern you just matched with. A match with the minimum error will be a better match.
I recommend you consult Computer Vision: Algorithms and Applications. There's no code in it, but lots of useful techniques typically used in computer vision (most of them already implemented in opencv).

Is there a feature extraction method that is scale invariant but not rotation invariant?

Is there a feature extraction method that is scale invariant but not rotation invariant? I would like to match similar images that have been scaled but not rotated...
EDIT: Let me rephrase. How can I check whether an image is a scaled version (or close to) of an original?
Histogram and Gauss Pyramids are used for extract scale invariant features.
How can I check whether an image is a scaled version (or close to) of an original?
It's puzzle for me. Does you mean given two images, one is original and another is scaled? Or is one original and another is a fragment from the original but scaled and you want to locate the fragment in the original one?
[updated]
Given two images, a and b,
Detected their SIFT or SURF feature points and descriptions.
Get the corresponding regions between a and b. If none, return false. Refer to Pattern Matching - Find reference object in second image and Trying to match two images using sift in OpenCv, but too many matches. Name the region in a as Ra, and one in b as Rb.
Using algorithm like template match to determine if Ra is identical enough to Rb. If yes, calculate the scale ratio.

stitching aerial images

I am trying to stitch 2 aerial images together with very little overlap, probably <500 px of overlap. These images have 3600x2100 resolution. I am using the OpenCV library to complete this task.
Here is my approach:
1. Find feature points and match points between the two images.
2. Find homography between two images
3. Warp one of the images using the homgraphy
4. Stitch the two images
Right now I am trying to get this to work with two images. I am having trouble with step 3 and possibly step 2. I used findHomography() from the OpenCV library to grab my homography between the two images. Then I called warpPerspective() on one of my images using the homgraphy.
The problem with the approach is that the transformed image is all distorted. Also it seems to only transform a certain part of the image. I have no idea why it is not transforming the whole image.
Can someone give me some advice on how I should approach this problem? Thanks
In the results that you have posted, I can see that you have at least one keypoint mismatch. If you use findHomography(src, dst, 0), it will mess up your homography. You should use findHomography(src, dst, CV_RANSAC) instead.
You can also try to use warpAffine instead of warpPerspective.
Edit: In the results that you posted in the comments to your question, I had the impression that the matching worked quite stable. That means that you should be able to get good results with the example as well. Since you mostly seem to have to deal with translation you could try to filter out the outliers with the following sketched algorithm:
calculate the average (or median) motion vector x_avg
calculate the normalized dot product <x_avg, x_match>
discard x_match if the dot product is smaller than a threshold
To make it work for images with smaller overlap, you would have to look at the detector, descriptors and matches. You do not specify which descriptors you work with, but I would suggest using SIFT or SURF descriptors and the corresponding detectors. You should also set the detector parameters to make a dense sampling (i.e., try to detect more features).
You can refer to this answer which is slightly related: OpenCV - Image Stitching
To stitch images using Homography, the most important thing that should be taken care of is finding of correspondence points in both the images. Lesser the outliers in the correspondence points, the better is the generated homography.
Using robust techniques such as RANSAC along with FindHomography() function of OpenCV(Use CV_RANSAC as option) will still generate reasonable homography provided percentage of inliers is more than percentage of outliers. Also make sure that there are at-least 4 inliers in the correspondence points that passed to the FindHomography function.