When comparing 2 images via feature extraction, how do you compare keypoint distances so to disregard those that are obviously incorrect?
I've found when comparing similar images against each other, most of the time it can fairly accurate, but other times it can throws matches that are completely separate.
So I'm after a way of looking at the 2 sets of keypoints from both images and determining whether the matched keypoints are relatively in the same locations on both. As in it knows that keypoints 1, 2, and 3 are so far apart on image 1, so the corresponding keypoints matched on image 2 should be of a fairly similar distance away from each other again.
I've used RANSAC and minimum distance checks in the past but only to some effect, they don't seem to be as thorough as I'm after.
(Using ORB and BruteForce)
EDIT
Changed "x, y, and z" to "1, 2, and 3"
EDIT 2 -- I'll try to explain further with quick Paint made examples:
Say I have this as my image:
And I give it this image to compare against:
Its a cropped and squashed version of the original, but obviously similar.
Now, say you ran it through feature detection and it came back with these results for the keypoints for the two images:
The keypoints on both images are in roughly the same areas, and proportionately the same distance away from each other. Take the keypoint I've circled, lets call it "Image 1 Keypoint 1".
We can see that there are 5 keypoints around it. Its these distances between them and "Image 1 Keypoint 1" that I want to obtain so to compare them against "Image 2 Keypoint 1" and its 5 surround keypoints in the same area (see below) so as to not just compare a keypoint to another keypoint, but to compare "known shapes" based off of the locations of the keypoints.
--
Does that make sense?
Keypoint matching is a problem with several dimensions. These dimensions are:
spatial distance, ie, the (x,y) distance as measured from the locations of two keypoints in different images
feature distance, that is, a distance that describes how much two keypoints look alike.
Depending on your context, you do not want to compute the same distance, or you want to combine both. Here are some use cases:
optical flow, as implemented by opencv's sparse Lucas-Kanade optical flow. In this case, keypoints called good features are computed in each frame, then matched on a spatial distance basis. This works because the image is supposed to change relatively slowly (the input frames have a video framerate);
image stitching, as you can implement from opencv's features2d (free or non-free). In this case, the images change radically since you move your camera around. Then, your goal becomes to find stable points, ie, points that are present in two or more images whatever their location is. In this case you will use feature distance. This also holds when you have a template image of an object that you want to find in query images.
In order to compute feature distance, you need to compute a coded version of their appearance. This operation is performed by the DescriptorExtractor class.
Then, you can compute distances between the output of the descriptions: if the distance between two descriptions is small then the original keypoints are very likely to correspond to the same scene point.
Pay attention when you compute distances to use the correct distance function: ORB, FREAK, BRISK rely on Hamming distance, while SIFt and SURF use a more usual L2 distance.
Match filtering
When you have individual matches, you may want to perform match filtering in order to reject good individual matches that may arise from scene ambiguities. Think for example of a keypoint that originates from the corner of a window of a house. Then it is very likely to match with another window in another house, but this may not be the good house or the good window.
You have several ways of doing it:
RANSAC performs a consistency check of the computed matches with the current solution estimate. Basically, it picks up some matches at random, computes a solution to the problem (usually a geometric transform between 2 images) and then counts how many of the matchings agree with this estimate. The estimate with the higher count of inliers wins;
David Lowe performed another kind of filtering in the original SIFT paper.
He kept the two best candidates for a match with a given query keypoint, ie, points that had the lowest distance (or highest similarity). Then, he computed the ratio similarity(query, best)/similarity(query, 2nd best). If this ratio is too low then the second best is also a good candidate for a match, and the matching result is dubbed ambiguous and rejected.
Exactly how you should do it in your case is thus very likely to depend on your exact application.
Your specific case
In your case, you want to develop an alternate feature descriptor that is based on neighbouring keypoints.
The sky is obviously the limit here, but here are some steps that I would follow:
make your descriptor rotation and scale invariant by computing the PCA of the keypoints :
// Form a matrix from KP locations in current image
cv::Mat allKeyPointsMatrix = gatherAllKeypoints(keypoints);
// Compute PCA basis
cv::PCA currentPCA(allKeyPointsMatrix, 2);
// Reproject keypoints in new basis
cv::Mat normalizedKeyPoints = currentPCA.project(allKeyPointsMatrix);
(optional) sort the keypoints in a quadtree or kd-tree for faster spatial indexing
Compute for each keypoint a descriptor that is (for example) the offsets in normalized coordinates of the 4 or 5 closest keypoints
Do the same in your query image
Match keypoints from both mages based on these new descriptors.
What is it you are trying to do exactly? More information is needed to give you a good answer. Otherwise it'll have to be very broad and most likely not useful to your needs.
And with your statement "determining whether the matched keypoints are relatively in the same locations on both" do you mean literally on the same x,y positions between 2 images?
I would try out the SURF algorithm. It works extremely well for what you described above (though I found it to be a bit slow unless you use gpu acceleration, 5fps vs 34fps).
Here is the tutorial for surf, I personally found it very useful, but the executables are for linux users only. However you can simply remove the OS specific bindings in the source code and keep only the opencv related bindings and have it compile + run on linux just the same.
https://code.google.com/p/find-object/#Tutorials
Hope this helped!
You can do a filter on the pixels distance between two keypoint.
Let's say matches is your vector of matches, kp_1 your vector of keypoints on the first picture and kp_2 on the second. You can use the code above to eliminate obviously incorrect matches. You just need to fix a threshold.
double threshold= YourValue;
vector<DMatch> good_matches;
for (int i = 0; i < matches.size(); i++)
{
double dist_p = sqrt(pow(abs(kp_1[matches[i][0].queryIdx].pt.x - kp_2[matches[i][0].trainIdx].pt.x), 2) + pow(abs(kp_1[matches[i][0].queryIdx].pt.y - kp_2[matches[i][0].trainIdx].pt.y), 2));
if (dist_p < threshold)
{
good_matches.push_back(matches[i][0]);
}
}
Related
As known, for tracking objects in OpenCV we can use:
FeatureDetector to find features
DescriptorMatcher to match similarity between features of the desired object and features of current frame in video
and then use findHomography to find the new position of the object
For matching features DescriptorMatcher uses Hamming distance (value of the difference between the two sequences of the same size, not the distance between coordinates).
I.e. we find the most similar object in the current frame, but not the most nearest to the previous position (if we know it).
How can we use to match both Hamming distance and distance between coordinates, for example, given the weight of both, not only Hamming distance?
It could solve the following problems:
If we start to track object from position (x,y) on previous frame, and the current frame contains two similar objects, then we will find the most similar, but not the most nearest. But due to inertia coordinates usually changes slower than similarity (a sharp change in light or rotation of the object). And we must to find the similar object with the nearest coordinates.
Thus we find the features, which not only the most similar, but which will give the most accurate homography, because we exclude features, which, although very similar, but are very far away in coordinates and most likely belong to other objects.
What you need is probably something like:
Compute matches as usual.
DMatch has queryIdx and trainIdx indices. You can use these to retrieve the corresponding keypoints. Compute the euclidean distance between them, and update the value distance if DMatch with some kind of weighting function.
Sort matches by distance (since distance has changed).
Now the matches vector is sorted according both hamming distance between descriptors and euclidean distance between keypoints.
I think there's no build-in method in opencv to do that.
What I'd do is to use the cv::DescriptorMatcher::radiusMatch. It finds all the matches that are under a certain Hamming distance. You'd need to find a radius/distance that ensures that the features are similar enough for your application, but not too big to make the whole calculation slow.
Then, from this features you can choose the one that's closest to the position you predicted the feature would be, or calculate some kind of weighted score based on Hamming distance and cordinate distance, etc.
PROBLEM
I have a picture that is taken from a swinging vehicle. For simplicity I have converted it into a black and white image. An example is shown below:
The image shows the high intensity returns and has a pattern in it that is found it all of the valid images is circled in red. This image can be taken from multiple angles depending on the rotation of the vehicle. Another example is here:
The intention here is to attempt to identify the picture cells in which this pattern exists.
CURRENT APPROACHES
I have tried a couple of methods so far, I am using Matlab to test but will eventually be implementing in c++. It is desirable for the algorithm to be time efficient, however, I am interested in any suggestions.
SURF (Speeded Up Robust Features) Feature Recognition
I tried the default matlab implementation of SURF to attempt to find features. Matlab SURF is able to identify features in 2 examples (not the same as above) however, it is not able to identify common ones:
I know that the points are different but the pattern is still somewhat identifiable. I have tried on multiple sets of pictures and there are almost never common points. From reading about SURF it seems like it is not robust to skewed images anyway.
Perhaps some recommendations on pre-processing here?
Template Matching
So template matching was tried but is definitely not ideal for the application because it is not robust to scale or skew change. I am open to pre-processing ideas to fix the skew. This could be quite easy, some discussion on extra information on the picture is provided further down.
For now lets investigate template matching: Say we have the following two images as the template and the current image:
The template is chosen from one of the most forward facing images. And using it on a very similar image we can match the position:
But then (and somewhat obviously) if we change the picture to a different angle it won't work. Of course we expect this because the template no-longer looks like the pattern in the image:
So we obviously need some pre-processing work here as well.
Hough Lines and RANSAC
Hough lines and RANSAC might be able to identify the lines for us but then how do we get the pattern position?
Other that I don't know about yet
I am pretty new to the image processing scene so i would love to hear about any other techniques that would suit this simple yet difficult image rec problem.
The sensor and how it will help pre-processing
The sensor is a 3d laser, it has been turned into an image for this experiment but still retains its distance information. If we plot with distance scaled from 0 - 255 we get the following image:
Where lighter is further away. This could definitely help us to align the image, some thoughts on the best way?. So far I have thought of things like calculating the normal of the cells that are not 0, we could also do some sort of gradient descent or least squares fitting such that the difference in the distance is 0, that could align the image so that it is always straight. The problem with that is that the solid white stripe is further away? Maybe we could segment that out? We are sort of building algorithms on our algorithms then so we need to be careful so this doesn't become a monster.
Any help or ideas would be great, I am happy to look into any serious answer!
I came up with the following program to segment the regions and hopefully locate the pattern of interest using template matching. I've added some comments and figure titles to explain the flow and some resulting images. Hope it helps.
im = imread('sample.png');
gr = rgb2gray(im);
bw = im2bw(gr, graythresh(gr));
bwsm = imresize(bw, .5);
dism = bwdist(bwsm);
dismnorm = dism/max(dism(:));
figure, imshow(dismnorm, []), title('distance transformed')
eq = histeq(dismnorm);
eqcl = imclose(eq, ones(5));
figure, imshow(eqcl, []), title('histogram equalized and closed')
eqclbw = eqcl < .2; % .2 worked for samples given
eqclbwcl = imclose(eqclbw, ones(5));
figure, imshow(eqclbwcl, []), title('binarized and closed')
filled = imfill(eqclbwcl, 'holes');
figure, imshow(filled, []), title('holes filled')
% -------------------------------------------------
% template
tmpl = zeros(16);
tmpl(3:4, 2:6) = 1;tmpl(11:15, 13:14) = 1;
tmpl(3:10, 7:14) = 1;
st = regionprops(tmpl, 'orientation');
tmplAngle = st.Orientation;
% -------------------------------------------------
lbl = bwlabel(filled);
stats = regionprops(lbl, 'BoundingBox', 'Area', 'Orientation');
figure, imshow(label2rgb(lbl), []), title('labeled')
% here I just take the largest contour for convenience. should consider aspect ratio and any
% other features that can be used to uniquely identify the shape
[mx, id] = max([stats.Area]);
mxbb = stats(id).BoundingBox;
% resize and rotate the template
tmplre = imresize(tmpl, [mxbb(4) mxbb(3)]);
tmplrerot = imrotate(tmplre, stats(id).Orientation-tmplAngle);
xcr = xcorr2(double(filled), double(tmplrerot));
figure, imshow(xcr, []), title('template matching')
Resized image:
Segmented:
Template matching:
Given the poor image quality (low resolution + binarization), I would prefer template matching because it is based on a simple global measure of similarity and does not attempt to do any feature extraction (there are no reliable features in your samples).
But you will need to apply template matching with rotation. One way is to precompute rotated instances of the template, perform matchings for every angle and keep the best.
It is possible to integrate depth information in the comparison (if that helps).
This is quite similar to the problem of recognising hand-sketched characters that we tackle in our lab, in the sense that the target pattern is binary, low resolution, and liable to moderate deformation.
Based on our experiences I don't think SURF is the right way to go as pointed out elsewhere this assumes a continuous 2D image not binary and will break in your case. Template matching is not good for this kind of binary image either - your pixels need to be only slightly misaligned to return a low match score, as there is no local spatial coherence in the pixel values to mitigate minor misalignments of the window.
Our approach is this scenario is to try to "convert" the binary image into a continuous or "greyscale" image. For example see below:
These conversions are made by running a 1st derivative edge detector e.g. convolve 3x3 template [0 0 0 ; 1 0 -1 ; 0 0 0] and it's transpose over image I to get dI/dx and dI/dy.
At any pixel we can get the edge orientation atan2(dI/dy,dI/dx) from these two fields. We treat this information as known at the sketched pixels (the white pixels in your problem) and unknown at the black pixels. We then use a Laplacian smoothness assumption to extrapolate values for the black pixels from the white ones. Details are in this paper:
http://personal.ee.surrey.ac.uk/Personal/J.Collomosse/pubs/Hu-CVIU-2013.pdf
If this is a major hassle you could try using a distance transform instead, convenient in Matlab using bwdist, but it won't give as accurate results.
Now we have the "continuous" image (as per right hand column of images above). The greyscale patterns encode the local structure in the image, and are much more amenable to gradient based descriptors like SURF and template matching.
My hunch would be to try template match first, but since this is affine sensitive I would go the whole way and use a HOG/Bag of Visual words approach again just as in our above paper, to match those patterns.
We have found this pipeline to give state of the art results in sketch based shape recognition, and my PhD student has successfully used in subsequent work for matching hieroglyphs, so I think it could have a good shot at working the kind of pattern you pose in your example images.
I do not think SURF is the right approach to use here. SURF is designed to work on regular 2D intensity images, but what you have here is a 3D point cloud. There is an algorithm for point cloud registration called Iterative Closed Point (ICP). There are several implementations on MATLAB File Exchange, such as this one.
Edit
The Computer Vision System Toolbox now (as of the R2015b release) includes point cloud processing functionality. See this example for point cloud registration and stitching.
I would:
segment image
by Z coordinates (distance from camera/LASER) where Z coordinate jumps more then threshold there is border between object and background (if neighboring Z value is big or out of range) or another object (if neighboring Z value is different) or itself (if neighboring Z value is different but can be connected to itself). This will give you set of objects
align to viewer
compute boundary points of each object (most outer edges), compute direction via atan2 rotate back to face camera perpendicular.
Your image looks like flag marker so in that case rotation around Y axis should suffice. Also you can scale size of the object to predefined distance (if the target is always the same size)
You will need to know the FOV of your camera system and have calibrated Z axis for this.
now try to identify object
here use what you have by now and also can add filter like skip objects with not matching size or aspect ratio ... you can use DFT/DCT or compare histograms of normalized/equalized image etc. ...
[PS]
for features is not a good idea to use BW-Bit image because you loose too much info. Use gray-scale or color instead (gray-scale is usually enough). I usually add few simplified histograms of small area (with few different radius-es) around point of interest which is invariant on rotation.
Have a look a log-polar template matching, it is rotation and scale invariant:
http://etd.lsu.edu/docs/available/etd-07072005-113808/unrestricted/Thunuguntla_thesis.pdf
I am doing object detection using feature extraction (sift,orb).
I want to extract ORB feature from different point of view of the object (train images) and then matching all of them with a query image.
The problem I am facing is: how can I create a good homography from keypoint coming from different point of view of the image that have of course different sizes?
Edit
I was thinking to create an homography for each train images that got say 3-4 matches and then calculate some "mean" homography...
The probleam arise when you have for example say just 1-2 matches from each train image, at that point you cannot create not even 1 homography
Code for create homography
//> For each train images with at least some good matches ??
H = findHomography( train, scene, CV_RANSAC );
perspectiveTransform( trainCorners, sceneCorners, H);
I think there is no point on doing that as a pair of images A and B has nothing to do with a pair of images B and C when you talk about homography. You will get different sets of good matches and different homographies, but homographies will be unrelated and no error minimization would have a point.
All minimization has to be within matches, keypoints and descriptors considering just the pair of images.
There is an idea similar to what you ask in FREAK descriptor. You can train the selected pairs with a set of images. That means that FREAK will decide the best pattern for extracting descriptors basing on a set of images. After this training you are supposed to find more robust mathces that will give you a better homography.
To find a good homography you need accurate matches of your keypoints. You need 4 matches.
The most common methos is DLT combined with RANSAC. DLT is a linear transform that finds the homography 3x3 matrix that proyects your keypoints into the scene. RANSAC finds the best set of inliers/outliers that satisfies the mathematicl model, so it will find the best 4 points as input of DLT.
EDIT
You need to find robust keypoints. SIFT is supossed to do that, scale and perspective invariant. I don't think you need to train with different images. Finding a mean homography has no point. You need to find an only homography for an object detected, and that homography will be the the transformation between the marker and the object detected. Homography is precise, there is no point on finding a mean.
Have you tried the approach of getting keypoints from views of the object: train_kps_1, train_kps_2... then match those arrays with the scene, then select the best matches from those several arrays resulting in a single array of good matches. And finally use that result in find homography as the 'train'.
The key here is how to select the best matches, which is a different question, which you can find a nice anwser here:
http://answers.opencv.org/question/15/how-to-get-good-matches-from-the-orb-feature/
And maybe here:
http://answers.opencv.org/question/2493/best-matches/
Finding Circle Edges :
Here are the two sample images that i have posted.
Need to find the edges of the circle:
Does it possible to develop one generic circle algorithm,that could find all possible circles in all scenarios ?? Like below
1. Circle may in different color ( White , Black , Gray , Red)
2. Background color may be different
3. Different in its size
http://postimage.org/image/tddhvs8c5/
http://postimage.org/image/8kdxqiiyb/
Please suggest some idea to write a algorithm that should work out on above circle
Sounds like a job for the Hough circle transform:
I have not used it myself so far, but it is included in OpenCV. Among other parameters, you can give it a minimum and maximum radius.
Here are links to documentation and a tutorial.
I'd imagine your second example picture will be very hard to detect though
You could apply an edge detection transformation to both images.
Here is what I did in Paint.NET using the outline effect:
You could test edge detect too but that requires more contrast in the images.
Another thing to take into consideration is what it exactly is that you want to detect; in the first image, do you want to detect the white ring or the disc inside. In the second image; do you want to detect the all the circles (there are many tiny ones) or just the big one(s). These requirement will influence what transformation to use and how to initialize these.
After transforming the images into versions that 'highlight' the circles you'll need an algorithm to find them.
Again, there are more options than just one. Here is a paper describing an algoritm
Searching the web for image processing circle recognition gives lots of results.
I think you will have to use a couple of different feature calculations that can be used for segmentation. I the first picture the circle is recognizeable by intensity alone so that one is easy. In the second picture it is mostly the texture that differentiates the circle edge, in that case a feature image based based on some kind of texture filter will be needed, calculating the local variance for instance will result in a scalar image that can segment out the circle. If there are other features that defines the circle in other scenarios (different colors for background foreground etc) you might need other explicit filters that give a scalar difference for those cases.
When you have scalar images where the circles stand out you can use the circular Hough transform to find the circle. Either run it for different circle sizes or modify it to detect a range of sizes.
If you know that there will be only one circle and you know the kind of noise that will be present (vertical/horizontal lines etc) an alternative approach is to design a more specific algorithm e.g. filter out the noise and find center of gravity etc.
Answer to comment:
The idea is to separate the algorithm into independent stages. I do not know how the specific algorithm you have works but presumably it could take a binary or grayscale image where high values means pixel part of circle and low values pixel not part of circle, the present algorithm also needs to give some kind of confidence value on the circle it finds. This present algorithm would then represent some stage(s) at the end of the complete algorithm. You will then have to add the first stage which is to generate feature images for all kind of input you want to handle. For the two examples it should suffice with one intensity image (simply grayscale) and one image where each pixel represents the local variance. In the color case do a color transform an use the hue value perhaps? For every input feed all feature images to the later stage, use the confidence value to select the most likely candidate. If you have other unknowns that your algorithm need as input parameters (circle size etc) just iterate over the possible values and make sure your later stages returns confidence values.
I am trying to stitch 2 aerial images together with very little overlap, probably <500 px of overlap. These images have 3600x2100 resolution. I am using the OpenCV library to complete this task.
Here is my approach:
1. Find feature points and match points between the two images.
2. Find homography between two images
3. Warp one of the images using the homgraphy
4. Stitch the two images
Right now I am trying to get this to work with two images. I am having trouble with step 3 and possibly step 2. I used findHomography() from the OpenCV library to grab my homography between the two images. Then I called warpPerspective() on one of my images using the homgraphy.
The problem with the approach is that the transformed image is all distorted. Also it seems to only transform a certain part of the image. I have no idea why it is not transforming the whole image.
Can someone give me some advice on how I should approach this problem? Thanks
In the results that you have posted, I can see that you have at least one keypoint mismatch. If you use findHomography(src, dst, 0), it will mess up your homography. You should use findHomography(src, dst, CV_RANSAC) instead.
You can also try to use warpAffine instead of warpPerspective.
Edit: In the results that you posted in the comments to your question, I had the impression that the matching worked quite stable. That means that you should be able to get good results with the example as well. Since you mostly seem to have to deal with translation you could try to filter out the outliers with the following sketched algorithm:
calculate the average (or median) motion vector x_avg
calculate the normalized dot product <x_avg, x_match>
discard x_match if the dot product is smaller than a threshold
To make it work for images with smaller overlap, you would have to look at the detector, descriptors and matches. You do not specify which descriptors you work with, but I would suggest using SIFT or SURF descriptors and the corresponding detectors. You should also set the detector parameters to make a dense sampling (i.e., try to detect more features).
You can refer to this answer which is slightly related: OpenCV - Image Stitching
To stitch images using Homography, the most important thing that should be taken care of is finding of correspondence points in both the images. Lesser the outliers in the correspondence points, the better is the generated homography.
Using robust techniques such as RANSAC along with FindHomography() function of OpenCV(Use CV_RANSAC as option) will still generate reasonable homography provided percentage of inliers is more than percentage of outliers. Also make sure that there are at-least 4 inliers in the correspondence points that passed to the FindHomography function.