OpenCV object (stamp) recognition/detection - c++

Good day! Just sorry for my English.
There are a few samples (etalons) stamps of various companies. And there are test images - photos of documents on which these stamps can occur. It is necessary to determine whether there is in the picture with the document or that stamp (reference). Stamps can be rotated.
Doing a similar task with the usual etalons (cartoon characters), finds good. But with stamps problem is probably due to the fact that they are very similar - all round.
Use SurfFeatureDetector, SurfDescriptorExtractor
It might be worth other detector and a Descriptor?
Thank you.

If stamp size is too small compared to the main image, using SURF descriptor alone may not be a feasible choice.
Since you have a finite number of stamp reference images, you can try template matching and after detecting the stamp region and calculating the orientation, you can just calculate the correlations with references and select the highest correlation as the detected object.
Template matching operation can be applied on each RGB channel (assuming color image) and matching scores can be summed as a final decision. However, the harder part is the detection of stamps and their orientations.
You can also use SURF keypoints on 3 RGB channels separately in order to utilize color information. After extracting keypoints for each channel, you can calculate matching scores separately and sum the three scores to obtain a final decision.
As another feature, you can use calculate color histograms of reference images and during testing, you can commpare these histograms to find a good match. THis feature is rotation independent and simple to calculate.

Related

opencv c++ compare keypoint locations in different images

When comparing 2 images via feature extraction, how do you compare keypoint distances so to disregard those that are obviously incorrect?
I've found when comparing similar images against each other, most of the time it can fairly accurate, but other times it can throws matches that are completely separate.
So I'm after a way of looking at the 2 sets of keypoints from both images and determining whether the matched keypoints are relatively in the same locations on both. As in it knows that keypoints 1, 2, and 3 are so far apart on image 1, so the corresponding keypoints matched on image 2 should be of a fairly similar distance away from each other again.
I've used RANSAC and minimum distance checks in the past but only to some effect, they don't seem to be as thorough as I'm after.
(Using ORB and BruteForce)
EDIT
Changed "x, y, and z" to "1, 2, and 3"
EDIT 2 -- I'll try to explain further with quick Paint made examples:
Say I have this as my image:
And I give it this image to compare against:
Its a cropped and squashed version of the original, but obviously similar.
Now, say you ran it through feature detection and it came back with these results for the keypoints for the two images:
The keypoints on both images are in roughly the same areas, and proportionately the same distance away from each other. Take the keypoint I've circled, lets call it "Image 1 Keypoint 1".
We can see that there are 5 keypoints around it. Its these distances between them and "Image 1 Keypoint 1" that I want to obtain so to compare them against "Image 2 Keypoint 1" and its 5 surround keypoints in the same area (see below) so as to not just compare a keypoint to another keypoint, but to compare "known shapes" based off of the locations of the keypoints.
--
Does that make sense?
Keypoint matching is a problem with several dimensions. These dimensions are:
spatial distance, ie, the (x,y) distance as measured from the locations of two keypoints in different images
feature distance, that is, a distance that describes how much two keypoints look alike.
Depending on your context, you do not want to compute the same distance, or you want to combine both. Here are some use cases:
optical flow, as implemented by opencv's sparse Lucas-Kanade optical flow. In this case, keypoints called good features are computed in each frame, then matched on a spatial distance basis. This works because the image is supposed to change relatively slowly (the input frames have a video framerate);
image stitching, as you can implement from opencv's features2d (free or non-free). In this case, the images change radically since you move your camera around. Then, your goal becomes to find stable points, ie, points that are present in two or more images whatever their location is. In this case you will use feature distance. This also holds when you have a template image of an object that you want to find in query images.
In order to compute feature distance, you need to compute a coded version of their appearance. This operation is performed by the DescriptorExtractor class.
Then, you can compute distances between the output of the descriptions: if the distance between two descriptions is small then the original keypoints are very likely to correspond to the same scene point.
Pay attention when you compute distances to use the correct distance function: ORB, FREAK, BRISK rely on Hamming distance, while SIFt and SURF use a more usual L2 distance.
Match filtering
When you have individual matches, you may want to perform match filtering in order to reject good individual matches that may arise from scene ambiguities. Think for example of a keypoint that originates from the corner of a window of a house. Then it is very likely to match with another window in another house, but this may not be the good house or the good window.
You have several ways of doing it:
RANSAC performs a consistency check of the computed matches with the current solution estimate. Basically, it picks up some matches at random, computes a solution to the problem (usually a geometric transform between 2 images) and then counts how many of the matchings agree with this estimate. The estimate with the higher count of inliers wins;
David Lowe performed another kind of filtering in the original SIFT paper.
He kept the two best candidates for a match with a given query keypoint, ie, points that had the lowest distance (or highest similarity). Then, he computed the ratio similarity(query, best)/similarity(query, 2nd best). If this ratio is too low then the second best is also a good candidate for a match, and the matching result is dubbed ambiguous and rejected.
Exactly how you should do it in your case is thus very likely to depend on your exact application.
Your specific case
In your case, you want to develop an alternate feature descriptor that is based on neighbouring keypoints.
The sky is obviously the limit here, but here are some steps that I would follow:
make your descriptor rotation and scale invariant by computing the PCA of the keypoints :
// Form a matrix from KP locations in current image
cv::Mat allKeyPointsMatrix = gatherAllKeypoints(keypoints);
// Compute PCA basis
cv::PCA currentPCA(allKeyPointsMatrix, 2);
// Reproject keypoints in new basis
cv::Mat normalizedKeyPoints = currentPCA.project(allKeyPointsMatrix);
(optional) sort the keypoints in a quadtree or kd-tree for faster spatial indexing
Compute for each keypoint a descriptor that is (for example) the offsets in normalized coordinates of the 4 or 5 closest keypoints
Do the same in your query image
Match keypoints from both mages based on these new descriptors.
What is it you are trying to do exactly? More information is needed to give you a good answer. Otherwise it'll have to be very broad and most likely not useful to your needs.
And with your statement "determining whether the matched keypoints are relatively in the same locations on both" do you mean literally on the same x,y positions between 2 images?
I would try out the SURF algorithm. It works extremely well for what you described above (though I found it to be a bit slow unless you use gpu acceleration, 5fps vs 34fps).
Here is the tutorial for surf, I personally found it very useful, but the executables are for linux users only. However you can simply remove the OS specific bindings in the source code and keep only the opencv related bindings and have it compile + run on linux just the same.
https://code.google.com/p/find-object/#Tutorials
Hope this helped!
You can do a filter on the pixels distance between two keypoint.
Let's say matches is your vector of matches, kp_1 your vector of keypoints on the first picture and kp_2 on the second. You can use the code above to eliminate obviously incorrect matches. You just need to fix a threshold.
double threshold= YourValue;
vector<DMatch> good_matches;
for (int i = 0; i < matches.size(); i++)
{
double dist_p = sqrt(pow(abs(kp_1[matches[i][0].queryIdx].pt.x - kp_2[matches[i][0].trainIdx].pt.x), 2) + pow(abs(kp_1[matches[i][0].queryIdx].pt.y - kp_2[matches[i][0].trainIdx].pt.y), 2));
if (dist_p < threshold)
{
good_matches.push_back(matches[i][0]);
}
}

C++ - Using Bag of Words for matching pictures together?

I would like to compare a picture (with his descriptors) with thousand of pictures inside a database in order to do a matching. (if two pictures are the same,that is to say the same thing but it can bo rotated, a bit blured, has a different scale etc.).
For example :
I saw on StackOverflaw that compute descriptors for each picture and compare them one to one is very a long process.
I did some researches and i saw that i can do an algorithm based on Bag of Words.
I don't know exactly how is works yet, but it seems to be good. But in think, i can be mistaked, it is only to detect what kind of object is it not ?
I would like to know according to you if using it can be a good solution to compare a picture to a thousands of pictures using descriptors like Sift of Surf ?
If yes, do you have some suggestions about how i can do that ?
Thank,
Yes, it is possible. The only thing you have to pay attention is the computational requirement which can be a little overwhelming. If you can narrow the search, that usually help.
To support my answer I will extract some examples from a recent work of ours. We aimed at recognizing a painting on a museum's wall using SIFT + RANSAC matching. We have a database of all the paintings in the museum and a SIFT descriptor for each one of them. We aim at recognizing the paining in a video which can be recorded from a different perspective (all the templates are frontal) or under different lighting conditions. This image should give you an idea: on the left you can see the template and the current frame. The second image is the SIFT matching and the third shows the results after RANSAC.
Once you have the matching between your image and each SIFT descriptor in your database, you can compute the reprojection error, namely the ratio between matched points (after RANSAC) and the total number of keypoints. This can be repeated for each image and the image with the lowest reprojection error can be declared as the match.
We used this for paintings but I think that can be generalized for every kind of image (the android logo you posted in the question is a fair example i think).
Hope this helps!

Measure of accuracy in pattern recognition using SURF in OpenCV

I’m currently working on pattern recognition using SURF in OpenCV. What do I have so far: I’ve written a program in C# where I can select a source-image and a template which I want to find. After that I transfer both pictures into a C++-dll where I’ve implemented a program using the OpenCV-SURFdetector, which returns all the keypoints and matches back to my C#-program where I try to draw a rectangle around my matches.
Now my question: Is there a common measure of accuracy in pattern recognition? Like for example number of matches in proportion to the number of keypoints in the template? Or maybe the size-difference between my match-rectangle and the original size of the template-image? What are common parameters that are used to say if a match is a “real” and “good” match?
Edit: To make my question clearer. I have a bunch of matchpoints, that are already thresholded by minHessian and distance value. After that I draw something like a rectangle around my matchpoints as you can see in my picture. This is my MATCH. How can I tell now how good this match is? I'm already calculating angle, size and color differences between my now found match and my template. But I think that is much too vague.
I am not 100% sure about what you are really asking, because what you call a "match" is vague. But since you said you already matched your SURF points and mentionned pattern recognition and the use of a template, I am assuming that, ultimately, you want to localize the template in your image and you are asking about a localization score to decide whether you found the template in the image or not.
This is a challenging problem and I am not aware that a good and always-appropriate solution has been found yet.
However, given your approach, what you could do is analyze the density of matched points in your image: consider local or global maximas as possible locations for your template (global if you know your template appears only once in the image, local if it can appear multiple times) and use a threshold on the density to decide whether or not the template appears. A sketch of the algorithm could be something like this:
Allocate a floating point density map of the size of your image
Compute the density map, by increasing by a fixed amount the density map in the neighborhood of each matched point (for instance, for each matched point, add a fixed value epsilon in the rectangle your are displaying in your question)
Find the global or local maximas of the density map (global can be found using opencv function MinMaxLoc, and local maximas can be found using morpho maths, e.g. How can I find local maxima in an image in MATLAB?)
For each maxima obtained, compare the corresponding density value to a threshold tau, to decide whether your template is there or not
If you are into resarch articles, you can check the following ones for improvement of this basic algorithm:
"ADABOOST WITH KEYPOINT PRESENCE FEATURES FOR REAL-TIME VEHICLE VISUAL DETECTION", by T.Bdiri, F.Moutarde, N.Bourdis and B.Steux, 2009.
"Interleaving Object Categorization and Segmentation", by B.Leibe and B.Schiele, 2006.
EDIT: another way to address your problem is to try and remove accidently-matched points in order to keep only those truly corresponding to your template image. This can be done by enforcing a constraint of consistancy between close matched points. The following research article presents an approach like this: "Context-dependent logo matching and retrieval", by H.Sahbi, L.Ballan, G.Serra, A.Del Bimbo, 2010 (however, this may require some background knowledge...).
Hope this helps.
Well, when you compare points you use some metric.
So results or comparison have some resulting distance.
And the less this distance is the better.
Example of code:
BFMatcher matcher(NORM_L2,true);
vector<DMatch> matches;
matcher.match(descriptors1, descriptors2, matches);
matches.erase(std::remove_if(matches.begin(),matches.end(),bad_dist),matches.end());
where bad_dist is defined as
bool dist(const DMatch &m) {
return m.distance > 150;
}
In this code i get rid of 'bad' matches.
There are many ways to match two patterns in the same image, actually it's a very open topic in computer vision, because there isn't a global best solution.
For instance, if you know your object can appear rotated (I'm not familiar with SURF, but I guess the descriptors are rotation invariant like SIFT descriptors), you can estimate the rotation between the pattern you have in the training set and the pattern you just matched with. A match with the minimum error will be a better match.
I recommend you consult Computer Vision: Algorithms and Applications. There's no code in it, but lots of useful techniques typically used in computer vision (most of them already implemented in opencv).

Methods for calculating an optimal distance threshold for fisherfaces. Source code provided

I've been working on a face recognizer based on the fisherfaces implementation (the pre opencv 2.4 version) provided by bytefish. The actual fisherfaces algorithm is the same, the differences are mostly convenience ones:
-Image/storage compression.
-Classifications by string opposed to ints.
-Inclusive predictions (multiple results sorted by a percentage).
Note: The percentage is calculated with: percent = 1.0 - (lowdist/distthreshold)
Where lowdist is the lowest euclidian distance between a src matrix (the test face image) and a matrix in a projection set (the trained face images) and distthreshold is the maximum distance allowed.
The inclusive predictions are where I'm having trouble. I haven't found a decent way to calculate an optimal threshold to use. Currently I'm just choosing 2200.0 as a random value to test with. This of course produces a lot of flaky results, especially when face images are coming from random sources with different lighting and resolutions.
So my question is: Is there a way to calculate an optimal distance threshold to use with fisherfaces?
I've provided the source code to the recognizer below.
Ignore the method, "FBaseLDARecognizer::calculateOptimalThreshold", it isn't finished. The goal was to add a group of faces to the recognizer and then test against an unknown set of faces with known classifications and get the maximum and minimum correct distances. That is as far as I got, I haven't thought of a useful way to use that data yet. So currently it always returns 0.0.
Note: This is not finished, there are a few performance issues I've yet to clean up. Also, this code isn't commented. If further explanation is needed please let me know and I can comment and reupload the files.
Source Files:
Header
Source

Image comparison method with C++ and OpenCV

I am new to OpenCV. I would like to know if we can compare two images (one of the images made by photoshop i.e source image and the otherone will be taken from the camera) and find if they are same or not.
I tried to compare the images using template matching. It does not work. Can you tell me what are the other procedures which we can use for this kind of comparison?
Comparison of images can be done in different ways depending on which purpose you have in mind:
if you just want to compare whether two images are approximately equal (with a few
luminance differences), but with the same perspective and camera view, you can simply
compute a pixel-to-pixel squared difference, per color band. If the sum of squares over
the two images is smaller than a threshold the images match, otherwise not.
If one image is a black-white variant of the other, conversion of the color images is
needed (see e.g. http://www.johndcook.com/blog/2009/08/24/algorithms-convert-color-grayscale). Afterwarts simply perform the step above.
If one image is a subimage of the other, you need to perform registration of the two
images. This means determining the scale, possible rotation and XY-translation that is
necessary to lay the subimage on the larger image (for methods to register images, see:
Pluim, J.P.W., Maintz, J.B.A., Viergever, M.A. , Mutual-information-based registration of
medical images: a survey, IEEE Transactions on Medical Imaging, 2003, Volume 22, Issue 8,
pp. 986 – 1004)
If you have perspective differences, you need an algorithm for deskewing one image to
match the other as well as possible. For ways of doing deskewing look for example in
http://javaanpr.sourceforge.net/anpr.pdf from page 15 and onwards.
Good luck!
You should try SIFT. You apply SIFT to your marker (image saved in memory) and you get some descriptors (points robust to be recognized). Then you can use FAST algorithm with the camera frames in order to find the coprrespondent keypoints of the marker in the camera image.
You have many threads about this topic:
How to get a rectangle around the target object using the features extracted by SIFT in OpenCV
How to search the image for an object with SIFT and OpenCV?
OpenCV - Object matching using SURF descriptors and BruteForceMatcher
Good luck