I am using openCV to do some dense feature extraction. For example, The code
DenseFeatureDetector detector(12.f, 1, 0.1f, 10);
I don't really understand the parameters in the above constructor. What does it mean ? Reading the opencv documentation about it does not help much either. In the documentation the arguments are:
DenseFeatureDetector( float initFeatureScale=1.f, int featureScaleLevels=1,
float featureScaleMul=0.1f,
int initXyStep=6, int initImgBound=0,
bool varyXyStepWithScale=true,
bool varyImgBoundWithScale=false );
What are they supposed to do ? i.e. what is the meaning of scale, initFeatureScale, featureScaleLevels etc ? How do you know the grid or grid spacing etc for the dense sampling.
I'm using opencv with dense detector too and I think I can help you with something. I'm not sure about what I'm going to say but the experience learnt me that.
When I use Dense detector I pass there the gray scale image. The detector makes some threshold filters where opencv uses a gray minimum value with is used to transform the image. The píxels where have a more gray level than the threshold will be made like black points and the others are white point. This action is repeated in a loop where the threshold will be bigger and bigger. So the parameter initFeatureScale determine the first threshold you put to do this loop, the featureScaleLevels parameter indicates how much this threshold is bigger between one loop iteration and the next one and featureScaleMul is a multiply factor to calculate the next threshold.
Anyway if you are looking for a your optimal parameters to use Dense detector to detect any particular points You would offer a program I made for that. It is liberated in github. This is a program where you can test some detectors (Dense detector is one of them) and check how it works if you change their parameters thanks to a user interface that let you change the detectors parameters as long as you are executing the program. You will see how the detected points will be change. For try it just click on the link, and download the files. You might need almost all the files to execute the program.
Apologies in advance, i'm predominantly using Python so i'll avoid embarressing myself by referring to C++.
DenseFeatureDetector populates a vector with KeyPoints to pass to compute feature descriptors. These keypoints have a point vector and their scale set. In the documentation, scale is the pixel radius of the keypoint.
KeyPoints are evenly spaced across the width and height of the image matrix passed to DenseFeatureVector.
Now to the arguments:
initFeatureScale
Set the initial KeyPoint feature radius in pixels (as far as I am aware this has no effect)
featureScaleLevels
Number of scales overwhich we wish to make keypoints
featureScaleMuliplier
Scale adjustment for initFeatureScale over featureScaleLevels, this scale adjustment can also be applied to the border (initImgBound) and the step size (initxystep). So when we set featureScaleLevels>1 then this multiplier will be applied to successive scales, to adjust feature scale, step and the boundary around the image.
initXyStep
moving column and row step in pixels. Self explanatory I hope.
initImgBound
row/col bounding region to ignore around the image (pixels), So a 100x100 image, with an initImgBound of 10, would create keypoints in the central 80x80 portion of the image.
varyXyStepWithScale
Boolean, if we have multiple featureScaleLevels do we want to adjust the step size using featureScaleMultiplier.
varyImgBoundWithScale
Boolean,as varyXyStepWithScale, but applied to the border.
Here is the DenseFeatureDetector source code from detectors.cpp in the OpenCV 2.4.3 source, which will probably explain better than my words:
DenseFeatureDetector::DenseFeatureDetector( float _initFeatureScale, int _featureScaleLevels,
float _featureScaleMul, int _initXyStep,
int _initImgBound, bool _varyXyStepWithScale,
bool _varyImgBoundWithScale ) :
initFeatureScale(_initFeatureScale), featureScaleLevels(_featureScaleLevels),
featureScaleMul(_featureScaleMul), initXyStep(_initXyStep), initImgBound(_initImgBound),
varyXyStepWithScale(_varyXyStepWithScale), varyImgBoundWithScale(_varyImgBoundWithScale)
{}
void DenseFeatureDetector::detectImpl( const Mat& image, vector<KeyPoint>& keypoints, const Mat& mask ) const
{
float curScale = static_cast<float>(initFeatureScale);
int curStep = initXyStep;
int curBound = initImgBound;
for( int curLevel = 0; curLevel < featureScaleLevels; curLevel++ )
{
for( int x = curBound; x < image.cols - curBound; x += curStep )
{
for( int y = curBound; y < image.rows - curBound; y += curStep )
{
keypoints.push_back( KeyPoint(static_cast<float>(x), static_cast<float>(y), curScale) );
}
}
curScale = static_cast<float>(curScale * featureScaleMul);
if( varyXyStepWithScale ) curStep = static_cast<int>( curStep * featureScaleMul + 0.5f );
if( varyImgBoundWithScale ) curBound = static_cast<int>( curBound * featureScaleMul + 0.5f );
}
KeyPointsFilter::runByPixelsMask( keypoints, mask );
}
You might expect a call to compute would calculate additional KeyPoint characteristics using the relevant keypoint detection algorithm (e.g. angle), based on the KeyPoints generated by DenseFeatureDetector. Unfortunately this isn't the case for SIFT under Python - i've not looked at at the other feature detectors, nor looked at the behaviour in C++.
Also note that DenseFeatureDetector is not in OpenCV 3.2 (unsure at which release it was removed).
Related
I have been struggling trying to implement the outlining algorithm described here and here.
The general idea of the paper is determining the Hausdorff distance of binary images and using it to find the template image from a test image.
For template matching, it is recommended to construct image pyramids along with sliding windows which you'll use to slide over your test image for detection. I was able to do both of these as well.
I am stuck on how to move forward from here on. Do I slide my template over the test image from different pyramid layers? Or is it the test image over the template? And with regards to the sliding window, is/are they meant to be a ROI of the test or template image?
In a nutshell, I have pieces to the puzzle but no idea of which direction to take to solve the puzzle
int distance(vector<Point>const& image, vector<Point>const& tempImage)
{
int maxDistance = 0;
for(Point imagePoint: image)
{
int minDistance = numeric_limits<int>::max();
for(Point tempPoint: tempImage)
{
Point diff = imagePoint - tempPoint;
int length = (diff.x * diff.x) + (diff.y * diff.y);
if(length < minDistance) minDistance = length;
if(length == 0) break;
}
maxDistance += minDistance;
}
return maxDistance;
}
double hausdorffDistance(vector<Point>const& image, vector<Point>const& tempImage)
{
double maxDistImage = distance(image, tempImage);
double maxDistTemp = distance(tempImage, image);
return sqrt(max(maxDistImage, maxDistTemp));
}
vector<Mat> buildPyramids(Mat& frame)
{
vector<Mat> pyramids;
int count = 6;
Mat prevFrame = frame, nextFrame;
while(count > 0)
{
resize(prevFrame, nextFrame, Size(), .85, .85);
prevFrame = nextFrame;
pyramids.push_back(nextFrame);
--count;
}
return pyramids;
}
vector<Rect> slidingWindows(Mat& image, int stepSize, int width, int height)
{
vector<Rect> windows;
for(size_t row = 0; row < image.rows; row += stepSize)
{
if((row + height) > image.rows) break;
for(size_t col = 0; col < image.cols; col += stepSize)
{
if((col + width) > image.cols) break;
windows.push_back(Rect(col, row, width, height));
}
}
return windows;
}
Edit I: More analysis on my solution can be found here
This is a bi-directional task.
Forward Direction
1. Translation
For each contour, calculate its moment. Then for each point in that contour, translate it about the moment i.e. contour.point[i] = contour.point[i] - contour.moment[i]. This moves all of the contour points to the origin.
PS: You need to keep track of each contour's produced moment because it will be used in the next section
2. Rotation
With the newly translated points, calculate their rotated rect. This will give you the angle of rotation. Depending on this angle, you would want to calculate the new angle which you want to rotate this contour by; this answer would be helpful.
After attaining the new angle, calculate the rotation matrix. Remember that your center here will be the origin i.e. (0, 0). I did not take scaling into account (that's where the pyramids come into play) when calculating the rotation matrix hence I passed 1.
PS: You need to keep track of each contour's produced matrix because it will be used in the next section
Using this matrix, you can go ahead and rotate each point in the contour by it as shown here*.
Once all of this is done, you can go ahead and calculate the Hausdorff distance and find contours which pass your set threshold.
Back Direction
Everything done in the first section, has to be undone in order for us to draw the valid contours onto our camera feed.
1. Rotation
Recall that each detected contour produced a rotation matrix. You want to undo the rotation of the valid contours. Just perform the same rotation but using the inverse matrix.
For each valid contour and corresponding matrix
inverse_matrix = matrix[i].inv(cv2.DECOMP_SVD)
Use * to rotate the points but with inverse_matrix as parameter
PS: When calculating the inverse, if the produced matrix was not a square one, it would fail. cv2.DECOMP_SVD will produce an inverse matrix even if the original matrix was a non-square.
2. Translation
With the valid contours' points rotated back, you just have to undo the previously performed translation. Instead of subtracting, just add the moment to each point.
You can now go ahead and draw these contours to your camera feed.
Scaling
This is were image pyramids come into play.
All you have to do is resize your template image by a fixed size/ratio upto your desired number of times (called layers). The tutorial found here does a good job of explaining how to do this in OpenCV.
It goes without saying that the values you choose to resize your image by and number of layers will and do play a huge role in how robust your program will be.
Put it all together
Template Image Operations
Create a pyramid consisting of n layers
For each layer in n
Find contours
Translate the contour points
Rotate the contour points
This operation should only be performed once and only store the results of the rotated points.
Camera Feed Operations
Assumptions
Let the rotated contours of the template image at each level be stored in templ_contours. So if I say templ_contours[0], this is going to give me the rotated contours at pyramid level 0.
Let the image's translated, rotated contours and moments be stored in transCont, rotCont and moment respectively.
image_contours = Find Contours
for each contour detected in image
moment = calculate moment
for each point in image_contours
transCont.thisPoint = forward_translate(image_contours.thisPoint)
rotCont.thisPoint = forward_rotate(transCont.thisPoint)
for each contour_layer in templ_contours
for each contour in rotCont
calculate Hausdorff Distance
valid_contours = contours_passing_distance_threshold
for each point in valid_contours
valid_point = backward_rotate(valid_point)
for each point in valid_contours
valid_point = backward_translate(valid_point)
drawContours(valid_contours, image)
I am undertaking a project that will automatically count values of coins from an input image. So far I have segmented the coins using some pre-processing with edge detection and using the Hough-Transform.
My question is how do I proceed from here? I need to do some template matching on the segmented images based on some previously stored features. How can I go about doing this.
I have also read about something called K-Nearest Neighbours and I feel it is something I should be using. But I am not too sure how to go about using it.
Research articles I have followed:
Coin
Detector
Coin
Recognition
One way of doing pattern matching is using cv::matchTemplate.
This takes an input image and a smaller image which acts as template. It compares the template against overlapped image regions computing the similarity of the template with the overlapped region. Several methods for computing the comparision are available.
This methods does not directly support scale or orientation invariance. But it is possible to overcome that by scaling candidates to a reference size and by testing against several rotated templates.
A detailed example of this technique is shown to detect pressence and location of 50c coins. The same procedure can be applied to the other coins.
Two programs will be built. One to create templates from the big image template for the 50c coin. And another one which will take as input those templates as well as the image with coins and will output an image where the 50c coin(s) are labelled.
Template Maker
#define TEMPLATE_IMG "50c.jpg"
#define ANGLE_STEP 30
int main()
{
cv::Mat image = loadImage(TEMPLATE_IMG);
cv::Mat mask = createMask( image );
cv::Mat loc = locate( mask );
cv::Mat imageCS;
cv::Mat maskCS;
centerAndScale( image, mask, loc, imageCS, maskCS);
saveRotatedTemplates( imageCS, maskCS, ANGLE_STEP );
return 0;
}
Here we load the image which will be used to construct our templates.
Segment it to create a mask.
Locate the center of masses of said mask.
And we rescale and copy that mask and the coin so that they ocupy a square of fixed size where the edges of the square are touching the circunference of mask and coin. That is, the side of the square has the same lenght in pixels as the diameter of the scaled mask or coin image.
Finally we save that scaled and centered image of the coin. And we save further copies of it rotated in fixed angle increments.
cv::Mat loadImage(const char* name)
{
cv::Mat image;
image = cv::imread(name);
if ( image.data==NULL || image.channels()!=3 )
{
std::cout << name << " could not be read or is not correct." << std::endl;
exit(1);
}
return image;
}
loadImage uses cv::imread to read the image. Verifies that data has been read and the image has three channels and returns the read image.
#define THRESHOLD_BLUE 130
#define THRESHOLD_TYPE_BLUE cv::THRESH_BINARY_INV
#define THRESHOLD_GREEN 230
#define THRESHOLD_TYPE_GREEN cv::THRESH_BINARY_INV
#define THRESHOLD_RED 140
#define THRESHOLD_TYPE_RED cv::THRESH_BINARY
#define CLOSE_ITERATIONS 5
cv::Mat createMask(const cv::Mat& image)
{
cv::Mat channels[3];
cv::split( image, channels);
cv::Mat mask[3];
cv::threshold( channels[0], mask[0], THRESHOLD_BLUE , 255, THRESHOLD_TYPE_BLUE );
cv::threshold( channels[1], mask[1], THRESHOLD_GREEN, 255, THRESHOLD_TYPE_GREEN );
cv::threshold( channels[2], mask[2], THRESHOLD_RED , 255, THRESHOLD_TYPE_RED );
cv::Mat compositeMask;
cv::bitwise_and( mask[0], mask[1], compositeMask);
cv::bitwise_and( compositeMask, mask[2], compositeMask);
cv::morphologyEx(compositeMask, compositeMask, cv::MORPH_CLOSE,
cv::Mat(), cv::Point(-1, -1), CLOSE_ITERATIONS );
/// Next three lines only for debugging, may be removed
cv::Mat filtered;
image.copyTo( filtered, compositeMask );
cv::imwrite( "filtered.jpg", filtered);
return compositeMask;
}
createMask does the segmentation of the template. It binarizes each of the BGR channels, does the AND of those three binarized images and performs a CLOSE morphologic operation to produce the mask.
The three debug lines copy the original image into a black one using the computed mask as a mask for the copy operation. This helped in chosing the proper values for the threshold.
Here we can see the 50c image filtered by the mask created in createMask
cv::Mat locate( const cv::Mat& mask )
{
// Compute center and radius.
cv::Moments moments = cv::moments( mask, true);
float area = moments.m00;
float radius = sqrt( area/M_PI );
float xCentroid = moments.m10/moments.m00;
float yCentroid = moments.m01/moments.m00;
float m[1][3] = {{ xCentroid, yCentroid, radius}};
return cv::Mat(1, 3, CV_32F, m);
}
locate computes the center of mass of the mask and its radius. Returning those 3 values in a single row mat in the form { x, y, radius }.
It uses cv::moments which calculates all of the moments up to the third order of a polygon or rasterized shape. A rasterized shape in our case. We are not interested in all of those moments. But three of them are useful here. M00 is the area of the mask. And the centroid can be calculated from m00, m10 and m01.
void centerAndScale(const cv::Mat& image, const cv::Mat& mask,
const cv::Mat& characteristics,
cv::Mat& imageCS, cv::Mat& maskCS)
{
float radius = characteristics.at<float>(0,2);
float xCenter = characteristics.at<float>(0,0);
float yCenter = characteristics.at<float>(0,1);
int diameter = round(radius*2);
int xOrg = round(xCenter-radius);
int yOrg = round(yCenter-radius);
cv::Rect roiOrg = cv::Rect( xOrg, yOrg, diameter, diameter );
cv::Mat roiImg = image(roiOrg);
cv::Mat roiMask = mask(roiOrg);
cv::Mat centered = cv::Mat::zeros( diameter, diameter, CV_8UC3);
roiImg.copyTo( centered, roiMask);
cv::imwrite( "centered.bmp", centered); // debug
imageCS.create( TEMPLATE_SIZE, TEMPLATE_SIZE, CV_8UC3);
cv::resize( centered, imageCS, cv::Size(TEMPLATE_SIZE,TEMPLATE_SIZE), 0, 0 );
cv::imwrite( "scaled.bmp", imageCS); // debug
roiMask.copyTo(centered);
cv::resize( centered, maskCS, cv::Size(TEMPLATE_SIZE,TEMPLATE_SIZE), 0, 0 );
}
centerAndScale uses the centroid and radius computed by locate to get a region of interest of the input image and a region of interest of the mask such that the center of the such regions is also the center of the coin and mask and the side length of the regions are equal to the diameter of the coin/mask.
These regions are later scaled to a fixed TEMPLATE_SIZE. This scaled region will be our reference template. When later on in the matching program we want to check if a detected candidate coin is this coin we will also take a region of the candidate coin, center and scale that candidate coin in the same way before performing template matching. This way we achieve scale invariance.
void saveRotatedTemplates( const cv::Mat& image, const cv::Mat& mask, int stepAngle )
{
char name[1000];
cv::Mat rotated( TEMPLATE_SIZE, TEMPLATE_SIZE, CV_8UC3 );
for ( int angle=0; angle<360; angle+=stepAngle )
{
cv::Point2f center( TEMPLATE_SIZE/2, TEMPLATE_SIZE/2);
cv::Mat r = cv::getRotationMatrix2D(center, angle, 1.0);
cv::warpAffine(image, rotated, r, cv::Size(TEMPLATE_SIZE, TEMPLATE_SIZE));
sprintf( name, "template-%03d.bmp", angle);
cv::imwrite( name, rotated );
cv::warpAffine(mask, rotated, r, cv::Size(TEMPLATE_SIZE, TEMPLATE_SIZE));
sprintf( name, "templateMask-%03d.bmp", angle);
cv::imwrite( name, rotated );
}
}
saveRotatedTemplates saves the previous computed template.
But it saves several copies of it, each one rotated by an angle, defined in ANGLE_STEP. The goal of this is to provide orientation invariance. The lower that we define stepAngle the better orientation invariance we get but it also implies a higher computational cost.
You may download the whole template maker program here.
When run with ANGLE_STEP as 30 I get the following 12 templates :
Template Matching.
#define INPUT_IMAGE "coins.jpg"
#define LABELED_IMAGE "coins_with50cLabeled.bmp"
#define LABEL "50c"
#define MATCH_THRESHOLD 0.065
#define ANGLE_STEP 30
int main()
{
vector<cv::Mat> templates;
loadTemplates( templates, ANGLE_STEP );
cv::Mat image = loadImage( INPUT_IMAGE );
cv::Mat mask = createMask( image );
vector<Candidate> candidates;
getCandidates( image, mask, candidates );
saveCandidates( candidates ); // debug
matchCandidates( templates, candidates );
for (int n = 0; n < candidates.size( ); ++n)
std::cout << candidates[n].score << std::endl;
cv::Mat labeledImg = labelCoins( image, candidates, MATCH_THRESHOLD, false, LABEL );
cv::imwrite( LABELED_IMAGE, labeledImg );
return 0;
}
The goal here is to read the templates and the image to be examined and determine the location of coins which match our template.
First we read into a vector of images all the template images we produced in the previous program.
Then we read the image to be examined.
Then we binarize the image to be examined using exactly the same function as in the template maker.
getCandidates locates the groups of points which are toghether forming a polygon. Each of these polygons is a candidate for coin. And all of them are rescaled and centered in a square of size equal to that of our templates so that we can perform matching in a way invariant to scale.
We save the candidate images obtained for debugging and tuning purposes.
matchCandidates matches each candidate with all the templates storing for each the result of the best match. Since we have templates for several orientations this provides invariance to orientation.
Scores of each candidate are printed so we can decide on a threshold to separate 50c coins from non 50c coins.
labelCoins copies the original image and draws a label over the ones which have a score greater than (or lesser than for some methods) the threshold defined in MATCH_THRESHOLD.
And finally we save the result in a .BMP
void loadTemplates(vector<cv::Mat>& templates, int angleStep)
{
templates.clear( );
for (int angle = 0; angle < 360; angle += angleStep)
{
char name[1000];
sprintf( name, "template-%03d.bmp", angle );
cv::Mat templateImg = cv::imread( name );
if (templateImg.data == NULL)
{
std::cout << "Could not read " << name << std::endl;
exit( 1 );
}
templates.push_back( templateImg );
}
}
loadTemplates is similar to loadImage. But it loads several images instead of just one and stores them in a std::vector.
loadImage is exactly the same as in the template maker.
createMask is also exactly the same as in the tempate maker. This time we apply it to the image with several coins. It should be noted that binarization thresholds were chosen to binarize the 50c and those will not work properly to binarize all the coins in the image. But that is of no consequence since the program objective is only to identify 50c coins. As long as those are properly segmented we are fine. It actually works in our favour if some coins are lost in this segmentation since we will save time evaluating them (as long as we only lose coins which are not 50c).
typedef struct Candidate
{
cv::Mat image;
float x;
float y;
float radius;
float score;
} Candidate;
void getCandidates(const cv::Mat& image, const cv::Mat& mask,
vector<Candidate>& candidates)
{
vector<vector<cv::Point> > contours;
vector<cv::Vec4i> hierarchy;
/// Find contours
cv::Mat maskCopy;
mask.copyTo( maskCopy );
cv::findContours( maskCopy, contours, hierarchy, CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE, cv::Point( 0, 0 ) );
cv::Mat maskCS;
cv::Mat imageCS;
cv::Scalar white = cv::Scalar( 255 );
for (int nContour = 0; nContour < contours.size( ); ++nContour)
{
/// Draw contour
cv::Mat drawing = cv::Mat::zeros( mask.size( ), CV_8UC1 );
cv::drawContours( drawing, contours, nContour, white, -1, 8, hierarchy, 0, cv::Point( ) );
// Compute center and radius and area.
// Discard small areas.
cv::Moments moments = cv::moments( drawing, true );
float area = moments.m00;
if (area < CANDIDATES_MIN_AREA)
continue;
Candidate candidate;
candidate.radius = sqrt( area / M_PI );
candidate.x = moments.m10 / moments.m00;
candidate.y = moments.m01 / moments.m00;
float m[1][3] = {
{ candidate.x, candidate.y, candidate.radius}
};
cv::Mat characteristics( 1, 3, CV_32F, m );
centerAndScale( image, drawing, characteristics, imageCS, maskCS );
imageCS.copyTo( candidate.image );
candidates.push_back( candidate );
}
}
The heart of getCandidates is cv::findContours which finds the contours of areas present in its input image. Which here is the mask previously computed.
findContours returns a vector of contours. Each contour itself being a vector of points which form the outer line of the detected polygon.
Each polygon delimites the region of each candidate coin.
For each contour we use cv::drawContours to draw the filled polygon over a black image.
With this drawn image we use the same procedure earlier explained to compute centroid and radius of the polygon.
And we use centerAndScale, the same function used in the template maker, to center and scale the image contained in that poligon in an image which will have the same size as our templates. This way we will later on be able to perform a proper matching even for coins from photos of different scales.
Each of these candidate coins is copied in a Candidate structure which contains :
Candidate image
x and y for centroid
radius
score
getCandidates computes all these values except for score.
After composing the candidate it is put in a vector of candidates which is the result we get from getCandidates.
These are the 4 candidates obtained :
void saveCandidates(const vector<Candidate>& candidates)
{
for (int n = 0; n < candidates.size( ); ++n)
{
char name[1000];
sprintf( name, "Candidate-%03d.bmp", n );
cv::imwrite( name, candidates[n].image );
}
}
saveCandidates saves the computed candidates for debugging purpouses. And also so that I may post those images here.
void matchCandidates(const vector<cv::Mat>& templates,
vector<Candidate>& candidates)
{
for (auto it = candidates.begin( ); it != candidates.end( ); ++it)
matchCandidate( templates, *it );
}
matchCandidates just calls matchCandidate for each candidate. After completion we will have the score for all candidates computed.
void matchCandidate(const vector<cv::Mat>& templates, Candidate& candidate)
{
/// For SQDIFF and SQDIFF_NORMED, the best matches are lower values. For all the other methods, the higher the better
candidate.score;
if (MATCH_METHOD == CV_TM_SQDIFF || MATCH_METHOD == CV_TM_SQDIFF_NORMED)
candidate.score = FLT_MAX;
else
candidate.score = 0;
for (auto it = templates.begin( ); it != templates.end( ); ++it)
{
float score = singleTemplateMatch( *it, candidate.image );
if (MATCH_METHOD == CV_TM_SQDIFF || MATCH_METHOD == CV_TM_SQDIFF_NORMED)
{
if (score < candidate.score)
candidate.score = score;
}
else
{
if (score > candidate.score)
candidate.score = score;
}
}
}
matchCandidate has as input a single candidate and all the templates. It's goal is to match each template against the candidate. That work is delegated to singleTemplateMatch.
We store the best score obtained, which for CV_TM_SQDIFF and CV_TM_SQDIFF_NORMED is the smallest one and for the other matching methods is the biggest one.
float singleTemplateMatch(const cv::Mat& templateImg, const cv::Mat& candidateImg)
{
cv::Mat result( 1, 1, CV_8UC1 );
cv::matchTemplate( candidateImg, templateImg, result, MATCH_METHOD );
return result.at<float>( 0, 0 );
}
singleTemplateMatch peforms the matching.
cv::matchTemplate uses two imput images, the second smaller or equal in size to the first one.
The common use case is for a small template (2nd parameter) to be matched against a larger image (1st parameter) and the result is a bidimensional Mat of floats with the matching of the template along the image. Locating the maximun (or minimun depending on the method) of this Mat of floats we get the best candidate position for our template in the image of the 1st parameter.
But we are not interested in locating our template in the image, we already have the coordinates of our candidates.
What we want is to get a measure of similitude between our candidate and template. Which is why we use cv::matchTemplate in a way which is less usual; we do so with a 1st parameter image of size equal to the 2nd parameter template. In this situation the result is a Mat of size 1x1. And the single value in that Mat is our score of similitude (or dissimilitude).
for (int n = 0; n < candidates.size( ); ++n)
std::cout << candidates[n].score << std::endl;
We print the scores obtained for each of our candidates.
In this table we can see the scores for each of the methods available for cv::matchTemplate. The best score is in green.
CCORR and CCOEFF give a wrong result, so those two are discarded. Of the remaining 4 methods the two SQDIFF methods are the ones with higher relative difference between the best match (which is a 50c) and the 2nd best (which is not a 50c). Which is why I have choosen them.
I have chosen SQDIFF_NORMED but there is no strong reason for that. In order to really chose a method we should test with a higher ammount of samples, not just one.
For this method a working threshold could be 0.065. Selection of a proper threshold also requires many samples.
bool selected(const Candidate& candidate, float threshold)
{
/// For SQDIFF and SQDIFF_NORMED, the best matches are lower values. For all the other methods, the higher the better
if (MATCH_METHOD == CV_TM_SQDIFF || MATCH_METHOD == CV_TM_SQDIFF_NORMED)
return candidate.score <= threshold;
else
return candidate.score>threshold;
}
void drawLabel(const Candidate& candidate, const char* label, cv::Mat image)
{
int x = candidate.x - candidate.radius;
int y = candidate.y;
cv::Point point( x, y );
cv::Scalar blue( 255, 128, 128 );
cv::putText( image, label, point, CV_FONT_HERSHEY_SIMPLEX, 1.5f, blue, 2 );
}
cv::Mat labelCoins(const cv::Mat& image, const vector<Candidate>& candidates,
float threshold, bool inverseThreshold, const char* label)
{
cv::Mat imageLabeled;
image.copyTo( imageLabeled );
for (auto it = candidates.begin( ); it != candidates.end( ); ++it)
{
if (selected( *it, threshold ))
drawLabel( *it, label, imageLabeled );
}
return imageLabeled;
}
labelCoins draws a label string at the location of candidates with a score bigger than ( or lesser than depending on the method) the threshold.
And finally the result of labelCoins is saved with
cv::imwrite( LABELED_IMAGE, labeledImg );
The result being :
The whole code for the coin matcher can be downloaded here.
Is this a good method?
That is hard to tell.
The method is consistent. It correctly detects the 50c coin for the sample and input image provided.
But we have no idea if the method is robust because it has not been tested with a proper sample size. And even more important is to test it against samples which were not available when the program was being coded, that is the true measure of robustness when done with a large enough sample size.
I am rather confident in the method not having false positives from silver coins. But I am not so sure about other copper coins like the 20c. As we can see from the scores obtained the 20c coin gets a score very similar to the 50c.
It is also quite possible that false negatives will happen under varying lighting conditions. Which is something that can and should be avoided if we have control over lighting conditions such as when we are designing a machine to take photos of coins and count them.
If the method works the same method can be repeated for each type of coin leading to full detection of all coins.
Code in this answer is also available under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
If you detect all coins correctly Its better to use size(radial) and RGB features to recognize its value. Its not a good idea that concatenate these features because their number are not equal ( size is one number and number of RGB features are much larger than one). I recommend you to use two classifier for this purpose. One for size and another for RGB features.
You have to classify all coins into for example 3 (It depends on type
of your coins) size class. You can do this with a simple 1NN
classifier (just calculate the radial of test coin and classify it to
nearest predefined radial)
Then you should have some templates in each size and use template matching to recognize its value.(all templates and detected coins should be resize to a particular size. e.g. (100,100) ) For template
matching you can use matchtemplate function. I thing that the CV_TM_CCOEFF method may be the best one, but you can test all methods
to get a good result. (Note you don't need to search on image for coin because you detect the coin previously as you mentioned in your
question. You just need to use this function to get one number as a similarity/difference between two image and classify the test coin to a class which the similarity is maximized or difference is minimized)
EDIT1: You should have all rotations in your templates in each class to compensate the rotation of test coin.
EDIT2: If all coins are in different sizes the first step is enough. Otherwise you should patch the similar sizes to one class and classify the test coin using the second step (RGB features).
(1) Find the coins edge, using Hough Transform Algorithm.
(2) Determine the origin dot of the coins. I don't know how you'll do this.
(3) You can use k from KNN Algorithm for comparing the diameter or of the coins. Don't forget to set the bias value.
You could try and set up a training set of coin images and generate SIFT/SURF etc. descriptors of it. (EDIT: OpenCV feature detectors
Using these data you could set up a kNN classifier, using the coins values as training labels.
Once you perform kNN classification on you segmented coin images, your classification result would yield the coins value.
I want to test whether two images match. Partial matches also interest me.
The problem is that the images suffer from strong noise. Another problem is that the images might be rotated with an unknown angle. The objects shown in the images will roughly always have the same scale!
The images show area scans from a top-shot perspective. "Lines" are mostly walls and other objects are mostly trees and different kinds of plants.
Another problem was, that the left image was very blurry and the right one's lines were very thin.
To compensate for this difference I used dilation. The resulting images are the ones I uploaded.
Although It can easily be seen that these images match almost perfectly I cannot convince my algorithm of this fact.
My first idea was a feature based matching, but the matches are horrible. It only worked for a rotation angle of -90°, 0° and 90°. Although most descriptors are rotation invariant (in past projects they really were), the rotation invariance seems to fail for this example.
My second idea was to split the images into several smaller segments and to use template matching. So I segmented the images and, again, for the human eye they are pretty easy to match. The goal of this step was to segment the different walls and trees/plants.
The upper row are parts of the left, and the lower are parts of the right image. After the segmentation the segments were dilated again.
As already mentioned: Template matching failed, as did contour based template matching and contour matching.
I think the dilation of the images was very important, because it was nearly impossible for the human eye to match the segments without dilation before the segmentation. Another dilation after the segmentation made this even less difficult.
Your first job should be to fix the orientation. I am not sure what is the best algorithm to do that but here is an approach I would use: fix one of the images and start rotating the other. For each rotation compute a histogram for the color intense on each of the rows/columns. Compute some distance between the resulting vectors(e.g. use cross product). Choose the rotation that results in smallest cross product. It may be good idea to combine this approach with hill climbing.
Once you have the images aligned in approximately the same direction, I believe matching should be easier. As the two images are supposed to be at the same scale, compute something analogous to the geometrical center for both images: compute weighted sum of all pixels - a completely white pixel would have a weight of 1, and a completely black - weight 0, the sum should be a vector of size 2(x and y coordinate). After that divide those values by the dimensions of the image and call this "geometrical center of the image". Overlay the two images in a way that the two centers coincide and then once more compute cross product for the difference between the images. I would say this should be their difference.
You can also try following methods to find rotation and similarity.
Use image moments to get the rotation as shown here.
Once you rotate the image, use cross-correlation to evaluate the similarity.
EDIT
I tried this with OpenCV and C++ for the two sample images. I'm posting the code and results below as it seems to work well at least for the given samples.
Here's the function to calculate the orientation vector using image moments:
Mat orientVec(Mat& im)
{
Moments m = moments(im);
double cov[4] = {m.mu20/m.m00, m.mu11/m.m00, m.mu11/m.m00, m.mu02/m.m00};
Mat covMat(2, 2, CV_64F, cov);
Mat evals, evecs;
eigen(covMat, evals, evecs);
return evecs.row(0);
}
Rotate and match sample images:
Mat im1 = imread(INPUT_FOLDER_PATH + string("WojUi.png"), 0);
Mat im2 = imread(INPUT_FOLDER_PATH + string("XbrsV.png"), 0);
// get the orientation vector
Mat v1 = orientVec(im1);
Mat v2 = orientVec(im2);
double angle = acos(v1.dot(v2))*180/CV_PI;
// rotate im2. try rotating with -angle and +angle. here using -angle
Mat rot = getRotationMatrix2D(Point(im2.cols/2, im2.rows/2), -angle, 1.0);
Mat im2Rot;
warpAffine(im2, im2Rot, rot, Size(im2.rows, im2.cols));
// add a border to rotated image
int borderSize = im1.rows > im2.cols ? im1.rows/2 + 1 : im1.cols/2 + 1;
Mat im2RotBorder;
copyMakeBorder(im2Rot, im2RotBorder, borderSize, borderSize, borderSize, borderSize,
BORDER_CONSTANT, Scalar(0, 0, 0));
// normalized cross-correlation
Mat& image = im2RotBorder;
Mat& templ = im1;
Mat nxcor;
matchTemplate(image, templ, nxcor, CV_TM_CCOEFF_NORMED);
// take the max
double max;
Point maxPt;
minMaxLoc(nxcor, NULL, &max, NULL, &maxPt);
// draw the match
Mat rgb;
cvtColor(image, rgb, CV_GRAY2BGR);
rectangle(rgb, maxPt, Point(maxPt.x+templ.cols-1, maxPt.y+templ.rows-1), Scalar(0, 255, 255), 2);
cout << "max: " << max << endl;
With -angle rotation in code, I get max = 0.758. Below is the rotated image in this case with the matching region.
Otherwise max = 0.293
I'm to build a panorama image of the ground covered by a downward facing camera (at a fixed height, around 1 metre above ground). This could potentially run to thousands of frames, so the Stitcher class' built in panorama method isn't really suitable - it's far too slow and memory hungry.
Instead I'm assuming the floor and motion is planar (not unreasonable here) and trying to build up a cumulative homography as I see each frame. That is, for each frame, I calculate the homography from the previous one to the new one. I then get the cumulative homography by multiplying that with the product of all previous homographies.
Let's say I get H01 between frames 0 and 1, then H12 between frames 1 and 2. To get the transformation to place frame 2 onto the mosaic, I need to get H01*H12. This continues as the frame count increases, such that I get H01*H12*H23*H34*H45*....
In code, this is something akin to:
cv::Mat previous, current;
// Init cumulative homography
cv::Mat cumulative_homography = cv::Mat::eye(3);
video_stream >> previous;
for(;;) {
video_stream >> current;
// Here I do some checking of the frame, etc
// Get the homography using my DenseMosaic class (using Farneback to get OF)
cv::Mat tmp_H = DenseMosaic::get_homography(previous,current);
// Now normalise the homography by its bottom right corner
tmp_H /= tmp_H.at<double>(2, 2);
cumulative_homography *= tmp_H;
previous = current.clone( );
}
It works pretty well, except that as the camera moves "up" in the viewpoint, the homography scale decreases. As it moves down, the scale increases again. This gives my panoramas a perspective type effect that I really don't want.
For example, this is taken on a few seconds of video moving forward then backward. The first frame looks ok:
The problem comes as we move forward a few frames:
Then when we come back again, you can see the frame gets bigger again:
I'm at a loss as to where this is coming from.
I'm using Farneback dense optical flow to calculate pixel-pixel correspondences as below (sparse feature matching doesn't work well on this data) and I've checked my flow vectors - they're generally very good, so it's not a tracking problem. I also tried switching the order of the inputs to find homography (in case I'd mixed up the frame numbers), still no better.
cv::calcOpticalFlowFarneback(grey_1, grey_2, flow_mat, 0.5, 6,50, 5, 7, 1.5, flags);
// Using the flow_mat optical flow map, populate grid point correspondences between images
std::vector<cv::Point2f> points_1, points_2;
median_motion = DenseMosaic::dense_flow_to_corresp(flow_mat, points_1, points_2);
cv::Mat H = cv::findHomography(cv::Mat(points_2), cv::Mat(points_1), CV_RANSAC, 1);
Another thing I thought it could be was the translation I include in the transformation to ensure my panorama is centred within the scene:
cv::warpPerspective(init.clone(), warped, translation*homography, init.size());
But having checked the values in the homography before the translation is applied, the scaling issue I mention is still present.
Any hints are gratefully received. There's a lot of code I could put in but it seems irrelevant, please do let me know if there's something missing
UPDATE
I've tried switching out the *= operator for the full multiplication and tried reversing the order the homographies are multiplied in, but no luck. Below is my code for calculating the homography:
/**
\brief Calculates the homography between the current and previous frames
*/
cv::Mat DenseMosaic::get_homography()
{
cv::Mat grey_1, grey_2; // Grayscale versions of frames
cv::cvtColor(prev, grey_1, CV_BGR2GRAY);
cv::cvtColor(cur, grey_2, CV_BGR2GRAY);
// Calculate the dense flow
int flags = cv::OPTFLOW_FARNEBACK_GAUSSIAN;
if (frame_number > 2) {
flags = flags | cv::OPTFLOW_USE_INITIAL_FLOW;
}
cv::calcOpticalFlowFarneback(grey_1, grey_2, flow_mat, 0.5, 6,50, 5, 7, 1.5, flags);
// Convert the flow map to point correspondences
std::vector<cv::Point2f> points_1, points_2;
median_motion = DenseMosaic::dense_flow_to_corresp(flow_mat, points_1, points_2);
// Use the correspondences to get the homography
cv::Mat H = cv::findHomography(cv::Mat(points_2), cv::Mat(points_1), CV_RANSAC, 1);
return H;
}
And this is the function I use to find the correspondences from the flow map:
/**
\brief Calculate pixel->pixel correspondences given a map of the optical flow across the image
\param[in] flow_mat Map of the optical flow across the image
\param[out] points_1 The set of points from #cur
\param[out] points_2 The set of points from #prev
\param[in] step_size The size of spaces between the grid lines
\return The median motion as a point
Uses a dense flow map (such as that created by cv::calcOpticalFlowFarneback) to obtain a set of point correspondences across a grid.
*/
cv::Point2f DenseMosaic::dense_flow_to_corresp(const cv::Mat &flow_mat, std::vector<cv::Point2f> &points_1, std::vector<cv::Point2f> &points_2, int step_size)
{
std::vector<double> tx, ty;
for (int y = 0; y < flow_mat.rows; y += step_size) {
for (int x = 0; x < flow_mat.cols; x += step_size) {
/* Flow is basically the delta between left and right points */
cv::Point2f flow = flow_mat.at<cv::Point2f>(y, x);
tx.push_back(flow.x);
ty.push_back(flow.y);
/* There's no need to calculate for every single point,
if there's not much change, just ignore it
*/
if (fabs(flow.x) < 0.1 && fabs(flow.y) < 0.1)
continue;
points_1.push_back(cv::Point2f(x, y));
points_2.push_back(cv::Point2f(x + flow.x, y + flow.y));
}
}
// I know this should be median, not mean, but it's only used for plotting the
// general motion direction so it's unimportant.
cv::Point2f t_median;
cv::Scalar mtx = cv::mean(tx);
t_median.x = mtx[0];
cv::Scalar mty = cv::mean(ty);
t_median.y = mty[0];
return t_median;
}
It turns out this was because my viewpoint was close to the features, meaning that the non-planarity of the tracked features was causing skew to the homography. I managed to prevent this (it's more of a hack than a method...) by using estimateRigidTransform instead of findHomography, as this does not estimate for perspective variations.
In this particular case, it makes sense to do so, as the view does only ever undergo rigid transformations.
I am creating a program to stabilize the video stream. At the moment, my program works based on the phase correlation algorithm. I'm calculating an offset between two images - base and current. Next I correct the current image according to the new coordinates. This program works, but the result is not satisfactory. The related links you may find that the treated video appears undesirable and shake the whole video is becoming worse.
Orininal video
Unshaked video
There is my current realisation:
Calculating offset between images:
Point2d calculate_offset_phase_optimized(Mat one, Mat& two) {
if(two.type() != CV_64F) {
cvtColor(two, two, CV_BGR2GRAY);
two.convertTo(two, CV_64F);
}
cvtColor(one, one, CV_BGR2GRAY);
one.convertTo(one, CV_64F);
return phaseCorrelate(one, two);
}
Shifting image according this coordinate:
void move_image_roi_alt(Mat& img, Mat& trans, const Point2d& offset) {
trans = Mat::zeros(img.size(), img.type());
img(
Rect(
_0(static_cast<int>(offset.x)),
_0(static_cast<int>(offset.y)),
img.cols-abs(static_cast<int>(offset.x)),
img.rows-abs(static_cast<int>(offset.y))
)
).copyTo(trans(
Rect(
_0ia(static_cast<int>(offset.x)),
_0ia(static_cast<int>(offset.y)),
img.cols-abs(static_cast<int>(offset.x)),
img.rows-abs(static_cast<int>(offset.y))
)
));
}
int _0(const int x) {
return x < 0 ? 0 : x;
}
int _0ia(const int x) {
return x < 0 ? abs(x) : 0;
}
I was looking through the document authors stabilizer YouTube and algorithm based on corner detection seemed attractive, but I'm not entirely clear how it works.
So my question is how to effectively solve this problem.
One of the conditions - the program will run on slower computers, so heavy algorithms may not be suitable.
Thanks!
P.S.
I apologize for any mistakes in the text - it is an automatic translation.
You can use image descriptors such as SIFT in each frame and calculate robust matches between the frames. Then you can calculate homography between the frames and use that to align them. Using sparse features can lead to faster implementation than using a dense correlation.
Alternately, if you know the camera parameters you can calculate 3D positions of the points and of the cameras and reproject the images onto a stable projection plane. In the result, you also get a sparse 3D reconstruction of the scene (somewhat imprecise, usually it needs to be optimized to be usable). This is what e.g. Autostitch would do, but it is quite difficult to implement, however.
Note that the camera parameters can also be calculated, but that is even more difficult.
OpenCV can do it for you in 3 lines of code (it is definitely shortest way, may be even the best):
t = estimateRigidTransform(newFrame, referenceFrame, 0); // 0 means not all transformations (5 of 6)
if(!t.empty()){
warpAffine(newFrame, stableFrame, t, Size(newFrame.cols, newFrame.rows)); // stableFrame should be stable now
}
You can turn off some kind of transformations by modifying matrix t, it can lead to more stable result. It is just core idea, then you can modify it in the way you want: change referenceFrame, smooth set of transformation parameters from matrix t etc.