I am undertaking a project that will automatically count values of coins from an input image. So far I have segmented the coins using some pre-processing with edge detection and using the Hough-Transform.
My question is how do I proceed from here? I need to do some template matching on the segmented images based on some previously stored features. How can I go about doing this.
I have also read about something called K-Nearest Neighbours and I feel it is something I should be using. But I am not too sure how to go about using it.
Research articles I have followed:
Coin
Detector
Coin
Recognition
One way of doing pattern matching is using cv::matchTemplate.
This takes an input image and a smaller image which acts as template. It compares the template against overlapped image regions computing the similarity of the template with the overlapped region. Several methods for computing the comparision are available.
This methods does not directly support scale or orientation invariance. But it is possible to overcome that by scaling candidates to a reference size and by testing against several rotated templates.
A detailed example of this technique is shown to detect pressence and location of 50c coins. The same procedure can be applied to the other coins.
Two programs will be built. One to create templates from the big image template for the 50c coin. And another one which will take as input those templates as well as the image with coins and will output an image where the 50c coin(s) are labelled.
Template Maker
#define TEMPLATE_IMG "50c.jpg"
#define ANGLE_STEP 30
int main()
{
cv::Mat image = loadImage(TEMPLATE_IMG);
cv::Mat mask = createMask( image );
cv::Mat loc = locate( mask );
cv::Mat imageCS;
cv::Mat maskCS;
centerAndScale( image, mask, loc, imageCS, maskCS);
saveRotatedTemplates( imageCS, maskCS, ANGLE_STEP );
return 0;
}
Here we load the image which will be used to construct our templates.
Segment it to create a mask.
Locate the center of masses of said mask.
And we rescale and copy that mask and the coin so that they ocupy a square of fixed size where the edges of the square are touching the circunference of mask and coin. That is, the side of the square has the same lenght in pixels as the diameter of the scaled mask or coin image.
Finally we save that scaled and centered image of the coin. And we save further copies of it rotated in fixed angle increments.
cv::Mat loadImage(const char* name)
{
cv::Mat image;
image = cv::imread(name);
if ( image.data==NULL || image.channels()!=3 )
{
std::cout << name << " could not be read or is not correct." << std::endl;
exit(1);
}
return image;
}
loadImage uses cv::imread to read the image. Verifies that data has been read and the image has three channels and returns the read image.
#define THRESHOLD_BLUE 130
#define THRESHOLD_TYPE_BLUE cv::THRESH_BINARY_INV
#define THRESHOLD_GREEN 230
#define THRESHOLD_TYPE_GREEN cv::THRESH_BINARY_INV
#define THRESHOLD_RED 140
#define THRESHOLD_TYPE_RED cv::THRESH_BINARY
#define CLOSE_ITERATIONS 5
cv::Mat createMask(const cv::Mat& image)
{
cv::Mat channels[3];
cv::split( image, channels);
cv::Mat mask[3];
cv::threshold( channels[0], mask[0], THRESHOLD_BLUE , 255, THRESHOLD_TYPE_BLUE );
cv::threshold( channels[1], mask[1], THRESHOLD_GREEN, 255, THRESHOLD_TYPE_GREEN );
cv::threshold( channels[2], mask[2], THRESHOLD_RED , 255, THRESHOLD_TYPE_RED );
cv::Mat compositeMask;
cv::bitwise_and( mask[0], mask[1], compositeMask);
cv::bitwise_and( compositeMask, mask[2], compositeMask);
cv::morphologyEx(compositeMask, compositeMask, cv::MORPH_CLOSE,
cv::Mat(), cv::Point(-1, -1), CLOSE_ITERATIONS );
/// Next three lines only for debugging, may be removed
cv::Mat filtered;
image.copyTo( filtered, compositeMask );
cv::imwrite( "filtered.jpg", filtered);
return compositeMask;
}
createMask does the segmentation of the template. It binarizes each of the BGR channels, does the AND of those three binarized images and performs a CLOSE morphologic operation to produce the mask.
The three debug lines copy the original image into a black one using the computed mask as a mask for the copy operation. This helped in chosing the proper values for the threshold.
Here we can see the 50c image filtered by the mask created in createMask
cv::Mat locate( const cv::Mat& mask )
{
// Compute center and radius.
cv::Moments moments = cv::moments( mask, true);
float area = moments.m00;
float radius = sqrt( area/M_PI );
float xCentroid = moments.m10/moments.m00;
float yCentroid = moments.m01/moments.m00;
float m[1][3] = {{ xCentroid, yCentroid, radius}};
return cv::Mat(1, 3, CV_32F, m);
}
locate computes the center of mass of the mask and its radius. Returning those 3 values in a single row mat in the form { x, y, radius }.
It uses cv::moments which calculates all of the moments up to the third order of a polygon or rasterized shape. A rasterized shape in our case. We are not interested in all of those moments. But three of them are useful here. M00 is the area of the mask. And the centroid can be calculated from m00, m10 and m01.
void centerAndScale(const cv::Mat& image, const cv::Mat& mask,
const cv::Mat& characteristics,
cv::Mat& imageCS, cv::Mat& maskCS)
{
float radius = characteristics.at<float>(0,2);
float xCenter = characteristics.at<float>(0,0);
float yCenter = characteristics.at<float>(0,1);
int diameter = round(radius*2);
int xOrg = round(xCenter-radius);
int yOrg = round(yCenter-radius);
cv::Rect roiOrg = cv::Rect( xOrg, yOrg, diameter, diameter );
cv::Mat roiImg = image(roiOrg);
cv::Mat roiMask = mask(roiOrg);
cv::Mat centered = cv::Mat::zeros( diameter, diameter, CV_8UC3);
roiImg.copyTo( centered, roiMask);
cv::imwrite( "centered.bmp", centered); // debug
imageCS.create( TEMPLATE_SIZE, TEMPLATE_SIZE, CV_8UC3);
cv::resize( centered, imageCS, cv::Size(TEMPLATE_SIZE,TEMPLATE_SIZE), 0, 0 );
cv::imwrite( "scaled.bmp", imageCS); // debug
roiMask.copyTo(centered);
cv::resize( centered, maskCS, cv::Size(TEMPLATE_SIZE,TEMPLATE_SIZE), 0, 0 );
}
centerAndScale uses the centroid and radius computed by locate to get a region of interest of the input image and a region of interest of the mask such that the center of the such regions is also the center of the coin and mask and the side length of the regions are equal to the diameter of the coin/mask.
These regions are later scaled to a fixed TEMPLATE_SIZE. This scaled region will be our reference template. When later on in the matching program we want to check if a detected candidate coin is this coin we will also take a region of the candidate coin, center and scale that candidate coin in the same way before performing template matching. This way we achieve scale invariance.
void saveRotatedTemplates( const cv::Mat& image, const cv::Mat& mask, int stepAngle )
{
char name[1000];
cv::Mat rotated( TEMPLATE_SIZE, TEMPLATE_SIZE, CV_8UC3 );
for ( int angle=0; angle<360; angle+=stepAngle )
{
cv::Point2f center( TEMPLATE_SIZE/2, TEMPLATE_SIZE/2);
cv::Mat r = cv::getRotationMatrix2D(center, angle, 1.0);
cv::warpAffine(image, rotated, r, cv::Size(TEMPLATE_SIZE, TEMPLATE_SIZE));
sprintf( name, "template-%03d.bmp", angle);
cv::imwrite( name, rotated );
cv::warpAffine(mask, rotated, r, cv::Size(TEMPLATE_SIZE, TEMPLATE_SIZE));
sprintf( name, "templateMask-%03d.bmp", angle);
cv::imwrite( name, rotated );
}
}
saveRotatedTemplates saves the previous computed template.
But it saves several copies of it, each one rotated by an angle, defined in ANGLE_STEP. The goal of this is to provide orientation invariance. The lower that we define stepAngle the better orientation invariance we get but it also implies a higher computational cost.
You may download the whole template maker program here.
When run with ANGLE_STEP as 30 I get the following 12 templates :
Template Matching.
#define INPUT_IMAGE "coins.jpg"
#define LABELED_IMAGE "coins_with50cLabeled.bmp"
#define LABEL "50c"
#define MATCH_THRESHOLD 0.065
#define ANGLE_STEP 30
int main()
{
vector<cv::Mat> templates;
loadTemplates( templates, ANGLE_STEP );
cv::Mat image = loadImage( INPUT_IMAGE );
cv::Mat mask = createMask( image );
vector<Candidate> candidates;
getCandidates( image, mask, candidates );
saveCandidates( candidates ); // debug
matchCandidates( templates, candidates );
for (int n = 0; n < candidates.size( ); ++n)
std::cout << candidates[n].score << std::endl;
cv::Mat labeledImg = labelCoins( image, candidates, MATCH_THRESHOLD, false, LABEL );
cv::imwrite( LABELED_IMAGE, labeledImg );
return 0;
}
The goal here is to read the templates and the image to be examined and determine the location of coins which match our template.
First we read into a vector of images all the template images we produced in the previous program.
Then we read the image to be examined.
Then we binarize the image to be examined using exactly the same function as in the template maker.
getCandidates locates the groups of points which are toghether forming a polygon. Each of these polygons is a candidate for coin. And all of them are rescaled and centered in a square of size equal to that of our templates so that we can perform matching in a way invariant to scale.
We save the candidate images obtained for debugging and tuning purposes.
matchCandidates matches each candidate with all the templates storing for each the result of the best match. Since we have templates for several orientations this provides invariance to orientation.
Scores of each candidate are printed so we can decide on a threshold to separate 50c coins from non 50c coins.
labelCoins copies the original image and draws a label over the ones which have a score greater than (or lesser than for some methods) the threshold defined in MATCH_THRESHOLD.
And finally we save the result in a .BMP
void loadTemplates(vector<cv::Mat>& templates, int angleStep)
{
templates.clear( );
for (int angle = 0; angle < 360; angle += angleStep)
{
char name[1000];
sprintf( name, "template-%03d.bmp", angle );
cv::Mat templateImg = cv::imread( name );
if (templateImg.data == NULL)
{
std::cout << "Could not read " << name << std::endl;
exit( 1 );
}
templates.push_back( templateImg );
}
}
loadTemplates is similar to loadImage. But it loads several images instead of just one and stores them in a std::vector.
loadImage is exactly the same as in the template maker.
createMask is also exactly the same as in the tempate maker. This time we apply it to the image with several coins. It should be noted that binarization thresholds were chosen to binarize the 50c and those will not work properly to binarize all the coins in the image. But that is of no consequence since the program objective is only to identify 50c coins. As long as those are properly segmented we are fine. It actually works in our favour if some coins are lost in this segmentation since we will save time evaluating them (as long as we only lose coins which are not 50c).
typedef struct Candidate
{
cv::Mat image;
float x;
float y;
float radius;
float score;
} Candidate;
void getCandidates(const cv::Mat& image, const cv::Mat& mask,
vector<Candidate>& candidates)
{
vector<vector<cv::Point> > contours;
vector<cv::Vec4i> hierarchy;
/// Find contours
cv::Mat maskCopy;
mask.copyTo( maskCopy );
cv::findContours( maskCopy, contours, hierarchy, CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE, cv::Point( 0, 0 ) );
cv::Mat maskCS;
cv::Mat imageCS;
cv::Scalar white = cv::Scalar( 255 );
for (int nContour = 0; nContour < contours.size( ); ++nContour)
{
/// Draw contour
cv::Mat drawing = cv::Mat::zeros( mask.size( ), CV_8UC1 );
cv::drawContours( drawing, contours, nContour, white, -1, 8, hierarchy, 0, cv::Point( ) );
// Compute center and radius and area.
// Discard small areas.
cv::Moments moments = cv::moments( drawing, true );
float area = moments.m00;
if (area < CANDIDATES_MIN_AREA)
continue;
Candidate candidate;
candidate.radius = sqrt( area / M_PI );
candidate.x = moments.m10 / moments.m00;
candidate.y = moments.m01 / moments.m00;
float m[1][3] = {
{ candidate.x, candidate.y, candidate.radius}
};
cv::Mat characteristics( 1, 3, CV_32F, m );
centerAndScale( image, drawing, characteristics, imageCS, maskCS );
imageCS.copyTo( candidate.image );
candidates.push_back( candidate );
}
}
The heart of getCandidates is cv::findContours which finds the contours of areas present in its input image. Which here is the mask previously computed.
findContours returns a vector of contours. Each contour itself being a vector of points which form the outer line of the detected polygon.
Each polygon delimites the region of each candidate coin.
For each contour we use cv::drawContours to draw the filled polygon over a black image.
With this drawn image we use the same procedure earlier explained to compute centroid and radius of the polygon.
And we use centerAndScale, the same function used in the template maker, to center and scale the image contained in that poligon in an image which will have the same size as our templates. This way we will later on be able to perform a proper matching even for coins from photos of different scales.
Each of these candidate coins is copied in a Candidate structure which contains :
Candidate image
x and y for centroid
radius
score
getCandidates computes all these values except for score.
After composing the candidate it is put in a vector of candidates which is the result we get from getCandidates.
These are the 4 candidates obtained :
void saveCandidates(const vector<Candidate>& candidates)
{
for (int n = 0; n < candidates.size( ); ++n)
{
char name[1000];
sprintf( name, "Candidate-%03d.bmp", n );
cv::imwrite( name, candidates[n].image );
}
}
saveCandidates saves the computed candidates for debugging purpouses. And also so that I may post those images here.
void matchCandidates(const vector<cv::Mat>& templates,
vector<Candidate>& candidates)
{
for (auto it = candidates.begin( ); it != candidates.end( ); ++it)
matchCandidate( templates, *it );
}
matchCandidates just calls matchCandidate for each candidate. After completion we will have the score for all candidates computed.
void matchCandidate(const vector<cv::Mat>& templates, Candidate& candidate)
{
/// For SQDIFF and SQDIFF_NORMED, the best matches are lower values. For all the other methods, the higher the better
candidate.score;
if (MATCH_METHOD == CV_TM_SQDIFF || MATCH_METHOD == CV_TM_SQDIFF_NORMED)
candidate.score = FLT_MAX;
else
candidate.score = 0;
for (auto it = templates.begin( ); it != templates.end( ); ++it)
{
float score = singleTemplateMatch( *it, candidate.image );
if (MATCH_METHOD == CV_TM_SQDIFF || MATCH_METHOD == CV_TM_SQDIFF_NORMED)
{
if (score < candidate.score)
candidate.score = score;
}
else
{
if (score > candidate.score)
candidate.score = score;
}
}
}
matchCandidate has as input a single candidate and all the templates. It's goal is to match each template against the candidate. That work is delegated to singleTemplateMatch.
We store the best score obtained, which for CV_TM_SQDIFF and CV_TM_SQDIFF_NORMED is the smallest one and for the other matching methods is the biggest one.
float singleTemplateMatch(const cv::Mat& templateImg, const cv::Mat& candidateImg)
{
cv::Mat result( 1, 1, CV_8UC1 );
cv::matchTemplate( candidateImg, templateImg, result, MATCH_METHOD );
return result.at<float>( 0, 0 );
}
singleTemplateMatch peforms the matching.
cv::matchTemplate uses two imput images, the second smaller or equal in size to the first one.
The common use case is for a small template (2nd parameter) to be matched against a larger image (1st parameter) and the result is a bidimensional Mat of floats with the matching of the template along the image. Locating the maximun (or minimun depending on the method) of this Mat of floats we get the best candidate position for our template in the image of the 1st parameter.
But we are not interested in locating our template in the image, we already have the coordinates of our candidates.
What we want is to get a measure of similitude between our candidate and template. Which is why we use cv::matchTemplate in a way which is less usual; we do so with a 1st parameter image of size equal to the 2nd parameter template. In this situation the result is a Mat of size 1x1. And the single value in that Mat is our score of similitude (or dissimilitude).
for (int n = 0; n < candidates.size( ); ++n)
std::cout << candidates[n].score << std::endl;
We print the scores obtained for each of our candidates.
In this table we can see the scores for each of the methods available for cv::matchTemplate. The best score is in green.
CCORR and CCOEFF give a wrong result, so those two are discarded. Of the remaining 4 methods the two SQDIFF methods are the ones with higher relative difference between the best match (which is a 50c) and the 2nd best (which is not a 50c). Which is why I have choosen them.
I have chosen SQDIFF_NORMED but there is no strong reason for that. In order to really chose a method we should test with a higher ammount of samples, not just one.
For this method a working threshold could be 0.065. Selection of a proper threshold also requires many samples.
bool selected(const Candidate& candidate, float threshold)
{
/// For SQDIFF and SQDIFF_NORMED, the best matches are lower values. For all the other methods, the higher the better
if (MATCH_METHOD == CV_TM_SQDIFF || MATCH_METHOD == CV_TM_SQDIFF_NORMED)
return candidate.score <= threshold;
else
return candidate.score>threshold;
}
void drawLabel(const Candidate& candidate, const char* label, cv::Mat image)
{
int x = candidate.x - candidate.radius;
int y = candidate.y;
cv::Point point( x, y );
cv::Scalar blue( 255, 128, 128 );
cv::putText( image, label, point, CV_FONT_HERSHEY_SIMPLEX, 1.5f, blue, 2 );
}
cv::Mat labelCoins(const cv::Mat& image, const vector<Candidate>& candidates,
float threshold, bool inverseThreshold, const char* label)
{
cv::Mat imageLabeled;
image.copyTo( imageLabeled );
for (auto it = candidates.begin( ); it != candidates.end( ); ++it)
{
if (selected( *it, threshold ))
drawLabel( *it, label, imageLabeled );
}
return imageLabeled;
}
labelCoins draws a label string at the location of candidates with a score bigger than ( or lesser than depending on the method) the threshold.
And finally the result of labelCoins is saved with
cv::imwrite( LABELED_IMAGE, labeledImg );
The result being :
The whole code for the coin matcher can be downloaded here.
Is this a good method?
That is hard to tell.
The method is consistent. It correctly detects the 50c coin for the sample and input image provided.
But we have no idea if the method is robust because it has not been tested with a proper sample size. And even more important is to test it against samples which were not available when the program was being coded, that is the true measure of robustness when done with a large enough sample size.
I am rather confident in the method not having false positives from silver coins. But I am not so sure about other copper coins like the 20c. As we can see from the scores obtained the 20c coin gets a score very similar to the 50c.
It is also quite possible that false negatives will happen under varying lighting conditions. Which is something that can and should be avoided if we have control over lighting conditions such as when we are designing a machine to take photos of coins and count them.
If the method works the same method can be repeated for each type of coin leading to full detection of all coins.
Code in this answer is also available under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
If you detect all coins correctly Its better to use size(radial) and RGB features to recognize its value. Its not a good idea that concatenate these features because their number are not equal ( size is one number and number of RGB features are much larger than one). I recommend you to use two classifier for this purpose. One for size and another for RGB features.
You have to classify all coins into for example 3 (It depends on type
of your coins) size class. You can do this with a simple 1NN
classifier (just calculate the radial of test coin and classify it to
nearest predefined radial)
Then you should have some templates in each size and use template matching to recognize its value.(all templates and detected coins should be resize to a particular size. e.g. (100,100) ) For template
matching you can use matchtemplate function. I thing that the CV_TM_CCOEFF method may be the best one, but you can test all methods
to get a good result. (Note you don't need to search on image for coin because you detect the coin previously as you mentioned in your
question. You just need to use this function to get one number as a similarity/difference between two image and classify the test coin to a class which the similarity is maximized or difference is minimized)
EDIT1: You should have all rotations in your templates in each class to compensate the rotation of test coin.
EDIT2: If all coins are in different sizes the first step is enough. Otherwise you should patch the similar sizes to one class and classify the test coin using the second step (RGB features).
(1) Find the coins edge, using Hough Transform Algorithm.
(2) Determine the origin dot of the coins. I don't know how you'll do this.
(3) You can use k from KNN Algorithm for comparing the diameter or of the coins. Don't forget to set the bias value.
You could try and set up a training set of coin images and generate SIFT/SURF etc. descriptors of it. (EDIT: OpenCV feature detectors
Using these data you could set up a kNN classifier, using the coins values as training labels.
Once you perform kNN classification on you segmented coin images, your classification result would yield the coins value.
Related
Firstly I integrate OpenCV framework to XCode and All the OpenCV code is on ObjectiveC and I am using in Swift Using bridging header. I am new to OpenCV Framework and trying to achieve count of vertical lines from the image.
Here is my code:
First I am converting the image to GrayScale
+ (UIImage *)convertToGrayscale:(UIImage *)image {
cv::Mat mat;
UIImageToMat(image, mat);
cv::Mat gray;
cv::cvtColor(mat, gray, CV_RGB2GRAY);
UIImage *grayscale = MatToUIImage(gray);
return grayscale;
}
Then, I am detecting edges so I can find the line of gray color
+ (UIImage *)detectEdgesInRGBImage:(UIImage *)image {
cv::Mat mat;
UIImageToMat(image, mat);
//Prepare the image for findContours
cv::threshold(mat, mat, 128, 255, CV_THRESH_BINARY);
//Find the contours. Use the contourOutput Mat so the original image doesn't get overwritten
std::vector<std::vector<cv::Point> > contours;
cv::Mat contourOutput = mat.clone();
cv::findContours( contourOutput, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE );
NSLog(#"Count =>%lu", contours.size());
//For Blue
/*cv::GaussianBlur(mat, gray, cv::Size(11, 11), 0); */
UIImage *grayscale = MatToUIImage(mat);
return grayscale;
}
This both Function is written on Objective C
Here, I am calling both function Swift
override func viewDidLoad() {
super.viewDidLoad()
let img = UIImage(named: "imagenamed")
let img1 = Wrapper.convert(toGrayscale: img)
self.capturedImageView.image = Wrapper.detectEdges(inRGBImage: img1)
}
I was doing this for some days and finding some useful documents(Reference Link)
OpenCV - how to count objects in photo?
How to count number of lines (Hough Trasnform) in OpenCV
OPENCV Documents
https://docs.opencv.org/2.4/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html?#findcontours
Basically, I understand the first we need to convert this image to black and white, and then using cvtColor, threshold and findContours we can find the colors or lines.
I am attaching the image that vertical Lines I want to get.
Original Image
Output Image that I am getting
I got number of lines count =>10
I am not able to get accurate count here.
Please guide me on this. Thank You!
Since you want to detect the number of the vertical lines, there is a very simple approach I can suggest for you. You already got a clear output and I used this output in my code. Here are the steps before the code:
Preprocess the input image to get the lines clearly
Check each row and check until get a pixel whose value is higher than 100(threshold value I chose)
Then increase the line counter for that row
Continue on that line until get a pixel whose value is lower than 100
Restart from step 3 and finish the image for each row
At the end, check the most repeated element in the array which you assigned line numbers for each row. This number will be the number of vertical lines.
Note: If the steps are difficult to understand, think like this way:
" I am checking the first row, I found a pixel which is higher than
100, now this is a line edge starting, increase the counter for this
row. Search on this row until get a pixel smaller than 100, and then
research a pixel bigger than 100. when row is finished, assign the
line number for this row to a big array. Do this for all image. At the
end, since some lines looks like two lines at the top and also some
noises can occur, you should take the most repeated element in the big
array as the number of lines."
Here is the code part in C++:
#include <vector>
#include <iostream>
#include <opencv2/opencv.hpp>
#include <opencv2/highgui/highgui.hpp>
int main()
{
cv::Mat img = cv::imread("/ur/img/dir/img.jpg",cv::IMREAD_GRAYSCALE);
std::vector<int> numberOfVerticalLinesForEachRow;
cv::Rect r(0,0,img.cols-10,200);
img = img(r);
bool blackCheck = 1;
for(int i=0; i<img.rows; i++)
{
int numberOfLines = 0;
for(int j=0; j<img.cols; j++)
{
if((int)img.at<uchar>(cv::Point(j,i))>100 && blackCheck)
{
numberOfLines++;
blackCheck = 0;
}
if((int)img.at<uchar>(cv::Point(j,i))<100)
blackCheck = 1;
}
numberOfVerticalLinesForEachRow.push_back(numberOfLines);
}
// In this part you need a simple algorithm to check the most repeated element
for(int k:numberOfVerticalLinesForEachRow)
std::cout<<k<<std::endl;
cv::namedWindow("WinWin",0);
cv::imshow("WinWin",img);
cv::waitKey(0);
}
Here's another possible approach. It relies mainly on the cv::thinning function from the extended image processing module to reduce the lines at a width of 1 pixel. We can crop a ROI from this image and count the number of transitions from 255 (white) to 0 (black). These are the steps:
Threshold the image using Otsu's method
Apply some morphology to clean up the binary image
Get the skeleton of the image
Crop a ROI from the center of the image
Count the number of jumps from 255 to 0
This is the code, be sure to include the extended image processing module (ximgproc) and also link it before compiling it:
#include <iostream>
#include <opencv2/opencv.hpp>
#include <opencv2/ximgproc.hpp> // The extended image processing module
// Read Image:
std::string imagePath = "D://opencvImages//";
cv::Mat inputImage = cv::imread( imagePath+"IN2Xh.png" );
// Convert BGR to Grayscale:
cv::cvtColor( inputImage, inputImage, cv::COLOR_BGR2GRAY );
// Get binary image via Otsu:
cv::threshold( inputImage, inputImage, 0, 255, cv::THRESH_OTSU );
The above snippet produces the following image:
Note that there's a little bit of noise due to the thresholding, let's try to remove those isolated blobs of white pixels by applying some morphology. Maybe an opening, which is an erosion followed by dilation. The structuring elements and iterations, though, are not the same, and these where found by experimentation. I wanted to remove the majority of the isolated blobs without modifying too much the original image:
// Apply Morphology. Erosion + Dilation:
// Set rectangular structuring element of size 3 x 3:
cv::Mat SE = cv::getStructuringElement( cv::MORPH_RECT, cv::Size(3, 3) );
// Set the iterations:
int morphoIterations = 1;
cv::morphologyEx( inputImage, inputImage, cv::MORPH_ERODE, SE, cv::Point(-1,-1), morphoIterations);
// Set rectangular structuring element of size 5 x 5:
SE = cv::getStructuringElement( cv::MORPH_RECT, cv::Size(5, 5) );
// Set the iterations:
morphoIterations = 2;
cv::morphologyEx( inputImage, inputImage, cv::MORPH_DILATE, SE, cv::Point(-1,-1), morphoIterations);
This combination of structuring elements and iterations yield the following filtered image:
Its looking alright. Now comes the main idea of the algorithm. If we compute the skeleton of this image, we would "normalize" all the lines to a width of 1 pixel, which is very handy, because we could reduce the image to a 1 x 1 (row) matrix and count the number of jumps. Since the lines are "normalized" we could get rid of possible overlaps between lines. Now, skeletonized images sometimes produce artifacts near the borders of the image. These artifacts resemble thickened anchors at the first and last row of the image. To prevent these artifacts we can extend borders prior to computing the skeleton:
// Extend borders to avoid skeleton artifacts, extend 5 pixels in all directions:
cv::copyMakeBorder( inputImage, inputImage, 5, 5, 5, 5, cv::BORDER_CONSTANT, 0 );
// Get the skeleton:
cv::Mat imageSkelton;
cv::ximgproc::thinning( inputImage, imageSkelton );
This is the skeleton obtained:
Nice. Before we count jumps, though, we must observe that the lines are skewed. If we reduce this image directly to a one row, some overlapping could indeed happen between to lines that are too skewed. To prevent this, I crop a middle section of the skeleton image and count transitions there. Let's crop the image:
// Crop middle ROI:
cv::Rect linesRoi;
linesRoi.x = 0;
linesRoi.y = 0.5 * imageSkelton.rows;
linesRoi.width = imageSkelton.cols;
linesRoi.height = 1;
cv::Mat imageROI = imageSkelton( linesRoi );
This would be the new ROI, which is just the middle row of the skeleton image:
Let me prepare a BGR copy of this just to draw some results:
// BGR version of the Grayscale ROI:
cv::Mat colorROI;
cv::cvtColor( imageROI, colorROI, cv::COLOR_GRAY2BGR );
Ok, let's loop through the image and count the transitions between 255 and 0. That happens when we look at the value of the current pixel and compare it with the value obtained an iteration earlier. The current pixel must be 0 and the past pixel 255. There's more than a way to loop through a cv::Mat in C++. I prefer to use cv::MatIterator_s and pointer arithmetic:
// Set the loop variables:
cv::MatIterator_<cv::Vec3b> it, end;
uchar pastPixel = 0;
int jumpsCounter = 0;
int i = 0;
// Loop thru image ROI and count 255-0 jumps:
for (it = imageROI.begin<cv::Vec3b>(), end = imageROI.end<cv::Vec3b>(); it != end; ++it) {
// Get current pixel
uchar ¤tPixel = (*it)[0];
// Compare it with past pixel:
if ( (currentPixel == 0) && (pastPixel == 255) ){
// We have a jump:
jumpsCounter++;
// Draw the point on the BGR version of the image:
cv::line( colorROI, cv::Point(i, 0), cv::Point(i, 0), cv::Scalar(0, 0, 255), 1 );
}
// current pixel is now past pixel:
pastPixel = currentPixel;
i++;
}
// Show image and print number of jumps found:
cv::namedWindow( "Jumps Found", CV_WINDOW_NORMAL );
cv::imshow( "Jumps Found", colorROI );
cv::waitKey( 0 );
std::cout<<"Jumps Found: "<<jumpsCounter<<std::endl;
The points where the jumps were found are drawn in red, and the number of total jumps printed is:
Jumps Found: 9
I have to write a program that detect 3 types of road signs (speed limit, no parking and warnings). I know how to detect a circle using HoughCircles but I have several images and the parameters for HoughCircles are different for each image. There's a general way to detect circles without changing parameters for each image?
Moreover I need to detect triangle (warning signs) so I'm searching for a general shape detector. Have you any suggestions/code that can help me in this task?
Finally for detect the number on speed limit signs I thought to use SIFT and compare the image with some templates in order to identify the number on the sign. Could it be a good approach?
Thank you for the answer!
I know this is a pretty old question but I had been through the same problem and now I show you how I solved it.
The following images show some of the most accurate results that are displayed by the opencv program.
In the following images the street signs detected are circled with three different colors that distinguish the three kinds of street signs (warning, no parking, speed limit).
Red for warning signs
Blue for no parking signs
Fuchsia for speed limit signs
The speed limit value is written in green above the speed limit signs
[![example][1]][1]
[![example][2]][2]
[![example][3]][3]
[![example][4]][4]
As you can see the program performs quite well, it is able to detect and distinguish the three kinds of sign and to recognize the speed limit value in case of speed limit signs. Everything is done without computing too many false positives when, for instance, in the image there are some signs that do not belong to one of the three categories.
In order to achieve this result the software computes the detection in three main steps.
The first step involves a color based approach where the red objects in the image are detected and their region are extract to be analyzed. This step is particularly useful in order to prevent the detection of false positives, because only a small part of the image is processed.
The second step works with a machine learning algorithm: in particular we use a Cascade Classifier to compute the detection. This operation firstly requires to train the classifiers and on a later stage to use them to detect the signs.
In the last step the speed limit values inside the speed limit signs are read, also in this case through a machine learning algorithm but using the k-nearest neighbor algorithm.
Now we are going to see in detail each step.
COLOR BASED STEP
Since the street signs are always circled by a red frame, we can afford to take out and analyze only the regions where the red objects are detected.
In order to select the red objects, we consider all the ranges of the red color: even if this may produce some false positives, they will be easily discarded in the next steps.
inRange(image, Scalar(0, 70, 50), Scalar(10, 255, 255), mask1);
inRange(image, Scalar(170, 70, 50), Scalar(180, 255, 255), mask2);
In the image below we can see an example of the red objects detected with this method.
After having found the red pixels we can gather them to find the regions using a clustering algorithm, I use the method
partition(<#_ForwardIterator __first#>, _ForwardIterator __last, <#_Predicate __pred#>)
After the execution of this method we can save all the points in the same cluster in a vector (one for each cluster) and extract the bounding boxes which represent the
regions to be analyzed in the next step.
HAAR CASCADE CLASSIFIERS FOR SIGNS DETECTION
This is the real detection step where the street signs are detected. In order to perform a cascade classifier the first step consist in building a dataset of positives and negatives images. Now I explain how I have built my own datasets of images.
The first thing to note is that we need to train three different Haar cascades in order to distinguish between the three kind of signs that we have to detect, hence we must repeat the following steps for each of the three kinds of sign.
We need two datasets: one for the positive samples (which must be a set of images that contains the road signs that we are going to detect) and another one for the negative samples which can be any kind of image without street signs.
After collecting a set of 100 images for the positive samples and a set of 200 images for the negatives in two different folders, we need to write two text files:
Signs.info which contains a list of file names like the one below,
one for each positive sample in the positive folder.
pos/image_name.png 1 0 0 50 45
Here, the numbers after the name represent respectively the number
of street signs in the image, the coordinate of the upper left
corner of the street sign, his height and his width.
Bg.txt which contains a list of file names like the one below, one
for each sign in the negative folder.
neg/street15.png
With the command line below we generate the .vect file which contains all the information that the software retrieves from the positive samples.
opencv_createsamples -info sign.info -num 100 -w 50 -h 50 -vec signs.vec
Afterwards we train the cascade classifier with the following command:
opencv_traincascade -data data -vec signs.vec -bg bg.txt -numPos 60 -numNeg 200 -numStages 15 -w 50 -h 50 -featureType LBP
where the number of stages indicates the number of classifiers that will be generated in order to build the cascade.
At the end of this process we gain a file cascade.xml which will be used from the CascadeClassifier program in order to detect the objects in the image.
Now we have trained our algorithm and we can declare a CascadeClassifier for each kind of street sign, than we detect the signs in the image through
detectMultiScale(<#InputArray image#>, <#std::vector<Rect> &objects#>)
this method creates a Rect around each object that has been detected.
It is important to note that exactly as every machine learning algorithm, in order to perform well, we need a large number of samples in the dataset. The dataset that I have built, is not extremely large, thus in some situations it is not able to detect all the signs. This mostly happens when a small part of the street sign is not visible in the image like in the warning sign below:
I have expanded my dataset up to the point where I have obtained a fairly accurate result without
too many errors.
SPEED LIMIT VALUE DETECTION
Like for the street signs detection also here I used a machine learning algorithm but with a different approach. After some work, I realized that an OCR (tesseract) solution does not perform well, so I decided to build my own ocr software.
For the machine learning algorithm I took the image below as training data which contains some speed limit values:
The amount of training data is small. But, since in speed limit signs all letters have the same font, it is not a huge problem.
To prepare the data for training, I made a small code in OpenCV. It does the following things:
It loads the image on the left;
It selects the digits (obviously by contour finding and applying constraints on area and height of letters to avoid false detections).
It draws the bounding rectangle around one letter and it waits for the key to be manually pressed. This time the user presses the digit key corresponding to the letter in box by himself.
Once the corresponding digit key is pressed, it saves 100 pixel values in an array and the correspondent manually entered digit in another array.
Eventually it saves both the arrays in separate txt files.
Following the manual digit classification all the digits in the train data( train.png) are manually labeled, and the image will look like the one below.
Now we enter into training and testing part.
For training we do as follows:
Load the txt files we already saved earlier
Create an instance of classifier that we are going to use ( KNearest)
Then we use KNearest.train function to train the data
Now the detection:
We load the image with the speed limit sign detected
Process the image as before and extract each digit using contour methods
Draw bounding box for it, then resize to 10x10, and store its pixel values in an array as done earlier.
Then we use KNearest.find_nearest() function to find the nearest item to the one we gave.
And it recognizes the correct digit.
I tested this little OCR on many images, and just with this small dataset I have obtained an accuracy of about 90%.
CODE
Below I post all my openCv c++ code in a single class, following my instruction you should be able to achive my result.
#include "opencv2/objdetect/objdetect.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include <iostream>
#include <stdio.h>
#include <cmath>
#include <stdlib.h>
#include "opencv2/core/core.hpp"
#include "opencv2/highgui.hpp"
#include <string.h>
#include <opencv2/ml/ml.hpp>
using namespace std;
using namespace cv;
std::vector<cv::Rect> getRedObjects(cv::Mat image);
vector<Mat> detectAndDisplaySpeedLimit( Mat frame );
vector<Mat> detectAndDisplayNoParking( Mat frame );
vector<Mat> detectAndDisplayWarning( Mat frame );
void trainDigitClassifier();
string getDigits(Mat image);
vector<Mat> loadAllImage();
int getSpeedLimit(string speed);
//path of the haar cascade files
String no_parking_signs_cascade = "/Users/giuliopettenuzzo/Desktop/cascade_classifiers/no_parking_cascade.xml";
String speed_signs_cascade = "/Users/giuliopettenuzzo/Desktop/cascade_classifiers/speed_limit_cascade.xml";
String warning_signs_cascade = "/Users/giuliopettenuzzo/Desktop/cascade_classifiers/warning_cascade.xml";
CascadeClassifier speed_limit_cascade;
CascadeClassifier no_parking_cascade;
CascadeClassifier warning_cascade;
int main(int argc, char** argv)
{
//train the classifier for digit recognition, this require a manually train, read the report for more details
trainDigitClassifier();
cv::Mat sceneImage;
vector<Mat> allImages = loadAllImage();
for(int i = 0;i<=allImages.size();i++){
sceneImage = allImages[i];
//load the haar cascade files
if( !speed_limit_cascade.load( speed_signs_cascade ) ){ printf("--(!)Error loading\n"); return -1; };
if( !no_parking_cascade.load( no_parking_signs_cascade ) ){ printf("--(!)Error loading\n"); return -1; };
if( !warning_cascade.load( warning_signs_cascade ) ){ printf("--(!)Error loading\n"); return -1; };
Mat scene = sceneImage.clone();
//detect the red objects
std::vector<cv::Rect> allObj = getRedObjects(scene);
//use the three cascade classifier for each object detected by the getRedObjects() method
for(int j = 0;j<allObj.size();j++){
Mat img = sceneImage(Rect(allObj[j]));
vector<Mat> warningVec = detectAndDisplayWarning(img);
if(warningVec.size()>0){
Rect box = allObj[j];
}
vector<Mat> noParkVec = detectAndDisplayNoParking(img);
if(noParkVec.size()>0){
Rect box = allObj[j];
}
vector<Mat> speedLitmitVec = detectAndDisplaySpeedLimit(img);
if(speedLitmitVec.size()>0){
Rect box = allObj[j];
for(int i = 0; i<speedLitmitVec.size();i++){
//get speed limit and skatch it in the image
int digit = getSpeedLimit(getDigits(speedLitmitVec[i]));
if(digit > 0){
Point point = box.tl();
point.y = point.y + 30;
cv::putText(sceneImage,
"SPEED LIMIT " + to_string(digit),
point,
cv::FONT_HERSHEY_COMPLEX_SMALL,
0.7,
cv::Scalar(0,255,0),
1,
cv::CV__CAP_PROP_LATEST);
}
}
}
}
imshow("currentobj",sceneImage);
waitKey(0);
}
}
/*
* detect the red object in the image given in the param,
* return a vector containing all the Rect of the red objects
*/
std::vector<cv::Rect> getRedObjects(cv::Mat image)
{
Mat3b res = image.clone();
std::vector<cv::Rect> result;
cvtColor(image, image, COLOR_BGR2HSV);
Mat1b mask1, mask2;
//ranges of red color
inRange(image, Scalar(0, 70, 50), Scalar(10, 255, 255), mask1);
inRange(image, Scalar(170, 70, 50), Scalar(180, 255, 255), mask2);
Mat1b mask = mask1 | mask2;
Mat nonZeroCoordinates;
vector<Point> pts;
findNonZero(mask, pts);
for (int i = 0; i < nonZeroCoordinates.total(); i++ ) {
cout << "Zero#" << i << ": " << nonZeroCoordinates.at<Point>(i).x << ", " << nonZeroCoordinates.at<Point>(i).y << endl;
}
int th_distance = 2; // radius tolerance
// Apply partition
// All pixels within the radius tolerance distance will belong to the same class (same label)
vector<int> labels;
// With lambda function (require C++11)
int th2 = th_distance * th_distance;
int n_labels = partition(pts, labels, [th2](const Point& lhs, const Point& rhs) {
return ((lhs.x - rhs.x)*(lhs.x - rhs.x) + (lhs.y - rhs.y)*(lhs.y - rhs.y)) < th2;
});
// You can save all points in the same class in a vector (one for each class), just like findContours
vector<vector<Point>> contours(n_labels);
for (int i = 0; i < pts.size(); ++i){
contours[labels[i]].push_back(pts[i]);
}
// Get bounding boxes
vector<Rect> boxes;
for (int i = 0; i < contours.size(); ++i)
{
Rect box = boundingRect(contours[i]);
if(contours[i].size()>500){//prima era 1000
boxes.push_back(box);
Rect enlarged_box = box + Size(100,100);
enlarged_box -= Point(30,30);
if(enlarged_box.x<0){
enlarged_box.x = 0;
}
if(enlarged_box.y<0){
enlarged_box.y = 0;
}
if(enlarged_box.height + enlarged_box.y > res.rows){
enlarged_box.height = res.rows - enlarged_box.y;
}
if(enlarged_box.width + enlarged_box.x > res.cols){
enlarged_box.width = res.cols - enlarged_box.x;
}
Mat img = res(Rect(enlarged_box));
result.push_back(enlarged_box);
}
}
Rect largest_box = *max_element(boxes.begin(), boxes.end(), [](const Rect& lhs, const Rect& rhs) {
return lhs.area() < rhs.area();
});
//draw the rects in case you want to see them
for(int j=0;j<=boxes.size();j++){
if(boxes[j].area() > largest_box.area()/3){
rectangle(res, boxes[j], Scalar(0, 0, 255));
Rect enlarged_box = boxes[j] + Size(20,20);
enlarged_box -= Point(10,10);
rectangle(res, enlarged_box, Scalar(0, 255, 0));
}
}
rectangle(res, largest_box, Scalar(0, 0, 255));
Rect enlarged_box = largest_box + Size(20,20);
enlarged_box -= Point(10,10);
rectangle(res, enlarged_box, Scalar(0, 255, 0));
return result;
}
/*
* code for detect the speed limit sign , it draws a circle around the speed limit signs
*/
vector<Mat> detectAndDisplaySpeedLimit( Mat frame )
{
std::vector<Rect> signs;
vector<Mat> result;
Mat frame_gray;
cvtColor( frame, frame_gray, CV_BGR2GRAY );
//normalizes the brightness and increases the contrast of the image
equalizeHist( frame_gray, frame_gray );
//-- Detect signs
speed_limit_cascade.detectMultiScale( frame_gray, signs, 1.1, 3, 0|CV_HAAR_SCALE_IMAGE, Size(30, 30) );
cout << speed_limit_cascade.getFeatureType();
for( size_t i = 0; i < signs.size(); i++ )
{
Point center( signs[i].x + signs[i].width*0.5, signs[i].y + signs[i].height*0.5 );
ellipse( frame, center, Size( signs[i].width*0.5, signs[i].height*0.5), 0, 0, 360, Scalar( 255, 0, 255 ), 4, 8, 0 );
Mat resultImage = frame(Rect(center.x - signs[i].width*0.5,center.y - signs[i].height*0.5,signs[i].width,signs[i].height));
result.push_back(resultImage);
}
return result;
}
/*
* code for detect the warning sign , it draws a circle around the warning signs
*/
vector<Mat> detectAndDisplayWarning( Mat frame )
{
std::vector<Rect> signs;
vector<Mat> result;
Mat frame_gray;
cvtColor( frame, frame_gray, CV_BGR2GRAY );
equalizeHist( frame_gray, frame_gray );
//-- Detect signs
warning_cascade.detectMultiScale( frame_gray, signs, 1.1, 3, 0|CV_HAAR_SCALE_IMAGE, Size(30, 30) );
cout << warning_cascade.getFeatureType();
Rect previus;
for( size_t i = 0; i < signs.size(); i++ )
{
Point center( signs[i].x + signs[i].width*0.5, signs[i].y + signs[i].height*0.5 );
Rect newRect = Rect(center.x - signs[i].width*0.5,center.y - signs[i].height*0.5,signs[i].width,signs[i].height);
if((previus & newRect).area()>0){
previus = newRect;
}else{
ellipse( frame, center, Size( signs[i].width*0.5, signs[i].height*0.5), 0, 0, 360, Scalar( 0, 0, 255 ), 4, 8, 0 );
Mat resultImage = frame(newRect);
result.push_back(resultImage);
previus = newRect;
}
}
return result;
}
/*
* code for detect the no parking sign , it draws a circle around the no parking signs
*/
vector<Mat> detectAndDisplayNoParking( Mat frame )
{
std::vector<Rect> signs;
vector<Mat> result;
Mat frame_gray;
cvtColor( frame, frame_gray, CV_BGR2GRAY );
equalizeHist( frame_gray, frame_gray );
//-- Detect signs
no_parking_cascade.detectMultiScale( frame_gray, signs, 1.1, 3, 0|CV_HAAR_SCALE_IMAGE, Size(30, 30) );
cout << no_parking_cascade.getFeatureType();
Rect previus;
for( size_t i = 0; i < signs.size(); i++ )
{
Point center( signs[i].x + signs[i].width*0.5, signs[i].y + signs[i].height*0.5 );
Rect newRect = Rect(center.x - signs[i].width*0.5,center.y - signs[i].height*0.5,signs[i].width,signs[i].height);
if((previus & newRect).area()>0){
previus = newRect;
}else{
ellipse( frame, center, Size( signs[i].width*0.5, signs[i].height*0.5), 0, 0, 360, Scalar( 255, 0, 0 ), 4, 8, 0 );
Mat resultImage = frame(newRect);
result.push_back(resultImage);
previus = newRect;
}
}
return result;
}
/*
* train the classifier for digit recognition, this could be done only one time, this method save the result in a file and
* it can be used in the next executions
* in order to train user must enter manually the corrisponding digit that the program shows, press space if the red box is just a point (false positive)
*/
void trainDigitClassifier(){
Mat thr,gray,con;
Mat src=imread("/Users/giuliopettenuzzo/Desktop/all_numbers.png",1);
cvtColor(src,gray,CV_BGR2GRAY);
threshold(gray,thr,125,255,THRESH_BINARY_INV); //Threshold to find contour
imshow("ci",thr);
waitKey(0);
thr.copyTo(con);
// Create sample and label data
vector< vector <Point> > contours; // Vector for storing contour
vector< Vec4i > hierarchy;
Mat sample;
Mat response_array;
findContours( con, contours, hierarchy,CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE ); //Find contour
for( int i = 0; i< contours.size(); i=hierarchy[i][0] ) // iterate through first hierarchy level contours
{
Rect r= boundingRect(contours[i]); //Find bounding rect for each contour
rectangle(src,Point(r.x,r.y), Point(r.x+r.width,r.y+r.height), Scalar(0,0,255),2,8,0);
Mat ROI = thr(r); //Crop the image
Mat tmp1, tmp2;
resize(ROI,tmp1, Size(10,10), 0,0,INTER_LINEAR ); //resize to 10X10
tmp1.convertTo(tmp2,CV_32FC1); //convert to float
imshow("src",src);
int c=waitKey(0); // Read corresponding label for contour from keyoard
c-=0x30; // Convert ascii to intiger value
response_array.push_back(c); // Store label to a mat
rectangle(src,Point(r.x,r.y), Point(r.x+r.width,r.y+r.height), Scalar(0,255,0),2,8,0);
sample.push_back(tmp2.reshape(1,1)); // Store sample data
}
// Store the data to file
Mat response,tmp;
tmp=response_array.reshape(1,1); //make continuous
tmp.convertTo(response,CV_32FC1); // Convert to float
FileStorage Data("TrainingData.yml",FileStorage::WRITE); // Store the sample data in a file
Data << "data" << sample;
Data.release();
FileStorage Label("LabelData.yml",FileStorage::WRITE); // Store the label data in a file
Label << "label" << response;
Label.release();
cout<<"Training and Label data created successfully....!! "<<endl;
imshow("src",src);
waitKey(0);
}
/*
* get digit from the image given in param, using the classifier trained before
*/
string getDigits(Mat image)
{
Mat thr1,gray1,con1;
Mat src1 = image.clone();
cvtColor(src1,gray1,CV_BGR2GRAY);
threshold(gray1,thr1,125,255,THRESH_BINARY_INV); // Threshold to create input
thr1.copyTo(con1);
// Read stored sample and label for training
Mat sample1;
Mat response1,tmp1;
FileStorage Data1("TrainingData.yml",FileStorage::READ); // Read traing data to a Mat
Data1["data"] >> sample1;
Data1.release();
FileStorage Label1("LabelData.yml",FileStorage::READ); // Read label data to a Mat
Label1["label"] >> response1;
Label1.release();
Ptr<ml::KNearest> knn(ml::KNearest::create());
knn->train(sample1, ml::ROW_SAMPLE,response1); // Train with sample and responses
cout<<"Training compleated.....!!"<<endl;
vector< vector <Point> > contours1; // Vector for storing contour
vector< Vec4i > hierarchy1;
//Create input sample by contour finding and cropping
findContours( con1, contours1, hierarchy1,CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE );
Mat dst1(src1.rows,src1.cols,CV_8UC3,Scalar::all(0));
string result;
for( int i = 0; i< contours1.size(); i=hierarchy1[i][0] ) // iterate through each contour for first hierarchy level .
{
Rect r= boundingRect(contours1[i]);
Mat ROI = thr1(r);
Mat tmp1, tmp2;
resize(ROI,tmp1, Size(10,10), 0,0,INTER_LINEAR );
tmp1.convertTo(tmp2,CV_32FC1);
Mat bestLabels;
float p=knn -> findNearest(tmp2.reshape(1,1),4, bestLabels);
char name[4];
sprintf(name,"%d",(int)p);
cout << "num = " << (int)p;
result = result + to_string((int)p);
putText( dst1,name,Point(r.x,r.y+r.height) ,0,1, Scalar(0, 255, 0), 2, 8 );
}
imwrite("dest.jpg",dst1);
return result ;
}
/*
* from the digits detected, it returns a speed limit if it is detected correctly, -1 otherwise
*/
int getSpeedLimit(string numbers){
if ((numbers.find("30") != std::string::npos) || (numbers.find("03") != std::string::npos)) {
return 30;
}
if ((numbers.find("50") != std::string::npos) || (numbers.find("05") != std::string::npos)) {
return 50;
}
if ((numbers.find("80") != std::string::npos) || (numbers.find("08") != std::string::npos)) {
return 80;
}
if ((numbers.find("70") != std::string::npos) || (numbers.find("07") != std::string::npos)) {
return 70;
}
if ((numbers.find("90") != std::string::npos) || (numbers.find("09") != std::string::npos)) {
return 90;
}
if ((numbers.find("100") != std::string::npos) || (numbers.find("001") != std::string::npos)) {
return 100;
}
if ((numbers.find("130") != std::string::npos) || (numbers.find("031") != std::string::npos)) {
return 130;
}
return -1;
}
/*
* load all the image in the file with the path hard coded below
*/
vector<Mat> loadAllImage(){
vector<cv::String> fn;
glob("/Users/giuliopettenuzzo/Desktop/T1/dataset/*.jpg", fn, false);
vector<Mat> images;
size_t count = fn.size(); //number of png files in images folder
for (size_t i=0; i<count; i++)
images.push_back(imread(fn[i]));
return images;
}
maybe you should try implementing the ransac algorithm, if you are using color images, migt be a good idea (if you are in europe) to get the red channel only since the speed limits are surrounded by a red cricle (or a thin white i think also).
For that you need to filter the image to get the edges, (canny filter).
Here are some useful links:
OpenCV detect partial circle with noise
https://hal.archives-ouvertes.fr/hal-00982526/document
Finally for the numbers detection i think its ok. Other approach is to use something like Viola-Jones algorithm to detect the signals, with pretrained existing models... It's up to you!
I have used cv::calcOpticalFlowFarneback to calculate the optical flow in the current and previous frames of video with ofxOpenCv in openFrameworks.
I then draw the video with the optical flow field on top and then draw vectors showing the flow of motion in areas that are above a certain threshold.
What I want to do now is create a bounding box of those areas of motion and get the centroid and store that x,y position in a variable for tracking.
This is how I'm drawing my flow field if that helps.
if (calculatedFlow){
ofSetColor( 255, 255, 255 );
video.draw( 0, 0);
int w = gray1.width;
int h = gray1.height;
//1. Input images + optical flow
ofPushMatrix();
ofScale( 4, 4 );
//Optical flow
float *flowXPixels = flowX.getPixelsAsFloats();
float *flowYPixels = flowY.getPixelsAsFloats();
ofSetColor( 0, 0, 255 );
for (int y=0; y<h; y+=5) {
for (int x=0; x<w; x+=5) {
float fx = flowXPixels[ x + w * y ];
float fy = flowYPixels[ x + w * y ];
//Draw only long vectors
if ( fabs( fx ) + fabs( fy ) > .5 ) {
ofDrawRectangle( x-0.5, y-0.5, 1, 1 );
ofDrawLine( x, y, x + fx, y + fy );
}
}
}
}
For what you are asking, there is no simple answer. Here is a suggested solution. It involves multiple steps, but if your domain is simple enough, you could simplify this.
For each frame,
Calculate flow as two images flow_x,flow_y comparing current frame with previous frame using farneback method.(you seem to be doing this, in your code)
Translate the flow images into an hsv image, where the hue component of each pixel denotes the angle of the flow atan2(flow_y/flow_x) and value component of each pixel denotes the magnitude of the flow sqrt(flow_x\*\*2 + flow_y\*\*2)
In the above step, use your thresholding mechanism to suppress flow- pixels (make them black) whose magnitude falls below a certain threshold.
Segment the HSV image based on color ranges. You could use apriori information about your domain, or you could take histogram of hue components and identify prominent ranges of hues to classify pixels. As a result of this step, you can assign a class to each pixel.
Separate the pixels belonging to each class into multiple images. All pixels belonging to segmented class-1 will goto image-1, all pixels belonging to segmented class-2 will go to image-2 etc. Now each segmented image contains pixels in the HSV image, in a particular color range.
Transform each segmented image as a black and white image, and using opencv's morphological operations split into multiple regions using connectivity. (connected components).
Find the centroid of each connected component.
I found this reference to be helpful in this context.
I resolved my problem by creating new image from my flowX, and flowY. This was done by adding flowX and flowY to a new CV FloatImage.
flowX +=flowY;
flowXY = flowX;
Then I was able to do the contour finding from the pixels of the newly created image and then I could store all the centroids of all the blobs of movement.
Like so:
contourFinder.findContours( mask, 10, 10000, 20, false );
//Storing the objects centers with contour finder.
vector<ofxCvBlob> &blobs = contourFinder.blobs;
int n = blobs.size(); //Get number of blobs
obj.resize( n ); //Resize obj array
for (int i=0; i<n; i++) {
obj[i] = blobs[i].centroid; //Fill obj array
}
I initially noticed that movement was only being tracked in one direction in the x-axis and y-axis because of negative values. I resolved this by changing the calculation for my optical flow by calling the abs() function in cv::Mat.
Mat img1( gray1.getCvImage() ); //Create OpenCV images
Mat img2( gray2.getCvImage() );
Mat flow;
calcOpticalFlowFarneback( img1, img2, flow, 0.7, 3, 11, 5, 5, 1.1, 0 );
//Split flow into separate images
vector<Mat> flowPlanes;
Mat newFlow;
newFlow = abs(flow); //abs flow so values are absolute. Allows tracking in both directions.
split( newFlow, flowPlanes );
//Copy float planes to ofxCv images flowX and flowY
IplImage iplX( flowPlanes[0] );
flowX = &iplX;
IplImage iplY( flowPlanes[1] );
flowY = &iplY;
I am using openCV to do some dense feature extraction. For example, The code
DenseFeatureDetector detector(12.f, 1, 0.1f, 10);
I don't really understand the parameters in the above constructor. What does it mean ? Reading the opencv documentation about it does not help much either. In the documentation the arguments are:
DenseFeatureDetector( float initFeatureScale=1.f, int featureScaleLevels=1,
float featureScaleMul=0.1f,
int initXyStep=6, int initImgBound=0,
bool varyXyStepWithScale=true,
bool varyImgBoundWithScale=false );
What are they supposed to do ? i.e. what is the meaning of scale, initFeatureScale, featureScaleLevels etc ? How do you know the grid or grid spacing etc for the dense sampling.
I'm using opencv with dense detector too and I think I can help you with something. I'm not sure about what I'm going to say but the experience learnt me that.
When I use Dense detector I pass there the gray scale image. The detector makes some threshold filters where opencv uses a gray minimum value with is used to transform the image. The píxels where have a more gray level than the threshold will be made like black points and the others are white point. This action is repeated in a loop where the threshold will be bigger and bigger. So the parameter initFeatureScale determine the first threshold you put to do this loop, the featureScaleLevels parameter indicates how much this threshold is bigger between one loop iteration and the next one and featureScaleMul is a multiply factor to calculate the next threshold.
Anyway if you are looking for a your optimal parameters to use Dense detector to detect any particular points You would offer a program I made for that. It is liberated in github. This is a program where you can test some detectors (Dense detector is one of them) and check how it works if you change their parameters thanks to a user interface that let you change the detectors parameters as long as you are executing the program. You will see how the detected points will be change. For try it just click on the link, and download the files. You might need almost all the files to execute the program.
Apologies in advance, i'm predominantly using Python so i'll avoid embarressing myself by referring to C++.
DenseFeatureDetector populates a vector with KeyPoints to pass to compute feature descriptors. These keypoints have a point vector and their scale set. In the documentation, scale is the pixel radius of the keypoint.
KeyPoints are evenly spaced across the width and height of the image matrix passed to DenseFeatureVector.
Now to the arguments:
initFeatureScale
Set the initial KeyPoint feature radius in pixels (as far as I am aware this has no effect)
featureScaleLevels
Number of scales overwhich we wish to make keypoints
featureScaleMuliplier
Scale adjustment for initFeatureScale over featureScaleLevels, this scale adjustment can also be applied to the border (initImgBound) and the step size (initxystep). So when we set featureScaleLevels>1 then this multiplier will be applied to successive scales, to adjust feature scale, step and the boundary around the image.
initXyStep
moving column and row step in pixels. Self explanatory I hope.
initImgBound
row/col bounding region to ignore around the image (pixels), So a 100x100 image, with an initImgBound of 10, would create keypoints in the central 80x80 portion of the image.
varyXyStepWithScale
Boolean, if we have multiple featureScaleLevels do we want to adjust the step size using featureScaleMultiplier.
varyImgBoundWithScale
Boolean,as varyXyStepWithScale, but applied to the border.
Here is the DenseFeatureDetector source code from detectors.cpp in the OpenCV 2.4.3 source, which will probably explain better than my words:
DenseFeatureDetector::DenseFeatureDetector( float _initFeatureScale, int _featureScaleLevels,
float _featureScaleMul, int _initXyStep,
int _initImgBound, bool _varyXyStepWithScale,
bool _varyImgBoundWithScale ) :
initFeatureScale(_initFeatureScale), featureScaleLevels(_featureScaleLevels),
featureScaleMul(_featureScaleMul), initXyStep(_initXyStep), initImgBound(_initImgBound),
varyXyStepWithScale(_varyXyStepWithScale), varyImgBoundWithScale(_varyImgBoundWithScale)
{}
void DenseFeatureDetector::detectImpl( const Mat& image, vector<KeyPoint>& keypoints, const Mat& mask ) const
{
float curScale = static_cast<float>(initFeatureScale);
int curStep = initXyStep;
int curBound = initImgBound;
for( int curLevel = 0; curLevel < featureScaleLevels; curLevel++ )
{
for( int x = curBound; x < image.cols - curBound; x += curStep )
{
for( int y = curBound; y < image.rows - curBound; y += curStep )
{
keypoints.push_back( KeyPoint(static_cast<float>(x), static_cast<float>(y), curScale) );
}
}
curScale = static_cast<float>(curScale * featureScaleMul);
if( varyXyStepWithScale ) curStep = static_cast<int>( curStep * featureScaleMul + 0.5f );
if( varyImgBoundWithScale ) curBound = static_cast<int>( curBound * featureScaleMul + 0.5f );
}
KeyPointsFilter::runByPixelsMask( keypoints, mask );
}
You might expect a call to compute would calculate additional KeyPoint characteristics using the relevant keypoint detection algorithm (e.g. angle), based on the KeyPoints generated by DenseFeatureDetector. Unfortunately this isn't the case for SIFT under Python - i've not looked at at the other feature detectors, nor looked at the behaviour in C++.
Also note that DenseFeatureDetector is not in OpenCV 3.2 (unsure at which release it was removed).
The plan
My project is able to capture the bitmap of a target window and convert it into an IplImage, and then display that image in a cvNamedWindow, where further processing can take place.
For the sake of testing, I've loaded an image into MSPaint like so:
The user is then allowed to click and drag the mouse over any number of pixels within the image to create a vector<cv::Scalar_<BYTE>> containing these RGB color values.
Then, with the help of ColorRGBToHLS(), this array is then sorted from left to right by hue, like so:
// PixelColor is just a cv::Scalar_<BYTE>
bool comparePixelColors( PixelColor& pc1, PixelColor& pc2 ) {
WORD h1 = 0, h2 = 0;
WORD s1 = 0, s2 = 0;
WORD l1 = 0, l2 = 0;
ColorRGBToHLS(RGB(pc1.val[2], pc1.val[1], pc1.val[0]), &h1, &l1, &s1);
ColorRGBToHLS(RGB(pc2.val[2], pc2.val[1], pc2.val[0]), &h2, &l2, &s2);
return ( h1 < h2 );
}
//..(elsewhere in code)
std::sort(m_colorRange.begin(), m_colorRange.end(), comparePixelColors);
...and then shown in a new cvNamedWindow, which looks something like:
The problem
Now, the idea here is to create a binary threshold image (or "mask") where this selected range of colors become white, and the rest of the source image becomes black... similar to the way the "Select By Color" tool operates in GIMP, or the "magic wand" tool works in Photoshop... except instead of limiting ourselves to a specific contoured selection, we are literally operating on the image as a whole.
I've read into cvInRangeS, and it sounds like it's precisely what I need.
However, and for whatever reason, the thresholded image always ends up being totally black...
VOID ShowThreshedImage(const IplImage* src, const PixelColor& min, const PixelColor& max)
{
IplImage* imgHSV = cvCreateImage(cvGetSize(src), IPL_DEPTH_8U, 3);
cvCvtColor(src, imgHSV, CV_RGB2HLS);
cvNamedWindow("T1");
cvShowImage("T1", imgHSV); // <-- Shows up like the image below
IplImage* imgThreshed = cvCreateImage(cvGetSize(src), IPL_DEPTH_8U, 1);
cvInRangeS(imgHSV, min, max, imgThreshed);
cvNamedWindow("T2");
cvShowImage("T2", imgThreshed); // <-- SHOWS UP PITCH BLACK!
}
This is what the "T1" window ends up looking like (which I suppose is correct?):
Bearing in mind that because the color range vector is stored as RGB (and that OpenCV internally reverses this order into BGR), I have converted the min/max values into HLS before passing them into ShowThreshedImage() like so:
CvScalar rgbPixelToHSV(const PixelColor& pixelColor)
{
WORD h = 0, s = 0, l = 0;
ColorRGBToHLS(RGB(pixelColor.val[2], pixelColor.val[1], pixelColor.val[0]), &h, &l, &s);
return PixelColor(h, s, l);
}
//...(elsewhere in code)
if(m_colorRange.size() > 0)
m_minHSV = rgbPixelToHSV(m_colorRange[0]);
if(m_colorRange.size() > 1)
m_maxHSV = rgbPixelToHSV(m_colorRange[m_colorRange.size() - 1]);
ShowThreshedImage(m_imgSrc, m_minHSV, m_maxHSV);
...But even without this conversion and simply passing RGB values instead, the result is still an entirely black image. I've even tried manually plugging in certain min/max values, and the best result I got was a few lit pixels (albeit, the incorrect ones).
The question:
What am I doing wrong here?
Is there something that I don't understand about the cvInRangeS method?
Do I need to step through each and every single color in order to properly threshold the selected range out of the source image?
Are there any other ways of accomplishing this?
Thank you for your time.
Update:
I have discovered that cvInRangeS expects all values for min to be lower than that of max. But when a range of colors are selected, there doesn't appear to be any guarantee that this will be the case, often resulting in a black thresholded image.
And swapping values to enforce this rule may result in unwanted colors within the new range (in some cases, this could include all colors instead of just the desired ones).
So I suppose the real question here would be:
"How do you segment an array of RGB colors, and use them to threshold an image?"
Your problem might be caused by the simple fact that OpenCV maintains a different range for values than for instanc MSpaint. For instance the HSV color space in paint is 360,100,100 while in OpenCV it is 180,255,255. Check your input values in openCV bu outputting the pixel value when clicking on a certain pixel. inRangeS should be the correct tool for the job. That said, in RGB it should work just as well because the range is the same as in paint.
cvSetMouseCallback("MyWindow", mouseEvent, (void*) &myImage);
void mouseEvent(int evt, int x, int y, int flags, void *param) {
if (evt == CV_EVENT_LBUTTONDOWN) {
printf("%d %d\n", x, y);
IplImage* imageSource = (IplImage*) param;
Mat image(imageSource);
cout << "Image cols " << image.cols << " rows " << image.rows << endl;
Mat imageHSV;
cvtColor(image, imageHSV, CV_BGR2HSV);
Vec3b p = imageHSV.at<Vec3b > (y, x);
char text[20];
sprintf(text, "H=%d, S=%d, V=%d", p[0], p[1], p[2]);
cout << text << endl;
}
}
When you have an idea about the HSV values by using this values, use these as lower and upper bounds for the in range method after converting the image to HSV by using cvtColor(image, imageHSV, CV_BGR2HSV). That should make you able to get the desired result.
It is not going to be too inefficient to iterate through every pixel. That is exactly what cvInRangeS would do - see this: http://docs.opencv.org/doc/tutorials/core/how_to_scan_images/how_to_scan_images.html#the-efficient-way (I do this all the time and it is instantaneous for reasonable size images).
I would treat the color in the array as points in 3D RGB space. Find two color points that specify a prism that includes all other color points. That is just finding the min and max of all r,g, and b values. If this idea is not ok then you might have to check every image pixel against every pixel in the vector.
Then for each pixel in the image: result is black if (pixel.r < min.r) || (pixel.r > max.r) || (pixel.g < min.g) || (pixel.g > max.g) || (pixel.b < min.b) || (pixel.b > max.b), result is the pixel value otherwise.
This all should be very easy, so long as it is actually what you want.