face and eye detection not working properly - c++

using this code below I want to detect face and eyes from a video,
the code runs without error but the video and detection result is not displayed when I run it , what is the problem ?
I tried it using images it works fine on some images and other images just detect faces .
#include <opencv2\opencv.hpp>
using namespace cv;
using namespace std;
int main()
{
float EYE_SX = 0.16f;
float EYE_SY = 0.26f;
float EYE_SW = 0.30f;
float EYE_SH = 0.28f;
Mat dest, gray,frame;
VideoCapture capture("m.mp4");
CascadeClassifier detector, eyes_detector;
if (!capture.isOpened()) // check if we succeeded
return -1;
if(!detector.load("haarcascade_frontalface_alt2.xml"))
cout << "No se puede abrir clasificador." << endl;
if(!eyes_detector.load("haarcascade_eye_tree_eyeglasses.xml"))
cout << "No se puede abrir clasificador para los ojos." << endl;
for (;;)
{
capture >> frame;
cvtColor(frame, gray, CV_BGR2GRAY);
equalizeHist(gray, dest);
vector<Rect> rect;
detector.detectMultiScale(dest, rect);
for (Rect rc : rect)
{
rectangle(frame,
Point(rc.x, rc.y),
Point(rc.x + rc.width, rc.y + rc.height),
CV_RGB(0, 255, 0), 2);
}
if (rect.size() > 0)
{
Mat face = dest(rect[0]).clone();
vector<Rect> leftEye, rightEye;
int leftX = cvRound(face.cols * EYE_SX);
int topY = cvRound(face.rows * EYE_SY);
int widthX = cvRound(face.cols * EYE_SW);
int heightY = cvRound(face.rows * EYE_SH);
int rightX = cvRound(face.cols * (1.0 - EYE_SX - EYE_SW));
Mat topLeftOfFace = face(Rect(leftX, topY, widthX, heightY));
Mat topRightOfFace = face(Rect(rightX, topY, widthX, heightY));
eyes_detector.detectMultiScale(topLeftOfFace, leftEye);
eyes_detector.detectMultiScale(topRightOfFace, rightEye);
if ((int)leftEye.size() > 0)
{
rectangle(frame,
Point(leftEye[0].x + leftX + rect[0].x, leftEye[0].y + topY + rect[0].y),
Point(leftEye[0].width + widthX + rect[0].x - 5, leftEye[0].height + heightY + rect[0].y),
CV_RGB(0, 255, 255), 2);
}
if ((int)rightEye.size() > 0)
{
rectangle(frame,
Point(rightEye[0].x + rightX + leftX + rect[0].x, rightEye[0].y + topY + rect[0].y),
Point(rightEye[0].width + widthX + rect[0].x + 5, rightEye[0].height + heightY + rect[0].y),
CV_RGB(0, 255, 255), 2);
}
}
}
imshow("Ojos", frame);
waitKey(0);
return 1;
}

So, right now, the imshow("Ojos", frame); and the waitKey(0); are only getting called right before the program ends. That's fine for images, but not for video, since you want it to happen once per frame.
If you move it up a few lines, inside that for loop (basically, just put the bracket that's one line up from it one line below it), it should start working better for videos.
However, there are a couple of other things you might want to tweak in the code - it's only going to show one right eye and one left eye. This is normally what you want to happen, but if you have false positives you might end up with somebody's hair or skin being labeled an eye, and you're none-the-wiser as to how it happens. I'd reccommend displaying all of the items in the lefteye and righteye vectors. This can be done simply by replacing those if statements (if (int)rightEye.size() > 0, etc) with
for (int i = 0; i < rightEye.size(); i++) {
rectangle(frame,
Point(rightEye[i].x + rightX + leftX + rect[i].x,
rightEye[i].y + topY + rect[i].y),
Point(rightEye[i].width + widthX + rect[i].x + 5,
rightEye[i].height + heightY + rect[i].y),
CV_RGB(0, 255, 255), 2);
}
If you're having problems with false positives or negatives, you might want to look into tweaking the parameters on detectMultiscale - right now, you're leaving everything to the defaults. Multiscale has a number of parameters that can be put in. Image and objects you already have, but there are other, such as:
scaleFactor – Parameter specifying how much the image size is
reduced at each image scale. The default is 1.1. The bigger that is, the larger a scale will happen, which will take the cascade less time but there will be more false positives.
minNeighbors – Parameter specifying how
many neighbors each candidate rectangle should have to retain it. Default is 3. The bigger that is, the more neighbors it'll search for, resulting in less false positives, but it'll take longer. Tweak it too high, and it'll start giving false negatives.
flags – Parameter with the same meaning for an old cascade as in the
function cvHaarDetectObjects. It is not used for a new cascade. Default's zero, just leave it at zero for the most part.
minSize – Minimum possible object size. Objects smaller than that are
ignored. Default is Size(0,0). I tend to bump it up just a little. Again, bigger means it's faster and less false positives, but too big will skip over whatever you're looking for.
maxSize – Maximum possible object size. Objects larger than
that are ignored. Default is I believe the max size of the image passed in. I tend to limit it to smaller than that. Smaller on this means it's faster and less false positives, but too small will skip over whatever you're looking for.
cascade_name.detectMultiScale( frame_gray, frame_rectangle, 1.1, 2, 0, Size(30, 30) ); , as an example.

Related

How to obtain from a musical score the different pentagrams(openCV)

For the detection and processing of a picture of a score and then of each of the staves, I proceed as follows:
I take the picture and we stay with the horizontal components to stay alone with the lines of the staff, we eliminate the unnecessary components and noise and we mark the lines with HoughLinesP we verify that the lines are parallel to each other to subsequently relate by crop the different cuts to obtain the different staves of the imgen.
Finally my question is, do you think of some other simpler way, or more accurate for the detection and separation of the different pentagrams?
Attached image so you can see how it would be once processed for further cutting.
Now i need to obtain de diferent pentagrams.
image pre processing
image post processing
int offset_x = 50;
int offset_y = 50;
cv::Rect roi;
roi.x = offset_x;
roi.y = offset_y;
roi.width = horizontal.size().width - (offset_x * 2);
roi.height = horizontal.size().height - (offset_y * 2);
cv::Mat crop = horizontal(roi);
namedWindow("crop", WINDOW_NORMAL);
cv::imshow("crop", crop);
cv::waitKey(0);
//Use HougLines to detect de diferents lines of each pentagram
HoughLinesP(crop, lines, 1, CV_PI / 180, 80, 200, 10);
//We paint it
for (size_t i = 0; i < lines.size(); i++) // Draw the lines
{
line(crop, Point(lines[i][0], lines[i][1]),Point(lines[i][2], lines[i][3]), Scalar(255, 255, 255), 3, 3);
}
// the final result
namedWindow("Detected Lines", WINDOW_NORMAL);
imshow("Detected Lines", crop);
waitKey(0);
Now I need to take out the different staves in order to extract them from the initial image. But I can not think of how to do it ....
Here is a tutorial that does exactly what your first algorithm does: https://docs.opencv.org/3.4.0/dd/dd7/tutorial_morph_lines_detection.html
I can imagine that it is a little bit simpler than your approach. Also, the opposite case is shown in the example where you only keep your scores and refine them, supposedly what you want to do.

OpenCV and C++ - Shape and road signs detection

I have to write a program that detect 3 types of road signs (speed limit, no parking and warnings). I know how to detect a circle using HoughCircles but I have several images and the parameters for HoughCircles are different for each image. There's a general way to detect circles without changing parameters for each image?
Moreover I need to detect triangle (warning signs) so I'm searching for a general shape detector. Have you any suggestions/code that can help me in this task?
Finally for detect the number on speed limit signs I thought to use SIFT and compare the image with some templates in order to identify the number on the sign. Could it be a good approach?
Thank you for the answer!
I know this is a pretty old question but I had been through the same problem and now I show you how I solved it.
The following images show some of the most accurate results that are displayed by the opencv program.
In the following images the street signs detected are circled with three different colors that distinguish the three kinds of street signs (warning, no parking, speed limit).
Red for warning signs
Blue for no parking signs
Fuchsia for speed limit signs
The speed limit value is written in green above the speed limit signs
[![example][1]][1]
[![example][2]][2]
[![example][3]][3]
[![example][4]][4]
As you can see the program performs quite well, it is able to detect and distinguish the three kinds of sign and to recognize the speed limit value in case of speed limit signs. Everything is done without computing too many false positives when, for instance, in the image there are some signs that do not belong to one of the three categories.
In order to achieve this result the software computes the detection in three main steps.
The first step involves a color based approach where the red objects in the image are detected and their region are extract to be analyzed. This step is particularly useful in order to prevent the detection of false positives, because only a small part of the image is processed.
The second step works with a machine learning algorithm: in particular we use a Cascade Classifier to compute the detection. This operation firstly requires to train the classifiers and on a later stage to use them to detect the signs.
In the last step the speed limit values inside the speed limit signs are read, also in this case through a machine learning algorithm but using the k-nearest neighbor algorithm.
Now we are going to see in detail each step.
COLOR BASED STEP
Since the street signs are always circled by a red frame, we can afford to take out and analyze only the regions where the red objects are detected.
In order to select the red objects, we consider all the ranges of the red color: even if this may produce some false positives, they will be easily discarded in the next steps.
inRange(image, Scalar(0, 70, 50), Scalar(10, 255, 255), mask1);
inRange(image, Scalar(170, 70, 50), Scalar(180, 255, 255), mask2);
In the image below we can see an example of the red objects detected with this method.
After having found the red pixels we can gather them to find the regions using a clustering algorithm, I use the method
partition(<#_ForwardIterator __first#>, _ForwardIterator __last, <#_Predicate __pred#>)
After the execution of this method we can save all the points in the same cluster in a vector (one for each cluster) and extract the bounding boxes which represent the
regions to be analyzed in the next step.
HAAR CASCADE CLASSIFIERS FOR SIGNS DETECTION
This is the real detection step where the street signs are detected. In order to perform a cascade classifier the first step consist in building a dataset of positives and negatives images. Now I explain how I have built my own datasets of images.
The first thing to note is that we need to train three different Haar cascades in order to distinguish between the three kind of signs that we have to detect, hence we must repeat the following steps for each of the three kinds of sign.
We need two datasets: one for the positive samples (which must be a set of images that contains the road signs that we are going to detect) and another one for the negative samples which can be any kind of image without street signs.
After collecting a set of 100 images for the positive samples and a set of 200 images for the negatives in two different folders, we need to write two text files:
Signs.info which contains a list of file names like the one below,
one for each positive sample in the positive folder.
pos/image_name.png 1 0 0 50 45
Here, the numbers after the name represent respectively the number
of street signs in the image, the coordinate of the upper left
corner of the street sign, his height and his width.
Bg.txt which contains a list of file names like the one below, one
for each sign in the negative folder.
neg/street15.png
With the command line below we generate the .vect file which contains all the information that the software retrieves from the positive samples.
opencv_createsamples -info sign.info -num 100 -w 50 -h 50 -vec signs.vec
Afterwards we train the cascade classifier with the following command:
opencv_traincascade -data data -vec signs.vec -bg bg.txt -numPos 60 -numNeg 200 -numStages 15 -w 50 -h 50 -featureType LBP
where the number of stages indicates the number of classifiers that will be generated in order to build the cascade.
At the end of this process we gain a file cascade.xml which will be used from the CascadeClassifier program in order to detect the objects in the image.
Now we have trained our algorithm and we can declare a CascadeClassifier for each kind of street sign, than we detect the signs in the image through
detectMultiScale(<#InputArray image#>, <#std::vector<Rect> &objects#>)
this method creates a Rect around each object that has been detected.
It is important to note that exactly as every machine learning algorithm, in order to perform well, we need a large number of samples in the dataset. The dataset that I have built, is not extremely large, thus in some situations it is not able to detect all the signs. This mostly happens when a small part of the street sign is not visible in the image like in the warning sign below:
I have expanded my dataset up to the point where I have obtained a fairly accurate result without
too many errors.
SPEED LIMIT VALUE DETECTION
Like for the street signs detection also here I used a machine learning algorithm but with a different approach. After some work, I realized that an OCR (tesseract) solution does not perform well, so I decided to build my own ocr software.
For the machine learning algorithm I took the image below as training data which contains some speed limit values:
The amount of training data is small. But, since in speed limit signs all letters have the same font, it is not a huge problem.
To prepare the data for training, I made a small code in OpenCV. It does the following things:
It loads the image on the left;
It selects the digits (obviously by contour finding and applying constraints on area and height of letters to avoid false detections).
It draws the bounding rectangle around one letter and it waits for the key to be manually pressed. This time the user presses the digit key corresponding to the letter in box by himself.
Once the corresponding digit key is pressed, it saves 100 pixel values in an array and the correspondent manually entered digit in another array.
Eventually it saves both the arrays in separate txt files.
Following the manual digit classification all the digits in the train data( train.png) are manually labeled, and the image will look like the one below.
Now we enter into training and testing part.
For training we do as follows:
Load the txt files we already saved earlier
Create an instance of classifier that we are going to use ( KNearest)
Then we use KNearest.train function to train the data
Now the detection:
We load the image with the speed limit sign detected
Process the image as before and extract each digit using contour methods
Draw bounding box for it, then resize to 10x10, and store its pixel values in an array as done earlier.
Then we use KNearest.find_nearest() function to find the nearest item to the one we gave.
And it recognizes the correct digit.
I tested this little OCR on many images, and just with this small dataset I have obtained an accuracy of about 90%.
CODE
Below I post all my openCv c++ code in a single class, following my instruction you should be able to achive my result.
#include "opencv2/objdetect/objdetect.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include <iostream>
#include <stdio.h>
#include <cmath>
#include <stdlib.h>
#include "opencv2/core/core.hpp"
#include "opencv2/highgui.hpp"
#include <string.h>
#include <opencv2/ml/ml.hpp>
using namespace std;
using namespace cv;
std::vector<cv::Rect> getRedObjects(cv::Mat image);
vector<Mat> detectAndDisplaySpeedLimit( Mat frame );
vector<Mat> detectAndDisplayNoParking( Mat frame );
vector<Mat> detectAndDisplayWarning( Mat frame );
void trainDigitClassifier();
string getDigits(Mat image);
vector<Mat> loadAllImage();
int getSpeedLimit(string speed);
//path of the haar cascade files
String no_parking_signs_cascade = "/Users/giuliopettenuzzo/Desktop/cascade_classifiers/no_parking_cascade.xml";
String speed_signs_cascade = "/Users/giuliopettenuzzo/Desktop/cascade_classifiers/speed_limit_cascade.xml";
String warning_signs_cascade = "/Users/giuliopettenuzzo/Desktop/cascade_classifiers/warning_cascade.xml";
CascadeClassifier speed_limit_cascade;
CascadeClassifier no_parking_cascade;
CascadeClassifier warning_cascade;
int main(int argc, char** argv)
{
//train the classifier for digit recognition, this require a manually train, read the report for more details
trainDigitClassifier();
cv::Mat sceneImage;
vector<Mat> allImages = loadAllImage();
for(int i = 0;i<=allImages.size();i++){
sceneImage = allImages[i];
//load the haar cascade files
if( !speed_limit_cascade.load( speed_signs_cascade ) ){ printf("--(!)Error loading\n"); return -1; };
if( !no_parking_cascade.load( no_parking_signs_cascade ) ){ printf("--(!)Error loading\n"); return -1; };
if( !warning_cascade.load( warning_signs_cascade ) ){ printf("--(!)Error loading\n"); return -1; };
Mat scene = sceneImage.clone();
//detect the red objects
std::vector<cv::Rect> allObj = getRedObjects(scene);
//use the three cascade classifier for each object detected by the getRedObjects() method
for(int j = 0;j<allObj.size();j++){
Mat img = sceneImage(Rect(allObj[j]));
vector<Mat> warningVec = detectAndDisplayWarning(img);
if(warningVec.size()>0){
Rect box = allObj[j];
}
vector<Mat> noParkVec = detectAndDisplayNoParking(img);
if(noParkVec.size()>0){
Rect box = allObj[j];
}
vector<Mat> speedLitmitVec = detectAndDisplaySpeedLimit(img);
if(speedLitmitVec.size()>0){
Rect box = allObj[j];
for(int i = 0; i<speedLitmitVec.size();i++){
//get speed limit and skatch it in the image
int digit = getSpeedLimit(getDigits(speedLitmitVec[i]));
if(digit > 0){
Point point = box.tl();
point.y = point.y + 30;
cv::putText(sceneImage,
"SPEED LIMIT " + to_string(digit),
point,
cv::FONT_HERSHEY_COMPLEX_SMALL,
0.7,
cv::Scalar(0,255,0),
1,
cv::CV__CAP_PROP_LATEST);
}
}
}
}
imshow("currentobj",sceneImage);
waitKey(0);
}
}
/*
* detect the red object in the image given in the param,
* return a vector containing all the Rect of the red objects
*/
std::vector<cv::Rect> getRedObjects(cv::Mat image)
{
Mat3b res = image.clone();
std::vector<cv::Rect> result;
cvtColor(image, image, COLOR_BGR2HSV);
Mat1b mask1, mask2;
//ranges of red color
inRange(image, Scalar(0, 70, 50), Scalar(10, 255, 255), mask1);
inRange(image, Scalar(170, 70, 50), Scalar(180, 255, 255), mask2);
Mat1b mask = mask1 | mask2;
Mat nonZeroCoordinates;
vector<Point> pts;
findNonZero(mask, pts);
for (int i = 0; i < nonZeroCoordinates.total(); i++ ) {
cout << "Zero#" << i << ": " << nonZeroCoordinates.at<Point>(i).x << ", " << nonZeroCoordinates.at<Point>(i).y << endl;
}
int th_distance = 2; // radius tolerance
// Apply partition
// All pixels within the radius tolerance distance will belong to the same class (same label)
vector<int> labels;
// With lambda function (require C++11)
int th2 = th_distance * th_distance;
int n_labels = partition(pts, labels, [th2](const Point& lhs, const Point& rhs) {
return ((lhs.x - rhs.x)*(lhs.x - rhs.x) + (lhs.y - rhs.y)*(lhs.y - rhs.y)) < th2;
});
// You can save all points in the same class in a vector (one for each class), just like findContours
vector<vector<Point>> contours(n_labels);
for (int i = 0; i < pts.size(); ++i){
contours[labels[i]].push_back(pts[i]);
}
// Get bounding boxes
vector<Rect> boxes;
for (int i = 0; i < contours.size(); ++i)
{
Rect box = boundingRect(contours[i]);
if(contours[i].size()>500){//prima era 1000
boxes.push_back(box);
Rect enlarged_box = box + Size(100,100);
enlarged_box -= Point(30,30);
if(enlarged_box.x<0){
enlarged_box.x = 0;
}
if(enlarged_box.y<0){
enlarged_box.y = 0;
}
if(enlarged_box.height + enlarged_box.y > res.rows){
enlarged_box.height = res.rows - enlarged_box.y;
}
if(enlarged_box.width + enlarged_box.x > res.cols){
enlarged_box.width = res.cols - enlarged_box.x;
}
Mat img = res(Rect(enlarged_box));
result.push_back(enlarged_box);
}
}
Rect largest_box = *max_element(boxes.begin(), boxes.end(), [](const Rect& lhs, const Rect& rhs) {
return lhs.area() < rhs.area();
});
//draw the rects in case you want to see them
for(int j=0;j<=boxes.size();j++){
if(boxes[j].area() > largest_box.area()/3){
rectangle(res, boxes[j], Scalar(0, 0, 255));
Rect enlarged_box = boxes[j] + Size(20,20);
enlarged_box -= Point(10,10);
rectangle(res, enlarged_box, Scalar(0, 255, 0));
}
}
rectangle(res, largest_box, Scalar(0, 0, 255));
Rect enlarged_box = largest_box + Size(20,20);
enlarged_box -= Point(10,10);
rectangle(res, enlarged_box, Scalar(0, 255, 0));
return result;
}
/*
* code for detect the speed limit sign , it draws a circle around the speed limit signs
*/
vector<Mat> detectAndDisplaySpeedLimit( Mat frame )
{
std::vector<Rect> signs;
vector<Mat> result;
Mat frame_gray;
cvtColor( frame, frame_gray, CV_BGR2GRAY );
//normalizes the brightness and increases the contrast of the image
equalizeHist( frame_gray, frame_gray );
//-- Detect signs
speed_limit_cascade.detectMultiScale( frame_gray, signs, 1.1, 3, 0|CV_HAAR_SCALE_IMAGE, Size(30, 30) );
cout << speed_limit_cascade.getFeatureType();
for( size_t i = 0; i < signs.size(); i++ )
{
Point center( signs[i].x + signs[i].width*0.5, signs[i].y + signs[i].height*0.5 );
ellipse( frame, center, Size( signs[i].width*0.5, signs[i].height*0.5), 0, 0, 360, Scalar( 255, 0, 255 ), 4, 8, 0 );
Mat resultImage = frame(Rect(center.x - signs[i].width*0.5,center.y - signs[i].height*0.5,signs[i].width,signs[i].height));
result.push_back(resultImage);
}
return result;
}
/*
* code for detect the warning sign , it draws a circle around the warning signs
*/
vector<Mat> detectAndDisplayWarning( Mat frame )
{
std::vector<Rect> signs;
vector<Mat> result;
Mat frame_gray;
cvtColor( frame, frame_gray, CV_BGR2GRAY );
equalizeHist( frame_gray, frame_gray );
//-- Detect signs
warning_cascade.detectMultiScale( frame_gray, signs, 1.1, 3, 0|CV_HAAR_SCALE_IMAGE, Size(30, 30) );
cout << warning_cascade.getFeatureType();
Rect previus;
for( size_t i = 0; i < signs.size(); i++ )
{
Point center( signs[i].x + signs[i].width*0.5, signs[i].y + signs[i].height*0.5 );
Rect newRect = Rect(center.x - signs[i].width*0.5,center.y - signs[i].height*0.5,signs[i].width,signs[i].height);
if((previus & newRect).area()>0){
previus = newRect;
}else{
ellipse( frame, center, Size( signs[i].width*0.5, signs[i].height*0.5), 0, 0, 360, Scalar( 0, 0, 255 ), 4, 8, 0 );
Mat resultImage = frame(newRect);
result.push_back(resultImage);
previus = newRect;
}
}
return result;
}
/*
* code for detect the no parking sign , it draws a circle around the no parking signs
*/
vector<Mat> detectAndDisplayNoParking( Mat frame )
{
std::vector<Rect> signs;
vector<Mat> result;
Mat frame_gray;
cvtColor( frame, frame_gray, CV_BGR2GRAY );
equalizeHist( frame_gray, frame_gray );
//-- Detect signs
no_parking_cascade.detectMultiScale( frame_gray, signs, 1.1, 3, 0|CV_HAAR_SCALE_IMAGE, Size(30, 30) );
cout << no_parking_cascade.getFeatureType();
Rect previus;
for( size_t i = 0; i < signs.size(); i++ )
{
Point center( signs[i].x + signs[i].width*0.5, signs[i].y + signs[i].height*0.5 );
Rect newRect = Rect(center.x - signs[i].width*0.5,center.y - signs[i].height*0.5,signs[i].width,signs[i].height);
if((previus & newRect).area()>0){
previus = newRect;
}else{
ellipse( frame, center, Size( signs[i].width*0.5, signs[i].height*0.5), 0, 0, 360, Scalar( 255, 0, 0 ), 4, 8, 0 );
Mat resultImage = frame(newRect);
result.push_back(resultImage);
previus = newRect;
}
}
return result;
}
/*
* train the classifier for digit recognition, this could be done only one time, this method save the result in a file and
* it can be used in the next executions
* in order to train user must enter manually the corrisponding digit that the program shows, press space if the red box is just a point (false positive)
*/
void trainDigitClassifier(){
Mat thr,gray,con;
Mat src=imread("/Users/giuliopettenuzzo/Desktop/all_numbers.png",1);
cvtColor(src,gray,CV_BGR2GRAY);
threshold(gray,thr,125,255,THRESH_BINARY_INV); //Threshold to find contour
imshow("ci",thr);
waitKey(0);
thr.copyTo(con);
// Create sample and label data
vector< vector <Point> > contours; // Vector for storing contour
vector< Vec4i > hierarchy;
Mat sample;
Mat response_array;
findContours( con, contours, hierarchy,CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE ); //Find contour
for( int i = 0; i< contours.size(); i=hierarchy[i][0] ) // iterate through first hierarchy level contours
{
Rect r= boundingRect(contours[i]); //Find bounding rect for each contour
rectangle(src,Point(r.x,r.y), Point(r.x+r.width,r.y+r.height), Scalar(0,0,255),2,8,0);
Mat ROI = thr(r); //Crop the image
Mat tmp1, tmp2;
resize(ROI,tmp1, Size(10,10), 0,0,INTER_LINEAR ); //resize to 10X10
tmp1.convertTo(tmp2,CV_32FC1); //convert to float
imshow("src",src);
int c=waitKey(0); // Read corresponding label for contour from keyoard
c-=0x30; // Convert ascii to intiger value
response_array.push_back(c); // Store label to a mat
rectangle(src,Point(r.x,r.y), Point(r.x+r.width,r.y+r.height), Scalar(0,255,0),2,8,0);
sample.push_back(tmp2.reshape(1,1)); // Store sample data
}
// Store the data to file
Mat response,tmp;
tmp=response_array.reshape(1,1); //make continuous
tmp.convertTo(response,CV_32FC1); // Convert to float
FileStorage Data("TrainingData.yml",FileStorage::WRITE); // Store the sample data in a file
Data << "data" << sample;
Data.release();
FileStorage Label("LabelData.yml",FileStorage::WRITE); // Store the label data in a file
Label << "label" << response;
Label.release();
cout<<"Training and Label data created successfully....!! "<<endl;
imshow("src",src);
waitKey(0);
}
/*
* get digit from the image given in param, using the classifier trained before
*/
string getDigits(Mat image)
{
Mat thr1,gray1,con1;
Mat src1 = image.clone();
cvtColor(src1,gray1,CV_BGR2GRAY);
threshold(gray1,thr1,125,255,THRESH_BINARY_INV); // Threshold to create input
thr1.copyTo(con1);
// Read stored sample and label for training
Mat sample1;
Mat response1,tmp1;
FileStorage Data1("TrainingData.yml",FileStorage::READ); // Read traing data to a Mat
Data1["data"] >> sample1;
Data1.release();
FileStorage Label1("LabelData.yml",FileStorage::READ); // Read label data to a Mat
Label1["label"] >> response1;
Label1.release();
Ptr<ml::KNearest> knn(ml::KNearest::create());
knn->train(sample1, ml::ROW_SAMPLE,response1); // Train with sample and responses
cout<<"Training compleated.....!!"<<endl;
vector< vector <Point> > contours1; // Vector for storing contour
vector< Vec4i > hierarchy1;
//Create input sample by contour finding and cropping
findContours( con1, contours1, hierarchy1,CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE );
Mat dst1(src1.rows,src1.cols,CV_8UC3,Scalar::all(0));
string result;
for( int i = 0; i< contours1.size(); i=hierarchy1[i][0] ) // iterate through each contour for first hierarchy level .
{
Rect r= boundingRect(contours1[i]);
Mat ROI = thr1(r);
Mat tmp1, tmp2;
resize(ROI,tmp1, Size(10,10), 0,0,INTER_LINEAR );
tmp1.convertTo(tmp2,CV_32FC1);
Mat bestLabels;
float p=knn -> findNearest(tmp2.reshape(1,1),4, bestLabels);
char name[4];
sprintf(name,"%d",(int)p);
cout << "num = " << (int)p;
result = result + to_string((int)p);
putText( dst1,name,Point(r.x,r.y+r.height) ,0,1, Scalar(0, 255, 0), 2, 8 );
}
imwrite("dest.jpg",dst1);
return result ;
}
/*
* from the digits detected, it returns a speed limit if it is detected correctly, -1 otherwise
*/
int getSpeedLimit(string numbers){
if ((numbers.find("30") != std::string::npos) || (numbers.find("03") != std::string::npos)) {
return 30;
}
if ((numbers.find("50") != std::string::npos) || (numbers.find("05") != std::string::npos)) {
return 50;
}
if ((numbers.find("80") != std::string::npos) || (numbers.find("08") != std::string::npos)) {
return 80;
}
if ((numbers.find("70") != std::string::npos) || (numbers.find("07") != std::string::npos)) {
return 70;
}
if ((numbers.find("90") != std::string::npos) || (numbers.find("09") != std::string::npos)) {
return 90;
}
if ((numbers.find("100") != std::string::npos) || (numbers.find("001") != std::string::npos)) {
return 100;
}
if ((numbers.find("130") != std::string::npos) || (numbers.find("031") != std::string::npos)) {
return 130;
}
return -1;
}
/*
* load all the image in the file with the path hard coded below
*/
vector<Mat> loadAllImage(){
vector<cv::String> fn;
glob("/Users/giuliopettenuzzo/Desktop/T1/dataset/*.jpg", fn, false);
vector<Mat> images;
size_t count = fn.size(); //number of png files in images folder
for (size_t i=0; i<count; i++)
images.push_back(imread(fn[i]));
return images;
}
maybe you should try implementing the ransac algorithm, if you are using color images, migt be a good idea (if you are in europe) to get the red channel only since the speed limits are surrounded by a red cricle (or a thin white i think also).
For that you need to filter the image to get the edges, (canny filter).
Here are some useful links:
OpenCV detect partial circle with noise
https://hal.archives-ouvertes.fr/hal-00982526/document
Finally for the numbers detection i think its ok. Other approach is to use something like Viola-Jones algorithm to detect the signals, with pretrained existing models... It's up to you!

OpenCV C++ - Find an image contained within an image?

I have code for searching one small image in bigger another one image:
int* MyLib::MatchingMethod(int, void*)
{
/// Source image to display
img.copyTo(img_display);
/// Create the result matrix
int result_cols = img.cols - templ.cols + 1;
int result_rows = img.rows - templ.rows + 1;
result.create(result_rows, result_cols, CV_32FC1);
match_method = 0;
/// Do the Matching and Normalize
matchTemplate(img, templ, result, match_method);
normalize(result, result, 0, 1, cv::NORM_MINMAX, -1, cv::Mat());
/// Localizing the best match with minMaxLoc
double minVal;
double maxVal;
cv::Point minLoc;
cv::Point maxLoc;
cv::Point matchLoc;
minMaxLoc(result, &minVal, &maxVal, &minLoc, &maxLoc, cv::Mat());
/// For SQDIFF and SQDIFF_NORMED, the best matches are lower values. For all the other methods, the higher the better
if (match_method == CV_TM_SQDIFF || match_method == CV_TM_SQDIFF_NORMED)
{
matchLoc = minLoc;
}
else
{
matchLoc = maxLoc;
}
if (showOpenCVWindow) {
/// Show me what you got
rectangle(img_display, matchLoc, cv::Point(matchLoc.x + templ.cols, matchLoc.y + templ.rows), cv::Scalar(255, 0, 0, 255), 2, 8, 0);
rectangle(result, matchLoc, cv::Point(matchLoc.x + templ.cols, matchLoc.y + templ.rows), cv::Scalar(255, 0, 0, 255), 2, 8, 0);
imshow(image_window, img_display);
imshow(result_window, result);
}
double myX = (matchLoc.x + (templ.cols) / 2);
double myY = (matchLoc.y + (templ.rows) / 2);
static int o[2];
o[0] = myX;
o[1] = myY;
return o;
}
But this code could mistakenly "found" any area, even if bigger image doesn't contains small image.
How to change this code, to force it to "exactly" searching of the small image. For example, if smaller image is not on the bigger image, this code must show any info message "Image not found".
Update 1. It looks, like matchTemplate doesn't work good. For example, I have 3 images - one template ( http://s6.postimg.org/nj2ts3lf5/image.png ) , one image, that contains image from template ( http://s6.postimg.org/fp6tkg301/image.png ), and one image, that doesn't contains template ( http://s6.postimg.org/9x23zk3sh/image.png ).
For first image, that contains template, maxVal=0.99999994039535522 and it correctly selected area: http://s6.postimg.org/65x4qzfht/image.png
But for image, that doesn't contains template, maxVal=1.0000000000000000 and it incorrectly selected area, that doesn't contains template image: http://s6.postimg.org/5132llt0x/screenshot_544.png
Thank you!
You are visualizing the result regardless of the certainty with which the algorithm performed matching. Template matching will always give you an output - what you want to do is to try to figure out if it's valid or not.
Try outputing minVal or maxVal depending on the match_method. You should compare the value in the cases when the correct match was found and in the cases when it gave you a false positive. Those experiments should allow you to establish a threshold, that distinguishes between true hits and false positives. Thus, you will be able to say how big - for example - the maxVal has to be to be sure that it was a match. Pseudo code would go something like this:
if maxVal > threshold:
match_found = true
match_position = maxLoc
Now that's a theoretical approach. Since you didn't provide any images, it might or might not be the solution for your problem.
EDIT:
If you cannot find a definite threshold value (which in my opinion should be possible in most cases, if you maintain quality, size, etc), try doing one of two things:
Try looking at all obtained results, before minMaxLoc, calculate the mean value and see if the maxVal found is much bigger than the mean value in the true positive cases. Maybe you can define the threshold as the % of the mean value, thus saying: if maxVal > meanVal + meanVal * n%: match_found = true
It is a common situation, that template matching works better with edges than with the real image. Again, you haven't provided samples, so it's hard to say how reliable will that approach be here. But if you have enough high frequencies, to light up an image with Canny Edges, that might give you a much clearer threshold for discriminating between true and false positives.
EDIT2:
Since you're using match_method = 0, that means CV_TM_SQDIFF. For more control over the process, use the name explicitly. Find information on the methods here.
Also, put the cout inside the if statement, so that you print the correct value, that actually idicates the match (in your case, it's minVal).
if (match_method == CV_TM_SQDIFF || match_method == CV_TM_SQDIFF_NORMED)
{
matchLoc = minLoc;
std::cout << minVal << std::endl;
}
else
{
matchLoc = maxLoc;
std::cout << maxVal << std::endl;
}
And again: fairly tuned contours detection should almost certainly help if this doesn't give you the expected results.

Robust card detection/persecutive correction OpenCV

I currently have a method for detecting a card in an image and for the most part it works when the lighting is fairly consistent and the background is very calm.
Here is the code I am using to preform this operation:
Mat img = inImg.clone();
outImg = Mat(inImg.size(), CV_8UC1);
inImg.copyTo(outImg);
Mat img_fullRes = img.clone();
pyrDown(img, img);
Mat imgGray;
cvtColor(img, imgGray, CV_RGB2GRAY);
outImg_gray = imgGray.clone();
// Find Edges //
Mat detectedEdges = imgGray.clone();
bilateralFilter(imgGray, detectedEdges, 0, 185, 3, 0);
Canny( detectedEdges, detectedEdges, 20, 65, 3 );
dilate(detectedEdges, detectedEdges, Mat::ones(3,3,CV_8UC1));
Mat cdst = img.clone();
vector<Vec4i> lines;
HoughLinesP(detectedEdges, lines, 1, CV_PI/180, 60, 50, 3 );
for( size_t i = 0; i < lines.size(); i++ )
{
Vec4i l = lines[i];
// For debug
//line( cdst, cv::Point(l[0], l[1]), cv::Point(l[2], l[3]), Scalar(0,0,255), 1);
}
//cdst.copyTo(inImg);
// // Find points of intersection //
cv::Rect imgROI;
int ext = 10;
imgROI.x = ext;
imgROI.y = ext;
imgROI.width = img.size().width - ext;
imgROI.height = img.size().height - ext;
int N = lines.size();
// Creating N amount of points // N == lines.size()
cv::Point** poi = new cv::Point*[N];
for( int i = 0; i < N; i++ )
poi[i] = new cv::Point[N];
vector<cv::Point> poiList;
for( int i = 0; i < N; i++ )
{
poi[i][i] = cv::Point(-1,-1);
Vec4i line1 = lines[i];
for( int j = i + 1; j < N; j++ )
{
Vec4i line2 = lines[j];
cv::Point p = computeIntersect(line1, line2, imgROI);
if( p.x != -1 )
{
//line(cdst, p-cv::Point(2,0), p+cv::Point(2,0), Scalar(0,255,0));
//line(cdst, p-cv::Point(0,2), p+cv::Point(0,2), Scalar(0,255,0));
poiList.push_back(p);
}
poi[i][j] = p;
poi[j][i] = p;
}
}
cdst.copyTo(inImg);
if(poiList.size()==0)
{
outImg = inImg.clone();
//circle(outImg, cv::Point(100,100), 50, Scalar(255,0,0), -1);
return;
}
convexHull(poiList, poiList, false, true);
for( int i=0; i<poiList.size(); i++ )
{
cv::Point p = poiList[i];
//circle(cdst, p, 3, Scalar(255,0,0), 2);
}
//Evaluate all possible quadrilaterals
cv::Point cardCorners[4];
float metric_max = 0;
int Npoi = poiList.size();
for( int p1=0; p1<Npoi; p1++ )
{
cv::Point pts[4];
pts[0] = poiList[p1];
for( int p2=p1+1; p2<Npoi; p2++ )
{
pts[1] = poiList[p2];
if( isCloseBy(pts[1],pts[0]) )
continue;
for( int p3=p2+1; p3<Npoi; p3++ )
{
pts[2] = poiList[p3];
if( isCloseBy(pts[2],pts[1]) || isCloseBy(pts[2],pts[0]) )
continue;
for( int p4=p3+1; p4<Npoi; p4++ )
{
pts[3] = poiList[p4];
if( isCloseBy(pts[3],pts[0]) || isCloseBy(pts[3],pts[1])
|| isCloseBy(pts[3],pts[2]) )
continue;
// get the metrics
float area = getArea(pts);
cv::Point a = pts[0]-pts[1];
cv::Point b = pts[1]-pts[2];
cv::Point c = pts[2]-pts[3];
cv::Point d = pts[3]-pts[0];
float oppLenDiff = abs(a.dot(a)-c.dot(c)) + abs(b.dot(b)-d.dot(d));
float metric = area - 0.35*oppLenDiff;
if( metric > metric_max )
{
metric_max = metric;
cardCorners[0] = pts[0];
cardCorners[1] = pts[1];
cardCorners[2] = pts[2];
cardCorners[3] = pts[3];
}
}
}
}
}
// find the corners corresponding to the 4 corners of the physical card
sortPointsClockwise(cardCorners);
// Calculate Homography //
vector<Point2f> srcPts(4);
srcPts[0] = cardCorners[0]*2;
srcPts[1] = cardCorners[1]*2;
srcPts[2] = cardCorners[2]*2;
srcPts[3] = cardCorners[3]*2;
vector<Point2f> dstPts(4);
cv::Size outImgSize(1400,800);
dstPts[0] = Point2f(0,0);
dstPts[1] = Point2f(outImgSize.width-1,0);
dstPts[2] = Point2f(outImgSize.width-1,outImgSize.height-1);
dstPts[3] = Point2f(0,outImgSize.height-1);
Mat Homography = findHomography(srcPts, dstPts);
// Apply Homography
warpPerspective( img_fullRes, outImg, Homography, outImgSize, INTER_CUBIC );
outImg.copyTo(inImg);
Where computeIntersect is defined as:
cv::Point computeIntersect(cv::Vec4i a, cv::Vec4i b, cv::Rect ROI)
{
int x1 = a[0], y1 = a[1], x2 = a[2], y2 = a[3];
int x3 = b[0], y3 = b[1], x4 = b[2], y4 = b[3];
cv::Point p1 = cv::Point (x1,y1);
cv::Point p2 = cv::Point (x2,y2);
cv::Point p3 = cv::Point (x3,y3);
cv::Point p4 = cv::Point (x4,y4);
// Check to make sure all points are within the image boundrys, if not reject them.
if( !ROI.contains(p1) || !ROI.contains(p2)
|| !ROI.contains(p3) || !ROI.contains(p4) )
return cv::Point (-1,-1);
cv::Point vec1 = p1-p2;
cv::Point vec2 = p3-p4;
float vec1_norm2 = vec1.x*vec1.x + vec1.y*vec1.y;
float vec2_norm2 = vec2.x*vec2.x + vec2.y*vec2.y;
float cosTheta = (vec1.dot(vec2))/sqrt(vec1_norm2*vec2_norm2);
float den = ((float)(x1-x2) * (y3-y4)) - ((y1-y2) * (x3-x4));
if(den != 0)
{
cv::Point2f pt;
pt.x = ((x1*y2 - y1*x2) * (x3-x4) - (x1-x2) * (x3*y4 - y3*x4)) / den;
pt.y = ((x1*y2 - y1*x2) * (y3-y4) - (y1-y2) * (x3*y4 - y3*x4)) / den;
if( !ROI.contains(pt) )
return cv::Point (-1,-1);
// no-confidence metric
float d1 = MIN( dist2(p1,pt), dist2(p2,pt) )/vec1_norm2;
float d2 = MIN( dist2(p3,pt), dist2(p4,pt) )/vec2_norm2;
float no_confidence_metric = MAX(sqrt(d1),sqrt(d2));
// If end point ratios are greater than .5 reject
if( no_confidence_metric < 0.5 && cosTheta < 0.707 )
return cv::Point (int(pt.x+0.5), int(pt.y+0.5));
}
return cv::Point(-1, -1);
}
sortPointsClockWise is defined as:
void sortPointsClockwise(cv::Point a[])
{
cv::Point b[4];
cv::Point ctr = (a[0]+a[1]+a[2]+a[3]);
ctr.x /= 4;
ctr.y /= 4;
b[0] = a[0]-ctr;
b[1] = a[1]-ctr;
b[2] = a[2]-ctr;
b[3] = a[3]-ctr;
for( int i=0; i<4; i++ )
{
if( b[i].x < 0 )
{
if( b[i].y < 0 )
a[0] = b[i]+ctr;
else
a[3] = b[i]+ctr;
}
else
{
if( b[i].y < 0 )
a[1] = b[i]+ctr;
else
a[2] = b[i]+ctr;
}
}
}
getArea is defined as:
float getArea(cv::Point arr[])
{
cv::Point diag1 = arr[0]-arr[2];
cv::Point diag2 = arr[1]-arr[3];
return 0.5*(diag1.cross(diag2));
}
isCloseBy is defined as:
bool isCloseBy( cv::Point p1, cv::Point p2 )
{
int D = 10;
// Checking that X values are within 10, same for Y values.
return ( abs(p1.x-p2.x)<=D && abs(p1.y-p2.y)<=D );
}
And finally dist2:
float dist2( cv::Point p1, cv::Point p2 )
{
return float((p1.x-p2.x)*(p1.x-p2.x) + (p1.y-p2.y)*(p1.y-p2.y));
}
Here are several test images and their results:
Sorry for the very lengthy post, however I am hoping someone can suggest a way I can make my method for extracting the card from the image more robust. One that can better handle disruptive backgrounds along with inconsistent lighting.
When a card is placed on a contrasting background with good lighting my method works nearly 90% of the time. But it is clear I need a more robust approach.
Does anyone have any suggestions?
Thanks.
ATTEMPT of dhanushka's soloution
Mat gray, bw; pyrDown(inImg, inImg);
cvtColor(inImg, gray, CV_RGB2GRAY);
int morph_size = 3;
Mat element = getStructuringElement( MORPH_ELLIPSE, cv::Size( 4*morph_size + 1, 2*morph_size+1 ), cv::Point( morph_size, morph_size ) );
morphologyEx(gray, gray, 2, element);
threshold(gray, bw, 160, 255, CV_THRESH_BINARY);
vector<vector<cv::Point> > contours;
vector<Vec4i> hierarchy;
findContours( bw, contours, hierarchy, CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE, cv::Point(0, 0) );
int largest_area=0;
int largest_contour_index=0;
cv::Rect bounding_rect;
for( int i = 0; i< contours.size(); i++ )
{
double a=contourArea( contours[i],false); // Find the area of contour
if(a>largest_area){
largest_area=a;
largest_contour_index=i; //Store the index of largest contour
bounding_rect=boundingRect(contours[i]);
}
}
//Scalar color( 255,255,255);
rectangle(inImg, bounding_rect, Scalar(0,255,0),1, 8,0);
Mat biggestRect = inImg(bounding_rect);
Mat card1 = biggestRect.clone();
The art of image processing is (in my 10+ years experience) just that: an art. No single answer exists, and there is always more than one way to do it. And it will definitely fail in some cases.
In my experience of working on automatically detecting features in medical images, it takes a long time to build to reliable algorithm, but in hindsight the best result is obtained with a relative simple algorithm. However, it takes a lot of time to get to this simple algorithm.
To get to this, the general approach is always the same:
get started is to build up a large database of test-images (at least 100). This defines 'normal' images which should work. By collecting the images you already start thinking about the problem.
annotate the images to build a kind of 'ground truth'. In this case, the 'ground truth' should contain the 4 corners of the card since these are the interesting points.
create an application which runs over these images an algorithm and compares the result with the ground truth. In this case, the 'comparing with ground truth' would be to take the mean distance of the found 4 corner point with the ground truth corner points.
Output a tab-delimited file which you call .xls, and therefore can be opened (on Windows) in Excel by double clicking. Good to get an quick overview of the cases. Look at the worst cases first. Then open these cases manually to try to understand why they do not work.
Now you are ready to change the algorithm. Change something, and re-run. Compare new Excel sheet to old Excel sheet. Now you start realizing the trade-offs you have to make.
That having said, I think that you need to answer these questions during the tuning of the algorithm:
Do you allow a little folded cards? So no completely straight lines? If so, concentrate more on corners instead of lines / edges.
Do you allow gradual differences in lighting? If so, a local contrast-stretch filter might help.
Do you allow the same color for the card as the background? If so, you have to concentrate on the contents of the card instead of the border of the card.
Do you allow non-perfect lenses? If so, to which extend?
Do you allow rotated cards? If so, to which extend?
Should the background be uniform in color and/or texture?
How small should the smallest detectable card be relative to the image size? If you assume that at least 80% of the width or height should be covered, you get robustness back.
If more than one card is visible in the image, should the algorithm be robust and only pick one, or is any output ok?
If no card is visible, should it detect this case? Building in detection of this case will make it more user friendly ('no card found'), but also less robust.
These will make the requirements and assumptions on the image to acquire. Assumptions on which you can rely are very strong: they make the algorithm fast, robust and simple if you choose the right ones. Also let these requirements and assumptions be part of the testing database.
So what would I choose? Based on the three images you provided I would start with something like:
Assume the cards are filling the image from 50% to 100%.
Assume the cards are rotated at most 10 degrees or so.
Assume the corners are well visible.
Assume the aspect ratio (height divided by width) of the cards to be between 1/3 and 3.
Assume no card-like objects in the background
The algorithm then would look like:
Detect in each quadrant of the image a specific corner with a corner-filter. So in the upper left quadrant of the image the upper left corner of the card. Look for example at http://www.ee.surrey.ac.uk/CVSSP/demos/corners/results3.html , or use an OpenCV function for it like cornerHarris .
To be more robust, calculate more than one corner per quadrant.
Try to build parallelograms with one corner per each quadrant by combining points from each quadrant. Create a fitness function which gives higher score to:
having internal angles close to 90 degrees
be large
optionally, compare the corners of the card based on lighting or another feature.
This fitness function gives a lot of tuning possibilities later on.
Return the parallelogram with the highest score.
So why using corner-detection instead of a hough-transform to do line detection? In my opinion the hough-transform is (next to being slow) quite sensitive to patterns in the background (which is what you see in your first image -- it detects a stronger line in the background then of the card), and it cannot handle a little curved lines that well, unless you use a larger bin size which will worsen the detection.
Good luck!
A more general approach would definitely be something like Rutger Nijlunsing suggested in his answer. However, in your case, at least for the provided sample images, a very simple approach like morphological opening followed by thresholding, contour processing and convexhull would yield the result you want. Use a scaled down version of the images for processing so that you don't have to use a large kernel for morphological operations. Below are the images processed this way.
pyrDown(large, rgb0);
pyrDown(rgb0, rgb0);
pyrDown(rgb0, rgb0);
Mat small;
cvtColor(rgb0, small, CV_BGR2GRAY);
Mat morph;
Mat kernel = getStructuringElement(MORPH_ELLIPSE, Size(11, 11));
morphologyEx(small, morph, MORPH_OPEN, kernel);
Mat bw;
threshold(morph, bw, 0, 255.0, CV_THRESH_BINARY | CV_THRESH_OTSU);
Mat bdry;
kernel = getStructuringElement(MORPH_ELLIPSE, Size(3, 3));
erode(bw, bdry, kernel);
subtract(bw, bdry, bdry);
// do contour processing on bdry
This approach will not work in general, so I would strongly recommend something like Rutger suggested.

OpenCV 2.4.2 calcOpticalFlowPyrLK doesn't find any points

I am using OpenCV 2.4.2 on Linux. I am writing in C++. I want to track simple objects (e.g. black rectangle on the white background). Firstly I am using goodFeaturesToTrack and then calcOpticalFlowPyrLK to find those points on another image. The problem is that calcOpticalFlowPyrLK doesn't find those points.
I have found code that does it in C, which does not work in my case: http://dasl.mem.drexel.edu/~noahKuntz/openCVTut9.html
I have converted it into C++:
int main(int, char**) {
Mat imgAgray = imread("ImageA.png", CV_LOAD_IMAGE_GRAYSCALE);
Mat imgBgray = imread("ImageB.png", CV_LOAD_IMAGE_GRAYSCALE);
Mat imgC = imread("ImageC.png", CV_LOAD_IMAGE_UNCHANGED);
vector<Point2f> cornersA;
goodFeaturesToTrack(imgAgray, cornersA, 30, 0.01, 30);
for (unsigned int i = 0; i < cornersA.size(); i++) {
drawPixel(cornersA[i], &imgC, 2, blue);
}
// I have no idea what does it do
// cornerSubPix(imgAgray, cornersA, Size(15, 15), Size(-1, -1),
// TermCriteria(TermCriteria::COUNT + TermCriteria::EPS, 20, 0.03));
vector<Point2f> cornersB;
vector<uchar> status;
vector<float> error;
// winsize has to be 11 or 13, otherwise nothing is found
int winsize = 11;
int maxlvl = 5;
calcOpticalFlowPyrLK(imgAgray, imgBgray, cornersA, cornersB, status, error,
Size(winsize, winsize), maxlvl);
for (unsigned int i = 0; i < cornersB.size(); i++) {
if (status[i] == 0 || error[i] > 0) {
drawPixel(cornersB[i], &imgC, 2, red);
continue;
}
drawPixel(cornersB[i], &imgC, 2, green);
line(imgC, cornersA[i], cornersB[i], Scalar(255, 0, 0));
}
namedWindow("window", 1);
moveWindow("window", 50, 50);
imshow("window", imgC);
cvWaitKey(0);
return 0;
}
ImageA: http://oi50.tinypic.com/14kv05v.jpg
ImageB: http://oi46.tinypic.com/4l3xom.jpg
ImageC: http://oi47.tinypic.com/35n3uox.jpg
I have found out that it works only for winsize = 11. I have tried using it on a moving rectangle to check how far it is from the origin. It hardly ever detects all four corners.
int main(int, char**) {
std::cout << "Compiled at " << __TIME__ << std::endl;
Scalar white = Scalar(255, 255, 255);
Scalar black = Scalar(0, 0, 0);
Scalar red = Scalar(0, 0, 255);
Rect rect = Rect(50, 100, 100, 150);
Mat org = Mat(Size(640, 480), CV_8UC1, white);
rectangle(org, rect, black, -1, 0, 0);
vector<Point2f> features;
goodFeaturesToTrack(org, features, 30, 0.01, 30);
std::cout << "POINTS FOUND:" << std::endl;
for (unsigned int i = 0; i < features.size(); i++) {
std::cout << "Point found: " << features[i].x;
std::cout << " " << features[i].y << std::endl;
}
bool goRight = 1;
while (1) {
if (goRight) {
rect.x += 30;
rect.y += 30;
if (rect.x >= 250) {
goRight = 0;
}
} else {
rect.x -= 30;
rect.y -= 30;
if (rect.x <= 50) {
goRight = 1;
}
}
Mat frame = Mat(Size(640, 480), CV_8UC1, white);
rectangle(frame, rect, black, -1, 0, 0);
vector<Point2f> found;
vector<uchar> status;
vector<float> error;
calcOpticalFlowPyrLK(org, frame, features, found, status, error,
Size(11, 11), 5);
Mat display;
cvtColor(frame, display, CV_GRAY2BGR);
for (unsigned int i = 0; i < found.size(); i++) {
if (status[i] == 0 || error[i] > 0) {
continue;
} else {
line(display, features[i], found[i], red);
}
}
namedWindow("window", 1);
moveWindow("window", 50, 50);
imshow("window", display);
if (cvWaitKey(300) > 0) {
break;
}
}
}
OpenCV implementation of Lucas-Kanade seems to be unable to track a rectangle on a binary image. Am I doing something wrong or does this function just not work?
The Lucas Kanade method estimates the motion of a region by using the gradients in that region. It is in a case a gradient descends methods. So if you don't have gradients in x AND y direction the method will fail. The second important note is that the Lucas Kanade equation
E = sum_{winsize} (Ix * u + Iy * v * It)²
is an first order taylor approximation of the intensity constancy constrain.
I(x,y,t) = I(x+u,y+v,t+1)
so an restriction of the method without level (image pyramids) is that the image needs to be a linear function. In practise this mean only small motions could be estimated, dependend from the winsize you choose. Thats why you use the levels, which linearise the images (It). So a level of 5 is a little bit to high 3 should be enough. The top level image has in your case a size of 640x480 / 2^5 = 20 x 15.
Finally the problem in your code is the line:
if (status[i] == 0 || error[i] > 0) {
the error you get back from the lucas kanade method is the resulting SSD that means:
error = sum(winSize) (I(x,y,0) - I(x+u,y+u,1)^2) / (winsize * winsize)
It is very unlikely that the error is 0. So finally you skip all features. I have good experiences by ignoring the error, that is just a confidence measure. There are very good alternative confidence measures as the Foreward/Backward confidence. You could also start experiments by ignoring the status flag if too much feaurtes are discard
KLT does point tracking by finding a transformation between two sets of points regarding a certain window. The window size is an area over which each point will be chased in order to match it on the other frame.
It is another algorithm based on gradient that find the good features to track.
Normally KLT uses a pyramidal approach in order to maintain tracking even with big movements. It probably uses at "maxLevel" times for the "window sized" you specified.
Never tried KLT on binary images. The problem might be on KLT implementation that begin the search in a wrong direction and then just lost the points. When you change the windows size then the search algorithm changes also. On you're picture you have only 4 interest point maximum and only on 1 pixel.
These are parameters you're interested in :
winSize – Size of the search window at each pyramid level
maxLevel – 0-based maximal pyramid level number. If 0, pyramids are not used (single level), if 1, two levels are used etc.
criteria – Specifies the termination criteria of the iterative search algorithm (after the specified maximum number of iterations criteria.maxCount or when the search window moves by less than criteria.epsilon
Suggestion :
Did you try with natural pictures ? (two photos for instance), you'll have much more features to track. 4 or less is quite hard to keep. I would try this first