Related
I am developing an app where I compare two images using matchShapes() of OpenCV.
I implemented the method in Objective-C code is below
- (void) someMethod:(UIImage *)image :(UIImage *)temp {
RNG rng(12345);
cv::Mat src_base, hsv_base;
cv::Mat src_test1, hsv_test1;
src_base = [self cvMatWithImage:image];
src_test1 = [self cvMatWithImage:temp];
int thresh=150;
double ans=0, result=0;
Mat imageresult1, imageresult2;
cv::cvtColor(src_base, hsv_base, cv::COLOR_BGR2HSV);
cv::cvtColor(src_test1, hsv_test1, cv::COLOR_BGR2HSV);
std::vector<std::vector<cv::Point>>contours1, contours2;
std::vector<Vec4i>hierarchy1, hierarchy2;
Canny(hsv_base, imageresult1, thresh, thresh*2);
Canny(hsv_test1, imageresult2, thresh, thresh*2);
findContours(imageresult1,contours1,hierarchy1,CV_RETR_TREE,CV_CHAIN_APPROX_SIMPLE,cvPoint(0,0));
for(int i=0;i<contours1.size();i++)
{
Scalar color=Scalar(rng.uniform(0, 255), rng.uniform(0,255), rng.uniform(0,255));
drawContours(imageresult1,contours1,i,color,1,8,hierarchy1,0,cv::Point());
}
findContours(imageresult2,contours2,hierarchy2,CV_RETR_TREE,CV_CHAIN_APPROX_SIMPLE,cvPoint(0,0));
for(int i=0;i<contours2.size();i++)
{
Scalar color=Scalar(rng.uniform(0, 255), rng.uniform(0,255), rng.uniform(0,255));
drawContours(imageresult2,contours2,i,color,1,8,hierarchy2,0,cv::Point());
}
for(int i=0;i<contours1.size();i++)
{
ans = matchShapes(contours1[i],contours2[i],CV_CONTOURS_MATCH_I1,0);
std::cout<<ans<<" ";
getchar();
}
}
I got those results but do not know what exactly those numbers mean: 0 0 0.81946 0.816337 0.622353 0.634221 0
this blogpost I think should give a lot more insight into how matchShapes works.
You obviously already know what the input parameters are but for anyone finding this that doesn't:
double matchShapes(InputArray contour1, InputArray contour2, int method, double parameter)
The output is a metric where:
The lower the result, the better match it is. It is calculated based on the hu-moment values. Different measurement methods are explained in the docs.
The findings on the blogpost mentioned are as follows: ( max = 1 , min = 0)
I got following results:
Matching Image A with itself = 0.0
Matching Image A with Image B = 0.001946
Matching Image A with Image C = 0.326911
See, even image rotation doesn’t affect much on this comparison.
This basically shows that for your results:
The first two are great, you got a compelte match at 0
The second two (0.81946 0.816337) are quite an incompatible match
the third two are OK at around 62% incompatible
the last one is complete match.
If my computer vision learnings have taught me anything is always be sceptical of a complete match unless you are 100% using the same images.
Edit1: I think it might also be rotationally invarient so in your case you might have three very similar drawn lines that have been rotated to the same way (i.e. horizontal) and compared
I'm attempting to work with a depth sensor to add positional tracking to the Oculus Rift dev kit. However, I'm having trouble with the sequence of operations producing a usable result.
I'm starting with a 16 bit depth image, where the values sort of (but not really) correspond to millimeters. Undefined values in the image have already been set to 0.
First I'm eliminating everything outside a certain near and far distance by updating a mask image to exclude them.
cv::Mat result = cv::Mat::zeros(depthImage.size(), CV_8UC3);
cv::Mat depthMask;
depthImage.convertTo(depthMask, CV_8U);
for_each_pixel<DepthImagePixel, uint8_t>(depthImage, depthMask,
[&](DepthImagePixel & depthPixel, uint8_t & maskPixel){
if (!maskPixel) {
return;
}
static const uint16_t depthMax = 1200;
static const uint16_t depthMin = 200;
if (depthPixel < depthMin || depthPixel > depthMax) {
maskPixel = 0;
}
});
Next, since the feature I want is likely to be closer to the camera than the overall scene average, I update the mask again to exclude anything that isn't within a certain range of the median value:
const float depthAverage = cv::mean(depthImage, depthMask)[0];
const uint16_t depthMax = depthAverage * 1.0;
const uint16_t depthMin = depthAverage * 0.75;
for_each_pixel<DepthImagePixel, uint8_t>(depthImage, depthMask,
[&](DepthImagePixel & depthPixel, uint8_t & maskPixel){
if (!maskPixel) {
return;
}
if (depthPixel < depthMin || depthPixel > depthMax) {
maskPixel = 0;
}
});
Finally, I zero out everything that's not in the mask, and scale the remaining values to between 10 & 255 before converting the image format to 8 bit
cv::Mat outsideMask;
cv::bitwise_not(depthMask, outsideMask);
// Zero out outside the mask
cv::subtract(depthImage, depthImage, depthImage, outsideMask);
// Within the mask, normalize to the range + X
cv::subtract(depthImage, depthMin, depthImage, depthMask);
double minVal, maxVal;
minMaxLoc(depthImage, &minVal, &maxVal);
float range = depthMax - depthMin;
float scale = (((float)(UINT8_MAX - 10) / range));
depthImage *= scale;
cv::add(depthImage, 10, depthImage, depthMask);
depthImage.convertTo(depthImage, CV_8U);
The results looks like this:
I'm pretty happy with this section of the code, since it produces pretty clear visual features.
I'm then applying a couple of smoothing operations to get rid of the ridiculous amount of noise from the depth camera:
cv::medianBlur(depthImage, depthImage, 9);
cv::Mat blurred;
cv::bilateralFilter(depthImage, blurred, 5, 250, 250);
depthImage = blurred;
cv::Mat result = cv::Mat::zeros(depthImage.size(), CV_8UC3);
cv::insertChannel(depthImage, result, 0);
Again, the features look pretty clear visually, but I wonder if they couldn't be sharpened somehow:
Next I'm using canny for edge detection:
cv::Mat canny_output;
{
cv::Canny(depthImage, canny_output, 20, 80, 3, true);
cv::insertChannel(canny_output, result, 1);
}
The lines I'm looking for are there, but not well represented towards the corners:
Finally I'm using probabilistic Hough to identify lines:
std::vector<cv::Vec4i> lines;
cv::HoughLinesP(canny_output, lines, pixelRes, degreeRes * CV_PI / 180, hughThreshold, hughMinLength, hughMaxGap);
for (size_t i = 0; i < lines.size(); i++)
{
cv::Vec4i l = lines[i];
glm::vec2 a((l[0], l[1]));
glm::vec2 b((l[2], l[3]));
float length = glm::length(a - b);
cv::line(result, cv::Point(l[0], l[1]), cv::Point(l[2], l[3]), cv::Scalar(0, 0, 255), 3, CV_AA);
}
This results in this image
At this point I feel like I've gone off the rails, because I can't find a good set of parameters for Hough to produce a reasonable number of candidate lines in which to search for my shape, and I'm not sure if I should be fiddling with Hough or looking at improving the outputs of the prior steps.
Is there a good way of objectively validating my results at each stage, as opposed to just fiddling with the input values until I think it 'looks good'? Is there a better approach to finding the rectangle given the starting image (and given that it won't necessarily be oriented in a particular direction?
Very cool project!
Though, I feel like your approach does not use all the info that you could get from the depthmap (e.g. 3D points, normals, etc), which would help a lot.
The Point Cloud Library (PCL), which is a C++ library dedicated to the processing of RGB-D data, has a tutorial on plane segmentation using RANSAC which could inspire you. You might not want to use PCL in your program, due to the numerous dependencies, however as it is open-source, you can find the algorithm implementation on Github (PCL SAC segmentation). However, RANSAC might be slow and produce unwanted results depending on the scene.
You could also try to use the approach presented in "Real-Time Plane Segmentation
using RGB-D Cameras" by Holz, Holzer, Rusu and Behnke, 2011 (PDF), which suggests fast normal estimation using integral images followed by plane detection using clustering of normals.
This is my code for training the dataset of for example vehicles , when it train fully , i want it to predict the data(vehicle) from video(.avi) , how to predict trained data from video and how to add that part in it ? , i want that when the vehicle is shown in the video it count it as 1 and cout that the object is detected and if second vehicle come it increment the count as 2
IplImage *img2;
cout<<"Vector quantization..."<<endl;
collectclasscentroids();
vector<Mat> descriptors = bowTrainer.getDescriptors();
int count=0;
for(vector<Mat>::iterator iter=descriptors.begin();iter!=descriptors.end();iter++)
{
count += iter->rows;
}
cout<<"Clustering "<<count<<" features"<<endl;
//choosing cluster's centroids as dictionary's words
Mat dictionary = bowTrainer.cluster();
bowDE.setVocabulary(dictionary);
cout<<"extracting histograms in the form of BOW for each image "<<endl;
Mat labels(0, 1, CV_32FC1);
Mat trainingData(0, dictionarySize, CV_32FC1);
int k = 0;
vector<KeyPoint> keypoint1;
Mat bowDescriptor1;
//extracting histogram in the form of bow for each image
for(j = 1; j <= 4; j++)
for(i = 1; i <= 60; i++)
{
sprintf( ch,"%s%d%s%d%s","train/",j," (",i,").jpg");
const char* imageName = ch;
img2 = cvLoadImage(imageName, 0);
detector.detect(img2, keypoint1);
bowDE.compute(img2, keypoint1, bowDescriptor1);
trainingData.push_back(bowDescriptor1);
labels.push_back((float) j);
}
//Setting up SVM parameters
CvSVMParams params;
params.kernel_type = CvSVM::RBF;
params.svm_type = CvSVM::C_SVC;
params.gamma = 0.50625000000000009;
params.C = 312.50000000000000;
params.term_crit = cvTermCriteria(CV_TERMCRIT_ITER, 100, 0.000001);
CvSVM svm;
printf("%s\n", "Training SVM classifier");
bool res = svm.train(trainingData, labels, cv::Mat(), cv::Mat(), params);
cout<<"Processing evaluation data..."<<endl;
Mat groundTruth(0, 1, CV_32FC1);
Mat evalData(0, dictionarySize, CV_32FC1);
k = 0;
vector<KeyPoint> keypoint2;
Mat bowDescriptor2;
Mat results(0, 1, CV_32FC1);;
for(j = 1; j <= 4; j++)
for(i = 1; i <= 60; i++)
{
sprintf( ch, "%s%d%s%d%s", "eval/", j, " (",i,").jpg");
const char* imageName = ch;
img2 = cvLoadImage(imageName,0);
detector.detect(img2, keypoint2);
bowDE.compute(img2, keypoint2, bowDescriptor2);
evalData.push_back(bowDescriptor2);
groundTruth.push_back((float) j);
float response = svm.predict(bowDescriptor2);
results.push_back(response);
}
//calculate the number of unmatched classes
double errorRate = (double) countNonZero(groundTruth- results) / evalData.rows;
The question isThis code is not predicting from video , i want to know how to predict it from the video , mean like i want to detect the vehicle from movie , like it should show 1 when it find the vehicle from movie
For those who didn't understand the question :
I want to play a movie in above code
VideoCapture cap("movie.avi"); //movie.avi is with deleted background
Suppose i have a trained data which contain vehicle's , and "movie.avi" contain 5 vehicles , so it should detect that vehicles from the movie.avi and give me 5 as output
How to do this part in the above code
From looking at your code setup
params.svm_type = CvSVM::C_SVC;
it appears that you train your classifier with more than two classes. A typical example in traffic scenario could be cars/pedestrians/bikes/... However, you were asking for a way to detect cars only. Without a description of your training data and your video it's hard to tell, if your idea makes sense. I guess what the previous answers are assuming is the following:
You loop through each frame and want to output the number of cars in that frame. Thus, a frame may contain multiple cars, say 5. If you take the whole frame as input to the classifier, it might respond "car", even if the setup might be a little off, conceptually. You cannot retrieve the number of cars reliably with this approach.
Instead, the suggestion is to try a sliding-window approach. This means, for example, you loop over each pixel of the frame and take the region around the pixel (called sub-window or region-of-interest) as input to the classifier. Assuming a fixed scale, the sub-window could have a size of 150x50px as well as your training data would. You might fixate the scale of the cars in your training data, but in real-world videos, the cars will be of different size. In order to find a car of different scale, let's say it's two-times as large as in the training data, the typical approach is to scale the image (say with a factor of 2) and repeat the sliding-window approach.
By repeating this for all relevant scales you end up with an algorithm that gives you for each pixel location and each scale the result of your classifier. This means you have three loops, or, in other words, there are three dimensions (image width, image height, scale). This is best understood as a three-dimensional pyramid. "Why a pyramid?" you might ask. Because each time the image is scaled (say 2) the image gets smaller (/larger) and the next scale is an image of different size (for eample half the size).
The pixel locations indicate the position of the car and the scale indicates the size of it. Now, if you have an N-class classifier, each slot in this pyramid will contain a number (1,...,N) indicating the class. If you had a binary classifier (car/no car), then you would end up with each slot containing 0 or 1. Even in this simple case, where you would be tempted to simply count the number of 1 and output the count as the number of cars, you still have the problem that there could be multiple responses for the same car. Thus, it would be better if you had a car detector that gives continous responses between 0 and 1 and then you could find maxima in this pyramid. Each maximum would indicate a single car. This kind of detection is successfully used with corner features, where you detect corners of interest in a so-called scale-space pyramid.
To summarize, no matter if you are simplifying the problem to a binary classification problem ("car"/"no car"), or if you are sticking to the more difficult task of distinguishing between multiple classes ("car"/"animal"/"pedestrian"/...), you still have the problem of scale and location in each frame to solve.
The code you have for using images is written with OpenCV's C interface so it's probably easy to stick with that rather than use the C++ video interface.
In which case somthing along these lines should work:
CvCapture *capture = cvCaptureFromFile("movie.avi");
IplImage *img = 0;
while(img = cvQueryFrame(capture))
{
// Process image
...
}
You should implement a sliding window approach. In each window, you should apply the SVM to get candidates. Then, once you've done it for the whole image, you should merge the candidates (if you detected an object, then it is very likely that you'll detect it again in shift of a few pixels - that's the meaning of candidates).
Take a look at the V&J code at openCV or the latentSVM code (detection by parts) to see how it's done there.
By the way, I would use the LatentSVM code (detection by parts) to detect vehicles. It has trained models for cars and for buses.
Good luck.
You need detector, not classifier. Take a look at Haar cascades, LBP cascades, latentSVM, as mentioned before or HOG detector.
I'll explain why. Detector usually scan image by sliding window, line by line. In several scales. In every window detector solve problem: "object/not object". It may give you rough results but very fast. Classifiers such as BOW, works very slow for this task. Then you should apply classifiers to regions, found by detector.
I'm working on a project where I will use homography as features in a classifier. My problem is in automatically calculating homographies, i'm using SIFT descriptors to find the points between the two images on which to calculate homography but SIFT are giving me very poor results, hence i can't use them in my work.
I'm using OpenCV 2.4.3.
At first I was using SURF, but I had similar results and I decided to use SIFT which are slower but more precise. My first guess was that the image resolution in my dataset was too low but i ran my algorithm on a state-of-the-art dataset (Pointing 04) and I obtained pretty much the same results, so the problem lies in what I do and not in my dataset.
The match between the SIFT keypoints found in each image is done with the FlannBased matcher, i tried the BruteForce one but the results were again pretty much the same.
This is an example of the match I found (image from Pointing 04 dataset)
The above image shows how poor is the match found with my program. Only 1 point is a correct match. I need (at least) 4 correct matches for what I have to do.
Here is the code i use:
This is the function that extracts SIFT descriptors from each image
void extract_sift(const Mat &img, vector<KeyPoint> &keypoints, Mat &descriptors, Rect* face_rec) {
// Create masks for ROI on the original image
Mat mask1 = Mat::zeros(img.size(), CV_8U); // type of mask is CV_8U
Mat roi1(mask1, *face_rec);
roi1 = Scalar(255, 255, 255);
// Extracts keypoints in ROIs only
Ptr<DescriptorExtractor> featExtractor;
Ptr<FeatureDetector> featDetector;
Ptr<DescriptorMatcher> featMatcher;
featExtractor = new SIFT();
featDetector = FeatureDetector::create("SIFT");
featDetector->detect(img,keypoints,mask1);
featExtractor->compute(img,keypoints,descriptors);
}
This is the function that matches two images' descriptors
void match_sift(const Mat &img1, const Mat &img2, const vector<KeyPoint> &kp1,
const vector<KeyPoint> &kp2, const Mat &descriptors1, const Mat &descriptors2,
vector<Point2f> &p_im1, vector<Point2f> &p_im2) {
// Matching descriptor vectors using FLANN matcher
Ptr<DescriptorMatcher> matcher = DescriptorMatcher::create("FlannBased");
std::vector< DMatch > matches;
matcher->match( descriptors1, descriptors2, matches );
double max_dist = 0; double min_dist = 100;
// Quick calculation of max and min distances between keypoints
for( int i = 0; i < descriptors1.rows; ++i ){
double dist = matches[i].distance;
if( dist < min_dist ) min_dist = dist;
if( dist > max_dist ) max_dist = dist;
}
// Draw only the 4 best matches
std::vector< DMatch > good_matches;
// XXX: DMatch has no sort method, maybe a more efficent min extraction algorithm can be used here?
double min=matches[0].distance;
int min_i = 0;
for( int i = 0; i < (matches.size()>4?4:matches.size()); ++i ) {
for(int j=0;j<matches.size();++j)
if(matches[j].distance < min) {
min = matches[j].distance;
min_i = j;
}
good_matches.push_back( matches[min_i]);
matches.erase(matches.begin() + min_i);
min=matches[0].distance;
min_i = 0;
}
Mat img_matches;
drawMatches( img1, kp1, img2, kp2,
good_matches, img_matches, Scalar::all(-1), Scalar::all(-1),
vector<char>(), DrawMatchesFlags::NOT_DRAW_SINGLE_POINTS );
imwrite("imgMatch.jpeg",img_matches);
imshow("",img_matches);
waitKey();
for( int i = 0; i < good_matches.size(); i++ )
{
// Get the points from the best matches
p_im1.push_back( kp1[ good_matches[i].queryIdx ].pt );
p_im2.push_back( kp2[ good_matches[i].trainIdx ].pt );
}
}
And these functions are called here:
extract_sift(dataset[i].img,dataset[i].keypoints,dataset[i].descriptors,face_rec);
[...]
// Extract keypoints from i+1 image and calculate homography
extract_sift(dataset[i+1].img,dataset[i+1].keypoints,dataset[i+1].descriptors,face_rec);
dataset[front].points_r.clear(); // XXX: dunno if clearing the points every time is the best way to do it..
match_sift(dataset[front].img,dataset[i+1].img,dataset[front].keypoints,dataset[i+1].keypoints,
dataset[front].descriptors,dataset[i+1].descriptors,dataset[front].points_r,dataset[i+1].points_r);
dataset[i+1].H = findHomography(dataset[front].points_r,dataset[i+1].points_r, RANSAC);
Any help on how to improve the matching performance would be really appreciated, thanks.
You apparently use the "best four points" in your code w.r.t. the distance of the matches. In other words, you consider that a match is valid if both descriptors are really similar. I believe this is wrong. Did you try to draw all of the matches? Many of them should be wrong, but many should be good as well.
The distance of a match just tells how similar both points are. This doesn't tell if the match is coherent geometrically. Selecting the best matches should definitely consider the geometry.
Here is how I would do:
Detect the corners (you already do this)
Find the matches (you already do this)
Try to find a homography transform between both images by using the matches (don't filter them before!) using findHomography(...)
findHomography(...) will tell you which are the inliers. Those are your good_matches.
I am trying to use SURF but I am having trouble finding way to do so in C. The documentation only seems to have stuff for C++ in terms of.
I have been able to detect SURF feature:
IplImage *img = cvLoadImage("img5.jpg");
CvMat* image = cvCreateMat(img->height, img->width, CV_8UC1);
cvCvtColor(img, image, CV_BGR2GRAY);
// detecting keypoints
CvSeq *imageKeypoints = 0, *imageDescriptors = 0;
int i;
//Extract SURF points by initializing parameters
CvSURFParams params = cvSURFParams(1, 1);
cvExtractSURF( image, 0, &imageKeypoints, &imageDescriptors, storage, params );
printf("Image Descriptors: %d\n", imageDescriptors->total);
//draw the keypoints on the captured frame
for( i = 0; i < imageKeypoints->total; i++ )
{
CvSURFPoint* r = (CvSURFPoint*)cvGetSeqElem( imageKeypoints, i );
CvPoint center;
int radius;
center.x = cvRound(r->pt.x);
center.y = cvRound(r->pt.y);
radius = cvRound(r->size*1.2/9.*2);
cvCircle( image, center, radius, CV_RGB(0,255,0), 1, 8, 0 );
}
But I can't find the method that I need to compare the descriptors of 2 images. I found this code in C++ but I'm having trouble translating it:
// matching descriptors
BruteForceMatcher<L2<float> > matcher;
vector<DMatch> matches;
matcher.match(descriptors1, descriptors2, matches);
// drawing the results
namedWindow("matches", 1);
Mat img_matches;
drawMatches(img1, keypoints1, img2, keypoints2, matches, img_matches);
imshow("matches", img_matches);
waitKey(0);
I would appreciate if someone could lead me on to a descriptor matcher or even better, let me know where I can find the OpenCV documentation in C only.
This link might give you a hint. https://projects.developer.nokia.com/opencv/browser/opencv/opencv-2.3.1/samples/c/find_obj.cpp . Look in the function naiveNearestNeighbor
Check out the blog post from thioldhack. Contains a sample code. Its for QT, but you can easily do it for VC++ or any other. You will need to match the Key points using K-nearest neighbour algorithm. It has all.
The slightly longer but surest way is to compile OpenCV on your computer with debug information and just to step into the C++ implementation with a debugger. You can also copy it aside to your project and start peeling it like an onion till you get to pure C.