How to detect object from video using SVM - c++

This is my code for training the dataset of for example vehicles , when it train fully , i want it to predict the data(vehicle) from video(.avi) , how to predict trained data from video and how to add that part in it ? , i want that when the vehicle is shown in the video it count it as 1 and cout that the object is detected and if second vehicle come it increment the count as 2
IplImage *img2;
cout<<"Vector quantization..."<<endl;
collectclasscentroids();
vector<Mat> descriptors = bowTrainer.getDescriptors();
int count=0;
for(vector<Mat>::iterator iter=descriptors.begin();iter!=descriptors.end();iter++)
{
count += iter->rows;
}
cout<<"Clustering "<<count<<" features"<<endl;
//choosing cluster's centroids as dictionary's words
Mat dictionary = bowTrainer.cluster();
bowDE.setVocabulary(dictionary);
cout<<"extracting histograms in the form of BOW for each image "<<endl;
Mat labels(0, 1, CV_32FC1);
Mat trainingData(0, dictionarySize, CV_32FC1);
int k = 0;
vector<KeyPoint> keypoint1;
Mat bowDescriptor1;
//extracting histogram in the form of bow for each image
for(j = 1; j <= 4; j++)
for(i = 1; i <= 60; i++)
{
sprintf( ch,"%s%d%s%d%s","train/",j," (",i,").jpg");
const char* imageName = ch;
img2 = cvLoadImage(imageName, 0);
detector.detect(img2, keypoint1);
bowDE.compute(img2, keypoint1, bowDescriptor1);
trainingData.push_back(bowDescriptor1);
labels.push_back((float) j);
}
//Setting up SVM parameters
CvSVMParams params;
params.kernel_type = CvSVM::RBF;
params.svm_type = CvSVM::C_SVC;
params.gamma = 0.50625000000000009;
params.C = 312.50000000000000;
params.term_crit = cvTermCriteria(CV_TERMCRIT_ITER, 100, 0.000001);
CvSVM svm;
printf("%s\n", "Training SVM classifier");
bool res = svm.train(trainingData, labels, cv::Mat(), cv::Mat(), params);
cout<<"Processing evaluation data..."<<endl;
Mat groundTruth(0, 1, CV_32FC1);
Mat evalData(0, dictionarySize, CV_32FC1);
k = 0;
vector<KeyPoint> keypoint2;
Mat bowDescriptor2;
Mat results(0, 1, CV_32FC1);;
for(j = 1; j <= 4; j++)
for(i = 1; i <= 60; i++)
{
sprintf( ch, "%s%d%s%d%s", "eval/", j, " (",i,").jpg");
const char* imageName = ch;
img2 = cvLoadImage(imageName,0);
detector.detect(img2, keypoint2);
bowDE.compute(img2, keypoint2, bowDescriptor2);
evalData.push_back(bowDescriptor2);
groundTruth.push_back((float) j);
float response = svm.predict(bowDescriptor2);
results.push_back(response);
}
//calculate the number of unmatched classes
double errorRate = (double) countNonZero(groundTruth- results) / evalData.rows;
The question isThis code is not predicting from video , i want to know how to predict it from the video , mean like i want to detect the vehicle from movie , like it should show 1 when it find the vehicle from movie
For those who didn't understand the question :
I want to play a movie in above code
VideoCapture cap("movie.avi"); //movie.avi is with deleted background
Suppose i have a trained data which contain vehicle's , and "movie.avi" contain 5 vehicles , so it should detect that vehicles from the movie.avi and give me 5 as output
How to do this part in the above code

From looking at your code setup
params.svm_type = CvSVM::C_SVC;
it appears that you train your classifier with more than two classes. A typical example in traffic scenario could be cars/pedestrians/bikes/... However, you were asking for a way to detect cars only. Without a description of your training data and your video it's hard to tell, if your idea makes sense. I guess what the previous answers are assuming is the following:
You loop through each frame and want to output the number of cars in that frame. Thus, a frame may contain multiple cars, say 5. If you take the whole frame as input to the classifier, it might respond "car", even if the setup might be a little off, conceptually. You cannot retrieve the number of cars reliably with this approach.
Instead, the suggestion is to try a sliding-window approach. This means, for example, you loop over each pixel of the frame and take the region around the pixel (called sub-window or region-of-interest) as input to the classifier. Assuming a fixed scale, the sub-window could have a size of 150x50px as well as your training data would. You might fixate the scale of the cars in your training data, but in real-world videos, the cars will be of different size. In order to find a car of different scale, let's say it's two-times as large as in the training data, the typical approach is to scale the image (say with a factor of 2) and repeat the sliding-window approach.
By repeating this for all relevant scales you end up with an algorithm that gives you for each pixel location and each scale the result of your classifier. This means you have three loops, or, in other words, there are three dimensions (image width, image height, scale). This is best understood as a three-dimensional pyramid. "Why a pyramid?" you might ask. Because each time the image is scaled (say 2) the image gets smaller (/larger) and the next scale is an image of different size (for eample half the size).
The pixel locations indicate the position of the car and the scale indicates the size of it. Now, if you have an N-class classifier, each slot in this pyramid will contain a number (1,...,N) indicating the class. If you had a binary classifier (car/no car), then you would end up with each slot containing 0 or 1. Even in this simple case, where you would be tempted to simply count the number of 1 and output the count as the number of cars, you still have the problem that there could be multiple responses for the same car. Thus, it would be better if you had a car detector that gives continous responses between 0 and 1 and then you could find maxima in this pyramid. Each maximum would indicate a single car. This kind of detection is successfully used with corner features, where you detect corners of interest in a so-called scale-space pyramid.
To summarize, no matter if you are simplifying the problem to a binary classification problem ("car"/"no car"), or if you are sticking to the more difficult task of distinguishing between multiple classes ("car"/"animal"/"pedestrian"/...), you still have the problem of scale and location in each frame to solve.

The code you have for using images is written with OpenCV's C interface so it's probably easy to stick with that rather than use the C++ video interface.
In which case somthing along these lines should work:
CvCapture *capture = cvCaptureFromFile("movie.avi");
IplImage *img = 0;
while(img = cvQueryFrame(capture))
{
// Process image
...
}

You should implement a sliding window approach. In each window, you should apply the SVM to get candidates. Then, once you've done it for the whole image, you should merge the candidates (if you detected an object, then it is very likely that you'll detect it again in shift of a few pixels - that's the meaning of candidates).
Take a look at the V&J code at openCV or the latentSVM code (detection by parts) to see how it's done there.
By the way, I would use the LatentSVM code (detection by parts) to detect vehicles. It has trained models for cars and for buses.
Good luck.

You need detector, not classifier. Take a look at Haar cascades, LBP cascades, latentSVM, as mentioned before or HOG detector.
I'll explain why. Detector usually scan image by sliding window, line by line. In several scales. In every window detector solve problem: "object/not object". It may give you rough results but very fast. Classifiers such as BOW, works very slow for this task. Then you should apply classifiers to regions, found by detector.

Related

OCR & OpenCV: Difference between two frames on high resolution images

According to this post OCR: Difference between two frames, I now know how to find pixel differences between two images with OpenCV.
I would like to improve this solution and use it with high resolution images (from a video) with rich content. The example above is not applicable with big images because the process is to slow (too much differences found, the "findCountours method" fills the tab with 250k elements which takes a huge time to process).
My application uses a RLE decoder to decode the compressed frames of the video. Once the frame is decoded, I would like to compare the current frame with the previous one in order to store the differences between the two frames in a "Mat" tab for example.
The goal of all of this is to be able to perform an analysis on the different pixels and to check if there is any latin character. This allows me to reduce the amount of pixels to analyze and to save precious time.
If anyone has other ideas instead of this one to perform such operations, feel free to propose it please.
Thank you for your help.
EDIT 1:
Example of two high resolution images of a computer screen. These are for the moment the perfect example of what I'm trying to analyse. As we can see there is just a window as difference between the two big images and I would like to analyze just the new "Challenge" window for any character.
EDIT 2:
I'm trying to tune the algorithm depending on the data analyzed. Typically on the two following pictures I only get the green lines as differences and no text at all (which is what is the most interesting). I'm trying to understand better how things work for this.
1st image:
2nd image:
3rd image:
As you can see I only have those green lines and never the text (at the best I can have just ONE letter when decreasing the countours[i].size())
In addition to the post you mentioned, you need to:
When you binarize the mask, use a threshold higher then 0 to remove small differences.
Remove some noise. You can find all connected components, and remove smaller ones.
Find the area of the bigger connected components. You can use convexHull and fillConvexPoly to get the mask of the different objects on screen
Copy the second image to a new image, with the given mask.
The result will look like:
Code:
#include <opencv2/opencv.hpp>
#include <vector>
using namespace std;
using namespace cv;
int main()
{
Mat3b img1 = imread("path_to_image_1");
Mat3b img2 = imread("path_to_image_2");
Mat3b diff;
absdiff(img1, img2, diff);
// Split each channel
vector<Mat1b> masks;
split(diff, masks);
// Create a black mask
Mat1b mask(diff.rows, diff.cols, uchar(0));
// OR with each channel of the N channels mask
for (int i = 0; i < masks.size(); ++i)
{
mask |= masks[i];
}
// Binarize mask
mask = mask > 100;
// Results images
vector<Mat3b> difference_images;
// Remove small blobs
//Mat kernel = getStructuringElement(MORPH_RECT, Size(5,5));
//morphologyEx(mask, mask, MORPH_OPEN, kernel);
// Find connected components
vector<vector<Point>> contours;
findContours(mask.clone(), contours, CV_RETR_EXTERNAL, CHAIN_APPROX_NONE);
for (int i = 0; i < contours.size(); ++i)
{
if (contours[i].size() > 1000)
{
Mat1b mm(mask.rows, mask.cols, uchar(0));
vector<Point> hull;
convexHull(contours[i], hull);
fillConvexPoly(mm, hull, Scalar(255));
Mat3b difference_img(img2.rows, img2.cols, Vec3b(0,0,0));
img2.copyTo(difference_img, mm);
difference_images.push_back(difference_img.clone());
}
}
return 0;
}

HOG features for object detection using GPU in OpenCV

In my project I am calculating HOG features on GPU for different levels in the same image. My aim is to detect the following objects.
1. Truck
2. Car
3. Person
Most important question is the selection of window size in case of multi class object detector. This post provide a very good base but it does not provide an answer for the selection of window size in case of multi class feature.
To solve this problem I calculated the HOG features of each positive image at different levels/resolution keeping the window size(48*96) same but the file for each image is around 600 MB which is too large.
Please let me know how to select the window size, block size and cell size in case of multi class object detection. Here is my code that I used to calculate the HOG features.
void App::run()
{
unsigned int count = 1;
FileStorage fs;
running = true;
//int width;
//int height;
Size win_size(args.win_width, args.win_width * 2);
Size win_stride(args.win_stride_width, args.win_stride_height);
cv::gpu::HOGDescriptor gpu_hog(win_size, Size(16, 16), Size(8, 8), Size(8, 8), 9,
cv::gpu::HOGDescriptor::DEFAULT_WIN_SIGMA, 0.2, gamma_corr,
cv::gpu::HOGDescriptor::DEFAULT_NLEVELS);
VideoCapture vc("/home/ubuntu/Desktop/getdescriptor/images/image%d.jpg");
Mat frame;
Mat Left;
Mat img_aux, img, img_to_show, img_new;
cv::Mat temp;
gpu::GpuMat gpu_img, descriptors, new_img;
char cbuff[20];
while (running)
{
vc.read(frame);
if (!frame.empty())
{
workBegin();
width = frame.rows;
height = frame.cols;
sprintf (cbuff, "%04d", count);
// Change format of the image
if (make_gray) cvtColor(frame, img_aux, CV_BGR2GRAY);
else if (use_gpu) cvtColor(frame, img_aux, CV_BGR2BGRA);
else Left.copyTo(img_aux);
// Resize image
if (args.resize_src) resize(img_aux, img, Size(args.width, args.height));
else img = img_aux;
img_to_show = img;
gpu_hog.nlevels = nlevels;
hogWorkBegin();
if (use_gpu)
{
gpu_img.upload(img);
new_img.upload(img_new);
fs.open(cbuff, FileStorage::WRITE);
for(int levels = 0; levels < nlevels; levels++)
{
gpu_hog.getDescriptors(gpu_img, win_stride, descriptors, cv::gpu::HOGDescriptor::DESCR_FORMAT_ROW_BY_ROW);
descriptors.download(temp);
//printf("size %d %d\n", temp.rows, temp.cols);
fs <<"level" << levels;
fs << "features" << temp;
cout<<"("<<width<<","<<height<<")"<<endl;
width = round(width/scale);
height = round(height/scale);
if( width < win_size.width || height < win_size.height )
break;
cout<<"Levels "<<levels<<endl;
resize(img,img_new,Size(width,height));
scale *= scale;
}
cout<<count<< " Image feature calculated !"<<endl;
count++;
//width = 640; height = 480;
scale = 1.05;
}
hogWorkEnd();
fs.release();
}
else running = false;
}
}
The window size should be chosen, s.t. the object(s) you want to detect fit into the window. If you want to have different window sizes for different types this might become tricky.
Usually what you do is the following
Take training data for each type of objects, and train [number of object types] many models using the features extracted at the known position of the objects.
Then you take each test image and use a sliding window approach to extract features at each location. These features are then compared to each model. If one of the models lead to a score higher than a certain threshold you have found this object. If more than one model scores higher than the threshold simply take the one scoring highest.
If you want to use differently sized detection windows you will get feature vectors of different size (by nature of the HoG features). The tricky thing is, that in the testing phase you have to use as many sliding windows as object types you use. This would definitely work, but you have to process each testing image several times leading to higher processing time)
To answer your question of the sizes: There is no value I can give you, it always depends on your images. Using an image pyramid as you mentioned above is a good way to deal with differently scaled objects.
window size: the whole object should fit in; has to be divisible by block size
block size has to be divisible by cell size
Sample code for visualization of HoG features can be found here. This also helps understand how the feature vectors look like.
EDIT: Found out the hard way, that only cv::Size(8,8) is allowed for cell size. See documentation.

C++ OpenCV SVM Predict Not Working

Trying to create a functional SVM. I have 114 training images, 60 Positive/54 Negative, and 386 testing images for the SVM to predict against.
I read in the training image features to float like this:
trainingDataFloat[i][0] = trainFeatures.rows;
trainingDataFloat[i][1] = trainFeatures.cols;
And the same for the testing images too:
testDataFloat[i][0] = testFeatures.rows;
testDataFloat[i][2] = testFeatures.cols;
Then, using Micka's answer to this question, I turn the testDataFloat into a 1 Dimensional Array, and feed it to a Mat like this so to predict on the SVM:
float* testData1D = (float*)testDataFloat;
Mat testDataMat1D(height*width, 1, CV_32FC1, testData1D);
float testPredict = SVMmodel.predict(testDataMat1D);
Once this was all in place, there is the Debug Error of:
Sizes of input arguments do not match (the sample size is different from what has been used for training) in cvPreparePredictData
Looking at this post I found (Thanks to berak) that:
"all images (used in training & prediction) have to be the same size"
So I included a re-size function that would re-size the images to be all square at whatever size you wished (100x100, 200x200, 1000, 1000 etc.)
Run it again with the images re-sized to a new directory that the program now loads the images in from, and I get the exact same error as before of:
Sizes of input arguments do not match (the sample size is different from what has been used for training) in cvPreparePredictData
I just have no idea anymore on what to do. Why is it still throwing that error?
EDIT
I changed
Mat testDataMat1D(TestDFheight*TestDFwidth, 1, CV_32FC1, testData1D);
to
Mat testDataMat1D(1, TestDFheight*TestDFwidth, CV_32FC1, testData1D);
and placed the .predict inside the loop that the features are given to the float so that each image is given to the .predict individually because of this question. With the to int swapped so that .cols = 1 and .rows = TestDFheight*TestDFwidth the program seems to actually run, but then stops on image 160 (.exe has stopped working)... So that's a new concern.
EDIT 2
Added a simple
std::cout << testPredict;
To view the determined output of the SVM, and it seems to be positively matching everything until Image 160, where it stops running:
Please check your training and test feature vector.
I'm assuming your feature data is some form of cv::Mat containing features on each row.
In which case you want your training matrix to be a concatenation of each feature matrix from each image.
These line doesn't look right:
trainingDataFloat[i][0] = trainFeatures.rows;
trainingDataFloat[i][1] = trainFeatures.cols;
This is setting an element of a 2d matrix to the number of rows and columns in trainFeatures. This has nothing to do with the actual data that is in the trainFeatures matrix.
What are you trying to detect? If each image is a positive and negative example, then are you trying to detect something in an image? What are your features?
If you're trying to detect an object in the image on a per image basis, then you need a feature vector describing the whole image in one vector. In which case you'd do something like this with your training data:
int N; // Set to number of images you plan on using for training
int feature_size; // Set to the number of features extracted in each image. Should be constant across all images.
cv::Mat X = cv::Mat::zeros(N, feature_size, CV_32F); // Feature matrix
cv::Mat Y = cv::Mat::zeros(N, 1, CV_32F); // Label vector
// Now use a for loop to copy data into X and Y, Y = +1 for positive examples and -1 for negative examples
for(int i = 0; i < trainImages.size(); ++i)
{
X.row(i) = trainImages[i].features; // Where features is a cv::Mat row vector of size N of the extracted features
Y.row(i) = trainImages[i].isPositive ? 1:-1;
}
// Now train your cv::SVM on X and Y.

How do I update this Neural Net to use image pixel data

I'm learning Neural Networks from this bytefish machine learning guide and code. I understand it well but I would like to update the code at the previous link to use image pixel data instead of random values as the input data. In this section of the aforementioned code:
cv::randu(trainingData,0,1);
cv::randu(testData,0,1);
the training and test matrices are filled with random data. Then label data is added to the classes matrices here:
cv::Mat trainingClasses = labelData(trainingData, eq);
cv::Mat testClasses = labelData(testData, eq);
using this function:
// label data with equation
cv::Mat labelData(cv::Mat points, int equation) {
cv::Mat labels(points.rows, 1, CV_32FC1);
for(int i = 0; i < points.rows; i++) {
float x = points.at<float>(i,0);
float y = points.at<float>(i,1);
labels.at<float>(i, 0) = f(x, y, equation);
// the f() function used above
//is only a case statement with 5
//switches in it eg on of the switches is:
//case 0:
//return y > sin(x*10) ? -1 : 1;
//break;
}
return labels;
}
Then points are plotted in a window here:
plot_binary(trainingData, trainingClasses, "Training Data");
plot_binary(testData, testClasses, "Test Data");
with this function:
;; Plot Data and Class function
void plot_binary(cv::Mat& data, cv::Mat& classes, string name) {
cv::Mat plot(size, size, CV_8UC3);
plot.setTo(cv::Scalar(255.0,255.0,255.0));
for(int i = 0; i < data.rows; i++) {
float x = data.at<float>(i,0) * size;
float y = data.at<float>(i,1) * size;
if(classes.at<float>(i, 0) > 0) {
cv::circle(plot, Point(x,y), 2, CV_RGB(255,0,0),1);
} else {
cv::circle(plot, Point(x,y), 2, CV_RGB(0,255,0),1);
}
}
imshow(name, plot);
}
The plotted points, as I understand it, represent the input data multiplied by the equations in the f() function and is used by the predict functions to predict which point to plot in the mlp, knn, svm etc. functions. How do I update what is going on here to do something with Image pixel data. Any advice to get me farther would be appreciated.
"How do I update what is going on here to do something with Image pixel data" is a broad and generic question. May I ask in exchange: what do you want to do with "Image pixel data"?
Do you want an answer to: what can be done with "Image pixel data" on machine learning algorithms like ANN, SVM etc. ?
The answer is a loooong list of things encompassing thousands of research papers and hundreds of PhD theses. Some examples include: supervised and/or un-supervised classification of images into labels/tags/categories based on features like image content, objects in image, patterns in image etc. The possibilities are endless. You may perhaps want to take a look at this: http://stuff.mit.edu/afs/athena/course/urop/profit/PDFS/EdwardTolson.pdf
Now, coming back to you original objective: "I would like to update the code at the previous link to use image pixel data instead of random values as the input data"...
The implementation technique would depend largely on what you want to do. I can cite one/two easy techniques for extracting feature vectors from image, which can be fed into any machine learning algorithm of your choice...
Example 1:
You may start with using pixel intensity data as a feature vector. Here's how you may go ahead with it:
Load image using
Mat image = imread(argv[1], CV_LOAD_IMAGE_GRAYSCALE);
Resize image into a smaller area using resize. You may want to begin with small image sizes, like 8x8 or 10x10 pixels.
Loop through the image matrix, somewhat like this:
for(int row = 0; row < img.rows; ++row)
{
uchar* p = img.ptr(row);
for(int col = 0; col < img.cols; ++col)
{
*p++ //points to each pixel value in turn assuming a CV_8UC1 greyscale image
}
}
A collection of all the pixel values will give you a feature vector for that image.
Now suppose you have two classes of image. For each set of feature vector you generate, you'll have to prepare (for supervised classification) a corresponding label Mat (somewhat like the example you've mentioned). It needs to contain the class label (say, 0 and 1) for all the feature vectors present in your feature Mat.
Now feed the feature vectors and label Mat to your machine learning code and see what happens.
However, the ability of image classification based on image pixel data alone is quite limited. There are thousands of techniques for extracting image features, most of which are dependent on the application area.
Example 2:
I'll finish off with one more example for extracting feature vectors, which, in some cases, will prove to be more effective than simple image pixel values.
You may use the Histogram of Oriented Gradients descriptor for slightly better results, use this:
cv::HOGDescriptor hog;
vector<float> descriptors;
hog.compute(mat, descriptors);
The vector descriptors is your feature vector.
HOGDescriptors, when used with SVM, provides a decent classification mechanism.
You can put the pixel data of an image into a Mat called trainingData using something similar to this:
cv::Mat labelData(cv::Mat points, int equation)
{
cv::Mat labels(points.rows, 1, CV_32FC1);
for(int i = 0; i < points.rows; i++)
{
float x = points.at<float>(i,0);
float y = points.at<float>(i,1);
labels.at<float>(i, 0) = f(x, y, equation);
}
return labels;
}
Now, instead of labelData, we're going to return a Mat of pixel data. One obvious way is to use the image itself as a feature vector. However, some machine learning algorithms in openCV, including ANN, SVM etc., required special formatting of input data.
You may try something like this:
cv::Mat trainingData(cv::Mat image)
{
cv::Mat trainingVector(image.rows*image.cols, 1, CV_32FC1);
for(int i = 0; i < image.rows; i++)
{
for(int j = 0; j < image.cols; j++)
{
float valueOfPixel = image.at<float>(i,j);
trainingVector.at<float>((i*image.cols)+j, 0) = valueOfPixel;
}
}
return trainingVector;
}
(Please recheck the syntax of the code before using, I just typed it out here)
So, what the above block effectively does is change the 2D matrix of the image into a 1D array. Now, how and where you use it depends on your requirements.
Please make necessary modifications before invoking the machine learning modules.
Hope this answers your question.
Thanks.

Updating OpenCV CvNormalBayesClassifier

I'm trying to use CvNormalBayesClassifier to train my program to learn skin pixel colors. I have a set of training images and response images. The response images are in black and white, skin regions are marked white. The following is my code,
CvNormalBayesClassifier classifier;
for (int i = 0; i < numFiles; i++) {
string trainFile = "images/" + int2str(i) + ".jpg";
string responseFile = "images/" + int2str(i) + "_mask.jpg";
Mat trainData = imread(trainFile, 1);
Mat responseData = imread(responseFile, CV_LOAD_IMAGE_GRAYSCALE);
trainData = trainData.reshape(1, trainData.rows * trainData.cols);
responseData = responseData.reshape(0, responseData.rows * responseData.cols);
trainData.convertTo(trainData, CV_32FC1);
responseData.convertTo(responseData, CV_32FC1);
classifier.train(trainData, responseData, Mat(), Mat(), i != 0);
}
However, it is giving the following error,
The function/feature is not implemented (In the current implementation the new training data must have absolutely the same set of class labels as used in the original training data) in CvNormalBayesClassifier::train
Many thanks.
As the error message states, you cannot 'update' the classifier in light of new class labels. The Normal Bayes Classifier learns a Mixture of Gaussians to represent the training data. If you suddenly start adding new labels this mixture model will cease to be correct and a new model must be learned from scratch.
Ok, I found that the problem was that the black and white images have been compressed and thus contain values ranging from 0-255. Therefore, there can be a new class label in the other images.
To solve this problem, use thresholding to make the value all become 0 or 255.