OpenCV get faces from image and predict with model - c++

The piece of code which retrieves faces from grayscale image (already converted to cv::Mat) works oddly,
what I'm doing wrong?
// in initializer list
model(cv::face::FisherFaceRecognizer::create())
// ....
const cv::Mat grayscale = cv::imread("photo_15.jpeg",cv::IMREAD_GRAYSCALE);
std::vector<cv::Rect> faceCandidates;
m_cascade.detectMultiScale(grayscale, faceCandidates);
uint32 label = -1;
double confidence = 0.0;
// this line for the testing purposes only
model->predict(grayscale, label, confidence);
this works fine : label refers to correct person and confidence within 10.
but lets continue with this function code:
for (auto &faceCandidateRegion : faceCandidates) {
cv::Mat faceResized;
// size_ is a member and contains 1280x720 for my case, equal to model trained photos.
cv::resize( cv::Mat(grayscale, faceCandidateRegion), faceResized, cv::Size(size_.width(), size_.height()));
// Recognize current face.
m_model->predict(faceResized, label, confidence);
// ... other processing
this piece of code works absolutely wrong: it always produces incorrect label and confidence is about ~45-46K even if I use a recognition photo from training photo set
any idea what I'm doing wrong here?
for the testing : I've tried to perform this with fisher, eigen and lbph with the same wrong result
update: each model in the app is a few user's group, where each user presented by 2-6 photos , so this is a reason why I train a few users in the model
here is a code which trains the models:
std::size_t
Recognizer::extractFacesAndConvertGrayscale(const QByteArray &rgb888, std::vector<cv::Mat> &faces)
{
cv::Mat frame = cv::imdecode(std::vector<char>{rgb888.cbegin(), rgb888.cend()}, cv::IMREAD_GRAYSCALE);
std::vector<cv::Rect> faceCandidates;
m_cascade.detectMultiScale(frame, faceCandidates);
int label = 0;
for(const auto &face : faceCandidates) {
cv::Mat faceResized;
cv::resize(cv::Mat{frame, face}, faceResized,
cv::Size(this->m_size.width(), this->m_size.height()));
faces.push_back(faceResized);
}
return faceCandidates.size();
}
bool Recognizer::train(const std::vector<qint32> &labels, const std::vector<QByteArray> &rgb888s)
{
if (labels.empty() || rgb888s.empty() || labels.size() != rgb888s.size())
return false;
std::vector<cv::Mat> mats = {};
std::vector<int32_t> processedLabels = {};
std::size_t i = 0;
for(const QByteArray &data : rgb888s)
{
std::size_t count = this->extractFacesAndConvertGrayscale(data, mats);
if (count)
std::fill_n(std::back_inserter(processedLabels), count, labels[i++]);
}
m_model->train(mats, processedLabels);
return true;
}

We resolved this in the comments, but for future reference:
The fact that this line
// this line for the testing purposes only
model->predict(grayscale, label, confidence);
had better confidence than
// Recognize current face.
m_model->predict(faceResized, label, confidence);
occurred because the model was trained with non-cropped images, while the detector crops the faces.
Rather than using the whole image with prediction, to match the input, the model should be trained with cropped faces:
The classifier performs independently of the size of the faces in the original image, due to the multiscale detection; i.e. size and position of the faces in the image become invariants.
Background does not interfere with classification. The original input had 16:9 aspect ratio, so at least the sides of the image would produce noise in the descriptors.

Related

Opencv - How to get number of vertical lines present in image (count of lines)

Firstly I integrate OpenCV framework to XCode and All the OpenCV code is on ObjectiveC and I am using in Swift Using bridging header. I am new to OpenCV Framework and trying to achieve count of vertical lines from the image.
Here is my code:
First I am converting the image to GrayScale
+ (UIImage *)convertToGrayscale:(UIImage *)image {
cv::Mat mat;
UIImageToMat(image, mat);
cv::Mat gray;
cv::cvtColor(mat, gray, CV_RGB2GRAY);
UIImage *grayscale = MatToUIImage(gray);
return grayscale;
}
Then, I am detecting edges so I can find the line of gray color
+ (UIImage *)detectEdgesInRGBImage:(UIImage *)image {
cv::Mat mat;
UIImageToMat(image, mat);
//Prepare the image for findContours
cv::threshold(mat, mat, 128, 255, CV_THRESH_BINARY);
//Find the contours. Use the contourOutput Mat so the original image doesn't get overwritten
std::vector<std::vector<cv::Point> > contours;
cv::Mat contourOutput = mat.clone();
cv::findContours( contourOutput, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE );
NSLog(#"Count =>%lu", contours.size());
//For Blue
/*cv::GaussianBlur(mat, gray, cv::Size(11, 11), 0); */
UIImage *grayscale = MatToUIImage(mat);
return grayscale;
}
This both Function is written on Objective C
Here, I am calling both function Swift
override func viewDidLoad() {
super.viewDidLoad()
let img = UIImage(named: "imagenamed")
let img1 = Wrapper.convert(toGrayscale: img)
self.capturedImageView.image = Wrapper.detectEdges(inRGBImage: img1)
}
I was doing this for some days and finding some useful documents(Reference Link)
OpenCV - how to count objects in photo?
How to count number of lines (Hough Trasnform) in OpenCV
OPENCV Documents
https://docs.opencv.org/2.4/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html?#findcontours
Basically, I understand the first we need to convert this image to black and white, and then using cvtColor, threshold and findContours we can find the colors or lines.
I am attaching the image that vertical Lines I want to get.
Original Image
Output Image that I am getting
I got number of lines count =>10
I am not able to get accurate count here.
Please guide me on this. Thank You!
Since you want to detect the number of the vertical lines, there is a very simple approach I can suggest for you. You already got a clear output and I used this output in my code. Here are the steps before the code:
Preprocess the input image to get the lines clearly
Check each row and check until get a pixel whose value is higher than 100(threshold value I chose)
Then increase the line counter for that row
Continue on that line until get a pixel whose value is lower than 100
Restart from step 3 and finish the image for each row
At the end, check the most repeated element in the array which you assigned line numbers for each row. This number will be the number of vertical lines.
Note: If the steps are difficult to understand, think like this way:
" I am checking the first row, I found a pixel which is higher than
100, now this is a line edge starting, increase the counter for this
row. Search on this row until get a pixel smaller than 100, and then
research a pixel bigger than 100. when row is finished, assign the
line number for this row to a big array. Do this for all image. At the
end, since some lines looks like two lines at the top and also some
noises can occur, you should take the most repeated element in the big
array as the number of lines."
Here is the code part in C++:
#include <vector>
#include <iostream>
#include <opencv2/opencv.hpp>
#include <opencv2/highgui/highgui.hpp>
int main()
{
cv::Mat img = cv::imread("/ur/img/dir/img.jpg",cv::IMREAD_GRAYSCALE);
std::vector<int> numberOfVerticalLinesForEachRow;
cv::Rect r(0,0,img.cols-10,200);
img = img(r);
bool blackCheck = 1;
for(int i=0; i<img.rows; i++)
{
int numberOfLines = 0;
for(int j=0; j<img.cols; j++)
{
if((int)img.at<uchar>(cv::Point(j,i))>100 && blackCheck)
{
numberOfLines++;
blackCheck = 0;
}
if((int)img.at<uchar>(cv::Point(j,i))<100)
blackCheck = 1;
}
numberOfVerticalLinesForEachRow.push_back(numberOfLines);
}
// In this part you need a simple algorithm to check the most repeated element
for(int k:numberOfVerticalLinesForEachRow)
std::cout<<k<<std::endl;
cv::namedWindow("WinWin",0);
cv::imshow("WinWin",img);
cv::waitKey(0);
}
Here's another possible approach. It relies mainly on the cv::thinning function from the extended image processing module to reduce the lines at a width of 1 pixel. We can crop a ROI from this image and count the number of transitions from 255 (white) to 0 (black). These are the steps:
Threshold the image using Otsu's method
Apply some morphology to clean up the binary image
Get the skeleton of the image
Crop a ROI from the center of the image
Count the number of jumps from 255 to 0
This is the code, be sure to include the extended image processing module (ximgproc) and also link it before compiling it:
#include <iostream>
#include <opencv2/opencv.hpp>
#include <opencv2/ximgproc.hpp> // The extended image processing module
// Read Image:
std::string imagePath = "D://opencvImages//";
cv::Mat inputImage = cv::imread( imagePath+"IN2Xh.png" );
// Convert BGR to Grayscale:
cv::cvtColor( inputImage, inputImage, cv::COLOR_BGR2GRAY );
// Get binary image via Otsu:
cv::threshold( inputImage, inputImage, 0, 255, cv::THRESH_OTSU );
The above snippet produces the following image:
Note that there's a little bit of noise due to the thresholding, let's try to remove those isolated blobs of white pixels by applying some morphology. Maybe an opening, which is an erosion followed by dilation. The structuring elements and iterations, though, are not the same, and these where found by experimentation. I wanted to remove the majority of the isolated blobs without modifying too much the original image:
// Apply Morphology. Erosion + Dilation:
// Set rectangular structuring element of size 3 x 3:
cv::Mat SE = cv::getStructuringElement( cv::MORPH_RECT, cv::Size(3, 3) );
// Set the iterations:
int morphoIterations = 1;
cv::morphologyEx( inputImage, inputImage, cv::MORPH_ERODE, SE, cv::Point(-1,-1), morphoIterations);
// Set rectangular structuring element of size 5 x 5:
SE = cv::getStructuringElement( cv::MORPH_RECT, cv::Size(5, 5) );
// Set the iterations:
morphoIterations = 2;
cv::morphologyEx( inputImage, inputImage, cv::MORPH_DILATE, SE, cv::Point(-1,-1), morphoIterations);
This combination of structuring elements and iterations yield the following filtered image:
Its looking alright. Now comes the main idea of the algorithm. If we compute the skeleton of this image, we would "normalize" all the lines to a width of 1 pixel, which is very handy, because we could reduce the image to a 1 x 1 (row) matrix and count the number of jumps. Since the lines are "normalized" we could get rid of possible overlaps between lines. Now, skeletonized images sometimes produce artifacts near the borders of the image. These artifacts resemble thickened anchors at the first and last row of the image. To prevent these artifacts we can extend borders prior to computing the skeleton:
// Extend borders to avoid skeleton artifacts, extend 5 pixels in all directions:
cv::copyMakeBorder( inputImage, inputImage, 5, 5, 5, 5, cv::BORDER_CONSTANT, 0 );
// Get the skeleton:
cv::Mat imageSkelton;
cv::ximgproc::thinning( inputImage, imageSkelton );
This is the skeleton obtained:
Nice. Before we count jumps, though, we must observe that the lines are skewed. If we reduce this image directly to a one row, some overlapping could indeed happen between to lines that are too skewed. To prevent this, I crop a middle section of the skeleton image and count transitions there. Let's crop the image:
// Crop middle ROI:
cv::Rect linesRoi;
linesRoi.x = 0;
linesRoi.y = 0.5 * imageSkelton.rows;
linesRoi.width = imageSkelton.cols;
linesRoi.height = 1;
cv::Mat imageROI = imageSkelton( linesRoi );
This would be the new ROI, which is just the middle row of the skeleton image:
Let me prepare a BGR copy of this just to draw some results:
// BGR version of the Grayscale ROI:
cv::Mat colorROI;
cv::cvtColor( imageROI, colorROI, cv::COLOR_GRAY2BGR );
Ok, let's loop through the image and count the transitions between 255 and 0. That happens when we look at the value of the current pixel and compare it with the value obtained an iteration earlier. The current pixel must be 0 and the past pixel 255. There's more than a way to loop through a cv::Mat in C++. I prefer to use cv::MatIterator_s and pointer arithmetic:
// Set the loop variables:
cv::MatIterator_<cv::Vec3b> it, end;
uchar pastPixel = 0;
int jumpsCounter = 0;
int i = 0;
// Loop thru image ROI and count 255-0 jumps:
for (it = imageROI.begin<cv::Vec3b>(), end = imageROI.end<cv::Vec3b>(); it != end; ++it) {
// Get current pixel
uchar &currentPixel = (*it)[0];
// Compare it with past pixel:
if ( (currentPixel == 0) && (pastPixel == 255) ){
// We have a jump:
jumpsCounter++;
// Draw the point on the BGR version of the image:
cv::line( colorROI, cv::Point(i, 0), cv::Point(i, 0), cv::Scalar(0, 0, 255), 1 );
}
// current pixel is now past pixel:
pastPixel = currentPixel;
i++;
}
// Show image and print number of jumps found:
cv::namedWindow( "Jumps Found", CV_WINDOW_NORMAL );
cv::imshow( "Jumps Found", colorROI );
cv::waitKey( 0 );
std::cout<<"Jumps Found: "<<jumpsCounter<<std::endl;
The points where the jumps were found are drawn in red, and the number of total jumps printed is:
Jumps Found: 9

How to form data for SVM training OpenCV3

I am trying to write utility for training svm classifier for image classification in OpenCV3. But I have Floating point exception (core dumped) error during training process.
My main problem is that I don't know, I'm not sure exactly how to form training data to feed svm.train method.
This is code which is forming training data.
TrainingDataType SVMTrainer::prepareDataForTraining() {
cv::Mat trainingData(m_numOfAllImages, 28*28, CV_32FC1);
cv::Mat trainingLabels(m_numOfAllImages, 1, CV_32FC1);
int rowNum = 0;
// Item is pair of classId (int) and vector of images.
for(auto item : m_data){
int classId = item.first;
for(auto item1 : item.second){
Mat temp = item1.reshape(1,1);
temp.copyTo(trainingData.row(rowNum));
trainingLabels.at<float>(rowNum) = item.first;
++rowNum;
}
}
return cv::ml::TrainData::create(trainingData,
cv::ml::SampleTypes::ROW_SAMPLE,
trainingLabels) ;
}
void SVMTrainer::train(std::string& configPath){
// Read and store images in memory.
formClassifierData(configPath);
m_classifier = cv::ml::SVM::create();
// Training parameters:
m_classifier->setType(cv::ml::SVM::C_SVC);
m_classifier->setKernel(cv::ml::SVM::POLY);
m_classifier->setGamma(3);
m_classifier->setDegree(3);
TrainingDataType trainData = prepareDataForTraining();
m_classifier->trainAuto(trainData);
}
All images are already prepared with dimensions 28*28, black&white.
And actual train call is in this method
Can somebody tell me what I am doing wrong.
Thanks,
Its simple. Change the label format to CV_32SC1. It will definitely resolve your issue in opencv 3.0 ml.

How do I update this Neural Net to use image pixel data

I'm learning Neural Networks from this bytefish machine learning guide and code. I understand it well but I would like to update the code at the previous link to use image pixel data instead of random values as the input data. In this section of the aforementioned code:
cv::randu(trainingData,0,1);
cv::randu(testData,0,1);
the training and test matrices are filled with random data. Then label data is added to the classes matrices here:
cv::Mat trainingClasses = labelData(trainingData, eq);
cv::Mat testClasses = labelData(testData, eq);
using this function:
// label data with equation
cv::Mat labelData(cv::Mat points, int equation) {
cv::Mat labels(points.rows, 1, CV_32FC1);
for(int i = 0; i < points.rows; i++) {
float x = points.at<float>(i,0);
float y = points.at<float>(i,1);
labels.at<float>(i, 0) = f(x, y, equation);
// the f() function used above
//is only a case statement with 5
//switches in it eg on of the switches is:
//case 0:
//return y > sin(x*10) ? -1 : 1;
//break;
}
return labels;
}
Then points are plotted in a window here:
plot_binary(trainingData, trainingClasses, "Training Data");
plot_binary(testData, testClasses, "Test Data");
with this function:
;; Plot Data and Class function
void plot_binary(cv::Mat& data, cv::Mat& classes, string name) {
cv::Mat plot(size, size, CV_8UC3);
plot.setTo(cv::Scalar(255.0,255.0,255.0));
for(int i = 0; i < data.rows; i++) {
float x = data.at<float>(i,0) * size;
float y = data.at<float>(i,1) * size;
if(classes.at<float>(i, 0) > 0) {
cv::circle(plot, Point(x,y), 2, CV_RGB(255,0,0),1);
} else {
cv::circle(plot, Point(x,y), 2, CV_RGB(0,255,0),1);
}
}
imshow(name, plot);
}
The plotted points, as I understand it, represent the input data multiplied by the equations in the f() function and is used by the predict functions to predict which point to plot in the mlp, knn, svm etc. functions. How do I update what is going on here to do something with Image pixel data. Any advice to get me farther would be appreciated.
"How do I update what is going on here to do something with Image pixel data" is a broad and generic question. May I ask in exchange: what do you want to do with "Image pixel data"?
Do you want an answer to: what can be done with "Image pixel data" on machine learning algorithms like ANN, SVM etc. ?
The answer is a loooong list of things encompassing thousands of research papers and hundreds of PhD theses. Some examples include: supervised and/or un-supervised classification of images into labels/tags/categories based on features like image content, objects in image, patterns in image etc. The possibilities are endless. You may perhaps want to take a look at this: http://stuff.mit.edu/afs/athena/course/urop/profit/PDFS/EdwardTolson.pdf
Now, coming back to you original objective: "I would like to update the code at the previous link to use image pixel data instead of random values as the input data"...
The implementation technique would depend largely on what you want to do. I can cite one/two easy techniques for extracting feature vectors from image, which can be fed into any machine learning algorithm of your choice...
Example 1:
You may start with using pixel intensity data as a feature vector. Here's how you may go ahead with it:
Load image using
Mat image = imread(argv[1], CV_LOAD_IMAGE_GRAYSCALE);
Resize image into a smaller area using resize. You may want to begin with small image sizes, like 8x8 or 10x10 pixels.
Loop through the image matrix, somewhat like this:
for(int row = 0; row < img.rows; ++row)
{
uchar* p = img.ptr(row);
for(int col = 0; col < img.cols; ++col)
{
*p++ //points to each pixel value in turn assuming a CV_8UC1 greyscale image
}
}
A collection of all the pixel values will give you a feature vector for that image.
Now suppose you have two classes of image. For each set of feature vector you generate, you'll have to prepare (for supervised classification) a corresponding label Mat (somewhat like the example you've mentioned). It needs to contain the class label (say, 0 and 1) for all the feature vectors present in your feature Mat.
Now feed the feature vectors and label Mat to your machine learning code and see what happens.
However, the ability of image classification based on image pixel data alone is quite limited. There are thousands of techniques for extracting image features, most of which are dependent on the application area.
Example 2:
I'll finish off with one more example for extracting feature vectors, which, in some cases, will prove to be more effective than simple image pixel values.
You may use the Histogram of Oriented Gradients descriptor for slightly better results, use this:
cv::HOGDescriptor hog;
vector<float> descriptors;
hog.compute(mat, descriptors);
The vector descriptors is your feature vector.
HOGDescriptors, when used with SVM, provides a decent classification mechanism.
You can put the pixel data of an image into a Mat called trainingData using something similar to this:
cv::Mat labelData(cv::Mat points, int equation)
{
cv::Mat labels(points.rows, 1, CV_32FC1);
for(int i = 0; i < points.rows; i++)
{
float x = points.at<float>(i,0);
float y = points.at<float>(i,1);
labels.at<float>(i, 0) = f(x, y, equation);
}
return labels;
}
Now, instead of labelData, we're going to return a Mat of pixel data. One obvious way is to use the image itself as a feature vector. However, some machine learning algorithms in openCV, including ANN, SVM etc., required special formatting of input data.
You may try something like this:
cv::Mat trainingData(cv::Mat image)
{
cv::Mat trainingVector(image.rows*image.cols, 1, CV_32FC1);
for(int i = 0; i < image.rows; i++)
{
for(int j = 0; j < image.cols; j++)
{
float valueOfPixel = image.at<float>(i,j);
trainingVector.at<float>((i*image.cols)+j, 0) = valueOfPixel;
}
}
return trainingVector;
}
(Please recheck the syntax of the code before using, I just typed it out here)
So, what the above block effectively does is change the 2D matrix of the image into a 1D array. Now, how and where you use it depends on your requirements.
Please make necessary modifications before invoking the machine learning modules.
Hope this answers your question.
Thanks.

How to detect object from video using SVM

This is my code for training the dataset of for example vehicles , when it train fully , i want it to predict the data(vehicle) from video(.avi) , how to predict trained data from video and how to add that part in it ? , i want that when the vehicle is shown in the video it count it as 1 and cout that the object is detected and if second vehicle come it increment the count as 2
IplImage *img2;
cout<<"Vector quantization..."<<endl;
collectclasscentroids();
vector<Mat> descriptors = bowTrainer.getDescriptors();
int count=0;
for(vector<Mat>::iterator iter=descriptors.begin();iter!=descriptors.end();iter++)
{
count += iter->rows;
}
cout<<"Clustering "<<count<<" features"<<endl;
//choosing cluster's centroids as dictionary's words
Mat dictionary = bowTrainer.cluster();
bowDE.setVocabulary(dictionary);
cout<<"extracting histograms in the form of BOW for each image "<<endl;
Mat labels(0, 1, CV_32FC1);
Mat trainingData(0, dictionarySize, CV_32FC1);
int k = 0;
vector<KeyPoint> keypoint1;
Mat bowDescriptor1;
//extracting histogram in the form of bow for each image
for(j = 1; j <= 4; j++)
for(i = 1; i <= 60; i++)
{
sprintf( ch,"%s%d%s%d%s","train/",j," (",i,").jpg");
const char* imageName = ch;
img2 = cvLoadImage(imageName, 0);
detector.detect(img2, keypoint1);
bowDE.compute(img2, keypoint1, bowDescriptor1);
trainingData.push_back(bowDescriptor1);
labels.push_back((float) j);
}
//Setting up SVM parameters
CvSVMParams params;
params.kernel_type = CvSVM::RBF;
params.svm_type = CvSVM::C_SVC;
params.gamma = 0.50625000000000009;
params.C = 312.50000000000000;
params.term_crit = cvTermCriteria(CV_TERMCRIT_ITER, 100, 0.000001);
CvSVM svm;
printf("%s\n", "Training SVM classifier");
bool res = svm.train(trainingData, labels, cv::Mat(), cv::Mat(), params);
cout<<"Processing evaluation data..."<<endl;
Mat groundTruth(0, 1, CV_32FC1);
Mat evalData(0, dictionarySize, CV_32FC1);
k = 0;
vector<KeyPoint> keypoint2;
Mat bowDescriptor2;
Mat results(0, 1, CV_32FC1);;
for(j = 1; j <= 4; j++)
for(i = 1; i <= 60; i++)
{
sprintf( ch, "%s%d%s%d%s", "eval/", j, " (",i,").jpg");
const char* imageName = ch;
img2 = cvLoadImage(imageName,0);
detector.detect(img2, keypoint2);
bowDE.compute(img2, keypoint2, bowDescriptor2);
evalData.push_back(bowDescriptor2);
groundTruth.push_back((float) j);
float response = svm.predict(bowDescriptor2);
results.push_back(response);
}
//calculate the number of unmatched classes
double errorRate = (double) countNonZero(groundTruth- results) / evalData.rows;
The question isThis code is not predicting from video , i want to know how to predict it from the video , mean like i want to detect the vehicle from movie , like it should show 1 when it find the vehicle from movie
For those who didn't understand the question :
I want to play a movie in above code
VideoCapture cap("movie.avi"); //movie.avi is with deleted background
Suppose i have a trained data which contain vehicle's , and "movie.avi" contain 5 vehicles , so it should detect that vehicles from the movie.avi and give me 5 as output
How to do this part in the above code
From looking at your code setup
params.svm_type = CvSVM::C_SVC;
it appears that you train your classifier with more than two classes. A typical example in traffic scenario could be cars/pedestrians/bikes/... However, you were asking for a way to detect cars only. Without a description of your training data and your video it's hard to tell, if your idea makes sense. I guess what the previous answers are assuming is the following:
You loop through each frame and want to output the number of cars in that frame. Thus, a frame may contain multiple cars, say 5. If you take the whole frame as input to the classifier, it might respond "car", even if the setup might be a little off, conceptually. You cannot retrieve the number of cars reliably with this approach.
Instead, the suggestion is to try a sliding-window approach. This means, for example, you loop over each pixel of the frame and take the region around the pixel (called sub-window or region-of-interest) as input to the classifier. Assuming a fixed scale, the sub-window could have a size of 150x50px as well as your training data would. You might fixate the scale of the cars in your training data, but in real-world videos, the cars will be of different size. In order to find a car of different scale, let's say it's two-times as large as in the training data, the typical approach is to scale the image (say with a factor of 2) and repeat the sliding-window approach.
By repeating this for all relevant scales you end up with an algorithm that gives you for each pixel location and each scale the result of your classifier. This means you have three loops, or, in other words, there are three dimensions (image width, image height, scale). This is best understood as a three-dimensional pyramid. "Why a pyramid?" you might ask. Because each time the image is scaled (say 2) the image gets smaller (/larger) and the next scale is an image of different size (for eample half the size).
The pixel locations indicate the position of the car and the scale indicates the size of it. Now, if you have an N-class classifier, each slot in this pyramid will contain a number (1,...,N) indicating the class. If you had a binary classifier (car/no car), then you would end up with each slot containing 0 or 1. Even in this simple case, where you would be tempted to simply count the number of 1 and output the count as the number of cars, you still have the problem that there could be multiple responses for the same car. Thus, it would be better if you had a car detector that gives continous responses between 0 and 1 and then you could find maxima in this pyramid. Each maximum would indicate a single car. This kind of detection is successfully used with corner features, where you detect corners of interest in a so-called scale-space pyramid.
To summarize, no matter if you are simplifying the problem to a binary classification problem ("car"/"no car"), or if you are sticking to the more difficult task of distinguishing between multiple classes ("car"/"animal"/"pedestrian"/...), you still have the problem of scale and location in each frame to solve.
The code you have for using images is written with OpenCV's C interface so it's probably easy to stick with that rather than use the C++ video interface.
In which case somthing along these lines should work:
CvCapture *capture = cvCaptureFromFile("movie.avi");
IplImage *img = 0;
while(img = cvQueryFrame(capture))
{
// Process image
...
}
You should implement a sliding window approach. In each window, you should apply the SVM to get candidates. Then, once you've done it for the whole image, you should merge the candidates (if you detected an object, then it is very likely that you'll detect it again in shift of a few pixels - that's the meaning of candidates).
Take a look at the V&J code at openCV or the latentSVM code (detection by parts) to see how it's done there.
By the way, I would use the LatentSVM code (detection by parts) to detect vehicles. It has trained models for cars and for buses.
Good luck.
You need detector, not classifier. Take a look at Haar cascades, LBP cascades, latentSVM, as mentioned before or HOG detector.
I'll explain why. Detector usually scan image by sliding window, line by line. In several scales. In every window detector solve problem: "object/not object". It may give you rough results but very fast. Classifiers such as BOW, works very slow for this task. Then you should apply classifiers to regions, found by detector.

Updating OpenCV CvNormalBayesClassifier

I'm trying to use CvNormalBayesClassifier to train my program to learn skin pixel colors. I have a set of training images and response images. The response images are in black and white, skin regions are marked white. The following is my code,
CvNormalBayesClassifier classifier;
for (int i = 0; i < numFiles; i++) {
string trainFile = "images/" + int2str(i) + ".jpg";
string responseFile = "images/" + int2str(i) + "_mask.jpg";
Mat trainData = imread(trainFile, 1);
Mat responseData = imread(responseFile, CV_LOAD_IMAGE_GRAYSCALE);
trainData = trainData.reshape(1, trainData.rows * trainData.cols);
responseData = responseData.reshape(0, responseData.rows * responseData.cols);
trainData.convertTo(trainData, CV_32FC1);
responseData.convertTo(responseData, CV_32FC1);
classifier.train(trainData, responseData, Mat(), Mat(), i != 0);
}
However, it is giving the following error,
The function/feature is not implemented (In the current implementation the new training data must have absolutely the same set of class labels as used in the original training data) in CvNormalBayesClassifier::train
Many thanks.
As the error message states, you cannot 'update' the classifier in light of new class labels. The Normal Bayes Classifier learns a Mixture of Gaussians to represent the training data. If you suddenly start adding new labels this mixture model will cease to be correct and a new model must be learned from scratch.
Ok, I found that the problem was that the black and white images have been compressed and thus contain values ranging from 0-255. Therefore, there can be a new class label in the other images.
To solve this problem, use thresholding to make the value all become 0 or 255.