I'm trying to use CvNormalBayesClassifier to train my program to learn skin pixel colors. I have a set of training images and response images. The response images are in black and white, skin regions are marked white. The following is my code,
CvNormalBayesClassifier classifier;
for (int i = 0; i < numFiles; i++) {
string trainFile = "images/" + int2str(i) + ".jpg";
string responseFile = "images/" + int2str(i) + "_mask.jpg";
Mat trainData = imread(trainFile, 1);
Mat responseData = imread(responseFile, CV_LOAD_IMAGE_GRAYSCALE);
trainData = trainData.reshape(1, trainData.rows * trainData.cols);
responseData = responseData.reshape(0, responseData.rows * responseData.cols);
trainData.convertTo(trainData, CV_32FC1);
responseData.convertTo(responseData, CV_32FC1);
classifier.train(trainData, responseData, Mat(), Mat(), i != 0);
}
However, it is giving the following error,
The function/feature is not implemented (In the current implementation the new training data must have absolutely the same set of class labels as used in the original training data) in CvNormalBayesClassifier::train
Many thanks.
As the error message states, you cannot 'update' the classifier in light of new class labels. The Normal Bayes Classifier learns a Mixture of Gaussians to represent the training data. If you suddenly start adding new labels this mixture model will cease to be correct and a new model must be learned from scratch.
Ok, I found that the problem was that the black and white images have been compressed and thus contain values ranging from 0-255. Therefore, there can be a new class label in the other images.
To solve this problem, use thresholding to make the value all become 0 or 255.
Related
The piece of code which retrieves faces from grayscale image (already converted to cv::Mat) works oddly,
what I'm doing wrong?
// in initializer list
model(cv::face::FisherFaceRecognizer::create())
// ....
const cv::Mat grayscale = cv::imread("photo_15.jpeg",cv::IMREAD_GRAYSCALE);
std::vector<cv::Rect> faceCandidates;
m_cascade.detectMultiScale(grayscale, faceCandidates);
uint32 label = -1;
double confidence = 0.0;
// this line for the testing purposes only
model->predict(grayscale, label, confidence);
this works fine : label refers to correct person and confidence within 10.
but lets continue with this function code:
for (auto &faceCandidateRegion : faceCandidates) {
cv::Mat faceResized;
// size_ is a member and contains 1280x720 for my case, equal to model trained photos.
cv::resize( cv::Mat(grayscale, faceCandidateRegion), faceResized, cv::Size(size_.width(), size_.height()));
// Recognize current face.
m_model->predict(faceResized, label, confidence);
// ... other processing
this piece of code works absolutely wrong: it always produces incorrect label and confidence is about ~45-46K even if I use a recognition photo from training photo set
any idea what I'm doing wrong here?
for the testing : I've tried to perform this with fisher, eigen and lbph with the same wrong result
update: each model in the app is a few user's group, where each user presented by 2-6 photos , so this is a reason why I train a few users in the model
here is a code which trains the models:
std::size_t
Recognizer::extractFacesAndConvertGrayscale(const QByteArray &rgb888, std::vector<cv::Mat> &faces)
{
cv::Mat frame = cv::imdecode(std::vector<char>{rgb888.cbegin(), rgb888.cend()}, cv::IMREAD_GRAYSCALE);
std::vector<cv::Rect> faceCandidates;
m_cascade.detectMultiScale(frame, faceCandidates);
int label = 0;
for(const auto &face : faceCandidates) {
cv::Mat faceResized;
cv::resize(cv::Mat{frame, face}, faceResized,
cv::Size(this->m_size.width(), this->m_size.height()));
faces.push_back(faceResized);
}
return faceCandidates.size();
}
bool Recognizer::train(const std::vector<qint32> &labels, const std::vector<QByteArray> &rgb888s)
{
if (labels.empty() || rgb888s.empty() || labels.size() != rgb888s.size())
return false;
std::vector<cv::Mat> mats = {};
std::vector<int32_t> processedLabels = {};
std::size_t i = 0;
for(const QByteArray &data : rgb888s)
{
std::size_t count = this->extractFacesAndConvertGrayscale(data, mats);
if (count)
std::fill_n(std::back_inserter(processedLabels), count, labels[i++]);
}
m_model->train(mats, processedLabels);
return true;
}
We resolved this in the comments, but for future reference:
The fact that this line
// this line for the testing purposes only
model->predict(grayscale, label, confidence);
had better confidence than
// Recognize current face.
m_model->predict(faceResized, label, confidence);
occurred because the model was trained with non-cropped images, while the detector crops the faces.
Rather than using the whole image with prediction, to match the input, the model should be trained with cropped faces:
The classifier performs independently of the size of the faces in the original image, due to the multiscale detection; i.e. size and position of the faces in the image become invariants.
Background does not interfere with classification. The original input had 16:9 aspect ratio, so at least the sides of the image would produce noise in the descriptors.
I am trying to write utility for training svm classifier for image classification in OpenCV3. But I have Floating point exception (core dumped) error during training process.
My main problem is that I don't know, I'm not sure exactly how to form training data to feed svm.train method.
This is code which is forming training data.
TrainingDataType SVMTrainer::prepareDataForTraining() {
cv::Mat trainingData(m_numOfAllImages, 28*28, CV_32FC1);
cv::Mat trainingLabels(m_numOfAllImages, 1, CV_32FC1);
int rowNum = 0;
// Item is pair of classId (int) and vector of images.
for(auto item : m_data){
int classId = item.first;
for(auto item1 : item.second){
Mat temp = item1.reshape(1,1);
temp.copyTo(trainingData.row(rowNum));
trainingLabels.at<float>(rowNum) = item.first;
++rowNum;
}
}
return cv::ml::TrainData::create(trainingData,
cv::ml::SampleTypes::ROW_SAMPLE,
trainingLabels) ;
}
void SVMTrainer::train(std::string& configPath){
// Read and store images in memory.
formClassifierData(configPath);
m_classifier = cv::ml::SVM::create();
// Training parameters:
m_classifier->setType(cv::ml::SVM::C_SVC);
m_classifier->setKernel(cv::ml::SVM::POLY);
m_classifier->setGamma(3);
m_classifier->setDegree(3);
TrainingDataType trainData = prepareDataForTraining();
m_classifier->trainAuto(trainData);
}
All images are already prepared with dimensions 28*28, black&white.
And actual train call is in this method
Can somebody tell me what I am doing wrong.
Thanks,
Its simple. Change the label format to CV_32SC1. It will definitely resolve your issue in opencv 3.0 ml.
Trying to create a functional SVM. I have 114 training images, 60 Positive/54 Negative, and 386 testing images for the SVM to predict against.
I read in the training image features to float like this:
trainingDataFloat[i][0] = trainFeatures.rows;
trainingDataFloat[i][1] = trainFeatures.cols;
And the same for the testing images too:
testDataFloat[i][0] = testFeatures.rows;
testDataFloat[i][2] = testFeatures.cols;
Then, using Micka's answer to this question, I turn the testDataFloat into a 1 Dimensional Array, and feed it to a Mat like this so to predict on the SVM:
float* testData1D = (float*)testDataFloat;
Mat testDataMat1D(height*width, 1, CV_32FC1, testData1D);
float testPredict = SVMmodel.predict(testDataMat1D);
Once this was all in place, there is the Debug Error of:
Sizes of input arguments do not match (the sample size is different from what has been used for training) in cvPreparePredictData
Looking at this post I found (Thanks to berak) that:
"all images (used in training & prediction) have to be the same size"
So I included a re-size function that would re-size the images to be all square at whatever size you wished (100x100, 200x200, 1000, 1000 etc.)
Run it again with the images re-sized to a new directory that the program now loads the images in from, and I get the exact same error as before of:
Sizes of input arguments do not match (the sample size is different from what has been used for training) in cvPreparePredictData
I just have no idea anymore on what to do. Why is it still throwing that error?
EDIT
I changed
Mat testDataMat1D(TestDFheight*TestDFwidth, 1, CV_32FC1, testData1D);
to
Mat testDataMat1D(1, TestDFheight*TestDFwidth, CV_32FC1, testData1D);
and placed the .predict inside the loop that the features are given to the float so that each image is given to the .predict individually because of this question. With the to int swapped so that .cols = 1 and .rows = TestDFheight*TestDFwidth the program seems to actually run, but then stops on image 160 (.exe has stopped working)... So that's a new concern.
EDIT 2
Added a simple
std::cout << testPredict;
To view the determined output of the SVM, and it seems to be positively matching everything until Image 160, where it stops running:
Please check your training and test feature vector.
I'm assuming your feature data is some form of cv::Mat containing features on each row.
In which case you want your training matrix to be a concatenation of each feature matrix from each image.
These line doesn't look right:
trainingDataFloat[i][0] = trainFeatures.rows;
trainingDataFloat[i][1] = trainFeatures.cols;
This is setting an element of a 2d matrix to the number of rows and columns in trainFeatures. This has nothing to do with the actual data that is in the trainFeatures matrix.
What are you trying to detect? If each image is a positive and negative example, then are you trying to detect something in an image? What are your features?
If you're trying to detect an object in the image on a per image basis, then you need a feature vector describing the whole image in one vector. In which case you'd do something like this with your training data:
int N; // Set to number of images you plan on using for training
int feature_size; // Set to the number of features extracted in each image. Should be constant across all images.
cv::Mat X = cv::Mat::zeros(N, feature_size, CV_32F); // Feature matrix
cv::Mat Y = cv::Mat::zeros(N, 1, CV_32F); // Label vector
// Now use a for loop to copy data into X and Y, Y = +1 for positive examples and -1 for negative examples
for(int i = 0; i < trainImages.size(); ++i)
{
X.row(i) = trainImages[i].features; // Where features is a cv::Mat row vector of size N of the extracted features
Y.row(i) = trainImages[i].isPositive ? 1:-1;
}
// Now train your cv::SVM on X and Y.
This is my code for training the dataset of for example vehicles , when it train fully , i want it to predict the data(vehicle) from video(.avi) , how to predict trained data from video and how to add that part in it ? , i want that when the vehicle is shown in the video it count it as 1 and cout that the object is detected and if second vehicle come it increment the count as 2
IplImage *img2;
cout<<"Vector quantization..."<<endl;
collectclasscentroids();
vector<Mat> descriptors = bowTrainer.getDescriptors();
int count=0;
for(vector<Mat>::iterator iter=descriptors.begin();iter!=descriptors.end();iter++)
{
count += iter->rows;
}
cout<<"Clustering "<<count<<" features"<<endl;
//choosing cluster's centroids as dictionary's words
Mat dictionary = bowTrainer.cluster();
bowDE.setVocabulary(dictionary);
cout<<"extracting histograms in the form of BOW for each image "<<endl;
Mat labels(0, 1, CV_32FC1);
Mat trainingData(0, dictionarySize, CV_32FC1);
int k = 0;
vector<KeyPoint> keypoint1;
Mat bowDescriptor1;
//extracting histogram in the form of bow for each image
for(j = 1; j <= 4; j++)
for(i = 1; i <= 60; i++)
{
sprintf( ch,"%s%d%s%d%s","train/",j," (",i,").jpg");
const char* imageName = ch;
img2 = cvLoadImage(imageName, 0);
detector.detect(img2, keypoint1);
bowDE.compute(img2, keypoint1, bowDescriptor1);
trainingData.push_back(bowDescriptor1);
labels.push_back((float) j);
}
//Setting up SVM parameters
CvSVMParams params;
params.kernel_type = CvSVM::RBF;
params.svm_type = CvSVM::C_SVC;
params.gamma = 0.50625000000000009;
params.C = 312.50000000000000;
params.term_crit = cvTermCriteria(CV_TERMCRIT_ITER, 100, 0.000001);
CvSVM svm;
printf("%s\n", "Training SVM classifier");
bool res = svm.train(trainingData, labels, cv::Mat(), cv::Mat(), params);
cout<<"Processing evaluation data..."<<endl;
Mat groundTruth(0, 1, CV_32FC1);
Mat evalData(0, dictionarySize, CV_32FC1);
k = 0;
vector<KeyPoint> keypoint2;
Mat bowDescriptor2;
Mat results(0, 1, CV_32FC1);;
for(j = 1; j <= 4; j++)
for(i = 1; i <= 60; i++)
{
sprintf( ch, "%s%d%s%d%s", "eval/", j, " (",i,").jpg");
const char* imageName = ch;
img2 = cvLoadImage(imageName,0);
detector.detect(img2, keypoint2);
bowDE.compute(img2, keypoint2, bowDescriptor2);
evalData.push_back(bowDescriptor2);
groundTruth.push_back((float) j);
float response = svm.predict(bowDescriptor2);
results.push_back(response);
}
//calculate the number of unmatched classes
double errorRate = (double) countNonZero(groundTruth- results) / evalData.rows;
The question isThis code is not predicting from video , i want to know how to predict it from the video , mean like i want to detect the vehicle from movie , like it should show 1 when it find the vehicle from movie
For those who didn't understand the question :
I want to play a movie in above code
VideoCapture cap("movie.avi"); //movie.avi is with deleted background
Suppose i have a trained data which contain vehicle's , and "movie.avi" contain 5 vehicles , so it should detect that vehicles from the movie.avi and give me 5 as output
How to do this part in the above code
From looking at your code setup
params.svm_type = CvSVM::C_SVC;
it appears that you train your classifier with more than two classes. A typical example in traffic scenario could be cars/pedestrians/bikes/... However, you were asking for a way to detect cars only. Without a description of your training data and your video it's hard to tell, if your idea makes sense. I guess what the previous answers are assuming is the following:
You loop through each frame and want to output the number of cars in that frame. Thus, a frame may contain multiple cars, say 5. If you take the whole frame as input to the classifier, it might respond "car", even if the setup might be a little off, conceptually. You cannot retrieve the number of cars reliably with this approach.
Instead, the suggestion is to try a sliding-window approach. This means, for example, you loop over each pixel of the frame and take the region around the pixel (called sub-window or region-of-interest) as input to the classifier. Assuming a fixed scale, the sub-window could have a size of 150x50px as well as your training data would. You might fixate the scale of the cars in your training data, but in real-world videos, the cars will be of different size. In order to find a car of different scale, let's say it's two-times as large as in the training data, the typical approach is to scale the image (say with a factor of 2) and repeat the sliding-window approach.
By repeating this for all relevant scales you end up with an algorithm that gives you for each pixel location and each scale the result of your classifier. This means you have three loops, or, in other words, there are three dimensions (image width, image height, scale). This is best understood as a three-dimensional pyramid. "Why a pyramid?" you might ask. Because each time the image is scaled (say 2) the image gets smaller (/larger) and the next scale is an image of different size (for eample half the size).
The pixel locations indicate the position of the car and the scale indicates the size of it. Now, if you have an N-class classifier, each slot in this pyramid will contain a number (1,...,N) indicating the class. If you had a binary classifier (car/no car), then you would end up with each slot containing 0 or 1. Even in this simple case, where you would be tempted to simply count the number of 1 and output the count as the number of cars, you still have the problem that there could be multiple responses for the same car. Thus, it would be better if you had a car detector that gives continous responses between 0 and 1 and then you could find maxima in this pyramid. Each maximum would indicate a single car. This kind of detection is successfully used with corner features, where you detect corners of interest in a so-called scale-space pyramid.
To summarize, no matter if you are simplifying the problem to a binary classification problem ("car"/"no car"), or if you are sticking to the more difficult task of distinguishing between multiple classes ("car"/"animal"/"pedestrian"/...), you still have the problem of scale and location in each frame to solve.
The code you have for using images is written with OpenCV's C interface so it's probably easy to stick with that rather than use the C++ video interface.
In which case somthing along these lines should work:
CvCapture *capture = cvCaptureFromFile("movie.avi");
IplImage *img = 0;
while(img = cvQueryFrame(capture))
{
// Process image
...
}
You should implement a sliding window approach. In each window, you should apply the SVM to get candidates. Then, once you've done it for the whole image, you should merge the candidates (if you detected an object, then it is very likely that you'll detect it again in shift of a few pixels - that's the meaning of candidates).
Take a look at the V&J code at openCV or the latentSVM code (detection by parts) to see how it's done there.
By the way, I would use the LatentSVM code (detection by parts) to detect vehicles. It has trained models for cars and for buses.
Good luck.
You need detector, not classifier. Take a look at Haar cascades, LBP cascades, latentSVM, as mentioned before or HOG detector.
I'll explain why. Detector usually scan image by sliding window, line by line. In several scales. In every window detector solve problem: "object/not object". It may give you rough results but very fast. Classifiers such as BOW, works very slow for this task. Then you should apply classifiers to regions, found by detector.
As an aside: Apologies if I'm flooding SO with OpenCV questions :p
I'm currently trying to port over my old C code to use the new C++ interface and I've got to the point where I'm rebuilidng my Eigenfaces face recogniser class.
Mat img = imread("1.jpg");
Mat img2 = imread("2.jpg");
FaceDetector* detect = new HaarDetector("haarcascade_frontalface_alt2.xml");
// convert to grey scale
Mat g_img, g_img2;
cvtColor(img, g_img, CV_BGR2GRAY);
cvtColor(img2, g_img2, CV_BGR2GRAY);
// find the faces in the images
Rect r = detect->getFace(g_img);
Mat img_roi = g_img(r);
r = detect->getFace(g_img2);
Mat img2_roi = g_img2(r);
// create the data matrix for PCA
Mat data;
data.create(2,1, img2_roi.type());
data.row(0) = img_roi;
data.row(1) = img2_roi;
// perform PCA
Mat averageFace;
PCA pca(data, averageFace, CV_PCA_DATA_AS_ROW, 2);
//namedWindow("avg",1); imshow("avg", averageFace); - causes segfault
//namedWindow("avg",1); imshow("avg", Mat(pca.mean)); - doesn't work
I'm trying to create the PCA space, and then see if it's working by displaying the computed average image. Are there any other steps to this?
Perhaps I need to project the images onto the PCA subspace first of all?
Your error is probably here:
Mat data;
data.create(2,1, img2_roi.type());
data.row(0) = img_roi;
data.row(1) = img2_roi;
PCA expects a matrix with the data vectors as rows. However, you never scale the images to the same size so that they have the same number of pixels (so the dimension is the same), also data.create(2,1,...) - the 1 needs to be the dimension of your vector, i.e. the number of your pixels. Then copy the pixels from the crop to your matrix.