SVM classifier based on HOG features for "object detection" in OpenCV - c++

I have a project, which I want to detect objects in the images; my aim is to use HOG features. By using OpenCV SVM implementation , I could find the code for detecting people, and I read some papers about tuning the parameters in order to detect object instead of people. Unfortunately, I couldn't do that for a few reasons; first of all, I am probably tuning the parameters incorrectly, second of all, I am not a good programmer in C++ but I have to do it with C++/OpenCV... here you can find the code for detecting HOG features for people by using C++/OpenCV.
Let's say that I want to detect the object in this image. Now, I will show you what I have tried to change in the code but it didn't work out with me.
The code that I tried to change:
HOGDescriptor hog;
hog.setSVMDetector(HOGDescriptor::getDefaultPeopleDetector());
I tried to change getDefaultPeopleDetector() with the following parameters, but it didn't work:
(Size(64, 128), Size(16, 16), Size(8, 8), Size(8, 8), 9, 0,-1, 0, 0.2, true, cv::HOGDescriptor::DEFAULT_NLEVELS)
I then tried to make a vector, but when I wanted to print the results, it seems to be empty.
vector<float> detector;
HOGDescriptor hog(Size(64, 128), Size(16, 16), Size(8, 8), Size(8, 8), 9, 0,-1, 0, 0.2, true, cv::HOGDescriptor::DEFAULT_NLEVELS);
hog.setSVMDetector(detector);
Please, I need help solving this problem.

In order to detect arbitrary objects with using opencv HOG descriptors and SVM classifier, you need to first train the classifier. Playing with the parameters will not help here, sorry :( .
In broad terms, you will need to complete the following steps:
Step 1) Prepare some training images of the objects you want to detect (positive samples). Also you will need to prepare some images with no objects of interest (negative samples).
Step 2) Detect HOG features of the training sample and use this features to train an SVM classifier (also provided in OpenCV).
Step 3) Use the coefficients of the trained SVM classifier in HOGDescriptor::setSVMDetector() method.
Only then, you can use the peopledetector.cpp sample code, to detect the objects you want to detect.

I've been dealing with the same problem and surprised with the lack of some clean C++ solutions I have create ~> this wrapper of SVMLight <~, which is a static library that provides classes SVMTrainer and SVMClassifier that simplify the training to something like:
// we are going to use HOG to obtain feature vectors:
HOGDescriptor hog;
hog.winSize = Size(32,48);
// and feed SVM with them:
SVMLight::SVMTrainer svm("features.dat");
then for each training sample:
// obtain feature vector describing sample image:
vector<float> featureVector;
hog.compute(img, featureVector, Size(8, 8), Size(0, 0));
// and write feature vector to the file:
svm.writeFeatureVectorToFile(featureVector, true); // true = positive sample
till the features.dat file contains feature vectors for all samples and at the end you just call:
std::string modelName("classifier.dat");
svm.trainAndSaveModel(modelName);
Once you have a file with model (or features.dat that you can just train the classifier with):
SVMLight::SVMClassifier c(classifierModelName);
vector<float> descriptorVector = c.getDescriptorVector();
hog.setSVMDetector(descriptorVector);
...
vector<Rect> found;
Size padding(Size(0, 0));
Size winStride(Size(8, 8));
hog.detectMultiScale(segment, found, 0.0, winStride, padding, 1.01, 0.1);
just check the documentation of HOGDescriptor for more info :)

I have done similar things as you did: collect samples of positive and negative images using HOG to extract features of car, train the feature set using linear SVM (I use SVM light), then use the model to detect car using HOG multidetect function.
I get lot of false positives, then I retrain the data using positive samples and false positive+negative samples. The resulting model is then tested again. The resulting detection improves (less false positives) but the result is not satisfying (average 50% hit rate and 50% false positives). Tuning up multidetect parameters improve the result but not much (10% less false positives and increase in hit rate).
Edit
I can share you the source code if you'd like, and I am very open for discussion as I have not get satisfactory results using HOG. Anyway, I think the code can be good starting point on using HOG for training and detection
Edit: adding code
static void calculateFeaturesFromInput(const string& imageFilename, vector<float>& featureVector, HOGDescriptor& hog)
{
Mat imageData = imread(imageFilename, 1);
if (imageData.empty()) {
featureVector.clear();
printf("Error: HOG image '%s' is empty, features calculation skipped!\n", imageFilename.c_str());
return;
}
// Check for mismatching dimensions
if (imageData.cols != hog.winSize.width || imageData.rows != hog.winSize.height) {
featureVector.clear();
printf("Error: Image '%s' dimensions (%u x %u) do not match HOG window size (%u x %u)!\n", imageFilename.c_str(), imageData.cols, imageData.rows, hog.winSize.width, hog.winSize.height);
return;
}
vector<Point> locations;
hog.compute(imageData, featureVector, winStride, trainingPadding, locations);
imageData.release(); // Release the image again after features are extracted
}
...
int main(int argc, char** argv) {
// <editor-fold defaultstate="collapsed" desc="Init">
HOGDescriptor hog; // Use standard parameters here
hog.winSize.height = 128;
hog.winSize.width = 64;
// Get the files to train from somewhere
static vector<string> tesImages;
static vector<string> positiveTrainingImages;
static vector<string> negativeTrainingImages;
static vector<string> validExtensions;
validExtensions.push_back("jpg");
validExtensions.push_back("png");
validExtensions.push_back("ppm");
validExtensions.push_back("pgm");
// </editor-fold>
// <editor-fold defaultstate="collapsed" desc="Read image files">
getFilesInDirectory(posSamplesDir, positiveTrainingImages, validExtensions);
getFilesInDirectory(negSamplesDir, negativeTrainingImages, validExtensions);
/// Retrieve the descriptor vectors from the samples
unsigned long overallSamples = positiveTrainingImages.size() + negativeTrainingImages.size();
// </editor-fold>
// <editor-fold defaultstate="collapsed" desc="Calculate HOG features and save to file">
// Make sure there are actually samples to train
if (overallSamples == 0) {
printf("No training sample files found, nothing to do!\n");
return EXIT_SUCCESS;
}
/// #WARNING: This is really important, some libraries (e.g. ROS) seems to set the system locale which takes decimal commata instead of points which causes the file input parsing to fail
setlocale(LC_ALL, "C"); // Do not use the system locale
setlocale(LC_NUMERIC,"C");
setlocale(LC_ALL, "POSIX");
printf("Reading files, generating HOG features and save them to file '%s':\n", featuresFile.c_str());
float percent;
/**
* Save the calculated descriptor vectors to a file in a format that can be used by SVMlight for training
* #NOTE: If you split these steps into separate steps:
* 1. calculating features into memory (e.g. into a cv::Mat or vector< vector<float> >),
* 2. saving features to file / directly inject from memory to machine learning algorithm,
* the program may consume a considerable amount of main memory
*/
fstream File;
File.open(featuresFile.c_str(), ios::out);
if (File.good() && File.is_open()) {
File << "# Use this file to train, e.g. SVMlight by issuing $ svm_learn -i 1 -a weights.txt " << featuresFile.c_str() << endl; // Remove this line for libsvm which does not support comments
// Iterate over sample images
for (unsigned long currentFile = 0; currentFile < overallSamples; ++currentFile) {
storeCursor();
vector<float> featureVector;
// Get positive or negative sample image file path
const string currentImageFile = (currentFile < positiveTrainingImages.size() ? positiveTrainingImages.at(currentFile) : negativeTrainingImages.at(currentFile - positiveTrainingImages.size()));
// Output progress
if ( (currentFile+1) % 10 == 0 || (currentFile+1) == overallSamples ) {
percent = ((currentFile+1) * 100 / overallSamples);
printf("%5lu (%3.0f%%):\tFile '%s'", (currentFile+1), percent, currentImageFile.c_str());
fflush(stdout);
resetCursor();
}
// Calculate feature vector from current image file
calculateFeaturesFromInput(currentImageFile, featureVector, hog);
if (!featureVector.empty()) {
/* Put positive or negative sample class to file,
* true=positive, false=negative,
* and convert positive class to +1 and negative class to -1 for SVMlight
*/
File << ((currentFile < positiveTrainingImages.size()) ? "+1" : "-1");
// Save feature vector components
for (unsigned int feature = 0; feature < featureVector.size(); ++feature) {
File << " " << (feature + 1) << ":" << featureVector.at(feature);
}
File << endl;
}
}
printf("\n");
File.flush();
File.close();
} else {
printf("Error opening file '%s'!\n", featuresFile.c_str());
return EXIT_FAILURE;
}
// </editor-fold>
// <editor-fold defaultstate="collapsed" desc="Pass features to machine learning algorithm">
/// Read in and train the calculated feature vectors
printf("Calling SVMlight\n");
SVMlight::getInstance()->read_problem(const_cast<char*> (featuresFile.c_str()));
SVMlight::getInstance()->train(); // Call the core libsvm training procedure
printf("Training done, saving model file!\n");
SVMlight::getInstance()->saveModelToFile(svmModelFile);
// </editor-fold>
// <editor-fold defaultstate="collapsed" desc="Generate single detecting feature vector from calculated SVM support vectors and SVM model">
printf("Generating representative single HOG feature vector using svmlight!\n");
vector<float> descriptorVector;
vector<unsigned int> descriptorVectorIndices;
// Generate a single detecting feature vector (v1 | b) from the trained support vectors, for use e.g. with the HOG algorithm
SVMlight::getInstance()->getSingleDetectingVector(descriptorVector, descriptorVectorIndices);
// And save the precious to file system
saveDescriptorVectorToFile(descriptorVector, descriptorVectorIndices, descriptorVectorFile);
// </editor-fold>
// <editor-fold defaultstate="collapsed" desc="Test detecting vector">
cout << "Test Detecting Vector" << endl;
hog.setSVMDetector(descriptorVector); // Set our custom detecting vector
cout << "descriptorVector size: " << sizeof(descriptorVector) << endl;
getFilesInDirectory(tesSamplesDir, tesImages, validExtensions);
namedWindow("Test Detector", 1);
for( size_t it = 0; it < tesImages.size(); it++ )
{
cout << "Process image " << tesImages[it] << endl;
Mat image = imread( tesImages[it], 1 );
detectAndDrawObjects(image, hog);
for(;;)
{
int c = waitKey();
if( (char)c == 'n')
break;
else if( (char)c == '\x1b' )
exit(0);
}
}
// </editor-fold>
return EXIT_SUCCESS;
}

Related

opencv neural network, incorrect predict

I'm trying to create a neural network in C++ with OpenCV. The aim is recognition of road signs. I have created the network in this way, but it predicts badly, because it returns strange results:
Sample images from the training selection look like this:
Can someone help?
trainNN() {
char* templates_directory[] = {
"speed50ver1\\",
"speed60ver1\\",
"speed70ver1\\",
"speed80ver1\\"
};
int const numFilesChars[]={ 213, 100, 385, 163};
char const strCharacters[] = { '5', '6', '7', '8' };
Mat trainingData;
Mat trainingLabels(0, 0, CV_32S);
int const numCharacters = 4;
// load images from directory
for (int i = 0; i != numCharacters; ++i) {
int numFiles = numFilesChars[i];
DIR *dir;
struct dirent *ent;
char* s1 = templates_directory[i];
if ((dir = opendir (s1)) != NULL) {
Size size(80, 80);
while ((ent = readdir (dir)) != NULL) {
string s = s1;
s.append(ent->d_name);
if(s.substr(s.find_last_of(".") + 1) == "jpg") {
Mat img = imread(s,0);
Mat img_mat;
resize(img, img_mat, size);
Mat new_img = img_mat.reshape(1, 1);
trainingData.push_back(new_img);
trainingLabels.push_back(i);
}
}
int b = 0;
closedir (dir);
} else {
/* could not open directory */
perror ("");
}
}
trainingData.convertTo(trainingData, CV_32FC1);
Mat trainClasses(trainingData.rows, numCharacters, CV_32FC1);
for( int i = 0; i != trainClasses.rows; ++i){
int const labels = *trainingLabels.ptr<int>(i);
auto train_ptr = trainClasses.ptr<float>(i);
for(int k = 0; k != trainClasses.cols; ++k){
*train_ptr = k != labels ? 0 : 1;
++train_ptr;
}
}
int layers_d[] = { trainingData.cols, 10, numCharacters};
Mat layers(1, 3, CV_32SC1, layers_d);
ann.create(layers, CvANN_MLP::SIGMOID_SYM, 1, 1);
CvANN_MLP_TrainParams params = CvANN_MLP_TrainParams(
// terminate the training after either 1000
// iterations or a very small change in the
// network wieghts below the specified value
cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 1000, 0.000001),
// use backpropogation for training
CvANN_MLP_TrainParams::BACKPROP,
// co-efficents for backpropogation training
// (refer to manual)
0.1,
0.1);
int iterations = ann.train(trainingData, trainClasses, cv::Mat(), cv::Mat(), params);
CvFileStorage* storage = cvOpenFileStorage( "neural_network_2.xml", 0, CV_STORAGE_WRITE );
ann.write(storage,"digit_recognition");
cvReleaseFileStorage(&storage);
}
void analysis(char* file, bool a) {
//trainNN(a);
read_nn();
// load image
Mat img = imread(file, 0);
Size my_size(80,80);
resize(img, img, my_size);
Mat r_img = img.reshape(1,1);
r_img.convertTo(r_img, CV_32FC1);
Mat classOut(1,4,CV_32FC1);
ann.predict(r_img, classOut);
double min1, max1;
cv::Point min_loc, max_loc;
minMaxLoc(classOut, &min1, &max1, &min_loc, &max_loc);
int x = max_loc.x;
//create windows
namedWindow("Original Image", CV_WINDOW_AUTOSIZE);
imshow("Original Image", img);
waitKey(0); //wait for key press
img.release();
rr.release();
destroyAllWindows(); //destroy all open windows
}
strange results: for this input answer is 3 (because i have only 4 classes - speed limit 50, 60, 70, 80). It's correct for speed limit 80 sign.
But for the rest inputs results are incorrect. They are same for signs 50, 60, 70. max1 = min1 = 1.02631...(as on the first picture) It's strange.
I have adapted your code to train a classifier on 4 hand positions (since that's the image data I have). I kept your logic as similar as possible, only changing what was absolutely necessary to make it run on my Windows machine on my images. Long story short, there is nothing fundamentally wrong with your code - I don't see the failure mode you described.
One thing you left out was the code for read_nn(). I assume that just does something like the following:
ann.load("neural_network_2.xml");
Anyway, my suspicion is that either your neural network is not converging at all or it's badly overfitting. Perhaps there's not enough variation in the training data. Are you running analysis() on separate test data that the ANN wasn't trained on? If so, is the ANN able to predict training data properly at least?
EDIT: OK, I just downloaded your image data and tried it out and saw the same behavior. After some analysis, it looks like your ANN is not converging. The training operation exits after only about 250 iterations, even if you specify only CV_TERMCRIT_ITER for the cvTermCriteria. After increasing your hidden layer size from 10 to 20, I saw a marked improvement, with successful classification on the training data for 212, 72, 94, and 143 of the images respectively to the classes (50, 60, 70, and 80). That's not very good, but it demonstrates that you're on the right track.
Basically, the network architecture is not expressive enough to adequately model the problem you're trying to solve, so the network weights never converge and it abandons the backprop early. For one class, you may see some success, but I believe that's largely a function of the lack of shuffling of training data. If it stops after having just trained on a couple hundred very similar images, it may be able to manage to classify those correctly.
In short, I would recommend doing the following:
Build a way to test the results - e.g.: create a function to run prediction on all training data, and ideally set aside some images as a validation set in order to also confirm that the model is not overfitting the training data.
Shuffle the training data prior to training. Otherwise, backprop will not converge as easily.
Experiment with different architectures such as more than one hidden layer with varying sizes.
Really, this is a problem that would benefit dramatically from using a Convolutional Neural Net, but OpenCV's machine learning facilities are pretty limited. Ultimately, if you're serious about creating ANNs, you might want to investigate some more robust tools. I personally use Tensorflow, but I've heard good things about Theano as well.
I've only implemented NN with OpenCV for boolean classification, but I think that for a task where you need to classify more than two distinct classes this might also apply:
"If you are using the default cvANN_MLP::SIGMOID_SYM activation function then the output should be in the range [-1,1], instead of [0,1], for optimal results."
So, where you do:
*train_ptr = k != labels ? 0 : 1;
You might want to try:
*train_ptr = k != labels ? -1 : 1;
Disregard if I'm way off track here.

HOG features for object detection using GPU in OpenCV

In my project I am calculating HOG features on GPU for different levels in the same image. My aim is to detect the following objects.
1. Truck
2. Car
3. Person
Most important question is the selection of window size in case of multi class object detector. This post provide a very good base but it does not provide an answer for the selection of window size in case of multi class feature.
To solve this problem I calculated the HOG features of each positive image at different levels/resolution keeping the window size(48*96) same but the file for each image is around 600 MB which is too large.
Please let me know how to select the window size, block size and cell size in case of multi class object detection. Here is my code that I used to calculate the HOG features.
void App::run()
{
unsigned int count = 1;
FileStorage fs;
running = true;
//int width;
//int height;
Size win_size(args.win_width, args.win_width * 2);
Size win_stride(args.win_stride_width, args.win_stride_height);
cv::gpu::HOGDescriptor gpu_hog(win_size, Size(16, 16), Size(8, 8), Size(8, 8), 9,
cv::gpu::HOGDescriptor::DEFAULT_WIN_SIGMA, 0.2, gamma_corr,
cv::gpu::HOGDescriptor::DEFAULT_NLEVELS);
VideoCapture vc("/home/ubuntu/Desktop/getdescriptor/images/image%d.jpg");
Mat frame;
Mat Left;
Mat img_aux, img, img_to_show, img_new;
cv::Mat temp;
gpu::GpuMat gpu_img, descriptors, new_img;
char cbuff[20];
while (running)
{
vc.read(frame);
if (!frame.empty())
{
workBegin();
width = frame.rows;
height = frame.cols;
sprintf (cbuff, "%04d", count);
// Change format of the image
if (make_gray) cvtColor(frame, img_aux, CV_BGR2GRAY);
else if (use_gpu) cvtColor(frame, img_aux, CV_BGR2BGRA);
else Left.copyTo(img_aux);
// Resize image
if (args.resize_src) resize(img_aux, img, Size(args.width, args.height));
else img = img_aux;
img_to_show = img;
gpu_hog.nlevels = nlevels;
hogWorkBegin();
if (use_gpu)
{
gpu_img.upload(img);
new_img.upload(img_new);
fs.open(cbuff, FileStorage::WRITE);
for(int levels = 0; levels < nlevels; levels++)
{
gpu_hog.getDescriptors(gpu_img, win_stride, descriptors, cv::gpu::HOGDescriptor::DESCR_FORMAT_ROW_BY_ROW);
descriptors.download(temp);
//printf("size %d %d\n", temp.rows, temp.cols);
fs <<"level" << levels;
fs << "features" << temp;
cout<<"("<<width<<","<<height<<")"<<endl;
width = round(width/scale);
height = round(height/scale);
if( width < win_size.width || height < win_size.height )
break;
cout<<"Levels "<<levels<<endl;
resize(img,img_new,Size(width,height));
scale *= scale;
}
cout<<count<< " Image feature calculated !"<<endl;
count++;
//width = 640; height = 480;
scale = 1.05;
}
hogWorkEnd();
fs.release();
}
else running = false;
}
}
The window size should be chosen, s.t. the object(s) you want to detect fit into the window. If you want to have different window sizes for different types this might become tricky.
Usually what you do is the following
Take training data for each type of objects, and train [number of object types] many models using the features extracted at the known position of the objects.
Then you take each test image and use a sliding window approach to extract features at each location. These features are then compared to each model. If one of the models lead to a score higher than a certain threshold you have found this object. If more than one model scores higher than the threshold simply take the one scoring highest.
If you want to use differently sized detection windows you will get feature vectors of different size (by nature of the HoG features). The tricky thing is, that in the testing phase you have to use as many sliding windows as object types you use. This would definitely work, but you have to process each testing image several times leading to higher processing time)
To answer your question of the sizes: There is no value I can give you, it always depends on your images. Using an image pyramid as you mentioned above is a good way to deal with differently scaled objects.
window size: the whole object should fit in; has to be divisible by block size
block size has to be divisible by cell size
Sample code for visualization of HoG features can be found here. This also helps understand how the feature vectors look like.
EDIT: Found out the hard way, that only cv::Size(8,8) is allowed for cell size. See documentation.

c++ Bag Of Words - OpenCV: Assertion Failed

I'm trying to get to grips with Bag Of Words in c++ and I have some sample code, but this Error keeps on throwing it and I don't know why.
I'm completely new to this and am very much lost.
Here's the entirety of the code:
#include "stdafx.h"
#include <opencv/cv.h>
#include <opencv/highgui.h>
#include <opencv2/nonfree/features2d.hpp>
using namespace cv;
using namespace std;
#define DICTIONARY_BUILD 1 // set DICTIONARY_BUILD 1 to do Step 1, otherwise it goes to step 2
int _tmain(int argc, _TCHAR* argv[])
{
#if DICTIONARY_BUILD == 1
//Step 1 - Obtain the set of bags of features.
//to store the input file names
char * filename = new char[100];
//to store the current input image
Mat input;
//To store the keypoints that will be extracted by SIFT
vector<KeyPoint> keypoints;
//To store the SIFT descriptor of current image
Mat descriptor;
//To store all the descriptors that are extracted from all the images.
Mat featuresUnclustered;
//The SIFT feature extractor and descriptor
SiftDescriptorExtractor detector;
//I select 20 (1000/50) images from 1000 images to extract feature descriptors and build the vocabulary
for(int f=0;f<999;f+=50){
//create the file name of an image
sprintf(filename,"G:\\testimages\\image\\%i.jpg",f);
//open the file
input = imread(filename, CV_LOAD_IMAGE_GRAYSCALE); // -- Forgot to add in
//detect feature points
detector.detect(input, keypoints);
//compute the descriptors for each keypoint
detector.compute(input, keypoints,descriptor);
//put the all feature descriptors in a single Mat object
featuresUnclustered.push_back(descriptor);
//print the percentage
printf("%i percent done\n",f/10);
}
//Construct BOWKMeansTrainer
//the number of bags
int dictionarySize=200;
//define Term Criteria
TermCriteria tc(CV_TERMCRIT_ITER,100,0.001);
//retries number
int retries=1;
//necessary flags
int flags=KMEANS_PP_CENTERS;
//Create the BoW (or BoF) trainer
BOWKMeansTrainer bowTrainer(dictionarySize,tc,retries,flags);
//cluster the feature vectors
Mat dictionary;
dictionary=bowTrainer.cluster(featuresUnclustered); // -- BREAKS
//store the vocabulary
FileStorage fs("dictionary.yml", FileStorage::WRITE);
fs << "vocabulary" << dictionary;
fs.release();
#else
//Step 2 - Obtain the BoF descriptor for given image/video frame.
//prepare BOW descriptor extractor from the dictionary
Mat dictionary;
FileStorage fs("dictionary.yml", FileStorage::READ);
fs["vocabulary"] >> dictionary;
fs.release();
//create a nearest neighbor matcher
Ptr<DescriptorMatcher> matcher(new FlannBasedMatcher);
//create Sift feature point extracter
Ptr<FeatureDetector> detector(new SiftFeatureDetector());
//create Sift descriptor extractor
Ptr<DescriptorExtractor> extractor(new SiftDescriptorExtractor);
//create BoF (or BoW) descriptor extractor
BOWImgDescriptorExtractor bowDE(extractor,matcher);
//Set the dictionary with the vocabulary we created in the first step
bowDE.setVocabulary(dictionary);
//To store the image file name
char * filename = new char[100];
//To store the image tag name - only for save the descriptor in a file
char * imageTag = new char[10];
//open the file to write the resultant descriptor
FileStorage fs1("descriptor.yml", FileStorage::WRITE);
//the image file with the location. change it according to your image file location
sprintf(filename,"G:\\testimages\\image\\1.jpg");
//read the image
Mat img=imread(filename,CV_LOAD_IMAGE_GRAYSCALE);
//To store the keypoints that will be extracted by SIFT
vector<KeyPoint> keypoints;
//Detect SIFT keypoints (or feature points)
detector->detect(img,keypoints);
//To store the BoW (or BoF) representation of the image
Mat bowDescriptor;
//extract BoW (or BoF) descriptor from given image
bowDE.compute(img,keypoints,bowDescriptor);
//prepare the yml (some what similar to xml) file
sprintf(imageTag,"img1");
//write the new BoF descriptor to the file
fs1 << imageTag << bowDescriptor;
//You may use this descriptor for classifying the image.
//release the file storage
fs1.release();
#endif
printf("\ndone\n");
return 0;
}
But then it throws this up:
OpenCV Error: Assertion failed (data.dims <= 2 && type == CV_32F && K > 0) in cv::kmeans, file C:\buildslave64\win64_amdoc1\2_4_PackSlave-win32-vc11-shared\opencv\modules\core\src\matrix.cpp, line 2701
Help, please.
EDIT
Line that it breaks on:
dictionary = bowTrainer.cluster(featuresUnclustered); // -- Breaks
EDIT 2
Ive come across this, but i am unsure how to translate it to help with my cause.
I'm not 100% sure of what the code is doing since I'm not an OpenCV expert. However I can see that you are not initializing input in any way. This probably results in you not getting the descriptors you want, and thus not really doing anything. The code then probably breaks since it expects actual data in, but there is none.
In general, when dealing with OpenCV or other big "kind of messy" libraries I would advise you to proceed step by step, and checking that results are what you expect every step of the way. Copy-pasting a big blob of code and expecting it to work is never the best course of action.
if (allDescriptors.type() != CV_32F)
{
allDescriptors.convertTo(allDescriptors, CV_32F);
}
Make sure that your image directory in 1st step is correct. It should exist training images as 0.jpg, 50.jpg, ... etc. Cause in a lot of situations, this error occurs when image is not loaded. You can add following codes after imread to check. Hope it can work.
if(input.empty())
{
cout << "Error: Image cannot be loaded !" << endl;
system("Pause");
return -1;
}

The opencv code for face recognition does not predict correctly in visual c++

This is my code for face recognition in videos. It runs without any error but it's prediction
is wrong most of the time.I am using LBPH face recognizer to recognize the faces.
I tried using haar cascades but it does not load. so i switched to LBHP.please help me to improve the prediction.
I am using gray scale cropped images of size 500 x 500 (pixels) for training the cascade classifier.
#include <opencv2/core/core.hpp>
#include <opencv2/contrib/contrib.hpp
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/objdetect/objdetect.hpp>
#include <iostream>
#include <fstream>
#include <sstream>
using namespace cv;
using namespace std;
static void read_csv(const string& filename, vector<Mat>& images, vector<int>& labels, char separator = ';') {
std::ifstream file(filename.c_str(), ifstream::in);
if (!file) {
string error_message = "No valid input file was given, please check the given filename.";
CV_Error(CV_StsBadArg, error_message);
}
string line, path, classlabel;
while (getline(file, line)) {
stringstream liness(line);
getline(liness, path, separator);
getline(liness, classlabel);
if(!path.empty() && !classlabel.empty()) {
images.push_back(imread(path, 0));
labels.push_back(atoi(classlabel.c_str()));
}
}
}
string g_listname_t[]=
{
"ajay","Aasai","famiz"
};
int main(int argc, const char *argv[]) {
// Check for valid command line arguments, print usage
// if no arguments were given.
//if (argc != 4) {
// cout << "usage: " << argv[0] << " </path/to/haar_cascade> </path/to/csv.ext> </path/to/device id>"<<endl;
// cout << "\t </path/to/haar_cascade> -- Path to the Haar Cascade for face detection." << endl;
// cout << "\t </path/to/csv.ext> -- Path to the CSV file with the face database." << endl;
// cout << "\t <device id> -- The webcam device id to grab frames from." << endl;
// exit(1);
//}
//// Get the path to your CSV:
//string fn_haar = string(argv[1]);
//string fn_csv = string(argv[2]);
//int deviceId = atoi(argv[3]);
//// Get the path to your CSV:
// please set the correct path based on your folder
string fn_haar = "lbpcascade_frontalface.xml";
string fn_csv = "reader.ext ";
int deviceId = 0; // here is my webcam Id.
// These vectors hold the images and corresponding labels:
vector<Mat> images;
vector<int> labels;
// Read in the data (fails if no valid input filename is given, but you'll get an error message):
try {
read_csv(fn_csv, images, labels);
} catch (cv::Exception& e) {
cerr << "Error opening file \"" << fn_csv << "\". Reason: " << e.msg << endl;
// nothing more we can do
exit(1);
}
// Get the height from the first image. We'll need this
// later in code to reshape the images to their original
// size AND we need to reshape incoming faces to this size:
int im_width = images[0].cols;
int im_height = images[0].rows;
// Create a FaceRecognizer and train it on the given images:
Ptr<FaceRecognizer> model = createLBPHFaceRecognizer();
model->train(images, labels);
cout<<("Facerecognizer created");
// That's it for learning the Face Recognition model. You now
// need to create the classifier for the task of Face Detection.
// We are going to use the haar cascade you have specified in the
// command line arguments:
CascadeClassifier lbp_cascade;
if ( ! lbp_cascade.load(fn_haar) )
{
cout<<("\nlbp cascade not loaded");
}
else
{
cout<<("\nlbp cascade loaded");
}
// Get a handle to the Video device:
VideoCapture cap(deviceId);
cout<<("\nvideo device is opened");
// Check if we can use this device at all:
if(!cap.isOpened()) {
cerr << "Capture Device ID " << deviceId << "cannot be opened." << endl;
return -1;
}
// Holds the current frame from the Video device:
Mat frame;
for(;;) {
cap >> frame;
// Clone the current frame:
Mat original = frame.clone();
cout<<("\nframe is cloned");
// Convert the current frame to grayscale:
Mat gray;
//gray = imread("G:\Picture\003.jpg",0);
cvtColor(original, gray, CV_BGR2GRAY);
imshow("gray image", gray);
// And display it:
char key1 = (char) waitKey(50);
// Find the faces in the frame:
cout<<("\ncolor converted");
vector< Rect_<int> > faces;
cout<<("\ndetecting faces");
lbp_cascade.detectMultiScale(gray, faces);
// At this point you have the position of the faces in
// faces. Now we'll get the faces, make a prediction and
// annotate it in the video. Cool or what?
cout<<("\nfaces detected\n");
cout<<faces.size();
for(int i = 0; i < faces.size(); i++)
{
// Process face by face:
cout<<("\nprocessing faces");
Rect face_i = faces[i];
// Crop the face from the image. So simple with OpenCV C++:
Mat face = gray(face_i);
// Resizing the face is necessary for Eigenfaces and Fisherfaces. You can easily
// verify this, by reading through the face recognition tutorial coming with OpenCV.
// Resizing IS NOT NEEDED for Local Binary Patterns Histograms, so preparing the
// input data really depends on the algorithm used.
//
// I strongly encourage you to play around with the algorithms. See which work best
// in your scenario, LBPH should always be a contender for robust face recognition.
//
// Since I am showing the Fisherfaces algorithm here, I also show how to resize the
// face you have just found:
/*Mat face_resized;
cv::resize(face, face_resized, Size(im_width, im_height), 1.0, 1.0, INTER_CUBIC);
// Now perform the prediction, see how easy that is:
cout<<("\nface resized");
imshow("resized face image", face_resized);*/
int prediction = model->predict(face);
cout<<("\nface predicted");
// And finally write all we've found out to the original image!
// First of all draw a green rectangle around the detected face:
cout<<("\nnow writing to original");
rectangle(original, face_i, CV_RGB(0, 255,0), 1);
// Create the text we will annotate the box with:
string box_text;
box_text = format( "Prediction =",prediction);
// Get stringname
if ( prediction >= 0 && prediction <=1 )
{
box_text.append( g_listname_t[prediction] );
}
else box_text.append( "Unknown" );
// Calculate the position for annotated text (make sure we don't
// put illegal values in there):
int pos_x = std::max(face_i.tl().x - 10, 0);
int pos_y = std::max(face_i.tl().y - 10, 0);
// And now put it into the image:
putText(original, box_text, Point(pos_x, pos_y), FONT_HERSHEY_PLAIN, 1.0, CV_RGB(0,255,0), 2.0);
}
// Show the result:
imshow("face_recognizer", original);
// And display it:
char key = (char) waitKey(50);
// Exit this loop on escape:
if(key == 27)
break;
}
return 0;
}
That is an expected result if you ask me, the code which you showed is the basic one to do recognition, there are some backdrops which we need to take care of before implementing.
1) the quality of training images, how did you crop them ?
do they contain any extra information apart from face, if you used haar classifier in our opencv data to crop faces, then, the images tend to contain extra information than the face, as the rectangles are a bit large in size when compared to face.
2) there might be a chance that, even the rotated faces might be trained, so, its tough to classify with the features of rotated faces.
3) how many images, you trained the recognizer with ?, it playes a crucial role.
Answer for the first question, is most likely to be out of opencv, we cant do much about it, as there is very less probability that, we ll find a face detector which is as good and as simple as haar detector, so, we could make this as an exemption, if we can adjust with an accuracy around 70 %.
the second problem could be solved with some preprocessing techniques on training and testing dataset.
Like., aligning faces which are being rotated
follow this link, very good suggestions for face alignment are being suggested.
How to align face images c++ opencv
the third problem is solved with good number of samples which is not a hard task to achieve, take care of alignment before training, so that correct features could be extracted to classify.
there might be other factors that can improve the accuracy which I might have missed.

Store in hard disk a CvSVM object

I have create a SVM detector for faces in opencv. I want to separate the process of train and predict. Thus, I want to find a way to store in my hard disk CvSVM SVM object in order to access it whenever I want without train again a new model. In this way, I could use predict method by just loading SVM object from hard disk. This is pretty straight forward in Matlab, but how can I succeed it in opencv c++?
Using load and save functions in order to load and save SVM model it leads to munmap_chunk()::invalid pointer message(glibc detected) message.
CvSVM SVM = detect.SVMtrain("dbpath/");
SVM.save("svmTrain.xml", 0);
cout <<"Detection system in progress progress...\n";
string pathFile = databasePath+ imageFilename;
vector<float> confidence;
detections = detect.detection(pathFile);
CvSVM SVM;
SVM.load("svmTrain.xml",0);
confidence = detect.SVMpredict(detections.at(0), SVM);
//cout << "confidence is: " << confidence.at(0) << endl;
When I ran the above code I received the above message. The problem stands in svmpredict function. And SVMpredict function:
vector<float> Detection::SVMpredict(Mat image, CvSVM SVM){
vector<float> res;
//Mat image = imread(fileName,0); //Read as a gray image
//cvtColor(sampleMat, sampleMat, CV_BGR2GRAY);
image = eigenfacesExtraction(image);
image.convertTo(image, CV_32FC1); // <-- Convert to CV_32F for matrix mult
float response = SVM.predict(image); res.push_back(response);
float val = SVM.predict(image,true); res.push_back(val);
cout << "svm result: "<< response <<" val: "<< val << endl;
return res;
}
Straightforward too. It's a CvStatModel, so it comes with save and load methods.