Learning a SVM with Bag of features - c++

Hello I try to learn a SVM with a Dataset of negative and positive examples.
Iam using the Kmeans Clustering and Bag of Words for it.
My steps are:
compute the descriptors and key points for each image with surf
put all descriptors in one mat (unclustered mat) and create the label mat(1 and -1)
using K-Means for clustering put in the unclustered mat and run the algorithm the result is the vocabulary
start the Bag of features procedure with BowImgDescriptorExtractor using the FlannBasedMatcher and the Surf detector for it
set the extracted vocabulary to BowImgDescriptorExtractor
compute bow with img, imgkeypoints
the result is a bow descriptor
train the SVM with the bowdescritpor and the labels
The syntax is correct, but if I use svm->islearned the svm returns false.
Something in my procedure is wrong. Please give me some advice what Iam doing wrong

Related

Prediction Confidence SVM OpenCV 3.0 returns 0

I am looking for a way to get the prediction confidence for a multi-class prediction with an opencv SVM. What I found so far:
//data is a cv::Mat containing the samples to be predicted in format (num_samples,dimension,CV_32F)
//SVM_classifier is a trained opencv SVM classifier
cv::Mat predLabels(data.rows,1,CV:32F);
cv::Mat predProb(data.rows,1,CV_32F);
predProb = SVM_classifier->predict(data,predLabels,cv::ml::StatModel::RAW_OUTPUT);
Executing this code, I get pretty good prediction results (Fmeasure around 0.95), but the predProb matrix only contains '0' ... Can someone help me to get the correct results? Is the issue that I have a multi-class problem instead of a binary problem?
Also what is the difference in using cv::ml::StatModel::RAW_OUTPUT instead of cv::ml::SVM::Flags::RAW_OUTPUT? No matter which one I use, I get a vector filled with zeros as result... Any help is appreciated!!

dlib 19.6 multi classifier training data

I'm working on object detection using dlib 19.6 c++, till now i was using single classification like below
, these labels are generated using imglab from dlib.
For this I'm using fhog_object_detector_ex.cpp. It works well and able to detect object.
Now I have defined multi classifier like below.
1. clock
2. pot
so how can i use/modify fhog_object_detector_ex.cpp to train data. I have knowledge on how to test image,
std::vector<object_detector<image_scanner_type> > my_detectors;
my_detectors.push_back(detector1); // clock.svm
my_detectors.push_back(detector2); // pot.svm
my_detectors.push_back(detector3); // any other.svm
std::vector dets2 = evaluate_detectors(my_detectors, image);
but i'm not sure how can i train data. do i need to label 2 times separately for each image and run object trainer 2 times? or it is possible to train 2 classifiers same time?
Dlib HOG detector doesn't support multiclass classifier.SO you need to train two separate detectors for each label and then combine detectors to run on a single image based on weight index you get the label of the detector.

OpenCV BoW assert error while computing histograms for SVM training

I am trying to do classification of images with combining SIFT features, Bag of Visual Words and SVM.
Now I am on training part. I need to get BoW histograms for each training image to be able to train SVM. For this I am using BOWImgDescriptorExtractor from OpenCV. I am using OpenCV version 3.1.0.
The problem is that it computes histogram for some images, but for some of images it gives me this error:
OpenCV Error: Assertion failed (queryIdx == (int)i) in compute,
file /Users/opencv-3.1.0/modules/features2d/src/bagofwords.cpp, line 200
libc++abi.dylib: terminating with uncaught exception of type
cv::Exception: /Users/opencv-3.1.0/modules/feature/src/bagofwords.cpp:200: error: (-215) queryIdx == (int)i in function compute
Training images are all of the same size, all have same number of channels.
For creating dictionary I use another image set than for training SVM.
Here's part of code:
Ptr<FeatureDetector> detector(cv::xfeatures2d::SIFT::create());
Ptr<DescriptorMatcher> matcher(new BFMatcher(NORM_L2, true));
BOWImgDescriptorExtractor bow_descr(det, matcher);
bow_descr.setVocabulary(dict);
Mat features_svm;
for (int i = 0; i < num_svm_data; ++i) {
Mat hist;
std::vector<KeyPoint> keypoints;
detector->detect(data_svm[i], keypoints);
bow_descr.compute(data_svm[i], keypoints, hist);
features_svm.push_back(hist);
}
data_svm is a vector<Mat> type. It is my training set images which I will use in SVM.
What the problem can be?
I ran into the same issue. I was able to fix it by changing the cross check option of the Brute Force matcher to False. I think its because the cross check option keeps only a select few matches and removes the rest and messes up the indexes in the process

checking duplicate images with ORB

Currently i am working on checking duplicate images , so i am using ORB for that, the first part is almost complete, i have the descriptor vector of both the images, now as the second part i want to know how we calculate the scores using hamming distance, and what should be the threshold of saying that these are duplicates
img1 = gray_image15
img2 = gray_image25
# Initiate STAR detector
orb = cv2.ORB_create()
# find the keypoints with ORB
kp1 = orb.detect(img1,None)
kp2 = orb.detect(img2,None)
# compute the descriptors with ORB
kp1, des1 = orb.compute(img1, kp1)
kp2, des2 = orb.compute(img2, kp2)
matcher = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = matcher.match(des1, des2)
# Sort them in the order of their distance.
matches = sorted(matches, key = lambda x:x.distance)
i just want to know the next step in this process so that ultimately i can print yes or no for duplicates. i am using opencv3.0.0 with python 2.7
Once you obtain the descriptors, you can use a bag-of-words model to cluster the descriptors of the reference image, that is, build a vocabulary (visual words).
Then project the descriptors of the other image on to this vocabulary.
Then you can obtain a histogram showing the distribution of each of the visual words in the two images.
Compare these two histograms using a histogram comparison technique and use a threshold to detect the duplicates. For example, if you use Bhattacharyya distance, a low value means a good match.
I don't have a python implementation of this, but you can find something similar in c++ here.

Hu moments and SVM does not work

I have come across one problem when trying to train data with SVM.
I get some different regions (set of connected pixels) from face images, and regions from eyes are very similar, so I want to use Hu moments for shape description and SVM for training.
But SVM does not work properly, method svm.predict evaluates afterwards everything as non-eye, moreover the same regions which were labeled and used in traning phase as eye, are evaluated as non-eye.
Feature data consists only of 7 Hu moments. I will post here some samples of source code in a moment, thanks in advance :)
Additional info:
input image:
http://i.stack.imgur.com/GyLO0.png
Setting up basic svm for 1 image:
int image_regions = 10;
Mat training_mat(image_regions ,7,CV_32FC1); // 7 hu moments
Mat labels(image_regions ,1,CV_32FC1); // for labels 1 (eye) and -1 (non eye)
// computing hu moments
Moments moments2=moments(croppedImage,false);
double hu[7];
HuMoments(moments2,hu);
// putting them into svm traning mat
for (int k=0;k<huCounter;k++)
training_mat.at<float>(counter,k) = hu[k]; // counter is current number of region
if (isEye(...))
{
labels.at<float>(counter,0)=1.0;
}
else
{
labels.at<float>(counter,0)=-1.0;
}
//I use the following:
CvSVM svm;
CvSVMParams params;
params.svm_type = CvSVM::C_SVC;
params.kernel_type = CvSVM::LINEAR;
params.term_crit = cvTermCriteria(CV_TERMCRIT_ITER, 1000, 1e-6);
// ... do the above mentioned phase, and then:
svm.train(training_mat, labels, Mat(), Mat(), params);
I hope the following suggestions can help you…..
The simplest task is to use a clustering algorithm and try to cluster the data into two classes. If an algorithm like ‘k-means’ can do the job why make things complex by using SVM and Neural Nets. I suggest you use this technique because your feature vector dimension is of a very small size (7 Hu Moments) as well as your number of samples.
Perform feature Normalization (specified in point 4) to make sure the values fall in a limited range.
Check out “is your data really separable?” As your data is small, take a few samples from positive images and a few samples from negative images and plot the feature vectors. If you can visually see the difference surely any learning algorithm can do the job for you. As I said earlier simple tricks can do better than complex math.
Only if you then decide to use SVM you should know the following:
• As I can see from your code you are using a Linear SVM, may be your data is non-separable by a linear kernel. Try using some polynomial kernel or other kernels. There is one option bool CvSVM::train_auto in openCV just have a look.
• Try to check whether the feature vector values you are getting are proper values or not (make sure that they are not some garbage values).
• Also you can perform feature normalization “ZERO MEAN and UNIT VARIENCE” before you use it for training.
• Most importantly increase the number of images for training, both positively and negatively labeled.
• Last but not least SVM is not magic, at the end of the day it is just drawing a line between two sets of points. So don’t expect it to classify anything you give it as input.
If nothing works “Just improve your feature extraction technique”