OpenCV Random Decision Forest: How to get posterior probability - c++

I did research on multiple websites, but I couldn't find any solution.
Here's the problem:
I am implementing a pixel-wise classification using RTrees from OpenCV. I need the posterior probability for each class. I tried to get it via cv::ml::StatModel::predict(), but the output matrix only contains the predicted value. Is there another way to get the posterior probability from RTrees?
PS: I'm still quite new to Machine Learning, so please forgive me my lack of knowledge ^^"

Instead of using cv::ml::StatModel::predict, you could refer to the cv::ml::RTrees::getVotes member function. This way, in case of classification, you get the number of trees which voted for each class for given sample. By dividing these numbers of votes by the forest size you get an approximation of posterior probabilities.
The getVotes function should be called instead of predict like this:
cv::Mat samples = [one or multiple samples (their feature vectors)]
cv::Mat votes;
classifier.getVotes(sample, votes, 0);
// provide 0 here unless you would like to manipulate with RTrees flags
What you should be aware of is that the votes matrix is going to have one more row than the number of samples. In this first row there are your classes enumerated (in ascending order if I remember well from the OpenCV source code).
The answer is up to date as of the 3.4.1 version of OpenCV.

Related

OpenCv/C++ - Find similarities from a picture with a big database easily

I would like to do a comparison from a query with pictures in a database (about 2000).
Before posting on this website i read a lot of papers concerning methods for matching a picture in a big database and read a lot of posts on stackOverflow.
Concerning papers, there are some stuff interesting but quite technical and difficult to understand well the algorithms. (I just began to specialize myself in this field)
Posts (the most interesting) :
Simple and fast method to compare images for similarity ;
Nearest neighbors in high-dimensional data? ;
How to understand Locality Sensitive Hashing? ;
Image fingerprint to compare similarity of many images ;
C++/SIFT/SQL - If there a way to compare efficiently a SIFT descriptor of an image with a SIFT descriptor in a SQL database?
Papers :
Object retrieval with large vocabularies and fast spatial matching,
Image Similarity Search with Compact Data Structures,
LSH,
Near Duplicate Image Detection min-Hash and tf-idf Weighting
Vocabulary tree
Aggregating locals descriptors
But i'm still confusing.
The first thing i did is to implement BoW. I trained the Bag of Words (with ORB as detector and descriptor ,and used VLAD features) with 5 class in order to test its efficiency. After a long training, i launched it. It functioned well with an accuracy of 94 %. That's pretty good.
But there is a problem for me:
I don't want to do a classification. In my database, i'll have about 2000 differents pictures. I just want to find the best matches between my query and the database.
So if i have 2000 differents pictures,if i'm logical i have to consider these 2000 pictures as 2000 differents class and obviously that's impossible...
For this first thing, are you agree with me ? It's not obviously the best method to do what i would like ?
Maybe there is another way to use BoW in order to find similarities in the database ?
The second thing i did is « more simpler ».
I compute the descriptors of my query. Then i did a loop over all my database and i computed the descriptors of each picture and then added each descriptors in a vector.
std::vector<cv::Mat> all_descriptors_database;
for (i → 2000) :
cv::Mat request=cv::imread(img);
computeKeypoints(request) ;
computeDescriptors(request) ;
all_descriptors_database.pushback(descriptors_of_request)
At the end i have a big vector which contains all the descriptors of the all database. (The same with all the keypoints)
Then, this is here where i get confused.
At the beginning, i wanted to compute the matching inside the loop that is to say, for each image in the database, compute its descriptors and do a match with the query. But it tooks a lot of time.
So after reading a lot of paper about how find similarities in big databases, i found the LSH algorithm which seems to be appropriate for that kind of search.
Therefore i wanted to use this method.
So inside my loop i did something like that :
//Create Flann LSH index
cv::flann::Index flannIndex(all_descriptors_database.at(i), cv::flann::LshIndexParams(12, 20, 2), cvflann::FLANN_DIST_HAMMING);
cv::Mat results, dists;
int k=2; // find the 2 nearest neighbors
// search (nearest neighbor)
flannIndex.knnSearch(query_descriptors, results, dists, k, cv::flann::SearchParams() );
However i have some questions :
It tooks more than 5 seconds to loop all my database (2000) whereas i thought it will take less 1s (on the papers, they have huge databases not like me and LSH is more efficient). Did i do something wrong ?
I found on the internet some libraries which implement LSH like http://lshkit.sourceforge.net/ or http://www.mit.edu/~andoni/LSH/ . So what is the difference between these libraries and the four line of code i wrote using OpenCV ? Because i checked the libraries and for a kind of beginner like me, it was so difficult to try to use it.I got a bit confused.
The third thing :
I wanted to do a kind of fingerprint of each descriptors for each picture (in order to compute the Hamming distance with the database) but it seems to be impossible to do that. OpenCV / SURF How to generate a image hash / fingerprint / signature out of the descriptors?
So since 3 days, i'm blocked on that task. I don't know if i'm on the wrong way or not.
Maybe i missed something.
I hope it will be enough clear for you. Thank for reading
Your question is kind of big. I'll give you some hints, though.
Bag of Words can work but classification is unnecessary. BoW pipeline typically consists of:
keypoint detection - ORB
keypoint description (feature extraction) - ORB
quantization - VLAD (fisher encoding might be better, but plain old kmeans might be enough in your case)
classification - you probably can skip this stage
You can treat quantization results (e.g. VLAD encoding) for each image as its fingerprint. Computing distance between fingerprints will yield a similarity measure. Still you have to do a 1 vs all matching, which is going to be tremendously expensive when your database gets big enough.
I didn't get your point.
I'd suggest reading G. Hinton's papers (e.g. this one) on dimensionality reduction with deep autoencoders and convolutional neural networks. He boasts of beating LSH. As for the tools, I'd recommend taking a look on BVLC's Caffe, a great neural network library.

cvSVM training produces poor results for HOGDescriptor

My objective is to train an SVM and get support vectors which i can plug into opencv's HOGdescriptor for object detection.
I have gathered 4000~ positives and 15000~ negatives and I train using the SVM provided by opencv. the results give me too many false positives.(up to 20 per image) I would clip out the false positives and add them into the pool of negatives to retrain. and I would end up with even more false positives at times! I have tried adjusting L2HysThreshold of my hogdescriptor upwards to 300 without significant improvement. is my pool of positives and negatives large enough?
the SVM training is also much faster than expected. I have tried with a feature vector size of 2916 and 12996, using grayscale images and color images on separate tries. SVM training has never taken longer than 20 minutes. I use auto_train. I am new to machine learning but from what i hear training with a dataset as large as mine should take at least a day no?
I believe cvSVM is not doing much learning and according to http://opencv-users.1802565.n2.nabble.com/training-a-HOG-descriptor-td6363437.html, it is not suited for this purpose. does anyone with experience with cvSVM have more input on this?
I am considering using SVMLight http://svmlight.joachims.org/ but it looks like there isn't a way to visualize the SVM hyperplane. What are my options?
I use opencv2.4.3 and have tried the following setsups for hogdescriptor
hog.winSize = cv::Size(100,100);
hog.cellSize = cv::Size(5,5);
hog.blockSize = cv::Size(10,10);
hog.blockStride = cv::Size(5,5); //12996 feature vector
hog.winSize = cv::Size(100,100);
hog.cellSize = cv::Size(10,10);
hog.blockSize = cv::Size(20,20);
hog.blockStride = cv::Size(10,10); //2916 feature vector
Your first descriptor dimension is way too large to be any useful. To form any reliable SVM hyperplane, you need at least the same number of positive and negative samples as your descriptor dimensions. This is because ideally you need separating information in every dimension of the hyperplane.
The number of positive and negative samples should be more or less the same unless you provide your SVM trainer with a bias parameter (may not be available in cvSVM).
There is no guarantee that HOG is a good descriptor for the type of problem you are trying to solve. Can you visually confirm that the object you are trying to detect has a distinct shape with similar orientation in all samples? A single type of flower for example may have a unique shape, however many types of flowers together don't have the same unique shape. A bamboo has a unique shape but may not be distinguishable from other objects easily, or may not have the same orientation in all sample images.
cvSVM is normally not the tool used to train SVMs for OpenCV HOG. Use the binary form of SVMLight (not free for commercial purposes) or libSVM (ok for commercial purposes). Calculate HOGs for all samples using your C++/OpenCV code and write it to a text file in the correct input format for SVMLight/libSVM. Use either of the programs to train a model using linear kernel with the optimal C. Find the optimal C by searching for the best accuracy while changing C in a loop. Calculate the detector vector (a N+1 dimensional vector where N is the dimension of your descriptor) by finding all the support vectors, multiplying alpha values by each corresponding support vector, and then for each dimension adding all the resulting alpha * values to find an ND vector. As the last element add -b where b is the hyperplane bias (you can find it in the model file coming out of SVMLight/libSVM training). Feed this N+1 dimensional detector to HOGDescriptor::setSVMDetector() and use HOGDescriptor::detect() or HOGDescriptor::detectMultiScale() for detection.
I have had successful results using SVMLight to learn SVM models when training from OpenCV, but haven't used cvSVM, so can't compare.
The hogDraw function from http://vision.ucsd.edu/~pdollar/toolbox/doc/index.html will visualise your descriptor.

Estimate color distribution with Gaussian mixture model

I am trying to use two Gaussian mixtures with EM algorithm to estimate color distribution of a video frame. For that, I want to use two separate peaks in the color distribution as the two Gaussian means to facilitate the EM calculation. I have several difficulties with the implementation of these in OpenCV.
My first question is: how can I determine the two peaks? I've searched about peak estimation in OpenCV, but still couldn't find any seperate function. So I am going to determine two regions, then find their maximum values as peaks. Is this way correct?
My second question is: how to perform Gaussian mixture model with EM in OpenCV? As far as I know, the "cv::EM::predict" function could give me the index of the most probable mixture component. But I have difficulties with training EM. I've searched and found some other codes, but finding the correct parameters is too much difficult for. Could someone provide me any example code for this? Thank you in advance.
#ederman, try {OpenCV library location}\opencv\samples\cpp\em.cpp instead of the web link. I think the sample code in the link is out of date now. I have successfully compiled the sample code in OpenCV 2.3.1. It shouldn't be a problem for 2.4.2.
Good luck:)
My first question is: how can I determine the two peaks?
I would iterate through the range of sample values possible, and test when the does EM.predict(sample)[0] peaks.

OpenCV's KNN Unknown Classifications

At the moment I am using OpenCV's KNN implementation to classify images. It currently classifies images into P, S or rectangle, and correctly. However if I feed it an image of noise it will attempt to classify it as 1 of the 3 classifications I stated earlier. To get it to classify as noise, should I train the KNN to put noise in a 'noise' category, or is there some kind of accuracy rating I can use?
The way to do it is to use the dists variable in the knn_nearest function. It spits out the distance between your vector and the K unit vectors, the further the distance the less they have in common with the test data.
yes, but i wouldnt advise it. If you have a classifier which is good at distinguishing between oranges and apples, you shouldn't try making it recognizes "not a fruit". First because you can feed wrong inputs to almost anything, second because it will lower its original performance, and third because you need noise to have a pattern. How do you define noise??

Implementation of Non Local Means Noise reduction algorithm in image processing

I am working over implementation of Non Local Means noise reduction algorithm in C++ . There are papers on this algorithm (such as this paper), but they are also not very clear on it.
I know, it is using the weighted mean but I don't know what is the use of research window here and how is it related to comparison window.
Being a new user, StackOverflow is not allowing me to upload images. but, you can find formula under the nl means section the link provided above.
From the paper you refer to, when determining the result value for a given pixel p, all the other pixels of the image will be weighted and summed according to the similarity between their neighborhoods and the neighborhood of the pixel p.
But that is computationally very expensive. So the authors restrict the number of pixels which will contribute to the weighted sum; that must be what you call the search window. This search window is a 21x21 region centered on the pixel p. The neighborhoods being compared are of size 7x7 (section 5).
I could make a prototype quickly with Mathematica and I confirm it becomes very costly when the size of the search window increases. I expect the same behavior when you implement in C++.
There's some GPL'd C++ code along with a brief writeup of the algorithm by the original authors here: http://www.ipol.im/pub/algo/bcm_non_local_means_denoising/
This had been added to OpenCV
http://docs.opencv.org/modules/photo/doc/denoising.html