OpenCV Neural Network Sigmoid Output - c++

I have been using OpenCV for a quite time. I decided to check its power for Machine Learning lately. So I ended up with implementing a neural network for face recognition. To summarize my strategy for face recognition :
Read images from a csv of some face database.
Roll images to a Mat array row wise.
Apply PCA for dimensionality reduction.
Use projections of PCA to train the network.
Predict the test data using the trained network.
So everything was OK until the prediction stage. I was using the max responsed output unit to classify the face. So normally OpenCV's sigmoid implementation should give values in range of -1 to 1 which is stated at the docs. 1 is the max closure to class. After I got nearly 0 accuracy I checked the output responses for each class for each test data. I was suprised with the values : 14.53, -1.7 , #IND . If sigmoid was applied, how could i get these values ? Where am i doing wrong ?
To help you understand the matter and for the ones wondering how to apply PCA and use it with NN I m sharing my code :
Reading csv:
void read_csv(const string& filename, vector& images, vector& labels, char separator = ';')
{
std::ifstream file(filename.c_str(), ifstream::in);
if (!file)
{
string error_message = "No valid input file was given, please check the given filename.";
CV_Error(1, error_message);
}
string line, path, classlabel;
while (getline(file, line))
{
stringstream liness(line);
getline(liness, path, separator);
getline(liness, classlabel);
if(!path.empty() && !classlabel.empty())
{
Mat im = imread(path, 0);
images.push_back(im);
labels.push_back(atoi(classlabel.c_str()));
}
}
}
Rolling images row by row :
Mat rollVectortoMat(const vector<Mat> &data)
{
Mat dst(static_cast<int>(data.size()), data[0].rows*data[0].cols, CV_32FC1);
for(unsigned int i = 0; i < data.size(); i++)
{
Mat image_row = data[i].clone().reshape(1,1);
Mat row_i = dst.row(i);
image_row.convertTo(row_i,CV_32FC1, 1/255.);
}
return dst;
}
Converting vector of labels to Mat of labels
Mat getLabels(const vector<int> &data,int classes = 20)
{
Mat labels(data.size(),classes,CV_32FC1);
for(int i = 0; i <data.size() ; i++)
{
int cls = data[i] - 1;
labels.at<float>(i,cls) = 1.0;
}
return labels;
}
MAIN
int main()
{
PCA pca;
vector<Mat> images_train;
vector<Mat> images_test;
vector<int> labels_train;
vector<int> labels_test;
read_csv("train1k.txt",images_train,labels_train);
read_csv("test1k.txt",images_test,labels_test);
Mat rawTrainData = rollVectortoMat(images_train);
Mat rawTestData = rollVectortoMat(images_test);
Mat trainLabels = getLabels(labels_train);
Mat testLabels = getLabels(labels_test);
int pca_size = 500;
Mat trainData(rawTrainData.rows, pca_size,rawTrainData.type());
Mat testData(rawTestData.rows,pca_size,rawTestData.type());
pca(rawTrainData,Mat(),CV_PCA_DATA_AS_ROW,pca_size);
for(int i = 0; i < rawTrainData.rows ; i++)
pca.project(rawTrainData.row(i),trainData.row(i));
for(int i = 0; i < rawTestData.rows ; i++)
pca.project(rawTestData.row(i),testData.row(i));
Mat layers = Mat(3,1,CV_32SC1);
int sz = trainData.cols ;
layers.row(0) = Scalar(sz);
layers.row(1) = Scalar(1000);
layers.row(2) = Scalar(20);
CvANN_MLP mlp;
CvANN_MLP_TrainParams params;
CvTermCriteria criteria;
criteria.max_iter = 1000;
criteria.epsilon = 0.00001f;
criteria.type = CV_TERMCRIT_ITER | CV_TERMCRIT_EPS;
params.train_method = CvANN_MLP_TrainParams::BACKPROP;
params.bp_dw_scale = 0.1f;
params.bp_moment_scale = 0.1f;
params.term_crit = criteria;
mlp.create(layers,CvANN_MLP::SIGMOID_SYM);
int i = mlp.train(trainData,trainLabels,Mat(),Mat(),params);
int t = 0, f = 0;
for(int i = 0; i < testData.rows ; i++)
{
Mat response(1,20,CV_32FC1);
Mat sample = testData.row(i);
mlp.predict(sample,response);
float max = -1000000000000.0f;
int cls = -1;
for(int j = 0 ; j < 20 ; j++)
{
float value = response.at<float>(0,j);
if(value > max)
{
max = value;
cls = j + 1;
}
}
if(cls == labels_test[i])
t++;
else
f++;
}
return 0;
}
NOTE: I used AT&T 's first 20 class for my dataset.

Thanks to Canberk Baci's comment I managed to overcome sigmoid output discrepancy. Problem seems to be at default parameters of mlp 's create function which takes alpha and beta 0 as default. When they both are given as 1, sigmoid function works as it was stated in the docs and neural network can predict something but with errors of course.
And for the results of Neural Network :
By modifying some parameters like momentum etc, and without any illumunation correction algorithm, I got %72 accuracy on the dataset of (randomly sampled 936 train, 262 test images ) first 20 classes of CroppedYaleB from opencv tutorials. For the other factors to increase accuracy; when I applied PCA, I directly gave the reduced dimension size as 500. This may also reduce accuracy because retained variance may be below %95 or worse. So when I have free time I will apply these to increase accuracy :
Tan Triggs illumination correction
Train PCA with 0.95 as pca size to retain %95 variance.
Modify neural network parameters (I wish we had a less parametric NN in OpenCV library)
I shared these so that someone may wonder how to increase the classification accuracy of NN. I hope it helps.
By the way you can track the issue about this here:
http://code.opencv.org/issues/3583

Related

Convolution using FFT gives a bad result

I'm trying to convolve an image using FFT. I use openCV so images are in Mat containers. I convert color image to gray image, then add a second channel for imaginary numbers that is all zero. Then I take this 2-channel Mat and convolve it with Prewitt's kernel. I get a result very different from the result I get when I use normal convolution algorithm. Left image is the output I get using FFT and right image is the output of normal convolution.
Below is the pseudo algorithm of how I do the operation;
Convert image Mat and kernel Mat to complex Mats by adding second channel (Result Mat type is CV_32FC2)
Assign all Mat elements to complex vectors
Zero pad the vectors to the same next power of 2
FFT the vectors
Signal multiply both vectors elementwise and assign result to result vector
Inverse FFT the result vector
Convert result vector to Mat
I think FFT algorithm is not the problem because when I take an image, FFT it, then inverse FFT it, I get the original image just fine. But I could be wrong. So here is the FFT algorithm. Notice how there are two of them. I use the second one. I also tried other FFT algorithms and they all output the same. FFT'ing and IFFT'ing same image only skips the signal multiplication step above. So I think that's where the problem is. Here is the code of the operation;
std::vector<cf> signalMultiplication(std::vector<cf> lh, std::vector<cf> rh) {
std::vector<cf> imVec = lh, kerVec = rh, resultVec;
resultVec.resize(imVec.size());
std::transform(imVec.begin(), imVec.end(), kerVec.begin(), resultVec.begin(), std::multiplies<cf>());
return resultVec;
}
I tried multiplying them using for loop but result was the same. I don't know the problem and I can't type the whole code here since it is too long, so tell me where you think the problem is and I'll give the code of that part.
#Paul below is the main body of the code;
cv::Mat convolution2D(cv::Mat image, cv::Mat kernel) {
cv::Mat imMat, kerMat;
imMat = convertToComplexMat(image);
kerMat = convertToComplexMat(kernel);
std::vector<cf> imVec, kerVec, resultVec;
imVec = matElementsToVector<cf>(imMat);
kerVec = matElementsToVector<cf>(kerMat);
float power = log2f(imVec.size());
if (abs(power - (int)power) == 0)
power++;
else
power = ceil(power);
zeroPadding(imVec, power);
zeroPadding(kerVec, power);
//FFT code I linked takes valarray as argument so I convert vectors to valarray and back
std::valarray<cf> imCArr(imVec.data(), imVec.size());
std::valarray<cf> kerCArr(kerVec.data(), kerVec.size());
fftRosetta(imCArr);
fftRosetta(kerCArr);
imVec.assign(std::begin(imCArr), std::end(imCArr));
kerVec.assign(std::begin(kerCArr), std::end(kerCArr));
resultVec = signalMultiplication(imVec, kerVec);
std::valarray<cf> resCArr(resultVec.data(), resultVec.size());
ifftRosetta(resCArr);
resultVec.assign(std::begin(resCArr), std::end(resCArr));
cv::Mat resultMat;
resultMat = vectorToMatElementsRowMajor(resultVec, imMat.rows, imMat.cols, imMat.type());
std::vector<cv::Mat> matVec;
cv::split(resultMat, matVec);
return matVec[0]; }
These are the custom functions;
convertToComplexMat, matElementsToVector, zeroPadding, fftRosetta, ifftRosetta, signalMultiplication, vectorToMatElementsRowMajor
signalMultiplication is posted, fftRosetta and ifftRosetta are linked so here, the rest of the functions;
using cf = std::complex<float>;
cv::Mat convertToComplexMat(cv::Mat imageMat) {
cv::Mat matOper;
if (imageMat.channels() == 3)
cv::cvtColor(imageMat, matOper, cv::COLOR_BGR2GRAY);
else
matOper = imageMat.clone();
matOper.convertTo(matOper, CV_32FC1);
cv::Mat compChannel = cv::Mat::zeros(matOper.rows, matOper.cols, CV_32FC1);
std::vector<cv::Mat> channels;
channels.push_back(matOper);
channels.push_back(compChannel);
cv::merge(channels, matOper);
return matOper;
}
template <typename T>
std::vector<T> matElementsToVector(cv::Mat operand) {
std::vector<T> vecOper;
int cn = operand.channels();
int lele = operand.total();
for (int i = 0; i < operand.total(); i++) {
if (cn == 1)
vecOper.push_back(operand.at<cv::Vec<T, 1>>(i)[0]);
else if (cn == 2) {
if (typeid(T) == typeid(cf)) {
T xd = operand.at<T>(i);
vecOper.push_back(xd);
}
else
for (int k = 0; k < cn; k++)
vecOper.push_back(operand.at<cv::Vec<T, 2>>(i)[k]);
}
else if (cn == 3)
for (int k = 0; k < cn; k++)
vecOper.push_back(operand.at<cv::Vec<T,3>>(i)[k]);
}
return vecOper;
}
void zeroPadding(std::vector<cf>& a, int power) {
int p, ioper;
if (power == -1)
p = ceil(log2f(a.size()));
else
p = power;
ioper = pow(2, p);
int size = a.size();
for (int i = 0; i < ioper - size; i++) {
a.push_back(0);
}
}
template <typename T>
cv::Mat vectorToMatElementsRowMajor(std::vector<T> operand, int mrows, int mcols, int mtype) {
cv::Mat matoper(mrows, mcols, mtype);
for (int j = 0; j < matoper.total(); j++) {
matoper.at<T>(j) = operand[j];
}
return matoper;
}
#Cris I tried it again with openCV DFT like you said, following the directions here. I applied DFT to image and kernel, then element-wise multiplied them, then applied IDFT. But result is something very different now. I can see resemblance of original image in there, but there are multiple shadows of it in different angles. I think the problem is how I do signal multiplication, but I can't find any answers on how to multiply 2D signals. Here is the code, output image is below it;
cv::Mat convolution2DopenCV(cv::Mat image, cv::Mat kernel) {
cv::Mat paddedImage, paddedKernel, imgOper, kerOper;
if (image.channels() == 3)
cv::cvtColor(image, imgOper, cv::COLOR_BGR2GRAY);
else
imgOper = image.clone();
kerOper = kernel;
int m = cv::getOptimalDFTSize(imgOper.rows);
int n = cv::getOptimalDFTSize(imgOper.cols);
cv::copyMakeBorder(imgOper, paddedImage, 0, m - imgOper.rows, 0, n - imgOper.cols, cv::BORDER_CONSTANT, cv::Scalar::all(0));
cv::copyMakeBorder(kerOper, paddedKernel, 0, m - kerOper.rows, 0, n - kerOper.cols, cv::BORDER_CONSTANT, cv::Scalar::all(0));
cv::Mat planesImage[] = { cv::Mat_<float>(paddedImage), cv::Mat::zeros(paddedImage.size(), CV_32F) };
cv::Mat cmpImgMat;
cv::merge(planesImage, 2, cmpImgMat);
cv::dft(cmpImgMat, cmpImgMat);
cv::Mat planesKernel[] = { cv::Mat_<float>(paddedKernel), cv::Mat::zeros(paddedKernel.size(), CV_32F) };
cv::Mat cmpKerMat;
cv::merge(planesKernel, 2, cmpKerMat);
cv::dft(cmpKerMat, cmpKerMat);
cv::Mat resultMat = cmpImgMat.mul(cmpKerMat);
cv::Mat planes[2];
cv::idft(resultMat, resultMat);
cv::split(resultMat, planes);
cv::normalize(planes[0], planes[0], 0, 255, cv::NORM_MINMAX);
return planes[0];
}
That's everything, if there is something I'm missing, let me know.

SVM predict on OpenCV: how can I extract the same number of features

I am play with OpenCV and SVM to make a classifier to predict facial expression. I have no problem to classify test dadaset, but when I try to predict a new image, I get this:
OpenCV Error: Assertion failed (samples.cols == var_count && samples.type() == CV_32F) in cv::ml::SVMImpl::predict
Error is pretty clear and I have a different number of columns, but of the same type.
I do not know how to achieve that, because I have a matrix of dimensions 1xnumber_of_features, but numbers_of_features is not the same of the trained and tested samples. How can I extract the same number of features from another image? Am I missing something?
To train classifier I did:
Detect face and save ROI;
Sift to extract features;
kmeans to cluster them;
bag of words to get the same numbers of features for each image;
pca to reduce;
train on train dadaset;
predict on test dadaset;
On the new image I did the same thing.
I tried to resize the new image to the same size, but nothing, same error ( and different number of columns, aka features). Vectors are of the same type (CF_32F).
[EDIT 1] Let's try to be more specific.
After succesfuly trained my classifier, I save SVM model in this way
svmClassifier->save(baseDatabasePath);
Then I load it when I need to do real time prediction in this way
cv::Ptr<cv::ml::SVM> svmClassifier;
svmClassifier = cv::ml::StatModel::load<ml::SVM>(path);
Then loop,
while (true)
{
getOneImage();
cv::Mat feature = extractFeaturesFromSingleImage();
float labelPredicted = svmClassifier->predict(feature);
cout << "Label predicted is: " << labelPredicted << endl;
}
But predict returns the error. feature dimension is 1x66, for example. As you can see below, I need like 140 features
<?xml version="1.0"?>
<opencv_storage>
<opencv_ml_svm>
<format>3</format>
<svmType>C_SVC</svmType>
<kernel>
<type>RBF</type>
<gamma>5.0625000000000009e-01</gamma></kernel>
<C>1.2500000000000000e+01</C>
<term_criteria><epsilon>1.1920928955078125e-07</epsilon>
<iterations>1000</iterations></term_criteria>
<var_count>140</var_count>
<class_count>7</class_count>
<class_labels type_id="opencv-matrix">
<rows>7</rows>
<cols>1</cols>
<dt>i</dt>
<data>
0 1 2 3 4 5 6</data></class_labels>
<sv_total>172</sv_total>
I do not know how achieve 140 features, when SIFT, FAST or SURF just give me around 60 features. What am I missing?
EDIT 2: I am going to try to be more formal: how can I put my real time sample on the same dimension of train and test dataset?
EDIT 3:
Extract features with sift and push on a vector of mat.
std::vector<cv::Mat> featuresVector;
for (int i = 0; i < numberImages; ++i)
{
cv::Mat face = cv::imread(facePath, CV_LOAD_IMAGE_GRAYSCALE);
cv::Mat featuresExtracted = runExtractFeature(face, featuresExtractionAlgorithm);
featuresVector.push_back(featuresExtracted);
}
Get total features extracted from all images.
int numberFeatures = 0;
for (int i = 0; i < featuresVector.size(); ++i)
{
numberFeatures += featuresVector[i].rows;
}
Prepare a mat to cluster features (I tried to follow this example)
cv::Mat featuresData = cv::Mat::zeros(numberFeatures, featuresVector[0].cols, CV_32FC1);
int currentIndex = 0;
for (int i = 0; i < featuresVector.size(); ++i)
{
featuresVector[i].copyTo(featuresData.rowRange(currentIndex, currentIndex + featuresVector[i].rows));
currentIndex += featuresVector[i].rows;
}
Perform clustering (I do not know how this parameter suite my case, my I think can be ok for now)
cv::Mat labels;
cv::Mat centers;
int binSize = 1000;
kmeans(featuresData, binSize, labels, cv::TermCriteria(cv::TermCriteria::COUNT + cv::TermCriteria::EPS, 100, 1.0), 3, KMEANS_PP_CENTERS, centers);
Prepare a mat to perform bow.
cv::Mat featuresDataHist = cv::Mat::zeros(numberImages, binSize, CV_32FC1);
for (int i = 0; i < numberImages; ++i)
{
cv::Mat feature = cv::Mat::zeros(1, binSize, CV_32FC1);
int numberImageFeatures = featuresVector[i].rows;
for (int j = 0; j < numberImageFeatures; ++j)
{
int bin = labels.at<int>(currentIndex + j);
feature.at<float>(0, bin) += 1;
}
cv::normalize(feature, feature);
feature.copyTo(featuresDataHist.row(i));
currentIndex += featuresVector[i].rows;
}
PCA to try to reduce dimension.
cv::PCA pca(featuresDataHist, cv::Mat(), CV_PCA_DATA_AS_ROW, 50/*0.90*/);
cv::Mat feature;
for (int i = 0; i < numberImages; ++i)
{
feature = pca.project(featuresDataHist.row(i));
}

In OpenCV, What does the function CvSVM::predict() returns 0 imply?

I was trying to use the SVM included in the OpenCV library. I made two labels: either 1 or -1. However, sometimes the predict() function returns 0. (It works as expected sometimes, when testing cases are close to my training data, but fails when the testing case is far away or exactly same as my training data) My guess is that the data can't be separated linearly. I couldn't find any information online about what the function returns 0 implies.
Also, if I should try to use the method for non-linearly separable data. What should the settings be for CvSVMParams?
Thank you.
Optional information: I am working on color recognition. So my training cases are all the pixels from a sample photo. Since each pixel contains 3 data(RGB), my array is like arr[numOfPixels][3].
int main(void){
//read two source img for the training data
Mat color1 = imread("zi.png");
Mat color2 = imread("lv.png");
//mark 0-2499 to label 1 and 2500-5000 to label -1
float labels[5000] = {1.0};
for(int c=2500;c<5000;c++){
labels[c] = -1.0;
}
Mat LabelsMat(5000,1,CV_32FC1,labels);
//get 5000 pixels training data
float trainingdata[5000][3] = {{0,0,0}};
int count=0;
for(int i=0;i<50;i++){
for(int j=0;j<50;j++){
trainingdata[count][0] = (float)color1.at<Vec3b>(i,j)[0];
trainingdata[count][1] = (float)color1.at<Vec3b>(i,j)[1];
trainingdata[count][2] = (float)color1.at<Vec3b>(i,j)[2];
cout<<trainingdata[count][0]<<","<<trainingdata[count][1]<<","<<trainingdata[count][2]<<endl;
count++;
}}
cout<<"green"<<endl;
for(int i=0;i<50;i++){
for(int j=0;j<50;j++){
trainingdata[count][0] =(float)color2.at<Vec3b>(i,j)[0];
trainingdata[count][1] =(float)color2.at<Vec3b>(i,j)[1];
trainingdata[count][2] =(float)color2.at<Vec3b>(i,j)[2];
cout<<trainingdata[count][0]<<","<<trainingdata[count][1]<<","<<trainingdata[count][2]<<endl;
count++;
}}
Mat trainingDataMat(5000,3,CV_32FC1,trainingdata);
//sets SVM Parameters
CvSVMParams params;
params.svm_type = CvSVM::C_SVC;
params.kernel_type = CvSVM::LINEAR;
params.term_crit = cvTermCriteria(CV_TERMCRIT_ITER, 100, 1e-6);
//train the SVM
CvSVM SVM;
SVM.train(trainingDataMat, LabelsMat, Mat(), Mat(), params);
//here's a quick test case
Mat sampleMat = (Mat_<float>(1,3) << 70,43,65);
cout<<(float)SVM.predict(sampleMat);
system("pause");
//set up the full testcase
Mat testcase = imread("zi.png");
if(!testcase.data)system("pause");
cout<<"testcase"<<endl;
float testdata[900][3] = {{0,0,0}};
count =0;
for(int i=0;i<30;i++){
for(int j=0;j<30;j++){
testdata[count][0]=(float)testcase.at<Vec3b>(i,j)[0];
testdata[count][1]=(float)testcase.at<Vec3b>(i,j)[1];
testdata[count][2]=(float)testcase.at<Vec3b>(i,j)[2];
cout<<testdata[count][0]<<","<<testdata[count][1]<<","<<testdata[count][2]<<endl;
count++;
}}
//test the testcase
for(int i=0;i<900;i++){
float temp[3] = {50,50,50};
Mat sampleMat(1,3,CV_32FC1,temp);
cout<< SVM.predict(sampleMat)<<endl;
}
system("pause");
return 0;
}

Texture Analysis using Local Binary Patterns (faces module)

Problem;
Since I didn't find an implementation of LBP in the OpenCV lib, I added the faces module to my OpenCV build. Which does have a LBP implementation.
I'm trying to do texture analysis using local binary patterns (lbp) to determine what is road and what are lines and what isn't road. So I divided my problem up in 3 'labels'. (Road / Lines / The rest )
After that I took 3 random sample images and made subimages with these labels. In other words I selected the road, lines, all the rest and used this as input for the training.
Training:
for (int i = 0; i < names.size(); i++) {
Mat trainingData = imread(names[i]), output;
if (!trainingData.empty()) {
cvtColor(trainingData, output, CV_RGB2GRAY);
int pos = names[i].find_last_of('_');
cv::String l = names[i].substr(pos + 1, 1);
const char *sub = l.c_str();
int label = atoi(sub);
grayscaleImages.push_back(output);
labels.push_back(label);
}
}
Ptr<LBPHFaceRecognizer> lbp = createLBPHFaceRecognizer(2, 4, 10, 10);
lbp->train(grayscaleImages, labels);
Using LBP to predict:
for (int i = 0; i < dataset.images.size(); i++) {
Mat image = (dataset.images[i]).clone();
Mat original = image.clone();
for (int i = 0 ; i < image.size().width; i += squareSize) {
for (int j = image.size().height / 2 ; j < image.size().height; j += squareSize) {
Mat subImage = image(Range(j, j + squareSize), Range(i, i + squareSize));
subImage = applySobel(subImage);
int predict = lbp->predict(subImage);
}
}
imshow("Image", original);
waitKey(0);
}
My question is why this isn't working? I'm kinda puzzeled about the prediction aswell. It doesn't seem to be anywhere near close to what I want it to do?
Do I have the implement LBP myself? Does the LBP implementation of the faces module do something special?
Do you have any suggestion on other methods for texture analysis specifically for the case of road / no road?
So here is a general overview of the files I'm using.
Input:
Selecting our training data:
Results of selecting: (every file has a _number to specify the label)
Final result after training on a random image of the same set. Purple represents road, green is line (or road marking).

OpenCV: color extraction based on Gaussian mixture model

I am trying to use opencv EM algorithm to do color extraction.I am using the following code based on example in opencv documentation:
cv::Mat capturedFrame ( height, width, CV_8UC3 );
int i, j;
int nsamples = 1000;
cv::Mat samples ( nsamples, 2, CV_32FC1 );
cv::Mat labels;
cv::Mat img = cv::Mat::zeros ( height, height, CV_8UC3 );
img = capturedFrame;
cv::Mat sample ( 1, 2, CV_32FC1 );
CvEM em_model;
CvEMParams params;
samples = samples.reshape ( 2, 0 );
for ( i = 0; i < N; i++ )
{
//from the training samples
cv::Mat samples_part = samples.rowRange ( i*nsamples/N, (i+1)*nsamples/N);
cv::Scalar mean (((i%N)+1)*img.rows/(N1+1),((i/N1)+1)*img.rows/(N1+1));
cv::Scalar sigma (30,30);
cv::randn(samples_part,mean,sigma);
}
samples = samples.reshape ( 1, 0 );
//initialize model parameters
params.covs = NULL;
params.means = NULL;
params.weights = NULL;
params.probs = NULL;
params.nclusters = N;
params.cov_mat_type = CvEM::COV_MAT_SPHERICAL;
params.start_step = CvEM::START_AUTO_STEP;
params.term_crit.max_iter = 300;
params.term_crit.epsilon = 0.1;
params.term_crit.type = CV_TERMCRIT_ITER|CV_TERMCRIT_EPS;
//cluster the data
em_model.train ( samples, Mat(), params, &labels );
cv::Mat probs;
probs = em_model.getProbs();
cv::Mat weights;
weights = em_model.getWeights();
cv::Mat modelIndex = cv::Mat::zeros ( img.rows, img.cols, CV_8UC3 );
for ( i = 0; i < img.rows; i ++ )
{
for ( j = 0; j < img.cols; j ++ )
{
sample.at<float>(0) = (float)j;
sample.at<float>(1) = (float)i;
int response = cvRound ( em_model.predict ( sample ) );
modelIndex.data [ modelIndex.cols*i + j] = response;
}
}
My question here is:
Firstly, I want to extract each model, here totally five, then store those corresponding pixel values in five different matrix. In this case, I could have five different colors seperately. Here I only obtained their indexes, is there any way to achieve their corresponding colors here? To make it easy, I can start from finding the dominant color based on these five GMMs.
Secondly, here my sample datapoints are "100", and it takes about nearly 3 seconds for them. But I want to do all these things in no more than 30 milliseconds. I know OpenCV background extraction, which is using GMM, performs really fast, below 20ms, that means, there must be a way for me to do all these within 30 ms for all 600x800=480000 pixels. I found predict function is the most time consuming one.
First Question:
In order to do color extraction you first need to train the EM with your input pixels. After that you simply loop over all the input pixels again and use predict() to classify each of them. I've attached a small example that utilizes EM for foreground/background separation based on colors. It shows you how to extract the dominant color (mean) of each gaussian and how to access the original pixel color.
#include <opencv2/opencv.hpp>
int main(int argc, char** argv) {
cv::Mat source = cv::imread("test.jpg");
//ouput images
cv::Mat meanImg(source.rows, source.cols, CV_32FC3);
cv::Mat fgImg(source.rows, source.cols, CV_8UC3);
cv::Mat bgImg(source.rows, source.cols, CV_8UC3);
//convert the input image to float
cv::Mat floatSource;
source.convertTo(floatSource, CV_32F);
//now convert the float image to column vector
cv::Mat samples(source.rows * source.cols, 3, CV_32FC1);
int idx = 0;
for (int y = 0; y < source.rows; y++) {
cv::Vec3f* row = floatSource.ptr<cv::Vec3f > (y);
for (int x = 0; x < source.cols; x++) {
samples.at<cv::Vec3f > (idx++, 0) = row[x];
}
}
//we need just 2 clusters
cv::EMParams params(2);
cv::ExpectationMaximization em(samples, cv::Mat(), params);
//the two dominating colors
cv::Mat means = em.getMeans();
//the weights of the two dominant colors
cv::Mat weights = em.getWeights();
//we define the foreground as the dominant color with the largest weight
const int fgId = weights.at<float>(0) > weights.at<float>(1) ? 0 : 1;
//now classify each of the source pixels
idx = 0;
for (int y = 0; y < source.rows; y++) {
for (int x = 0; x < source.cols; x++) {
//classify
const int result = cvRound(em.predict(samples.row(idx++), NULL));
//get the according mean (dominant color)
const double* ps = means.ptr<double>(result, 0);
//set the according mean value to the mean image
float* pd = meanImg.ptr<float>(y, x);
//float images need to be in [0..1] range
pd[0] = ps[0] / 255.0;
pd[1] = ps[1] / 255.0;
pd[2] = ps[2] / 255.0;
//set either foreground or background
if (result == fgId) {
fgImg.at<cv::Point3_<uchar> >(y, x, 0) = source.at<cv::Point3_<uchar> >(y, x, 0);
} else {
bgImg.at<cv::Point3_<uchar> >(y, x, 0) = source.at<cv::Point3_<uchar> >(y, x, 0);
}
}
}
cv::imshow("Means", meanImg);
cv::imshow("Foreground", fgImg);
cv::imshow("Background", bgImg);
cv::waitKey(0);
return 0;
}
I've tested the code with the following image and it performs quite good.
Second Question:
I've noticed that the maximum number of clusters has a huge impact on the performance. So it's better to set this to a very conservative value instead of leaving it empty or setting it to the number of samples like in your example. Furthermore the documentation mentions an iterative procedure to repeatedly optimize the model with less-constrained parameters. Maybe this gives you some speed-up. To read more please have a look at the docs inside the sample code that is provided for train() here.