OpenCV Face Recognition strange result - c++

I have been using OpenCV's SVM and RF for a multi-class face recognition problem with 11 classes and only 5 images per class. I used two kinds of features - initially a toy intensity image feature (just each image resized to 32x32 grayscale) and then the second feature was simply another toy feature using Tan Triggs preprocessing(link). Here is the feature code:
void Feature::makeFeature(cv::Mat &image, cv::Mat &result)
{
cv::resize( image, image, cv::Size(32, 32), 0, 0, cv::INTER_CUBIC );
cv::equalizeHist(image, image);
// Images must be aligned - Only pitch executed, yaw and roll assumed negligible
algmt->getAlignedImage( image, image ); // image alignment
// tan triggs
{
tan_triggs_preprocessing(image, result);
result = result.reshape(0, 1); // make a single row vector, needed for the training samples matrix
}
// if plain intensity
{
// image.copyTo(result);
// result.convertTo(result, CV_32F, 1.0f/255.0f);
// result = result.reshape(0, 1); // make a single row vector, needed for the training samples matrix
}
}
Where the tan_triggs_preprocessing function is the same as the Tan Triggs preprocessing function given in the link. I added one step - i normalized the result between 0 and 1.
The results on test for both were not very good, as expected, but then I made a silly mistake and discovered something strange: When I accidentally gave the training directory as input for both training and test, I get 100% results on the plain intensity feature, but the Tan Triggs feature gives the following as result:
SVM Training Complete
Total number of correct: 51 and accuracy: 92.7273
RF Training Complete
Total number of correct: 53 and accuracy: 96.3636
I do know however much you overfit the result should be perfect when the training set is input to test. Everything else is standard, both SVM and RF are standard as in the OpenCV examples. Besides I get 100% for plain intensity feature so of course I am mucking something up here when using Tan Triggs. Anyone has any idea what mistake I am making?
I have used other complex features like LTPs and LQPs without issue, but this preprocessing method is something I want to use. I use the Jain-Learned Miller congealing algorithm for alignment as I assume frontals for face recognition, no pose correction.

Related

How to align 2 images based on their content with OpenCV

I am totally new to OpenCV and I have started to dive into it. But I'd need a little bit of help.
So I want to combine these 2 images:
I would like the 2 images to match along their edges (ignoring the very right part of the image for now)
Can anyone please point me into the right direction? I have tried using the findTransformECC function. Here's my implementation:
cv::Mat im1 = [imageArray[1] CVMat3];
cv::Mat im2 = [imageArray[0] CVMat3];
// Convert images to gray scale;
cv::Mat im1_gray, im2_gray;
cvtColor(im1, im1_gray, CV_BGR2GRAY);
cvtColor(im2, im2_gray, CV_BGR2GRAY);
// Define the motion model
const int warp_mode = cv::MOTION_AFFINE;
// Set a 2x3 or 3x3 warp matrix depending on the motion model.
cv::Mat warp_matrix;
// Initialize the matrix to identity
if ( warp_mode == cv::MOTION_HOMOGRAPHY )
warp_matrix = cv::Mat::eye(3, 3, CV_32F);
else
warp_matrix = cv::Mat::eye(2, 3, CV_32F);
// Specify the number of iterations.
int number_of_iterations = 50;
// Specify the threshold of the increment
// in the correlation coefficient between two iterations
double termination_eps = 1e-10;
// Define termination criteria
cv::TermCriteria criteria (cv::TermCriteria::COUNT+cv::TermCriteria::EPS, number_of_iterations, termination_eps);
// Run the ECC algorithm. The results are stored in warp_matrix.
findTransformECC(
im1_gray,
im2_gray,
warp_matrix,
warp_mode,
criteria
);
// Storage for warped image.
cv::Mat im2_aligned;
if (warp_mode != cv::MOTION_HOMOGRAPHY)
// Use warpAffine for Translation, Euclidean and Affine
warpAffine(im2, im2_aligned, warp_matrix, im1.size(), cv::INTER_LINEAR + cv::WARP_INVERSE_MAP);
else
// Use warpPerspective for Homography
warpPerspective (im2, im2_aligned, warp_matrix, im1.size(),cv::INTER_LINEAR + cv::WARP_INVERSE_MAP);
UIImage* result = [UIImage imageWithCVMat:im2_aligned];
return result;
I have tried playing around with the termination_eps and number_of_iterations and increased/decreased those values, but they didn't really make a big difference.
So here's the result:
What can I do to improve my result?
EDIT: I have marked the problematic edges with red circles. The goal is to warp the bottom image and make it match with the lines from the image above:
I did a little bit of research and I'm afraid the findTransformECC function won't give me the result I'd like to have :-(
Something important to add:
I actually have an array of those image "stripes", 8 in this case, they all look similar to the images shown here and they all need to be processed to match the line. I have tried experimenting with the stitch function of OpenCV, but the results were horrible.
EDIT:
Here are the 3 source images:
The result should be something like this:
I transformed every image along the lines that should match. Lines that are too far away from each other can be ignored (the shadow and the piece of road on the right portion of the image)
By your images, it seems that they overlap. Since you said the stitch function didn't get you the desired results, implement your own stitching. I'm trying to do something close to that too. Here is a tutorial on how to implement it in c++: https://ramsrigoutham.com/2012/11/22/panorama-image-stitching-in-opencv/
You can use Hough algorithm with high threshold on two images and then compare the vertical lines on both of them - most of them should be shifted a bit, but keep the angle.
This is what I've got from running this algorithm on one of the pictures:
Filtering out horizontal lines should be easy(as they are represented as Vec4i), and then you can align the remaining lines together.
Here is the example of using it in OpenCV's documentation.
UPDATE: another thought. Aligning the lines together can be done with the concept similar to how cross-correlation function works. Doesn't matter if picture 1 has 10 lines, and picture 2 has 100 lines, position of shift with most lines aligned(which is, mostly, the maximum for CCF) should be pretty close to the answer, though this might require some tweaking - for example giving weight to every line based on its length, angle, etc. Computer vision never has a direct way, huh :)
UPDATE 2: I actually wonder if taking bottom pixels line of top image as an array 1 and top pixels line of bottom image as array 2 and running general CCF over them, then using its maximum as shift could work too... But I think it would be a known method if it worked good.

What accuracy should I expect from basic opencv ortho-rectification algorithms?

So, I'm taking over the work on an ortho-rectification algorithm that is intended to produce "accurate" results. I'm running into trouble trying to increase the accuracy and could use a little help.
Here is the basic approach.
Extract a calibration pattern from an image that was taken from a mobile phone.
Rectify the image based on a calibration pattern in the image
Scale the image to get the real world size of the scene around the pattern.
The calibration pattern is held against a flat surface, like a wall, counter, table, floor and the user takes a picture. With that picture, we want to measure artifacts on the same surface as the calibration pattern. We have tried this with calibration patterns ranging from the size of a credit card to a sheet of paper (8.5" x 11")
Here is an example input picture
With this resulting output image
Right now our measurements are usually within 1-2% of what we expect. This is sufficient for small areas (less than 25cm away from the calibration pattern. However, we'd like the algorithm to scale so that we can accurately measure a 2x2 meter area. However, at that size, the current error is too much (2-4 cm).
Here is the algorithm we are following.
// convert original image to grayscale and perform morphological dilation to reduce false matches when finding circle grid
Mat imgGray;
cvtColor(imgOriginal, imgGray, CV_BGR2GRAY);
// find calibration pattern in original image
Size patternSize(4, 11);
vector <Point2f> circleCenters_OriginalImage;
if (!findCirclesGrid(imgGray, patternSize, circleCenters_OriginalImage, CALIB_CB_ASYMMETRIC_GRID))
{
return false;
}
Point2f inputQuad[4];
inputQuad[0] = Point2f(circleCenters_OriginalImage[0].x, circleCenters_OriginalImage[0].y);
inputQuad[1] = Point2f(circleCenters_OriginalImage[3].x, circleCenters_OriginalImage[3].y);
inputQuad[2] = Point2f(circleCenters_OriginalImage[43].x, circleCenters_OriginalImage[43].y);
inputQuad[3] = Point2f(circleCenters_OriginalImage[40].x, circleCenters_OriginalImage[40].y);
// create model points for calibration pattern
vector <Point2f> circleCenters_ObjectSpace = GeneratePatternPointsInObjectSpace(circleCenters_OriginalImage[0], Distance(circleCenters_OriginalImage[0], circleCenters_OriginalImage[1]) / 2.0f, ioData.marker_up);
Point2f outputQuad[4];
outputQuad[0] = Point2f(circleCenters_ObjectSpace[0].x, circleCenters_ObjectSpace[0].y);
outputQuad[1] = Point2f(circleCenters_ObjectSpace[3].x, circleCenters_ObjectSpace[3].y);
outputQuad[2] = Point2f(circleCenters_ObjectSpace[43].x, circleCenters_ObjectSpace[43].y);
outputQuad[3] = Point2f(circleCenters_ObjectSpace[40].x, circleCenters_ObjectSpace[40].y);
Mat lambda(2,4,CV_32FC1);
lambda = Mat::zeros(imgOriginal.rows, imgOriginal.cols, imgOriginal.type());
lambda = getPerspectiveTransform(inputQuad, outputQuad);
warpPerspective(imgOriginal, imgOrthorectified, lambda, imgOrthorectified.size());
...
My Questions:
Is it reasonable to shoot for error < 0.25%? Is there a different algorithm that would yield more accurate results? What are the most valuable sources of error to identify and resolve?
As I've worked on this, I've also looked at removing pincushion / barrel distortions, and trying homographies to find the perspective transform. The best approaches I have found so far remain in the 1-2% error.
Any suggestions of where to go next would be really helpful

OpenCV: detectMultiScale() gives too many points out of the object

I trained my pc with opencv_traincascade all one day long to detect 2€ coins using more than 6000 positive images similar to the following:
Now, I have just tried to run a simple OpenCV program to see the results and to check the file cascade.xml. The final result is very disappointing:
There are many points on the coin but there are also many other points on the background. Could it be a problem with my positive images used for training? Or maybe, am I using the detectMultiScale() with wrong parameters?
Here's my code:
#include "opencv2/opencv.hpp"
using namespace cv;
int main(int, char**) {
Mat src = imread("2c.jpg", CV_LOAD_IMAGE_COLOR);
Mat src_gray;
std::vector<cv::Rect> money;
CascadeClassifier euro2_cascade;
cvtColor(src, src_gray, CV_BGR2GRAY );
//equalizeHist(src_gray, src_gray);
if ( !euro2_cascade.load( "/Users/lory/Desktop/cascade.xml" ) ) {
printf("--(!)Error loading\n");
return -1;
}
euro2_cascade.detectMultiScale( src_gray, money, 1.1, 0, CV_HAAR_FIND_BIGGEST_OBJECT|CV_HAAR_SCALE_IMAGE, cv::Size(10, 10),cv::Size(2000, 2000) );
for( size_t i = 0; i < money.size(); i++ ) {
cv::Point center( money[i].x + money[i].width*0.5, money[i].y + money[i].height*0.5 );
ellipse( src, center, cv::Size( money[i].width*0.5, money[i].height*0.5), 0, 0, 360, Scalar( 255, 0, 255 ), 4, 8, 0 );
}
//namedWindow( "Display window", WINDOW_AUTOSIZE );
imwrite("result.jpg",src);
}
I have also tried to reduce the number of neighbours but the effect is the same, just with many less points... Could it be a problem the fact that in positive images there are those 4 corners as background around the coin? I generated png images with Gimp from a shot video showing the coin, so I don't know why opencv_createsamples puts those 4 corners.
Those positive images are just plain wrong
The more "noise" you give your images on the parts of the training data then the more robust it will be, but yes the longer it will take to train. This is however where your negative sampels will come into action. If you have as many negative training samples as possible with as many ranges as possible then you will create more robust detectors. You need to make sure your positive images only have your coins in, all your negative images have everything but coins in them
I've seen a couple of your questions up to now & I think you want to detect three different types of euro coins. You would be best training three classifiers all on those different coins and then running all three on your images.
I think also you are missing a key piece of knowledge on how HAAR works (or LBP or whatever) effectively it creates a set of "features" from your positive images then tries to find those features in the images you run the classifier over. It creates these features by working out what is different between your positive images and your negative images. You don't want anything that isnt going to be the thing you are trying to detect in your positive images.
Edit 1 - An Example
Imagine creating a classifier for a road stop sign, which is a similar detection to coins. It's big, it's red & it's hexagonal. Creating a classifier for this is relatively easy - as long as you don't confuse the training stage with erroneous data.
Edit 2 - Scaling of images:
You have to also remember that when running the detection stage it takes your classifier and starts small and then scales up. Large, obvious features will get detected quicker - in my previous example big red blobs & hexagonal shapes. It would then start on small features i.e. text, or numbers.
Edit 3 - a much better example
This example shows you really well how training a cascade object detector works. In fact it even has the same example as with a stop sign!
To detect image of euro coin you can use several methods:
1) Train OpenCV cascade (HAAR or LBP). Don't forget use the great amount of false images. Also extend image of coin (add border).
2) Estimate image with abs gradients of original image. Use Hough Transform to detect circles (coin has shape of circle).

OpenCV Linear SVM not training

I've been stuck on this for some time now. OpenCV's SVM implementation doesn't seem to work for a linear kernel. I'm fairly sure there's no bug in the code: when I change the kernel_type to RBF or POLY, keeping everything else as is, it works.
The reason I say it doesn't work is, I save the generated model and check it out. It shows support vector count as 1. Which is not the case in RBF or POLYnomial kernels.
There's nothing special about the code in itself, I've used OpenCV's SVM implementation before, but never a linear kernel. I tried setting the degree to 1 in a POLY kernel and it results in the same model. Which makes me believe something is buggy here.
The code structure, if required:
Mat trainingdata; //acquire from files. done and correct.
Mat testingdata; //acquire from files. done and correct again.
Mat labels; //corresponding labels. checked and correct.
SVM my_svm;
SVMParams my_params;
my_params.svm_type = SVM::C_SVC;
my_params.kernel_type = SVM::LINEAR; //or poly, with my_params.degree = 1.
my_param.C = 0.02; //doesn't matter if I set it to 20000, makes no difference.
my_svm.train( trainingdata, labels, Mat(), Mat(), my_params );
//train_auto(..) function with 10-fold cross-validation takes the same time as above (~2sec)!
Mat responses;
my_svm.predict( testingdata, responses );
//responses matrix is all wrong.
I have 500 samples from one class and 600 from the other class to test, and the correct classifications I get are: 1/500 and 597/600.
Craziest part:
I have done the same experiment with the same data on libSVM's MATLAB wrapper, and it works. Was just trying to do an OpenCV version of it.
It is not a bug that you always get only one support vector with linear CvSVM.
OpenCV optimizes a linear SVM down to one support vector.
The idea here is that the support vectors define the classification margin, and to do the actual classification only the separating hyperplane is needed and it can be defined by only one vector.
Parameter C doesn't matter if your training data is linearly separable. Maybe it is your case.

Hu moments and SVM does not work

I have come across one problem when trying to train data with SVM.
I get some different regions (set of connected pixels) from face images, and regions from eyes are very similar, so I want to use Hu moments for shape description and SVM for training.
But SVM does not work properly, method svm.predict evaluates afterwards everything as non-eye, moreover the same regions which were labeled and used in traning phase as eye, are evaluated as non-eye.
Feature data consists only of 7 Hu moments. I will post here some samples of source code in a moment, thanks in advance :)
Additional info:
input image:
http://i.stack.imgur.com/GyLO0.png
Setting up basic svm for 1 image:
int image_regions = 10;
Mat training_mat(image_regions ,7,CV_32FC1); // 7 hu moments
Mat labels(image_regions ,1,CV_32FC1); // for labels 1 (eye) and -1 (non eye)
// computing hu moments
Moments moments2=moments(croppedImage,false);
double hu[7];
HuMoments(moments2,hu);
// putting them into svm traning mat
for (int k=0;k<huCounter;k++)
training_mat.at<float>(counter,k) = hu[k]; // counter is current number of region
if (isEye(...))
{
labels.at<float>(counter,0)=1.0;
}
else
{
labels.at<float>(counter,0)=-1.0;
}
//I use the following:
CvSVM svm;
CvSVMParams params;
params.svm_type = CvSVM::C_SVC;
params.kernel_type = CvSVM::LINEAR;
params.term_crit = cvTermCriteria(CV_TERMCRIT_ITER, 1000, 1e-6);
// ... do the above mentioned phase, and then:
svm.train(training_mat, labels, Mat(), Mat(), params);
I hope the following suggestions can help you…..
The simplest task is to use a clustering algorithm and try to cluster the data into two classes. If an algorithm like ‘k-means’ can do the job why make things complex by using SVM and Neural Nets. I suggest you use this technique because your feature vector dimension is of a very small size (7 Hu Moments) as well as your number of samples.
Perform feature Normalization (specified in point 4) to make sure the values fall in a limited range.
Check out “is your data really separable?” As your data is small, take a few samples from positive images and a few samples from negative images and plot the feature vectors. If you can visually see the difference surely any learning algorithm can do the job for you. As I said earlier simple tricks can do better than complex math.
Only if you then decide to use SVM you should know the following:
• As I can see from your code you are using a Linear SVM, may be your data is non-separable by a linear kernel. Try using some polynomial kernel or other kernels. There is one option bool CvSVM::train_auto in openCV just have a look.
• Try to check whether the feature vector values you are getting are proper values or not (make sure that they are not some garbage values).
• Also you can perform feature normalization “ZERO MEAN and UNIT VARIENCE” before you use it for training.
• Most importantly increase the number of images for training, both positively and negatively labeled.
• Last but not least SVM is not magic, at the end of the day it is just drawing a line between two sets of points. So don’t expect it to classify anything you give it as input.
If nothing works “Just improve your feature extraction technique”