I am trying to build a soft for face recognition using OpenCv with C++ implementation. The problem i am having is that i have a lot of faces, more than 10 million, which can not be labeled just using INT type(vector /int/ labels). I tried using long long int, but i get this error after some time of training(after 0.5 seconds):
OpenCV Error: Bad argument (Labels must be given as integer (CV_32SC1). Expected 4, but was 7.) in train, file /home/roka/opencv-2.4.6.1/modules/contrib/src/facerec.cpp, line 349 terminate called after throwing an instance of 'cv::Exception' what(): /home/roka/opencv-2.4.6.1/modules/contrib/src/facerec.cpp:349: error: (-5) Labels must be given as integer (CV_32SC1). Expected 4, but was 7. in function train
if you're going for millions of faces, you 've got more problems, than ints being too small to label them.
the eigenface approach will need far more memory than can fit into a single machine, as you need to give it all available data in one go for the training.
you'll have to break that data into several chunks, and choose an algo ( i.e, lbp ) that can live with independant measurements combined. also you'll need a separate index, that translates int-labels (from a local recognition) into global bigint labels again
Related
I am learning about Two Dimensional Neuron Network so I am facing many obstacles but I believe it is worth it and I am really enjoying this learning process.
Here's my plan: To make a 2-D NN work on recognizing images of digits. Images are 5 by 3 grids and I prepared 10 images from zero to nine. For Example this would be number 7:
Number 7 has indexes 0,1,2,5,8,11,14 as 1s (or 3,4,6,7,9,10,12,13 as 0s doesn't matter) and so on. Therefore, my input layer will be a 5 by 3 neuron layer and I will be feeding it zeros OR ones only (not in between and the indexes depends on which image I am feeding the layer).
My output layer however will be one dimensional layer of 10 neurons. Depends on which digit was recognized, a certain neuron will fire a value of one and the rest should be zeros (shouldn't fire).
I am done with implementing everything, I have a problem in computing though and I would really appreciate any help. I am getting an extremely high error rate and an extremely low (negative) output values on all output neurons and values (error and output) do not change even on the 10,000th pass.
I would love to go further and post my Backpropagation methods since I believe the problem is in it. However to break down my work I would love to hear some comments first, I want to know if my design is approachable.
Does my plan make sense?
All the posts are speaking about ranges ( 0->1, -1 ->+1, 0.01 -> 0.5 etc ), will it work for either { 0 | .OR. | 1 } on the output layer and not a range? if yes, how can I control that?
I am using TanHyperbolic as my transfer function. Does it make a difference between this and sigmoid, other functions.. etc?
Any ideas/comments/guidance are appreciated and thanks in advance
Well, by the description given above, I think that the design and approach taken it's correct! With respect to the choice of the activation function, remember that those functions help to get the neurons which have the largest activation number, also, their algebraic properties, such as an easy derivative, help with the definition of Backpropagation. Taking this into account, you should not worry about your choice of activation function.
The ranges that you mention above, correspond to a process of scaling of the input, it is better to have your input images in range 0 to 1. This helps to scale the error surface and help with the speed and convergence of the optimization process. Because your input set is composed of images, and each image is composed of pixels, the minimum value and and the maximum value that a pixel can attain is 0 and 255, respectively. To scale your input in this example, it is essential to divide each value by 255.
Now, with respect to the training problems, Have you tried checking if your gradient calculation routine is correct? i.e., by using the cost function, and evaluating the cost function, J? If not, try generating a toy vector theta that contains all the weight matrices involved in your neural network, and evaluate the gradient at each point, by using the definition of gradient, sorry for the Matlab example, but it should be easy to port to C++:
perturb = zeros(size(theta));
e = 1e-4;
for p = 1:numel(theta)
% Set perturbation vector
perturb(p) = e;
loss1 = J(theta - perturb);
loss2 = J(theta + perturb);
% Compute Numerical Gradient
numgrad(p) = (loss2 - loss1) / (2*e);
perturb(p) = 0;
end
After evaluating the function, compare the numerical gradient, with the gradient calculated by using backpropagation. If the difference between each calculation is less than 3e-9, then your implementation shall be correct.
I recommend to checkout the UFLDL tutorials offered by the Stanford Artificial Intelligence Laboratory, there you can find a lot of information related to neural networks and its paradigms, it's worth to take look at it!
http://ufldl.stanford.edu/wiki/index.php/Main_Page
http://ufldl.stanford.edu/tutorial/
I've pasted all my code here in case you'd need that to understand my question: Plotting a graph on axes but getting no results while trying to classify image based on HoG features
My question is: given approximately 500 images (the Caltech "Cars 2001" dataset) with 48 HoG features each, what possible reasons can there be for the boosting to terminate early? What could cause a perfect fit, or a problem with the boosted sample weights, and how can such problems be solved? The specific algorithm I'm using is SAMME, a multiclass Adaboost classifier. I'm using Python 2.7 on Anaconda.
When I checked certain variables during the classification of my dataset, setting the n_estimators parameter to be 600, I found that:
discrete_test_errors: consisted of 1 item instead of being an array of 600 values
discrete_estimator_errors: was again one single value instead of of being an array of 600 values
real_test_errors is just one item again instead of 600
discrete_estimator_weights: array ([1.]) "
n_trees_discrete and n_trees_real: 1 instead of 600
Problem solved. My training array, labels, had no negatives, it consisted only of labels for one class. Of course the training would terminate immediately!
In the superresolution (gpu/super_resolution.cpp) sample (built with vc11 compiler) the the following line:
//Ptr superRes;
superRes->nextFrame(result);
results the following error error (tried with multipe test videos):
http://i.imgbox.com/abwNaL3z.jpg
and if I change the optical flow method to simple, it takes forever to run (stopped 30 min with an i7 2600k)
Any idea?
The BTV SuperResolution algorithm was oriented for small input videos. And it use a lot of memory for inner buffers. Your video has large resolution [768 x 576] and you upscale it with factor 4. Try to reduce scale factor, temporal radius or input resolution (for example upscale only a part of frame).
Some details about my problem:
I'm trying to realize corner detector in openCV (another algorithm, that are built-in: Canny, Harris, etc).
I've got a matrix filled with the response values. The biggest response value is - the biggest probability of corner detected is.
I have a problem, that in neighborhood of a point there are few corners detected (but there is only one). I need to reduce number of false-detected corners.
Exact problem:
I need to walk through the matrix with a kernel, calculate maximum value of every kernel, leave max value, but others values in kernel make equal zero.
Are there build-in openCV functions to do this?
This is how I would do it:
Create a kernel, it defines a pixels neighbourhood.
Create a new image by dilating your image using this kernel. This dilated image contains the maximum neighbourhood value for every point.
Do an equality comparison between these two arrays. Wherever they are equal is a valid neighbourhood maximum, and is set to 255 in the comparison array.
Multiply the comparison array, and the original array together (scaling appropriately).
This is your final array, containing only neighbourhood maxima.
This is illustrated by these zoomed in images:
9 pixel by 9 pixel original image:
After processing with a 5 by 5 pixel kernel, only the local neighbourhood maxima remain (ie. maxima seperated by more than 2 pixels from a pixel with a greater value):
There is one caveat. If two nearby maxima have the same value then they will both be present in the final image.
Here is some Python code that does it, it should be very easy to convert to c++:
import cv
im = cv.LoadImage('fish2.png',cv.CV_LOAD_IMAGE_GRAYSCALE)
maxed = cv.CreateImage((im.width, im.height), cv.IPL_DEPTH_8U, 1)
comp = cv.CreateImage((im.width, im.height), cv.IPL_DEPTH_8U, 1)
#Create a 5*5 kernel anchored at 2,2
kernel = cv.CreateStructuringElementEx(5, 5, 2, 2, cv.CV_SHAPE_RECT)
cv.Dilate(im, maxed, element=kernel, iterations=1)
cv.Cmp(im, maxed, comp, cv.CV_CMP_EQ)
cv.Mul(im, comp, im, 1/255.0)
cv.ShowImage("local max only", im)
cv.WaitKey(0)
I didn't realise until now, but this is what #sansuiso suggested in his/her answer.
This is possibly better illustrated with this image, before:
after processing with a 5 by 5 kernel:
solid regions are due to the shared local maxima values.
I would suggest an original 2-step procedure (there may exist more efficient approaches), that uses opencv built-in functions :
Step 1 : morphological dilation with a square kernel (corresponding to your neighborhood). This step gives you another image, after replacing each pixel value by the maximum value inside the kernel.
Step 2 : test if the cornerness value of each pixel of the original response image is equal to the max value given by the dilation step. If not, then obviously there exists a better corner in the neighborhood.
If you are looking for some built-in functionality, FilterEngine will help you make a custom filter (kernel).
http://docs.opencv.org/modules/imgproc/doc/filtering.html#filterengine
Also, I would recommend some kind of noise reduction, usually blur, before all processing. That is unless you really want the image raw.
I'm trying to use SVM with Weka framework. So i'm using Libsvm. I'm new to SVM and reading the guide on the site of Libsvm I read that is possible to discover optimal parameters for SVM (cost and gamma) using GridSearch. So i choose Grid Search on Weka and I obtained a bad classification results (TN rate around 1%). So how do I have to interpret these results? If using optimal parameter I got bad results is there no chance for me to get better classification?What I mean is: Grid Search give me the Best results that i can obtain using SVM?
My dataset is formed by 1124 instances (89% negative class, 11% positive class) and there are 31 attributes (2 of them are nominal others are numeric). I'm using a cross validation (10-fold) on the whole dataset to test the model.
I tried to use GridSearch (I normalized each attribute values between 0 and 1, no features selection but I change class value from 0 and 1 to 1 and -1 accroding to SVM theory but T don't know if it useful) with these parameters: cost from 1 to 18 with 1.0 step and gamma from -5 to 10 with 1.0 step. Results are sensitivity 93,6% and specificity 64.8% but these takes around 1 hour to complete computation!!
I'd like to get better results compared with decision tree. Using Features Selection (Info Gain ranking) + SMOTE oversampling + Cost Sensitive Learning I obtained sensitivity 91% and specificity 80%. Is there a way to tune SVM without trying every possible range of values for cost and gamma?