Using OpenCV Cascade - Working solely with haartraning XML file - c++

I'm trying to implement the Viola Johns face detection algorithm on Cuda platform (I'm aware that openCV already did that, I do that for my school).
My first phase is to implement the algorithm on CPU.
I'm using openCV library, I know openCV knows how to do face detection, In order to understand, I would like to get back to basic and do it my own way.
I created the integral sum representation, and the squere sum integral representation using openCV function.
I iterated through the cascade. iterated through the stages, classfiers and rects. Normalized each window, calculated the sum of each classifer and compared to the threshold, Sadly it's seems like I'm missing something. because I can't detect faces.
It seems like I need to get better understanding of the the cascade XML file.
Here is an example:
<!-- tree 158 -->
<_>
<!-- root node -->
<feature>
<rects>
<_>3 6 2 2 -1.</_>
<_>3 6 1 1 2.</_>
<_>4 7 1 1 2.</_></rects>
<tilted>0</tilted></feature>
<threshold>2.3729570675641298e-003</threshold>
<left_val>0.4750812947750092</left_val>
<right_val>0.7060170769691467</right_val></_></_>
<_>
<!-- tree 159 -->
<!-- tree 159 -->
<_>
<!-- root node -->
<feature>
<rects>
<_>16 6 3 2 -1.</_>
<_>16 7 3 1 2.</_></rects>
<tilted>0</tilted></feature>
<threshold>-1.4541699783876538e-003</threshold>
<left_val>0.3811730146408081</left_val>
<right_val>0.5330739021301270</right_val></_></_></trees>
<stage_threshold>79.2490768432617190</stage_threshold>
<parent>16</parent>
<next>-1</next></_>
<_>
I'd like to understand what is the meaning of the left_val and the right_val? What is the meaning of the parent, next values? How to calculate each classifier normalized sum? Is there anything I'm doing wrong here?
See my code attached.
int RunHaarClassifierCascadeSum(CascadeClassifier * face_cascade, CvMat* image , CvMat* sum , CvMat* sqsum,
CvMat* tilted,CvSize *scaningWindowSize, int iteratorRow, int iteratorCol )
{
// Normalize the current scanning window - Detection window
// Variance(x) = E(x^2) - (E(x))^2 = detectionWindowSquereExpectancy - detectionWindowExpectancy^2
// Expectancy(x) = E(x) = sum_of_pixels / size_of_window
double detectionWindowTotalSize = scaningWindowSize->height * scaningWindowSize->width;
// calculate the detection Window Expectancy , e.g the E(x)
double sumDetectionWindowPoint1,sumDetectionWindowPoint2,sumDetectionWindowPoint3,sumDetectionWindowPoint4; // ______________________
sumDetectionWindowPoint1 = cvGetReal2D(sum,iteratorRow,iteratorCol); // |R1 R2|
sumDetectionWindowPoint2 = cvGetReal2D(sum,iteratorRow+scaningWindowSize->width,iteratorCol); // | | Sum = R4-R2-R3+R1
sumDetectionWindowPoint3 = cvGetReal2D(sum,iteratorRow,iteratorCol+scaningWindowSize->height); // |R3________________R4|
sumDetectionWindowPoint4 = cvGetReal2D(sum,iteratorRow+scaningWindowSize->width,iteratorCol+scaningWindowSize->height);
double detectionWindowSum = calculateSum(sumDetectionWindowPoint1,sumDetectionWindowPoint2,sumDetectionWindowPoint3,sumDetectionWindowPoint4);
const double detectionWindowExpectancy = detectionWindowSum / detectionWindowTotalSize; // E(x)
// calculate the Square detection Window Expectancy , e.g the E(x^2)
double squareSumDetectionWindowPoint1,squareSumDetectionWindowPoint2,squareSumDetectionWindowPoint3,squareSumDetectionWindowPoint4; // ______________________
squareSumDetectionWindowPoint1 = cvGetReal2D(sqsum,iteratorRow,iteratorCol); // |R1 R2|
squareSumDetectionWindowPoint2 = cvGetReal2D(sqsum,iteratorRow+scaningWindowSize->width,iteratorCol); // | | Sum = R4-R2-R3+R1
squareSumDetectionWindowPoint3 = cvGetReal2D(sqsum,iteratorRow,iteratorCol+scaningWindowSize->height); // |R3________________R4|
squareSumDetectionWindowPoint4 = cvGetReal2D(sqsum,iteratorRow+scaningWindowSize->width,iteratorCol+scaningWindowSize->height);
double detectionWindowSquareSum = calculateSum(squareSumDetectionWindowPoint1,squareSumDetectionWindowPoint2,squareSumDetectionWindowPoint3,squareSumDetectionWindowPoint4);
const double detectionWindowSquareExpectancy = detectionWindowSquareSum / detectionWindowTotalSize; // E(x^2)
const double detectionWindowVariance = detectionWindowSquareExpectancy - std::pow(detectionWindowExpectancy,2); // Variance(x) = E(x^2) - (E(x))^2
const double detectionWindowStandardDeviation = std::sqrt(detectionWindowVariance);
if (detectionWindowVariance<=0)
return -1 ; // Error
// Normalize the cascade window to the normal scale window
double normalizeScaleWidth = double(scaningWindowSize->width / face_cascade->oldCascade->orig_window_size.width);
double normalizeScaleHeight = double(scaningWindowSize->height / face_cascade->oldCascade->orig_window_size.height);
// Calculate the cascade for each one of the windows
for( int stageIterator=0; stageIterator< face_cascade->oldCascade->count; stageIterator++ ) // Stage iterator
{
CvHaarStageClassifier* pCvHaarStageClassifier = face_cascade->oldCascade->stage_classifier + stageIterator;
for (int CvHaarStageClassifierIterator=0;CvHaarStageClassifierIterator<pCvHaarStageClassifier->count;CvHaarStageClassifierIterator++) // Classifier iterator
{
CvHaarClassifier* classifier = pCvHaarStageClassifier->classifier + CvHaarStageClassifierIterator;
float classifierSum=0.;
for( int CvHaarClassifierIterator = 0; CvHaarClassifierIterator < classifier->count;CvHaarClassifierIterator++ ) // Feature iterator
{
CvHaarFeature * pCvHaarFeature = classifier->haar_feature;
// Remark
if (pCvHaarFeature->tilted==1)
break;
// Remark
for( int CvHaarFeatureIterator = 0; CvHaarFeatureIterator< CV_HAAR_FEATURE_MAX; CvHaarFeatureIterator++ ) // 3 Features iterator
{
CvRect * currentRect = &(pCvHaarFeature->rect[CvHaarFeatureIterator].r);
// Normalize the rect to the scaling window scale
CvRect normalizeRec;
normalizeRec.x = (int)(currentRect->x*normalizeScaleWidth);
normalizeRec.y = (int)(currentRect->y*normalizeScaleHeight);
normalizeRec.width = (int)(currentRect->width*normalizeScaleWidth);
normalizeRec.height = (int)(currentRect->height*normalizeScaleHeight);
double sumRectPoint1,sumRectPoint2,sumRectPoint3,sumRectPoint4; // ______________________
sumRectPoint1 = cvGetReal2D(sum,normalizeRec.x,normalizeRec.y); // |R1 R2|
sumRectPoint2 = cvGetReal2D(sum,normalizeRec.x+normalizeRec.width,normalizeRec.y); // | | Sum = R4-R2-R3+R1
sumRectPoint3 = cvGetReal2D(sum,normalizeRec.x,normalizeRec.y+normalizeRec.height); // |R3________________R4|
sumRectPoint4 = cvGetReal2D(sum,normalizeRec.x+normalizeRec.width,normalizeRec.y+normalizeRec.height);
double nonNormalizeRect = calculateSum(sumRectPoint1,sumRectPoint2,sumRectPoint3,sumRectPoint4); //
double sumMean = detectionWindowExpectancy*(normalizeRec.width*normalizeRec.height); // sigma(Pi) = normalizeRect = (sigma(Pi- rect) - sigma(mean)) / detectionWindowStandardDeviation
double normalizeRect = (nonNormalizeRect - sumMean)/detectionWindowStandardDeviation; //
classifierSum += (normalizeRect*(pCvHaarFeature->rect[CvHaarFeatureIterator].weight));
}
}
// if (classifierSum > (*(classifier->threshold)) )
// return 0; // That's not a face !
if (classifierSum > ((*(classifier->threshold))*detectionWindowStandardDeviation) )
return -stageIterator; // That's not a face ! , failed on stage number
}
}
return 1; // That's a face
}

You need to make some big changes. First of all classifier->threshold is a threshold for each feature. classifier->alpha points to an array made of 2 elements - left_val and right_val(to my understanding). You should put something like this after the classifier loop-
a = classifier->alpha[0]
b = classifier->alpha[1]
t = *(classifier->threshold)
stage_sum += classifierSum < t ? a : b
then compare stage_sum with CvHaarStageClassifier::threshold which is the stage threshold, loop through stage_classifiers[i] .if it passes all of them then its a face!
'parent' and 'next' are useless here if you use haarcascade_frontalface_alt.xml, it is just a stump based cascade and not a tree based.

Related

Backpropagation() to file

I wanna get backPropagation proccess into a file in the most detail possible.
I've tried adding ofstream File as Attribute for Matrix class to write methods results via ofstream append to file but then most Matrices in SimpleNeuralNetwork class were marked with error (for example _weightMatrices).
class SimpleNeuralNetwork
{
public:
std::vector<uint32_t> _topology;
std::vector<Matrix2D<float>> _weightMatrices;
std::vector<Matrix2D<float>> _valueMatrices;
std::vector<Matrix2D<float>> _biasMatrices;
float _learningRate;
public:
// topology defines the no.of neurons for each layer
// learning rate defines how much modification should be done in each backwords propagation i.e. training
SimpleNeuralNetwork(std::vector<uint32_t> topology,float learningRate = 0.1f)
:_topology(topology),
_weightMatrices({}),
_valueMatrices({}),
_biasMatrices({}),
_learningRate(learningRate)
{
bool backPropagate(std::vector<float> targetOutput)
{
if(targetOutput.size() != _topology.back())
return false;
// determine the simple error
// error = target - output
Matrix2D<float> errors(targetOutput.size(), 1);
errors._vals = targetOutput;
errors = errors.add(_valueMatrices.back().negetive());
// back propagating the error from output layer to input layer
// and adjusting weights of weight matrices and bias matrics
for(int32_t i = _weightMatrices.size() - 1; i >= 0; i--)
{
//calculating errrors for previous layer
Matrix2D<float> prevErrors = errors.multiply(_weightMatrices[i].transpose());
//calculating gradient i.e. delta weight (dw)
//dw = lr * error * d/dx(activated value)
Matrix2D<float> dOutputs = _valueMatrices[i + 1].applyFunction(DSigmoid);
Matrix2D<float> gradients = errors.multiplyElements(dOutputs);
gradients = gradients.multiplyScaler(_learningRate);
Matrix2D<float> weightGradients = _valueMatrices[i].transpose().multiply(gradients);
//adjusting bias and weight
_biasMatrices[i] = _biasMatrices[i].add(gradients);
_weightMatrices[i] = _weightMatrices[i].add(weightGradients);
errors = prevErrors;
}
}

ScalableTSDFVolume Integrate from TUM-RGBD Dataset

I am using Open3D 0.15 and C++11 on Ubuntu 18.04.
The main function I'm interested in is the ScalabeTSDFVolume Integrate() function, using the TUM RGBD dataset (the xyz set to be exact), based off of the IntegrateRGBD example from the Open3D repo.
Since the TUM-RGBD dataset does not provide an association file that matches the RGBD images and the trajectory info, I've created my own small code that matches the timestamp on the TUM dataset's image data and the trajectory information, and converting the 7-dimension [x y z rx ry rz rw] trajectory information into Eigen::Matrix4d, using the same equation that Open3D's FileTUM.cpp uses:
do
{
// Read the timestamp first
gt >> p_gt.timestamp;
double poseArr[7];
// push the remaining 7 numbers to the poseArr
for (int i = 0; i < 7; i++)
gt >> poseArr[i];
// copy paste of the tum trajectory reader
Eigen::Matrix4d transform;
transform.setIdentity();
transform.topLeftCorner<3, 3>() =
Eigen::Quaterniond(poseArr[6], poseArr[3], poseArr[4], poseArr[5]).toRotationMatrix();
transform.topRightCorner<3, 1>() = Eigen::Vector3d(poseArr[0], poseArr[1], poseArr[2]);
p_gt.pose = transform.inverse();
gtF.push_back(p_gt);
} while (std::getline(gt, line));
The code runs fine, but the issue is when I try to integrate multiple frames into the same volume and extract its pointcloud or mesh.
I can tell that the RGBD information is being fed into the program correctly, by extracting the mesh at the very first frame:
first frame mesh extraction
But there is a significant artifact when I try to extract the mesh when more frames are integrated, like this:
30 frames mesh extraction
From my previous experience, this probably has to do with the fact that the transformation matrices are not in the correct axis. If anyone has tried to use the TUM dataset with Open3D and encountered the same problem, I would greatly appreciate any info on this.
Edit:
For reference, this is the modified code I'm using for the reconstruction.
int main(int argc, char *argv[]) {
using namespace open3d;
std::string filebase("/home/geometry/Documents/rgbd_dataset_freiburg1_xyz");
VirtualSensor::CameraParameters kinect{ 525.0,525.0,319.5,239.5,5000};
VirtualSensor::CameraParameters camPar = kinect;
VirtualSensor v1(filebase,camPar);
bool save_pointcloud = true;
bool save_mesh = true;
bool save_voxel = false;
int every_k_frames = 50;
double length = 4.0;
double uLength = 6.0;
int resolution = 512;
double sdf_trunc_percentage = 0.01;
int verbose = 2;
utility::SetVerbosityLevel((utility::VerbosityLevel)verbose);
auto camera_intrinsic = camera::PinholeCameraIntrinsic(640, 480, 525.0, 525.0, 319.5, 239.5);
int index = 0;
int save_index = 0;
int pairSize = 30;
// initialise TSDF
pipelines::integration::ScalableTSDFVolume volume(
length / (double)resolution, length * sdf_trunc_percentage,
pipelines::integration::TSDFVolumeColorType::RGB8);
//pipelines::integration::UniformTSDFVolume uVolume(uLength, resolution, uLength*sdf_trunc_percentage, pipelines::integration::TSDFVolumeColorType::RGB8);
utility::FPSTimer timer("Process RGBD stream",
pairSize);
geometry::Image depth, color;
// start loop
for(int i = 0; i < pairSize; i++){
utility::LogInfo("Processing frame {:d} ...", index);
io::ReadImage(v1.GetDepthPath(i), depth);
io::ReadImage(v1.GetColorPath(i), color);
auto rgbd = geometry::RGBDImage::CreateFromColorAndDepth(
color, depth, 5000.0, 6.0, false);
if (index == 0 ||
(every_k_frames > 0 && index % every_k_frames == 0))
volume.Reset();
}
volume.Integrate(*rgbd,
camera_intrinsic, // intrinsic never changes
v1.GetCounterGT(i)); // get the groundtruth pose from my class
index++;
// saving mesh/pc logic
if (index == pairSize ||
(every_k_frames > 0 && index % every_k_frames == 0)) {
utility::LogInfo("Saving fragment {:d} ...", save_index);
std::string save_index_str = std::to_string(save_index);
if (save_pointcloud) {
utility::LogInfo("Saving pointcloud {:d} ...", save_index);
auto pcd = volume.ExtractPointCloud();
io::WritePointCloud("pointcloud_" + save_index_str + ".ply",
*pcd);
}
if (save_mesh) {
utility::LogInfo("Saving mesh {:d} ...", save_index);
auto mesh = volume.ExtractTriangleMesh();
io::WriteTriangleMesh("mesh_" + save_index_str + ".ply",
*mesh);
}
if (save_voxel) {
utility::LogInfo("Saving voxel {:d} ...", save_index);
auto voxel = volume.ExtractVoxelPointCloud();
io::WritePointCloud("voxel_" + save_index_str + ".ply",
*voxel);
}
save_index++;
}
timer.Signal();
}
return 0;
}

Face Recognition using webcam

I have been using OpenCv Libraries for awhile , used and understood a code that detects [Only one face] and recognize it the main functions are there ..
this is the function of recognition
// Find the most likely person based on a detection. Returns the index, and stores the confidence value into pConfidence.
int findNearestNeighbor(float * projectedTestFace, float *pConfidence)
{
double leastDistSq = DBL_MAX;
int i, iTrain, iNearest = 0;
for(iTrain=0; iTrain<nTrainFaces; iTrain++)
{
double distSq=0;
for(i=0; i<nEigens; i++)
{
float d_i = projectedTestFace[i] - projectedTrainFaceMat->data.fl[iTrain*nEigens + i];
#ifdef USE_MAHALANOBIS_DISTANCE
distSq += d_i*d_i / eigenValMat->data.fl[i]; // Mahalanobis distance (might give better results than Eucalidean distance)
#else
distSq += d_i*d_i; // Euclidean distance.
#endif
}
if(distSq < leastDistSq)
{
leastDistSq = distSq;
iNearest = iTrain;
}
}
// Return the confidence level based on the Euclidean distance,
// so that similar images should give a confidence between 0.5 to 1.0,
// and very different images should give a confidence between 0.0 to 0.5.
*pConfidence = 1.0f - sqrt( leastDistSq / (float)(nTrainFaces * nEigens) ) / 255.0f;
// Return the found index.
return iNearest;
}
after that it would return the value of iNearest to the follow code
// Check which person it is most likely to be.
iNearest = findNearestNeighbor(projectedTestFace, &confidence);
nearest = trainPersonNumMat->data.i[iNearest];
printf("Most likely person in camera: '%s' (confidence=%f).\n", personNames[nearest-1].c_str(), confidence);
how can I let it recognize the unknown person as unknown and stop comparing to with the faces in my db ?
also how can i detect multi-faces at once?.

Implementing De Boors algorithm for finding points on a B-spline

I've been working on this for several weeks but have been unable to get my algorithm working properly and i'm at my wits end. Here's an illustration of what i have achieved:
If everything was working i would expect a perfect circle/oval at the end.
My sample points (in white) are recalculated every time a new control point (in yellow) is added. At 4 control points everything looks perfect, again as i add a 5th on top of the 1st things look alright, but then on the 6th it starts to go off too the side and on the 7th it jumps up to the origin!
Below I'll post my code, where calculateWeightForPointI contains the actual algorithm. And for reference- here is the information i'm trying to follow. I'd be so greatful if someone could take a look for me.
void updateCurve(const std::vector<glm::vec3>& controls, std::vector<glm::vec3>& samples)
{
int subCurveOrder = 4; // = k = I want to break my curve into to cubics
// De boor 1st attempt
if(controls.size() >= subCurveOrder)
{
createKnotVector(subCurveOrder, controls.size());
samples.clear();
for(int steps=0; steps<=20; steps++)
{
// use steps to get a 0-1 range value for progression along the curve
// then get that value into the range [k-1, n+1]
// k-1 = subCurveOrder-1
// n+1 = always the number of total control points
float t = ( steps / 20.0f ) * ( controls.size() - (subCurveOrder-1) ) + subCurveOrder-1;
glm::vec3 newPoint(0,0,0);
for(int i=1; i <= controls.size(); i++)
{
float weightForControl = calculateWeightForPointI(i, subCurveOrder, controls.size(), t);
newPoint += weightForControl * controls.at(i-1);
}
samples.push_back(newPoint);
}
}
}
//i = the weight we're looking for, i should go from 1 to n+1, where n+1 is equal to the total number of control points.
//k = curve order = power/degree +1. eg, to break whole curve into cubics use a curve order of 4
//cps = number of total control points
//t = current step/interp value
float calculateWeightForPointI( int i, int k, int cps, float t )
{
//test if we've reached the bottom of the recursive call
if( k == 1 )
{
if( t >= knot(i) && t < knot(i+1) )
return 1;
else
return 0;
}
float numeratorA = ( t - knot(i) );
float denominatorA = ( knot(i + k-1) - knot(i) );
float numeratorB = ( knot(i + k) - t );
float denominatorB = ( knot(i + k) - knot(i + 1) );
float subweightA = 0;
float subweightB = 0;
if( denominatorA != 0 )
subweightA = numeratorA / denominatorA * calculateWeightForPointI(i, k-1, cps, t);
if( denominatorB != 0 )
subweightB = numeratorB / denominatorB * calculateWeightForPointI(i+1, k-1, cps, t);
return subweightA + subweightB;
}
//returns the knot value at the passed in index
//if i = 1 and we want Xi then we have to remember to index with i-1
float knot(int indexForKnot)
{
// When getting the index for the knot function i remember to subtract 1 from i because of the difference caused by us counting from i=1 to n+1 and indexing a vector from 0
return knotVector.at(indexForKnot-1);
}
//calculate the whole knot vector
void createKnotVector(int curveOrderK, int numControlPoints)
{
int knotSize = curveOrderK + numControlPoints;
for(int count = 0; count < knotSize; count++)
{
knotVector.push_back(count);
}
}
Your algorithm seems to work for any inputs I tried it on. Your problem might be a that a control point is not where it is supposed to be, or that they haven't been initialized properly. It looks like there are two control-points, half the height below the bottom left corner.

How to determine Scale of Line Graph based on Pixels/Height?

I have a problem due to my terrible math abilities, that I cannot figure out how to scale a graph based on the maximum and minimum values so that the whole graph will fit onto the graph-area (400x420) without parts of it being off the screen (based on a given equation by user).
Let's say I have this code, and it automatically draws squares and then the line graph based on these values. What is the formula (what do I multiply) to scale it so that it fits into the small graphing area?
vector<int> m_x;
vector<int> m_y; // gets automatically filled by user equation or values
int HeightInPixels = 420;// Graphing area size!!
int WidthInPixels = 400;
int best_max_y = GetMaxOfVector(m_y);
int best_min_y = GetMinOfVector(m_y);
m_row = 0;
m_col = 0;
y_magnitude = (HeightInPixels/(best_max_y+best_min_y)); // probably won't work
x_magnitude = (WidthInPixels/(int)m_x.size());
m_col = m_row = best_max_y; // number of vertical/horizontal lines to draw
////x_magnitude = (WidthInPixels/(int)m_x.size())/2; Doesn't work well
////y_magnitude = (HeightInPixels/(int)m_y.size())/2; Doesn't work well
ready = true; // we have values, graph it
Invalidate(); // uses WM_PAINT
////////////////////////////////////////////
/// Construction of Graph layout on WM_PAINT, before painting line graph
///////////////////////////////////////////
CPen pSilver(PS_SOLID, 1, RGB(150, 150, 150) ); // silver
CPen pDarkSilver(PS_SOLID, 2, RGB(120, 120, 120) ); // dark silver
dc.SelectObject( pSilver ); // silver color
CPoint pt( 620, 620 ); // origin
int left_side = 310;
int top_side = 30;
int bottom_side = 450;
int right_side = 710; // create a rectangle border
dc.Rectangle(left_side,top_side,right_side,bottom_side);
int origin = 310;
int xshift = 30;
int yshift = 30;
// draw scaled rows and columns
for(int r = 1; r <= colrow; r++){ // draw rows
pt.x = left_side;
pt.y = (ymagnitude)*r+top_side;
dc.MoveTo( pt );
pt.x = right_side;
dc.LineTo( pt );
for(int c = 1; c <= colrow; c++){
pt.x = left_side+c*(magnitude);
pt.y = top_side;
dc.MoveTo(pt);
pt.y = bottom_side;
dc.LineTo(pt);
} // draw columns
}
// grab the center of the graph on x and y dimension
int top_center = ((right_side-left_side)/2)+left_side;
int bottom_center = ((bottom_side-top_side)/2)+top_side;
You are using ax^2 + bx + c (quadratic equation). You will get list of (X,Y) values inserted by user.
Let us say 5 points you get are
(1,1)
(2,4)
(4,1)
(5,6)
(6,7)
So, here your best_max_y will be 7 and best_min_y will be 1.
Now you have total graph area is
Dx = right_side - left_side //here, 400 (710 - 310)
Dy = bottom_side - top_side //here, 420 (450 - 30)
So, you can calculate x_magnitude and y_magnitude using following equation :
x_magnitude = WidthInPixels / Dx;
y_magnitude = HeightInPixels / Dy;
What I did was to determine how many points I had going in the x and y directions, and then divide that by the x and y dimensions, then divide that by 3, as I wanted each minimum point to be three pixels, so it could be seen.
The trick then is that you have to aggregate the data so that you are showing several points with one point, so it may be the average of them, but that depends on what you are displaying.
Without knowing more about what you are doing it is hard to make a suggestion.
For this part, subtract, don't add:
best_max_y+best_min_y as you want the difference.
The only other thing would be to divide y_magnitude and x_magnitude by 3. That was an arbitrary number I came up with, just so the users could see the points, you may find some other number to work better.