Is there more detailed reference on Cascade Classifiers in OpenCV? - c++

I'm trying to modify OpenCV's haar Cascade Classifier into specific purpose and looked into its source code from up to down. But I stuck on understanding very final part which is calculating feature value in line 361 of cascadedetect.hpp
return optfeaturesPtr[featureIdx].calc(pwin) * varianceNormFactor;
and its sub part for (a) calculating each rectangles weighted sum in line 398-407 of cascadedetect.hpp
inline float HaarEvaluator::OptFeature :: calc( const int* ptr ) const
{
float ret = weight[0] * CALC_SUM_OFS(ofs[0], ptr) +
weight[1] * CALC_SUM_OFS(ofs[1], ptr);
if( weight[2] != 0.0f )
ret += weight[2] * CALC_SUM_OFS(ofs[2], ptr);
return ret;
}
and (b) calculating varianceNormFactor in line 691-697 and 701 of cascadedetect.cpp
pwin = &sbuf.at<int>(pt) + s.layer_ofs;
const int* pq = (const int*)(pwin + sqofs);
int valsum = CALC_SUM_OFS(nofs, pwin);
unsigned valsqsum = (unsigned)(CALC_SUM_OFS(nofs, pq));
double area = normrect.area();
double nf = area * valsqsum - (double)valsum * valsum;
line:701 varianceNormFactor = (float)(1./nf);
From my little knowledge I guess 'pwin' represents actual window which is currently in progress, but (Q1) What does pq mean? I couldn't find any line which gives value into sbuf variable (which somehow connected to everything above)
According to Rainer Lienhart's paper which was mentioned by OpenCV's team they used this equation for light correction. But in this paper they said we can calculate σ just by looking at 4 values in integral image from each pixel's square. (Q2) But aren't we supposed to subtract average from each pixels BEFORE taking sum of squares in order to calculate standard deviation or σ in this paper represent something different?
And from source I think equation they used must be similar to this instead. (Q3) So If possible, Where can I get maths behind these codes or detailed reference material for this part? I've red OpenCV's reference manual but didn't find anything about it.

(Q1) What does pq mean:
I think it may implies "Pointer to Square";
OpenCV may calculate the Square of all pixels and append it to a buffer B.
So when do normalization, it can quickly refer to it by:
the offset of current window in B, which is pwin.
the offset of square value for each pixel within current window in B based on pwin, which is sqofs.
(Q2) But aren't we supposed to subtract average from each pixels BEFORE taking sum of squares in order to calculate standard deviation > or σ in this paper represent something different?
I have checked it on OpenCV 3.4.4 and, yes, it does NOT subtract the mean.
I guess since for Haar pattern is symmetry, eg for Horizontal 2 one region subtract a region with the same area. Hence the effect sums up to zero and we can ignore it.

Related

OpenCV estimateRigidTransform()

After reading several posts about getting the 2D transformation of 2D points from one image to another, estimateRigidTransform() seems to be the recommendation. I'm trying to use it. I modified the source code (to change the RANSAC parameters, because its hardcoded, and the hardcoded parameters are not very good)(the source code for this function is in lkpyramid.cpp). I have read up on how RANSAC works, and am trying to understand the steps in estimateRigidTransform().
// choose random 3 non-complanar points from A & B
...
// additional check for non-complanar vectors
a[0] = pA[idx[0]];
a[1] = pA[idx[1]];
a[2] = pA[idx[2]];
b[0] = pB[idx[0]];
b[1] = pB[idx[1]];
b[2] = pB[idx[2]];
double dax1 = a[1].x - a[0].x, day1 = a[1].y - a[0].y;
double dax2 = a[2].x - a[0].x, day2 = a[2].y - a[0].y;
double dbx1 = b[1].x - b[0].x, dby1 = b[1].y - b[0].y;
double dbx2 = b[2].x - b[0].x, dby2 = b[2].y - b[0].y;
const double eps = 0.01;
if( fabs(dax1*day2 - day1*dax2) < eps*std::sqrt(dax1*dax1+day1*day1)*std::sqrt(dax2*dax2+day2*day2) ||
fabs(dbx1*dby2 - dby1*dbx2) < eps*std::sqrt(dbx1*dbx1+dby1*dby1)*std::sqrt(dbx2*dbx2+dby2*dby2) )
continue;
Is it a typo that it uses non-coplanar vectors? I mean the 2D points are all on the same plane right?
My second question is what is that if condition doing? I know that the left hand side (gives the area of triangle times 2) would be zero or near zero if the points are collinear, and the right hand side is the multiplication of the lengths of 2 sides of the triangle.
Collinearity is preserved in affine transformations (such as the one you are probably estimating), but this transformations also calculate also changes in rotations in point of view (as if you rotated the object in a 3d world). However, these points will be collinear as well, so for the algorithm it may have not a unique solution. Look at the pictures:
imagine selecting 3 center points of each black square in the first row in the first image. Then map it to the same centers in the next image. It may generate a mapping to that solution, but also a mapping to a zoom version of the first one. The same may happen with the third one, just that this time may map to a zoom out version of the first one (without any other change). However if the points are not collinear, for example, 3 corner squares centers, it will find a unique mapping.
I hope this helps you to clarify your doubts. If not, leave a comment

Starting from a source, find the next point closest to an objective on a grid in C++

I have an NxN grid with 2 points, the source and destination. I need to move step by step from the source to the destination (which is also moving). How do I determine what the next point is to move to?
One way is to assess all 8 points and see which yields the lowest distance using an Euclidian distance. However, I was hoping there is a cool (mathematical) trick which will yield more elegant results.
Your question statement allows moving diagonally, which is faster (since it's moving both horizontally and vertically in a single step): this solution will always do that unless it has the same x or y coordinate as the target.
using Position = pair<int,int>;
Position move(Position const &current, Position const &target) {
// horizontal and vertical distances
const int dx = target.first - current.first;
const int dy = target.second - current.second;
// horizontal and vertical steps [-1,+1]
const int sx = dx ? dx/abs(dx) : 0;
const int sy = dy ? dy/abs(dy) : 0;
return { current.first + sx, current.second + sy };
}
I'm not sure if this counts as a cool mathematical trick though, it just depends on knowing that:
dx = target.x-current.x is positive if you should move in the positive x-direction, negative if you should go in the negative direction, and zero if you should go straight up/down
dx/abs(dx) keeps the sign and removes the magnitude, so it's always one of -1,0,+1 (avoiding however division by zero)
I suppose that answer to your question is Bresenham's line algorithm. It allows to build sequence of integer points between start and end points in your grid. Anyway you can adapt ideas from it to your problem
For more information see https://www.cs.helsinki.fi/group/goa/mallinnus/lines/bresenh.html
I would simply use some vector math, take dest minus source as a vector, and then calculate the angle between that vector and some reference vector, e.g. <1, 0>, with standard methods.
Then you can simply divide the circle in 8 (or 4 if your prefer) sections and determine in which section your vector lies from the angle you obtained.
See euclidean space for how to calculate the angle between two vectors.

Given camera matrices, how to find point correspondances using OpenCV?

I'm following this tutorial, which uses Features2D + Homography. If I have known camera matrix for each image, how can I optimize the result? I tried some images, but it didn't work well.
//Edit
After reading some materials, I think I should rectify two image first. But the rectification is not perfect, so a vertical line on image 1 correspond a vertical band on image 2 generally. Are there any good algorithms?
I'm not sure if I understand your problem. You want to find corresponding points between the images or you want to improve the correctness of your matches by use of the camera intrinsics?
In principle, in order to use camera geometry for finding matches, you would need the fundamental or essential matrix, depending on wether you know the camera intrinsics (i.e. calibrated camera). That means, you would need an estimate for the relative rotation and translation of the camera. Then, by computing the epipolar lines corresponding to the features found in one image, you would need to search along those lines in the second image to find the best match. However, I think it would be better to simply rely on automatic feature matching. Given the fundamental/essential matrix, you could try your luck with correctMatches, which will move the correspondences such that the reprojection error is minimised.
Tips for better matches
To increase the stability and saliency of automatic matches, it usually pays to
Adjust the parameters of the feature detector
Try different detection algorithms
Perform a ratio test to filter out those keypoints which have a very similar second-best match and are therefore unstable. This is done like this:
Mat descriptors_1, descriptors_2; // obtained from feature detector
BFMatcher matcher;
vector<DMatch> matches;
matcher = BFMatcher(NORM_L2, false); // norm depends on feature detector
vector<vector<DMatch>> match_candidates;
const float ratio = 0.8; // or something
matcher.knnMatch(descriptors_1, descriptors_2, match_candidates, 2);
for (int i = 0; i < match_candidates.size(); i++)
{
if (match_candidates[i][0].distance < ratio * match_candidates[i][1].distance)
matches.push_back(match_candidates[i][0]);
}
A more involved way of filtering would be to compute the reprojection error for each keypoint in the first frame. This means to compute the corresponding epipolar line in the second image and then checking how far its supposed matching point is away from that line. Throwing away those points whose distance exceeds some threshold would remove the matches which are incompatible with the epiploar geometry (which I assume would be known). Computing the error can be done like this (I honestly do not remember where I took this code from and I may have modified it a bit, also the SO editor is buggy when code is inside lists, sorry for the bad formatting):
double computeReprojectionError(vector& imgpts1, vector& imgpts2, Mat& inlier_mask, const Mat& F)
{
double err = 0;
vector lines[2];
int npt = sum(inlier_mask)[0];
// strip outliers so validation is constrained to the correspondences
// which were used to estimate F
vector imgpts1_copy(npt),
imgpts2_copy(npt);
int c = 0;
for (int k = 0; k < inlier_mask.size().height; k++)
{
if (inlier_mask.at(0,k) == 1)
{
imgpts1_copy[c] = imgpts1[k];
imgpts2_copy[c] = imgpts2[k];
c++;
}
}
Mat imgpt[2] = { Mat(imgpts1_copy), Mat(imgpts2_copy) };
computeCorrespondEpilines(imgpt[0], 1, F, lines[0]);
computeCorrespondEpilines(imgpt1, 2, F, lines1);
for(int j = 0; j < npt; j++ )
{
// error is computed as the distance between a point u_l = (x,y) and the epipolar line of its corresponding point u_r in the second image plus the reverse, so errij = d(u_l, F^T * u_r) + d(u_r, F*u_l)
Point2f u_l = imgpts1_copy[j], // for the purpose of this function, we imagine imgpts1 to be the "left" image and imgpts2 the "right" one. Doesn't make a difference
u_r = imgpts2_copy[j];
float a2 = lines1[j][0], // epipolar line
b2 = lines1[j]1,
c2 = lines1[j][2];
float norm_factor2 = sqrt(pow(a2, 2) + pow(b2, 2));
float a1 = lines[0][j][0],
b1 = lines[0][j]1,
c1 = lines[0][j][2];
float norm_factor1 = sqrt(pow(a1, 2) + pow(b1, 2));
double errij =
fabs(u_l.x * a2 + u_l.y * b2 + c2) / norm_factor2 +
fabs(u_r.x * a1 + u_r.y * b1 + c1) / norm_factor1; // distance of (x,y) to line (a,b,c) = ax + by + c / (a^2 + b^2)
err += errij; // at this point, apply threshold and mark bad matches
}
return err / npt;
}
The point is, grab the fundamental matrix, use it to compute epilines for all the points and then compute the distance (the lines are given in a parametric form so you need to do some algebra to get the distance). This is somewhat similar in outcome to what findFundamentalMat with the RANSAC method does. It returns a mask wherein for each match there is either a 1, meaning that it was used to estimate the matrix, or a 0 if it was thrown out. But estimating the fundamental Matrix like this will probably be less accurate than using chessboards.
EDIT: Looks like oarfish beat me to it, but I'll leave this here.
The fundamental matrix (F) defines a mapping from a point in the left image to a line in the right image on which the corresponding point must lie, assuming perfect calibration. This is the epipolar line, i.e. the line though the point in the left image and the two epipoles of the stereo camera pair. For references, see these lecture notes and this chapter of the HZ book.
Given a set of point correspondences in the left and right images: (p_L, p_R), from SURF (or any other feature matcher), and given F, the constraint from epipolar geometry of the stereo pair says that p_R should lie on the epipolar line projected by p_L onto the right image, i.e.
In practice, calibration errors from noise as well as erroneous feature matches lead to a non-zero value.
However, using this idea, you can then perform outlier removal by rejecting those feature matches for which this equation is greater than a certain threshold value, i.e. reject (p_L, p_R) if and only if:
When selecting this threshold, keep in mind that it is the distance in image space of a point from an epipolar line that you are willing to tolerate, which in some sense is your epipolar error tolerance.
Degenerate case: To visually imagine what this means, let us assume that the stereo pair differ only in a pure X-translation. Then the epipolar lines are horizontal. This means that you can connect the feature matched point pairs by a line and reject those pairs whose line slope is not close to zero. The equation above is a generalization of this idea to arbitrary stereo rotation and translation, which is accounted for by the matrix F.
Your specific images: It looks like your feature matches are sparse. I suggest instead to use a dense feature matching approach so that after outlier removal, you are still left with a sufficient number of good-quality matches. I'm not sure which dense feature matcher is already implemented in OpenCV, but I suggest starting here.
Giving your pictures, your are trying to do a stereo matching.
This page will be helpfull. The rectification you want can be done using stereoCalibrate then stereoRectify.
The result (from the doc):
In order to find the Fundamental Matrix, you need correct correspondances, but in order to get good correspondances, you need a good estimate of the fundamental matrix. This might sound like an impossible chicken-and-the-egg-problem, but there is well established methods to do this; RANSAC.
It randomly selects a small set of correspondances, uses those to calculate a fundamental matrix (using the 7 or 8 point algorithm) and then tests how many of the other correspondences that comply with this matrix (using the method described by scribbleink for measuring the distance between point and epipolar line). It keeps testing new combinations of correspondances for a certain number of iterations and selects the one with the most inliers.
This is already implemented in OpenCV as cv::findFundamentalMat (http://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#findfundamentalmat). Select the method CV_FM_RANSAC to use ransac to remove bad correspondances. It will output a list of all the inlier correspondances.
The requirement for this is that all the points does not lie on the same plane.

Curvature Scale Space corner detection algorithm. Arc Length Parameter?

I'm studying about the CSS algorithm and I don't get the hang of the concept of 'Arc Length Parameter'.
According to the literature, planar curve Gamma(u)=(x(u),y(u)) and they say this u is the arc length parameter and apparently, Gaussian Kernel g is also parameterized by this u here.
Stop me if I got something wrong but, aren't x and y location of the pixel? How is it represented by another parameter?
I had no idea when I first saw it on the literature so, I looked up the code. and apparently, I got puzzled even more.
here is the portion of the code
void getGaussianDerivs(double sigma, int M, vector<double>& gaussian,
vector<double>& dg, vector<double>& d2g) {
int L = (M - 1) / 2;
double sigma_sq = sigma * sigma;
double sigma_quad = sigma_sq*sigma_sq;
dg.resize(M); d2g.resize(M); gaussian.resize(M);
Mat_<double> g = getGaussianKernel(M, sigma, CV_64F);
for (double i = -L; i < L+1.0; i += 1.0) {
int idx = (int)(i+L);
gaussian[idx] = g(idx);
// from http://www.cedar.buffalo.edu/~srihari/CSE555/Normal2.pdf
dg[idx] = (-i/sigma_sq) * g(idx);
d2g[idx] = (-sigma_sq + i*i)/sigma_quad * g(idx);
}
}
so, it seems the code uses simple 1D Gaussian Kernel Aperture size of M and it is trying to compute its 1st and 2nd derivatives. As far as I know, 1D Gaussian kernel has parameter of x which is a horizontal coordinate and sigma which is scale. it seems like that 'arc length parameter u' is equivalent to the variable of x. That doesn't make any sense because later in the code, it directly convolutes the set of x and y on the contour.
what is this u?
PS. since I replied to the fellow who tried to answer my question, I think I should modify my question, so, here we go.
What I'm confusing is, how is this parameter 'u' implemented in codes? I think I understood the full code above -of course, I inserted only a portion of the code- but the problem is, I have no idea about what it would be for the 'improved' version of the algorithm. It says it's using 'affine length parameter' instead of this 'arc length parameter' and I'm not so sure how I implement the concept into the code.
According to the literature, the main difference between arc length parameter and affine length parameter is it's sampling interval and arc length parameter uses 1 for the vertical and horizontal direction and root of 2 for the diagonal direction. It makes sense since the portion of the code above is using for loop to compute 1st and 2nd derivatives of the 1d Gaussian and it directly inserts the value of interval 1 but, how is it gonna be with different interval with different variable? Is it possible that I'm not able to use 'for loop' for it?

Help with understanding this algorithm

I would like to implement the Ramer–Douglas–Peucker_algorithm in C++.
The pseudo code looks like this:
function DouglasPeucker(PointList[], epsilon)
//Find the point with the maximum distance
dmax = 0
index = 0
for i = 2 to (length(PointList) - 1)
d = OrthogonalDistance(PointList[i], Line(PointList[1], PointList[end]))
if d > dmax
index = i
dmax = d
end
end
//If max distance is greater than epsilon, recursively simplify
if dmax >= epsilon
//Recursive call
recResults1[] = DouglasPeucker(PointList[1...index], epsilon)
recResults2[] = DouglasPeucker(PointList[index...end], epsilon)
// Build the result list
ResultList[] = {recResults1[1...end-1] recResults2[1...end]}
else
ResultList[] = {PointList[1], PointList[end]}
end
//Return the result
return ResultList[]
end
Here is my understanding so far.
It is a recursive function taking in an array of points and a distance threshold.
Then it iterates through the current points to find the point with the maximum distance.
I got a bit lost with the Orthographical Distance function. How do I compute this? I've never seen a distance function take a line segment as a parameter.
I think aside from this I should be alright, I will just use std::vectors for the arrays. I think I will use std::copy and then push or pop according to what the algorithm says.
Thanks
The OrthogonalDistance is shown in this picture:
So it's the distance from your point and the point on the line which is the projection of that point on the line.
The distance from a point to a line is usally something like this:
(source: fauser.edu)
where x0 and y0 are the coordinates of the external point and a, b, c are the coefficient of the equation of your line.
That's what I remember from school, long time ago.
The orthogonal distance from a point P to a line L is defined by the distance between point P and point P2, where P2 is the orthogonal projection of P on line L.
How you compute this value depends on the dimension of the space you work on, but if it's 2D, you should be able to figure that out by drawing an example on a piece of paper !
Look at the topcoder tutorial and the
double linePointDist(point A, point B, point C, bool isSegment);
method it it what your looking for.
A brief description of the math required can be found here. Just realize that you can swap the word "Orthogonal" for "Perpendicular" when dealing with 2D. The site linked has instructions for lines defined by two points as well as lines defined by slope-intercept form.
The short version is reproduced here:
If the line is represented in slope intercept from: ax + by + c = 0, and the point is represented by x0, y0, then the function that will give the Orthogonal distance is:
abs(a*x0 + b*y0 + c)/sqrt(a*a + b*b)
It's not clear to me whether you want the distance of the point to the (infinite) line through the two points, or the distance to the line segment defined by the points, but I suspect its the latter.
Consider the somewhat contrived example of the points (0,0) (1,0) and (10, t) where t is small. The distance of (10,t) from the line through the first two points (ie the x axis) is t, while the distance (10,t) from the line segment with end points (0,0) and (1,0) is hypot(9,t) ~ 9. So if you were using distance to the line, there is a danger that the
algorithm above would not split at (10,t).
The method mentioned by jethro above handles both lines and line segments.