Speeding up self-similarity in an image

Speeding up self-similarity in an image - c++

I'm writing a program that will generate images. One measurement that I want is the amount of "self-similarity" in the image. I wrote the following code that looks for the countBest-th best matches for each sizeWindow * sizeWindow window in the picture:
double Pattern::selfSimilar(int sizeWindow, int countBest) {
std::vector<int> *pvecount;
double similarity;
int match;
int x1;
int x2;
int xWindow;
int y1;
int y2;
int yWindow;
similarity = 0.0;
// (x1, y1) is the original that's looking for matches.
for (x1 = 0; x1 < k_maxX - sizeWindow; x1++) {
for (y1 = 0; y1 < k_maxY - sizeWindow; y1++) {
pvecount = new std::vector<int>();
// (x2, y2) is the possible match.
for (x2 = 0; x2 < k_maxX - sizeWindow; x2++) {
for (y2 = 0; y2 < k_maxY - sizeWindow; y2++) {
// Testing...
match = 0;
for (xWindow = 0; xWindow < sizeWindow; xWindow++) {
for (yWindow = 0; yWindow < sizeWindow; yWindow++) {
if (m_color[x1 + xWindow][y1 + yWindow] == m_color[x2 + xWindow][y2 + yWindow]) {
match++;
}
}
}
pvecount->push_back(match);
}
}
nth_element(pvecount->begin(), pvecount->end()-countBest, pvecount->end());
similarity += (1.0 / ((k_maxX - sizeWindow) * (k_maxY - sizeWindow))) *
(*(pvecount->end()-countBest) / (double) (sizeWindow * sizeWindow));
delete pvecount;
}
}
return similarity;
}
The good news is that the algorithm does what I want it to: it will return a value from 0.0 to 1.0 about how 'self-similar' a picture is.
The bad news -- as I'm sure that you've already noted -- is that the algorithm is extremely slow. It takes (k_maxX - sizeWindow) * (k_maxY - sizeWindow) * (k_maxX - sizeWindow) * (k_maxY - sizeWindow) * sizeWindow * sizeWindow steps for a run.
Some typical values for the variables:
k_maxX = 1280
k_maxY = 1024
sizeWindow = between 5 and 25
countBest = 3, 4, or 5
m_color[x][y] is defined as short m_color[k_maxX][k_maxY] with values between 0 and 3 (but may increase in the future.)
Right now, I'm not worried about the memory footprint taken by pvecount. Later, I can use a sorted data set that doesn't add another element when it's smaller than countBest. I am only worried about algorithm speed.
How can I speed this up?

Ok, first, this approach is not stable at all. If you add random noise to your image, it will greatly decrease the similarity between the two images. More importantly, from an image processing standpoint, it's not efficient or particularly good. I suggest another approach; for example, using a wavelet-based approach. If you performed a 2d DWT on your image for a few levels and compared the scaling coefficients, you would probably get better results. Plus, the discrete wavelet transform is O(n).
The downside is that wavelets are an advanced mathematical topic. There are some good OpenCourseWare notes on wavelets and filterbanks here.

Your problem strongly reminds me of the calculations that have to be done for motion compensation in video compression. Maybe you should take a closer look what's done in that area.
As rlbond already pointed out, counting the number of points in a window where the colors exactly match isn't what's normally done in comparing pictures. A conceptually simpler method than using discrete cosine or wavelet transformations is to add the squares of the differences
diff = (m_color[x1 + xWindow][y1 + yWindow] - m_color[x2 + xWindow][y2 + yWindow]);
sum += diff*diff;
and use sum instead of match as criterion for similarity (now smaller means better).
Back to what you really asked: I think it is possible to cut down the running time by the factor 2/sizeWindow (maybe squared?), but it is a little bit messy. It's based on the fact that certain pairs of squares you compare stay almost the same when incrementing y1 by 1. If the offsets xOff = x2-x1 and yOff = y2-y1 are the same, only the top (rsp. bottom) vertical stripes of the squares are no longer (rsp. now, but not before) matched. If you keep the values you calculate for match in a two-dimensional array indexed by the offsets xOff = x2-x1 and yOff = y2-y1, then can calculate the new value for match[xOff][yOff] for y1 increased by 1 and x1 staying the same by 2*sizeWindow comparisons:
for (int x = x1; x < x1 + sizeWindow; x++) {
if (m_color[x][y1] == m_color[x + xOff][y1 + yOff]) {
match[xOff][yOff]--; // top stripes no longer compared
}
if (m_color[x][y1+sizeWindow] == m_color[x + xOff][y1 + sizeWindow + yOff]) {
match[xOff][yOff]++; // bottom stripe compared not, but wasn't before
}
}
(as the possible values for yOff changed - by incrementing y1 - from the interval [y2 - y1, k_maxY - sizeWindow - y1 - 1] to the interval [y2 - y1 - 1, k_maxY - sizeWindow - y1 - 2] you can discard the matches with second index yOff = k_maxY - sizeWindow - y1 - 1 and have to calculate the matches with second index yOff = y2 - y1 - 1 differently). Maybe you can also keep the values by how much you increase/decrease match[][] during the loop in an array to get another 2/sizeWindow speed-up.

Related

Reverse engineering - Is this a cheap 3D distance function?

I am reverse engineering a game from 1999 and I came across a function which looks to be checking if the player is within range of a 3d point for the triggering of audio sources. The decompiler mangles the code pretty bad but I think I understand it.
// Position Y delta
v1 = * (float * )(this + 16) - LocalPlayerZoneEntry - > y;
// Position X delta
v2 = * (float * )(this + 20) - LocalPlayerZoneEntry - > x;
// Absolute value
if (v1 < 0.0)
v1 = -v1;
// Absolute value
if (v2 < 0.0)
v2 = -v2;
// What is going on here?
if (v1 <= v2)
v1 = v1 * 0.5;
else
v2 = v2 * 0.5;
// Z position delta
v3 = * (float * )(this + 24) - LocalPlayerZoneEntry - > z;
// Absolute value
if (v3 < 0.0)
v3 = -v3;
result = v3 + v2 + v1;
// Radius
if (result > * (float * )(this + 28))
return 0.0;
return result;
Interestingly enough, when in game, it seemed like the triggering was pretty inconsistent and would sometimes be quite a bit off depending on from which side I approached the trigger.
Does anyone have any idea if this was a common algorithm used back in the day?
Note: The types were all added by me so they may be incorrect. I assume that this is a function of type bool.

The best way to visualize a distance function (a metric) is to plot its unit sphere (the set of points at unit distance from origin -- the metric in question is norm induced).
First rewrite it in a more mathematical form:
N(x,y,z) = 0.5*|x| + |y| + |z| when |x| <= |y|
= |x| + 0.5*|y| + |z| otherwise
Let's do that for 2d (assume that z = 0). The absolute values make the function symmetric in the four quadrants. The |x| <= |y| condition makes it symmetric in all the eight sectors. Let's focus on the sector x > 0, y > 0, x <= y. We want to find the curve when N(x,y,0) = 1. For that sector it reduces to 0.5x + y = 1, or y = 1 - 0.5x. We can go and plot that line. For when x > 0, y > 0, x > y, we get x = 1 - 0.5y. Plotting it all gives the following unit 'circle':
For comparison, here is an Euclidean unit circle overlaid:
In the third dimension it behaves like a taxicab metric, effectively giving you a 'diamond' shaped sphere:
So yes, it is a cheap distance function, though it lacks rotational symmetries.

Distance between 2 hexagons on hexagon grid

I have a hexagon grid:
with template type coordinates T. How I can calculate distance between two hexagons?
For example:
dist((3,3), (5,5)) = 3
dist((1,2), (1,4)) = 2

First apply the transform (y, x) |-> (u, v) = (x, y + floor(x / 2)).
Now the facial adjacency looks like
0 1 2 3
0*-*-*-*
|\|\|\|
1*-*-*-*
|\|\|\|
2*-*-*-*
Let the points be (u1, v1) and (u2, v2). Let du = u2 - u1 and dv = v2 - v1. The distance is
if du and dv have the same sign: max(|du|, |dv|), by using the diagonals
if du and dv have different signs: |du| + |dv|, because the diagonals are unproductive
In Python:
def dist(p1, p2):
y1, x1 = p1
y2, x2 = p2
du = x2 - x1
dv = (y2 + x2 // 2) - (y1 + x1 // 2)
return max(abs(du), abs(dv)) if ((du >= 0 and dv >= 0) or (du < 0 and dv < 0)) else abs(du) + abs(dv)

Posting here after I saw a blog post of mine had gotten referral traffic from another answer here. It got voted down, rightly so, because it was incorrect; but it was a mischaracterization of the solution put forth in my post.
Your 'squiggly' axis - in terms of your x coordinate being displaced every other row - is going to cause you all sorts of headaches with trying to determine distances or doing pathfinding later on, if this is for a game of some sort. Hexagon grids lend themselves to three axes naturally, and a 'squared off' grid of hexagons will optimally have some negative coordinates, which allows for simpler math around distances.
Here's a grid with (x,y) mapped out, with x increasing to the lower right, and y increasing upwards.
By straightening things out, the third axis becomes obvious.
The neat thing about this, is that the three coordinates become interlinked - the sum of all three coordinates will always be 0.
With such a consistent coordinate system, the atomic distance between any two hexes is the largest change between the three coordinates, or:
d = max( abs(x1 - x2), abs(y1 -y2), abs( (-x1 + -y1) - (-x2 + -y2) )
Pretty straightforward. But you must fix your grid first!

The correct explicit formula for the distance, with your coordinate system, is given by:
d((x1,y1),(x2,y2)) = max( abs(x1 - x2),
abs((y1 + floor(x1/2)) - (y2 + floor(x2/2)))
)

Here is what a did:
Taking one cell as center (it is easy to see if you choose 0,0), cells at distance dY form a big hexagon (with “radius” dY). One vertices of this hexagon is (dY2,dY). If dX<=dY2 the path is a zig-zag to the ram of the big hexagon with a distance dY. If not, then the path is the “diagonal” to the vertices, plus an vertical path from the vertices to the second cell, with add dX-dY2 cells.
Maybe better to understand: led:
dX = abs(x1 - x2);
dY = abs(y1 - y2);
dY2= floor((abs(y1 - y2) + (y1+1)%2 ) / 2);
Then:
d = d((x1,y1),(x2,y2))
= dX < dY2 ? dY : dY + dX-dY2 + y1%2 * dY%2

First, you need to transform your coordinates to a "mathematical" coordinate system. Every two columns you shift your coordinates by 1 unit in the y-direction. The "mathamatical" coordinates (s, t) can be calculated from your coordinates (u,v) as follows:
s = u + floor(v/2)
t = v
If you call one side of your hexagons a, the basis vectors of your coordinate system are (0, -sqrt(3)a) and (3a/2, sqrt(3)a/2). To find the minimum distance between your points, you need to calculate the manhattan distance in your coordinate system, which is given by |s1-s2|+|t1-t2| where s and t are the coordinates in your system. The manhattan distance only covers walking in the direction of your basis vectors so it only covers walking like that: |/ but not walking like that: |\. You need to transform your vectors into another coordinate system with basis vectors (0, -sqrt(3)a) and (3a/2, -sqrt(3)a/2). The coordinates in this system are given by s'=s-t and t'=t so the manhattan distance in this coordinate system is given by |s1'-s2'|+|t1'-t2'|. The distance you are looking for is the minimum of the two calculated manhattan distances. Your code would look like this:
struct point
{
int u;
int v;
}
int dist(point const & p, point const & q)
{
int const ps = p.u + (p.v / 2); // integer division!
int const pt = p.v;
int const qs = q.u + (q.v / 2);
int const qt = q.v;
int const dist1 = abs(ps - qs) + abs(pt - qt);
int const dist2 = abs((ps - pt) - (qs - qt)) + abs(pt - qt);
return std::min(dist1, dist2);
}

(odd-r)(without z, only x,y)
I saw some problems with realizations above. Sorry, I didn't check it all but. But maybe my solution will be helpful for someone and maybe it's a bad and not optimized solution.
The main idea to go by diagonal and then by horizontal. But for that we need to note:
1) For example, we have 0;3 (x1=0;y1=3) and to go to the y2=6 we can handle within 6 steps to each point (0-6;6)
so: 0-left_border , 6-right_border
2)Calculate some offsets
#include <iostream>
#include <cmath>
int main()
{
//while(true){
int x1,y1,x2,y2;
std::cin>>x1>>y1;
std::cin>>x2>>y2;
int diff_y=y2-y1; //only up-> bottom no need abs
int left_x,right_x;
int path;
if( y1>y2 ) { // if Down->Up then swap
int temp_y=y1;
y1=y2;
y2=temp_y;
//
int temp_x=x1;
x1=x2;
x2=temp_x;
} // so now we have Up->Down
// Note that it's odd-r horizontal layout
//OF - Offset Line (y%2==1)
//NOF -Not Offset Line (y%2==0)
if( y1%2==1 && y2%2==0 ){ //OF ->NOF
left_x = x1 - ( (y2 - y1 + 1)/2 -1 ); //UP->DOWN no need abs
right_x = x1 + (y2 - y1 + 1)/2; //UP->DOWN no need abs
}
else if( y1%2==0 && y2%2==1 ){ // OF->NOF
left_x = x1 - (y2 - y1 + 1)/2; //UP->DOWN no need abs
right_x = x1 + ( (y2 - y1 + 1)/2 -1 ); //UP->DOWN no need abs
}
else{
left_x = x1 - (y2 - y1 + 1)/2; //UP->DOWN no need abs
right_x = x1 + (y2 - y1 + 1)/2; //UP->DOWN no need abs
}
/////////////////////////////////////////////////////////////
if( x2>=left_x && x2<=right_x ){
path = y2 - y1;
}
else {
int min_1 = std::abs( left_x - x2 );
int min_2 = std::abs( right_x - x2 );
path = y2 - y1 + std::min(min_1, min_2);
}
std::cout<<"Path: "<<path<<"\n\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\n";
//}
return 0;
}

I believe the answer you seek is:
d((x1,y1),(x2,y2))=max(abs(x1-x2),abs(y1-y2));
You can find a good explanation on hexagonal grid coordinate-system/distances here:
http://keekerdc.com/2011/03/hexagon-grids-coordinate-systems-and-distance-calculations/

Collision detection using the Pythagorean theorem is being unreliable?

Objects can often times pass through each other? Additionally when calculating momentum, occasionally the sprites will form blobs upon collision, moving together instead of bouncing off.
The code does work for most collisions, but it often fails. Any ideas?
xV = X Velocity. yV = Y Velocity. Every frame this velocity values are added to the X and Y positions of the quad.
bool Quad::IsTouching(Quad &q)
{
float distance = 0;
float combinedRadius = (size/2) + (q.GetSize()/2);
distance = sqrt(pow(q.GetX() - GetX(), 2) + pow(q.GetY() - GetY(), 2));
if(distance < combinedRadius)
{
return true;
}
return false;
}
void Quad::Collide(Quad &q)
{
float mX, mY, mX2, mY2, mXTmp, mYTmp;
mX = mass * xV;
mY = mass * yV;
mXTmp = mX;
mYTmp = mY;
mX2 = q.GetMass() * q.GetxV();
mY2 = q.GetMass() * q.GetyV();
mX = mX2;
mY = mY2;
mX2 = mXTmp;
mY2 = mYTmp;
xV = mX/mass;
yV = mY/mass;
q.SetxV(mX2/q.GetMass());
q.SetyV(mY2/q.GetMass());
}

I had the same issue and here is a quick video I made to demonstrate the problem.
The method to solve this is to calculate the exact time of the collision, so the particles would move the remaining time of the iteration/time-step with the new velocity. To do this you would have to check whether the will be a collision before updating the position, so: sqrt((x1 - x2 + dt * (vx1 - vx2))^2 + (y1 - y2 + dt * (vy1 - vy2))^2) <= distance.
You might also be able to get away with a simpler solution, in which you move both object slightly so that they aren't colliding anymore. This would yield a creator inaccuracy but does needs less calculations:
dx = x1 - x2;
dy = y1 - y2;
d = sqrt(dx^2 + dy^2);
D = r1 + r2;
if(d < D)
{
s = (d - D) / (2 * d);
x1 = x1 + s * dx;
x2 = x2 - s * dx;
y1 = y1 + s * dy;
y2 = y2 - s * dy;
}

What type of collisions are you referring to? Elastic or inelastic? For an elastic collision, the code would fail, and you would have to create an additional property to prevent the two objects from sticking together on contact. You would also have to ensure, with a loop or if statement, that if one object is crossing the position of another object at the same time as the other object, that the two will separate with an angle proportional to the collision speed. Use the appropriate physics formulae.

As a deduced, potential, issue (there are no values to the velocities, sizes, etc. supplied so I can't say for sure), you are not accounting that the quads are exactly touching. That is, distance == combinedRadius. Therefore, when this is true the check fails then the objects continue moving on the next tick...right through each other.
Change your check to distance <= combinedRadius. In addition, you may simply be getting a tunneling effect because the objects are small enough and moving fast enough that on each tick they pass through each other. There are multiple ways to fix this some of which are: impose a maximum velocity and a minimum size; increase your frame rate for physics checks; use continuous collision checks versus discrete checks: see wikipedia article on subject

Speed of quadratic curve

I'm writing a 2d game and I have birds in a camera-space. I want to make them fly. So, I generate 3 ~random points. First one is left-upper side, second: middle-bottom, third: right-upper.
As a result I have 180deg rotated triangle.
To move a bird through the curve's path I have a t-parameter which is increased in each frame (render loop) by some delta.
The problem is that in different curves birds have different speed. If the triangle is "wide" (1) they are more slowly, if it's stretched by Y-coordinate (2), the speed is very fast.
But I want to make speed equal at different curves. It's logically, that I have to change delta which is appended each frame for each curve.
I've tried to solve it like this:
Find the ~length of the curve by summing length of 2 vectors: P1P2 and P2P3.
Than I've defined the speed for 1 virtual meter per frame. A little pseudocode:
float pixelsInMeter = 92.f; // One virtual meter equals to this number of pixels
float length = len(P1P2) + len(P2P3)
float speed = 0.0003f; // m/frame
// (length * speed) / etalon_length
float speedForTheCurve = toPixels( (toMeters(length) * speed) / 1.f);
// ...
// Each frame code:
t += speedForTheCurve;
Vector2 newPos = BezierQuadratic(t, P1, P2, P3);
But birds anyway have different speed. What's wrong? Or maybe there is a better way.

The Bezier function you're using is a parametrized function with bounds [0...1]. You're mucking with the step-size, which is why you're getting crazy speeds. Generally speaking, the distance d is the dependent variable in the equation, which says to me that their speeds will be different based on the length of the curve.
Since speed is your dependent variable, we're going to vectorize your function by computing the step-size.
Check out this pseudocode:
P1 = (x1, y1)
P2 = (x2, y2)
P3 = (x3, y3)
int vec[100][2]
int getPoint(int p1, int p2, float stepSize) {
return p1 + (p2 - p1)*stepSize;
}
for (float i = 0.0; i < 1.0; i += 0.01 ) {
int newX = getPoint(getPoint(x1, x2, i), getPoint(x2, x3, i), i);
int newY = getPoint(getPoint(y1, y2, i), getPoint(y2, y3, i), i);
vec[iter++][0] = newX;
vec[iter][1] = newY;
}
You can get the delta values by performing a first difference but I don't think that's necessary. As long as you move all the birds the appropriate distance based on the step iteration they will all move different distances but they will start and end their trajectories identically.
From your equation, we can compute the pixel delta step size:
int pixelsToMove = toMeter(sqrt((x2 - x1)^2 + (y2 - y1)^2))/pixelsInMeter;
Which will give you the appropriate amount of pixels to move the bird. That way they'll all move different step sizes, but their speeds will be different. Does that make sense?
Or, try something like this (much harder):
Obtain the actual quadratic function of the three points you chose.
Integrate the quadratic between two xy rectangular coordinate
Convert computed length into pixels or whatever you're using
Obtain dependent variable speed so all curves finish at the same time.
Let's start with quadratic stuff:
y = Ax^2 + Bx + C where A != 0, so since you have three points, you will need three equations. Using algebra, you can solve for the contants:
A = (y3 - y2)/((x3 - x2)(x3 - x1)) - (y1 - y2)/((x1 - x2)(x3 - x1))
B = (y1 - y2 + A(x2^2 - x1^2))/(x1 - x2)
C = y1 - Ax1^2 - Bx1
Then you can use the formula above to obtain a closed-form arc length. Check this website out, wolfram will integrate it for you and you just have to type it:
Closed form solution for quadradic integration
Now that you've computed the arc length, convert actualArcLength to the speed or whatever unit you're using:
float speedForTheCurve = toPixels( (toMeters(actualArcLength) * speed) / 1.f);

How to fit the 2D scatter data with a line with C++

I used to work with MATLAB, and for the question I raised I can use p = polyfit(x,y,1) to estimate the best fit line for the scatter data in a plate. I was wondering which resources I can rely on to implement the line fitting algorithm with C++. I understand there are a lot of algorithms for this subject, and for me I expect the algorithm should be fast and meantime it can obtain the comparable accuracy of polyfit function in MATLAB.

This page describes the algorithm easier than Wikipedia, without extra steps to calculate the means etc. : http://faculty.cs.niu.edu/~hutchins/csci230/best-fit.htm . Almost quoted from there, in C++ it's:
#include <vector>
#include <cmath>
struct Point {
double _x, _y;
};
struct Line {
double _slope, _yInt;
double getYforX(double x) {
return _slope*x + _yInt;
}
// Construct line from points
bool fitPoints(const std::vector<Point> &pts) {
int nPoints = pts.size();
if( nPoints < 2 ) {
// Fail: infinitely many lines passing through this single point
return false;
}
double sumX=0, sumY=0, sumXY=0, sumX2=0;
for(int i=0; i<nPoints; i++) {
sumX += pts[i]._x;
sumY += pts[i]._y;
sumXY += pts[i]._x * pts[i]._y;
sumX2 += pts[i]._x * pts[i]._x;
}
double xMean = sumX / nPoints;
double yMean = sumY / nPoints;
double denominator = sumX2 - sumX * xMean;
// You can tune the eps (1e-7) below for your specific task
if( std::fabs(denominator) < 1e-7 ) {
// Fail: it seems a vertical line
return false;
}
_slope = (sumXY - sumX * yMean) / denominator;
_yInt = yMean - _slope * xMean;
return true;
}
};
Please, be aware that both this algorithm and the algorithm from Wikipedia ( http://en.wikipedia.org/wiki/Simple_linear_regression#Fitting_the_regression_line ) fail in case the "best" description of points is a vertical line. They fail because they use
y = k*x + b
line equation which intrinsically is not capable to describe vertical lines. If you want to cover also the cases when data points are "best" described by vertical lines, you need a line fitting algorithm which uses
A*x + B*y + C = 0
line equation. You can still modify the current algorithm to produce that equation:
y = k*x + b <=>
y - k*x - b = 0 <=>
B=1, A=-k, C=-b
In terms of the above code:
B=1, A=-_slope, C=-_yInt
And in "then" block of the if checking for denominator equal to 0, instead of // Fail: it seems a vertical line, produce the following line equation:
x = xMean <=>
x - xMean = 0 <=>
A=1, B=0, C=-xMean
I've just noticed that the original article I was referring to has been deleted. And this web page proposes a little different formula for line fitting: http://hotmath.com/hotmath_help/topics/line-of-best-fit.html
double denominator = sumX2 - 2 * sumX * xMean + nPoints * xMean * xMean;
...
_slope = (sumXY - sumY*xMean - sumX * yMean + nPoints * xMean * yMean) / denominator;
The formulas are identical because nPoints*xMean == sumX and nPoints*xMean*yMean == sumX * yMean == sumY * xMean.

I would suggest coding it from scratch. It is a very simple implementation in C++. You can code up both the intercept and gradient for least-squares fit (the same method as polyfit) from your data directly from the formulas here
http://en.wikipedia.org/wiki/Simple_linear_regression#Fitting_the_regression_line
These are closed form formulas that you can easily evaluate yourself using loops. If you were using higher degree fits then I would suggest a matrix library or more sophisticated algorithms but for simple linear regression as you describe above this is all you need. Matrices and linear algebra routines would be overkill for such a problem (in my opinion).

Equation of line is Ax + By + C=0.
So it can be easily( when B is not so close to zero ) convert to y = (-A/B)*x + (-C/B)
typedef double scalar_type;
typedef std::array< scalar_type, 2 > point_type;
typedef std::vector< point_type > cloud_type;
bool fit( scalar_type & A, scalar_type & B, scalar_type & C, cloud_type const& cloud )
{
if( cloud.size() < 2 ){ return false; }
scalar_type X=0, Y=0, XY=0, X2=0, Y2=0;
for( auto const& point: cloud )
{ // Do all calculation symmetric regarding X and Y
X += point[0];
Y += point[1];
XY += point[0] * point[1];
X2 += point[0] * point[0];
Y2 += point[1] * point[1];
}
X /= cloud.size();
Y /= cloud.size();
XY /= cloud.size();
X2 /= cloud.size();
Y2 /= cloud.size();
A = - ( XY - X * Y ); //!< Common for both solution
scalar_type Bx = X2 - X * X;
scalar_type By = Y2 - Y * Y;
if( fabs( Bx ) < fabs( By ) ) //!< Test verticality/horizontality
{ // Line is more Vertical.
B = By;
std::swap(A,B);
}
else
{ // Line is more Horizontal.
// Classical solution, when we expect more horizontal-like line
B = Bx;
}
C = - ( A * X + B * Y );
//Optional normalization:
// scalar_type D = sqrt( A*A + B*B );
// A /= D;
// B /= D;
// C /= D;
return true;
}

You can also use or go over this implementation there is also documentation here.

Fitting a Line can be acomplished in different ways.
Least Square means minimizing the sum of the squared distance.
But you could take another cost function as example the (not squared) distance. But normaly you use the squred distance (Least Square).
There is also a possibility to define the distance in different ways. Normaly you just use the "y"-axis for the distance. But you could also use the total/orthogonal distance. There the distance is calculated in x- and y-direction. This can be a better fit if you have also errors in x direction (let it be the time of measurment) and you didn't start the measurment on the exact time you saved in the data. For Least Square and Total Least Square Line fit exist algorithms in closed form. So if you fitted with one of those you will get the line with the minimal sum of the squared distance to the datapoints. You can't fit a better line in the sence of your defenition. You could just change the definition as examples taking another cost function or defining distance in another way.
There is a lot of stuff about fitting models into data you could think of, but normaly they all use the "Least Square Line Fit" and you should be fine most times. But if you have a special case it can be necessary to think about what your doing. Taking Least Square done in maybe a few minutes. Thinking about what Method fits you best to the problem envolves understanding the math, which can take indefinit time :-).

Note: This answer is NOT AN ANSWER TO THIS QUESTION but to this one "Line closest to a set of points" that has been flagged as "duplicate" of this one (incorrectly in my opinion), no way to add new answers to it.
The question asks for:
Find the line whose distance from all the points is minimum ? By
distance I mean the shortest distance between the point and the line.
The most usual interpretation of distance "between the point and the line" is the euclidean distance and the most common interpretation of "from all points" is the sum of distances (in absolute or squared value).
When the target is minimize the sum of squared euclidean distances, the linear regression (LST) is not the algorithm to use. In addition, linear regression can not result in a vertical line. The algorithm to be used is the "total least squares". See by example wikipedia for the problem description and this answer in math stack exchange for details about the formulation.

to fit a line y=param[0]x+param[1] simply do this:
// loop over data:
{
sum_x += x[i];
sum_y += y[i];
sum_xy += x[i] * y[i];
sum_x2 += x[i] * x[i];
}
// means
double mean_x = sum_x / ninliers;
double mean_y = sum_y / ninliers;
float varx = sum_x2 - sum_x * mean_x;
float cov = sum_xy - sum_x * mean_y;
// check for zero varx
param[0] = cov / varx;
param[1] = mean_y - param[0] * mean_x;
More on the topic http://easycalculation.com/statistics/learn-regression.php
(formulas are the same, they just multiplied and divided by N, a sample sz.). If you want to fit plane to 3D data use a similar approach -
http://www.mymathforum.com/viewtopic.php?f=13&t=8793
Disclaimer: all quadratic fits are linear and optimal in a sense that they reduce the noise in parameters. However, you might interested in the reducing noise in the data instead. You might also want to ignore outliers since they can bia s your solutions greatly. Both problems can be solved with RANSAC. See my post at:

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Speeding up self-similarity in an image - c++

Related

Reverse engineering - Is this a cheap 3D distance function?

Distance between 2 hexagons on hexagon grid

Collision detection using the Pythagorean theorem is being unreliable?

Speed of quadratic curve

How to fit the 2D scatter data with a line with C++

Categories

Resources