C++ floating point comparison - c++

Suppose you have have a rectangle, bottom-left point 0,0 and upper-right point is 100,100.
Now two line intersects the rectangle. I have to find out the coordinate of the intersection point. I have done that. Now the problem is I can't tell whether it is inside the rectangle or not. I used double comparison. But I think it is giving me wrong answer. Suppose the intersection point is ( x , y ). I used this checking for comparison : if( x >= 0.0 && x <= 100.0 && y >= 0.0 && y <= 100.0 ). What should I do?
//this function generates line
line genline( int x1 , int y1 , int x2 , int y2 ){
line l ;
l.A = y2 - y1 ;
l.B = x1 - x2 ;
l.C = l.A * x1 + l.B * y1 ;
return l ;
}
//this function checks intersection
bool intersect( line m ,line n ) {
int det = m.A * n.B - m.B * n.A ;
if( det == 0 ){
return false ;
}
else {
double x = ( n.B * m.C - m.B * n.C ) / ( det * 1.0 ) ;
double y = ( m.A * n.C - n.A * m.C ) / ( det * 1.0 ) ;
if( x >= 0.0 && x <= L && y >= 0.0 && y <= W ) { return true ; }
else{ return false ; }
}
}
EDIT:
Both the line are stretched to infinity.

Your math looks like it's right. By the way, If a line intersects something, it is always inside that something.

Checking to see if a point is inside a rectangle is relatively easy. However, the challenge is to find the intersection between two line segments. There are a large number of corner cases to that problem and limited accuracy of floating point numbers play a huge roll here.
Your algorithm seems to be overly simplistic. For a deeper discussion about this topic you can look at this and this. This two parts article investigates the problem of finding the intersection of two lines using floating point numbers. Notice that they are about MATLAB not C++ though that does not change the problem and the algorithms are easily translatable to any language.
Depending on application, even with clever tricks floating point representation might not simply cut it for some geometry problems. CGAL is a C++ library dedicated to computational geometry that deals with these kind problems. When necessary it uses arbitrary precision arithmetic to handle degenerate cases.

When you're dealing with floating point (or double), testing for equality is naïve and will fail in edge cases. Every comparison you make should be in reference to "epsilon", an extremely small quantity that doesn't matter. If two numbers are within epsilon for each other, then they are considered equal.
For example, instead of "if(a == b)", you need:
bool isEqual(double a, double b, double epsilon = 1.E-10)
{ return fabs(a - b) <= epsilon;
}
Pick a suitable value for epsilon depending on your problem domain.

Related

What is the easiest way to round a floating point according to its sign

I'm using SFML library for a game application with moving objects. In the game, I have a ball object and I need to make sure that the ball's offset on the Y-axis each move is at least 0.1 (or -0.1)
Here's the code I'm using now:
if (offset.y < 0.0 && offset.y > -0.1) offset.y = -0.1;
if (offset.y > 0.0 && offset.y < 0.1) offset.y = 0.1;
Is there an easier/prettier way to accomplish that?
Edit:
As pointed out by the comments, code should include 0.0 case
if (offset.y < 0.0 && offset.y > -0.1) offset.y = -0.1;
if (offset.y >= 0.0 && offset.y < 0.1) offset.y = 0.1;
Just use std::abs() and std::copysign():
if (std::abs(offset.y) < .1)
offset.y = std::copysign(.1, offset.y);
Preserving the sign of zero any other way is difficult.
Though, consider whether you cannot use an integral model instead; .1 cannot be exactly represented as binary floating point.
What would be "easier or prettier" is highly opinion based, I'm afraid, but you could start by wrapping the logic into a function with a meaningful name (hopefully better than mine).
double at_least(double min_value, double x)
{
if (x < 0.0 && x > -min_value)
return -min_value;
if (x >= 0.0 && x < min_value)
return min_value;
return x;
}
Then, you could experiment some other alternatives:
#include <cmath>
#include <algorithm>
double at_least(double min_value, double x)
{
return std::copysign(std::max(min_value, std::abs(x)), x);
}
Or this one
double at_least(double min_value, double x)
{
if (x < 0.0 )
return std::min(-min_value, x);
else
return std::max(min_value, x);
}
After testing the correctness of each one, you could also profile them, if performances are important for your task. See e.g. those quick benchmarks:
http://quick-bench.com/AXt8U9vKg-75XXOFMyCK0g14RyQ
http://quick-bench.com/_nfoT0BKsAvh6QDzAWc-cX3_KYU
If you only need 1 digit after decimal point, you can use a scaled integer. To convert it to a floating point number you multiply it by 0.1. To convert a floating point number to the integer you multiply it by 10 and round to the nearest integer.
I think it's good enough . It looks clear .
You can play with e.g functions from cmath such as abs,signbit . Or wrapping it to new function- But I can't see any point in doing that

2 Intersection Lines

i have this code to find the status of 2 lines there are 3 cases:
1- the lines are intersect in one point
2- they are parallel
3- special case of parallel they are identical[same line]
and here is my code but i still can't understand this two parts of code
if(!D && (Dx || Dy))
puts("NONE"); // the lines are parallel
if(!D && !Dx && !Dy)
puts("LINE"); // they are the same lime
why when the determinater is zero and dx = 0 or dy = 0 then they are parallel and when d = 0 and dx = 0 and dy = 0 then they are the same line
i know when determinater equal to zero then there is no unique solution but can't understand the part of !DX and !DY and DX || DY
Here is the full code
#include <iostream>
using namespace std;
struct point{
int x , y;
};
struct segment{
point s , e;
};
int main(){
int n;
scanf("%d" , &n);
puts("INTERSECTING LINES OUTPUT");
while(n--){
segment a , b;
scanf("%d%d%d%d" , &a.s.x , &a.s.y , &a.e.x , &a.e.y);
scanf("%d%d%d%d" , &b.s.x , &b.s.y , &b.e.x , &b.e.y);
double a1 , b1 , c1 , a2 , b2 , c2 , D , Dx , Dy;
a1 = a.e.y - a.s.y;
b1 = a.s.x - a.e.x;
c1 = a1 * (a.s.x) + b1 * (a.s.y);
a2 = b.e.y - b.s.y;
b2 = b.s.x - b.e.x;
c2 = a2 * (b.s.x) + b2 * (b.s.y);
D = a1 * b2 - a2 * b1;
Dx = c1 * b2 - c2 * b1;
Dy = a1 * c2 - a2 * c1;
if(!D && (Dx || Dy))
puts("NONE"); // the lines are parallel
if(!D && !Dx && !Dy)
puts("LINE"); // they are the same line
else printf("POINT %.2f %.2f\n" , (double)Dx / D , (double) Dy / D);
}
return 0;
}
One way to think about this, roughly speaking, is that when D==0 and Dx==0 and Dy==0, then the "coordinates of the intersection point" Dx/D and Dy/D are indeterminate forms 0/0 which could be any number. That means that the lines intersect in a whole bunch of points, which is only possible if they are the same line.
On the other hand, if D==0 and Dx!=0 or Dy!=0 (or both), then the value of Dx/D or Dy/D (or both) is infinity. In other words, the lines intersect at infinity (only), which is another way of saying that the lines are parallel and not coincident.
Testing if(D) for a double is a bad idea, for a couple of reasons. 1) It's not clear. Understanding that expression requires some fairly detailed knowledge of type conversions in the language that frankly not every programmer has. 2) Rounding errors and other numerical instability issues could interfere with the calculation and instead of D==0 you have some tiny D not equal to 0. Then !D is considered true, even though it should be false without the rounding error. Instead you should compare whether abs(D) < some tolerance which is slightly bigger than 0.
Finally your program is vulnerable to overflow issues: if the two lines are close in slope, but otherwise completely reasonable, the intersection point may be extremely large. This is a situation that makes sense in a purely mathematical context, but not so much sense on a computer. Instead, a better question to ask would be whether the intersection lies within the line segments defined by the points. There is a discussion of that problem on the Wikipedia page.
The feedback that worries about floating point arithmetic is on point (Ha!). Since the input is integral, though, I might suggest doing the math with integer types.
The questions you are asking boil down to seeing if two fractions are equivalent. I suggest the following helper class:
struct ratio {
int dx;
int dy;
ratio(int dxIn, int dyIn) :
dx(dxIn), dy(dyIn) {
}
bool isEquivalent(ratio rhs) const {
if ((dx == 0 && dy == 0) || (rhs.dx == 0 && rhs.dy == 0))
return (dx == 0 && dy == 0) && (rhs.dx == 0 && rhs.dy == 0);
return dx * rhs.dy == dy * rhs.dx;
}
};
Each line segment is converted into (a,b,c) coordinates for the extended line which form the line equation a*x+b*y+c=0. What you need to know, is that the vector (a,b) is orthogonal to the line.
The quantity D = a1*b2 - a2*b1 is the cross product between the two orthogonal vectors. As you know the cross product of two parallel vectors is 0. So if that is true, the two lines are either coincident or parallel.
The intersection point is defined as (Dx/D,Dy/D) where Dx=b1*c2-b2*c1 and Dy=a2*c1-a1*c2, so when all equals zero Dx=Dy=D=0 the intersection point is undefined, meaning the lines are coincident. Otherwise if only D=0 the intersection point is at infinity and the lines are parallel.
The rest is just typical confusing C syntax.

What is wrong with my intersection checking algorithm?

I know there are many sites which explain how to check for an intersection of two lines, but I find it utterly boring to just copy and paste code for such a simple mathematical task. The more it frustrates me that I cannot get my code to work. I know questions with "What is wrong in my code?" are stupid, but I don't know what the hell is wrong with my mathematics / code, also my code is documented nicely (except of admittedly bad variable naming), so I guess there should be someone who is interested in the math behind it:
bool segment::checkforIntersection(QPointF a, QPointF b) { //line 1: a+bx, line 2: c+dx, note that a and c are called offset and bx and dx are called gradients in this code
QPointF bx = b-a;
double firstGradient = bx.y() / bx.x(); //gradient of line 1
//now we have to calculate the offset of line 1: we have b from a+bx. Since QPointF a is on that line, it is:
//a + b * a.x = a.y with a as free variable, which yields a = a.y - b*a.x.
//One could also use the second point b for this calculation.
double firstOffset = a.y() - firstGradient * a.x();
double secondGradient, secondOffset;
for (int i = 0; i < poscount-3; i++) { //we dont check with the last line, because that could be the same line, as the one that emited intersection checking
QPointF c = pos[i];
QPointF d = pos[i+1];
QPointF dx = d-c;
secondGradient = dx.y() / dx.x(); //same formula as above
secondOffset = c.y() - secondGradient * c.x();
//a+bx=c+dx <=> a-c = (d-b)x <=> (a-c)/(d-b) = x
double x = (firstOffset - secondOffset) / (secondGradient - firstGradient);
//we have to check, if those lines intersect with a x \in [a.x,b.x] and x \in [c.x,d.x]. If this is the case, we have a collision
if (x >= a.x() && x <= b.x() && x >= c.x() && x <= d.x()) {
return true;
}
}
return false;
}
So what this does, it has 4 points a, b, c, d (line 1: a--b, line 2: c--d) (ignore the for loop) which have an absolute x and y value. First it calculates the gradient of the lines by calculating deltay/deltax. Then it calculates the offset by using the fact that point a (or c respectively) are on the lines. This way we transformed the 4 points into mathematical representation of these lines as equation a+bx, whereas a x of 0 means that we are at the first point (a / c) and a x of 1 means that we are on the second point (b/d). Next we calculate the intersection of those two lines (basic algebra). After that we check if the intersection's x value is valid. To my understanding this is all correct. Does anyone see the error?
This was empirically checked to be incorrect. The code does not give any false Positives (says there is an intersection, when there isn't), but it gives false Negatives (says there is no intersection, when there actually is). So when it says there is an Intersection it is correct, however if it says there is no intersection, you cannot always believe my algorithm.
Again, I checked online, but the algorithms are different (with some orientation tricks and something), I just wanted to come up with my own algorithm, I would be so glad if someone could help. :)
Edit: Here is a minimal reproducable not working example, this time without Qt but with C++ only:
#include <iostream>
#include <math.h>
using namespace std;
class Point {
private:
double xval, yval;
public:
// Constructor uses default arguments to allow calling with zero, one,
// or two values.
Point(double x = 0.0, double y = 0.0) {
xval = x;
yval = y;
}
// Extractors.
double x() { return xval; }
double y() { return yval; }
Point sub(Point b)
{
return Point(xval - b.xval, yval - b.yval);
}
};
bool checkforIntersection(Point a, Point b, Point c, Point d) { //line 1: a+bx, line 2: c+dx, note that a and c are called offset and bx and dx are called gradients in this code
Point bx = b.sub(a);
double firstGradient = bx.y() / bx.x(); //gradient of line 1
//now we have to calculate the offset of line 1: we have b from a+bx. Since Point a is on that line, it is:
//a + b * a.x = a.y with a as free variable, which yields a = a.y - b*a.x.
//One could also use the second point b for this calculation.
double firstOffset = a.y() - firstGradient * a.x();
double secondGradient, secondOffset;
Point dx = d.sub(c);
secondGradient = dx.y() / dx.x(); //same formula as above
secondOffset = c.y() - secondGradient * c.x();
//a+bx=c+dx <=> a-c = (d-b)x <=> (a-c)/(d-b) = x
double x = (firstOffset - secondOffset) / (secondGradient - firstGradient);
//we have to check, if those lines intersect with a x \in [a.x,b.x] and x \in [c.x,d.x]. If this is the case, we have a collision
if (x >= a.x() && x <= b.x() && x >= c.x() && x <= d.x()) {
return true;
}
return false;
}
int main(int argc, char const *argv[]) {
if (checkforIntersection(Point(310.374,835.171),Point(290.434,802.354), Point(333.847,807.232), Point(301.03,827.172)) == true) {
cout << "These lines do intersect so I should be printed out\n";
} else {
cout << "The algorithm does not work, so instead I do get printed out\n";
}
return 0;
}
So as example I took the points ~ (310,835) -- (290,802), and (333,807) -- (301,827). These lines do intersect:
\documentclass[crop,tikz]{standalone}
\begin{document}
\begin{tikzpicture}[x=.1cm,y=.1cm]
\draw (310,835) -- (290,802);
\draw (333,807) -- (301,827);
\end{tikzpicture}
\end{document}
Proof of intersection
However when running the above C++ code, it says that they do not intersect
(you may call me a pedant, but the terminology is important)
If you want to see if the line segments intersect, then rely on the parametric representation of your two segments, solve the system in the two parameters and see if both of the solution for both of the parameters falls into [0,1] range.
Parametric representation for segment [a, b], component-wise
{x, y}(t) = {(1-t)*ax+t*bx, (1-t)*ay+t*by} with t in the [0,1] range
Quick check - at t=0 you get a, at t=1 you get b, the expression is linear in t, so there you have it.
So, your (a,b) (c,d) intersection problem becomes:
// for some t1 and t2, the x coordinate is the same
(1-t1)*ax+t*bx=(1-t2)*cx+t2*dx;
(1-t1)*ay+t*by=(1-t2)*cy+t2*dy; // and so is the y coordinate
Solve the system in t1 and t2. If t1 is in the [0,1] range, then the intersection lies between a and b, the same goes for t2 in respect with c and d.
It is left as an exercise for the reader the study of what effects will have on the system above the following conditions and what checks should be implemented for a robust algorithm:
segment degeneracy - coincident ends for one or both segments
collinear segments with non-void overlap. Particular case when there's a single point of overlap (necessary, that point will be one of the ends)
collinear segments with no overlap
parallel segments
First it calculates the gradient of the lines by calculating deltay/deltax.
And what happens when deltax is very close to zero?
Look, what you are doing is exposing yourself to ill-conditioned situations - always fear divisions and straight comparison with 0.0 when it comes to computational geometry.
Alternative:
two lines will intersect if they are not parallel
two distinct lines will be parallel if their definition vectors will have a zero cross-product.
Cross-product of your (a,b) x (c,d) = (ax-bx)*(cy-dy)-(ay-by)*(cx-dx) - if this is close enough to zero, for all practical purposes there's no intersection between your lines (the intersection is so far away it doesn't matter).
Now, what remains to be said:
there will need to be a "are those line distinct?" test before going into computing the cross-product. Even more, you will need to treat degenerate cases (one or both of the lines are reduced to a point by coincident ends - like a==b and/or c==d)
the "close enough to zero" test is ambiguous if you don't normalize your definition vectors - imagine a 1 lightsecond-length vector defining the first line and a 1 parsec-length vector for the other (What test for 'proximity to zero' should you use in this case?) To normalize the two vectors, just apply a division ... (a division you say? I'm already shaking with fear) ... mmm.. I was saying to divide the resulted cross-product with hypot(ax-bx, ay-by)*hypot(cx-dx,cy-dy) (do you see why the degeneracy cases need to be treated in advance?)
after the normalization, once again, what would be a good 'proximity to zero' test for the resulted cross-product? Well, I think I can go on with the analysis for another hour or so (e.g. how far the intersection would be when compared with the extent of your {a,b,c,d} polygon), but... since the cross-product of two unitary vectors (after normalization) is sin(angle-between-versors), you may use your common sense and say 'if the angle is less that 1 degree, will this be good enough to consider the two lines parallel? No? What about 1 arcsecond?"

See if a point lies on a line(vector)

I have currently the following line in my program. I have two other whole number variables, x and y.
I wish to see if this new point(x, y) is on this line. I have been looking at the following thread:
Given a start and end point, and a distance, calculate a point along a line
I've come up with the following:
if(x >= x1 && x <= x2 && (y >= y1 && y <= y2 || y <= y1 && y >= y2))
{
float vx = x2 - x1;
float vy = y2 - y1;
float mag = sqrt(vx*vx + vy*vy);
// need to get the unit vector (direction)
float dvx = vx/mag; // this would be the unit vector (direction) x for the line
float dvy = vy/mag; // this would be the unit vector (direction) y for the line
float vcx = x - x1;
float vcy = y - y1;
float magc = sqrt(vcx*vcx + vcy*vcy);
// need to get the unit vector (direction)
float dvcx = vcx/magc; // this would be the unit vector (direction) x for the point
float dvcy = vcy/magc; // this would be the unit vector (direction) y for the point
// I was thinking of comparing the direction of the two vectors, if they are the same then the point must lie on the line?
if(dvcx == dvx && dvcy == dvy)
{
// the point is on the line!
}
}
It doesn't seem to be working, or is this idea whack?
Floating point numbers have a limited precision, so you'll get rounding errors from the calculations, with the result that values that should mathematically be equal will end up slightly different.
You'll need to compare with a small tolerance for error:
if (std::abs(dvcx-dvx) < tolerance && std::abs(dvcy-dvy) < tolerance)
{
// the point is (more or less) on the line!
}
The hard part is choosing that tolerance. If you can't accept any errors, then you'll need to use something other than fixed-precision floating point values - perhaps integers, with the calculations rearranged to avoid division and other inexact operations.
In any case, you can do this more simply, without anything like a square root. You want to find out if the two vectors are parallel; they are if the vector product is zero or, equivalently, if they have equal tangents. So you just need
if (vx * vcy == vy * vcx) // might still need a tolerance for floating-point
{
// the point is on the line!
}
If your inputs are integers, small enough that the multiplication won't overflow, then there's no need for floating-point arithmetic at all.
An efficient way to solve this problem is to use the signed area of a triangle. When the signed area of the triangle created by points {x1,y1}, {x2,y2}, and {x,y} is near-zero, you can consider {x,y} to be on the line. As others have mentioned, picking a good tolerance value is an important part of this if you are using floating point values.
bool isPointOnLine (xy p1, xy p2, xy p3) // returns true if p3 is on line p1, p2
{
xy va = p1 - p2;
xy vb = p3 - p2;
area = va.x * vb.y - va.y * vb.x;
if (abs (area) < tolerance)
return true;
return false;
}
This will let you know if {x,y} lies on the line, but it will not determine if {x,y} is contained by the line segment. To do that, you would also need to check {x,y} against the bounds of the line segment.
First you need to calculate the equation of your line. Then see if this equation holds true for the values of x and y that you have. To calculate the equation of your line, you need to work out where it croses the y-axis and what its gradient is. The equation will be of the form y=mx+c where m is the gradient and c is the 'intercept' (where the line crosses the y-axis).
For float values, don't use == but instead test for small difference:
if (fabs(dvcx-dvx) < delta && fabs(dvcy-dvy) < delta)
Also, you don't really need the unit vector, just the tangent:
float originalTangent = (y2 - y1) / (x2 - x1);
float newTangent = (y - y1) / (x - x1);
if (fabs(newTangent - originalTangent) < delta) { ... }
(delta should be some small number that depends on the accuracy you are expecting.)
Given that (x, y) is actually a point, the job seems a bit simpler than you're making it.
You probably want to start by checking for a perfectly horizontal or vertical line. In those cases, you just check whether x falls between x1 and x2 (or y between y1 and y2 for vertical).
Otherwise you can use linear interpolation on x and see if it gives you the correct value for y (within some possible tolerance for rounding). For this, you'd do something like:
slope = (y2-y1)/(x2-x1);
if (abs(slope * (x - x1) - y) < tolerance)
// (x,y) is on the line
else
// (x,y) isn't on the line

How to fit the 2D scatter data with a line with C++

I used to work with MATLAB, and for the question I raised I can use p = polyfit(x,y,1) to estimate the best fit line for the scatter data in a plate. I was wondering which resources I can rely on to implement the line fitting algorithm with C++. I understand there are a lot of algorithms for this subject, and for me I expect the algorithm should be fast and meantime it can obtain the comparable accuracy of polyfit function in MATLAB.
This page describes the algorithm easier than Wikipedia, without extra steps to calculate the means etc. : http://faculty.cs.niu.edu/~hutchins/csci230/best-fit.htm . Almost quoted from there, in C++ it's:
#include <vector>
#include <cmath>
struct Point {
double _x, _y;
};
struct Line {
double _slope, _yInt;
double getYforX(double x) {
return _slope*x + _yInt;
}
// Construct line from points
bool fitPoints(const std::vector<Point> &pts) {
int nPoints = pts.size();
if( nPoints < 2 ) {
// Fail: infinitely many lines passing through this single point
return false;
}
double sumX=0, sumY=0, sumXY=0, sumX2=0;
for(int i=0; i<nPoints; i++) {
sumX += pts[i]._x;
sumY += pts[i]._y;
sumXY += pts[i]._x * pts[i]._y;
sumX2 += pts[i]._x * pts[i]._x;
}
double xMean = sumX / nPoints;
double yMean = sumY / nPoints;
double denominator = sumX2 - sumX * xMean;
// You can tune the eps (1e-7) below for your specific task
if( std::fabs(denominator) < 1e-7 ) {
// Fail: it seems a vertical line
return false;
}
_slope = (sumXY - sumX * yMean) / denominator;
_yInt = yMean - _slope * xMean;
return true;
}
};
Please, be aware that both this algorithm and the algorithm from Wikipedia ( http://en.wikipedia.org/wiki/Simple_linear_regression#Fitting_the_regression_line ) fail in case the "best" description of points is a vertical line. They fail because they use
y = k*x + b
line equation which intrinsically is not capable to describe vertical lines. If you want to cover also the cases when data points are "best" described by vertical lines, you need a line fitting algorithm which uses
A*x + B*y + C = 0
line equation. You can still modify the current algorithm to produce that equation:
y = k*x + b <=>
y - k*x - b = 0 <=>
B=1, A=-k, C=-b
In terms of the above code:
B=1, A=-_slope, C=-_yInt
And in "then" block of the if checking for denominator equal to 0, instead of // Fail: it seems a vertical line, produce the following line equation:
x = xMean <=>
x - xMean = 0 <=>
A=1, B=0, C=-xMean
I've just noticed that the original article I was referring to has been deleted. And this web page proposes a little different formula for line fitting: http://hotmath.com/hotmath_help/topics/line-of-best-fit.html
double denominator = sumX2 - 2 * sumX * xMean + nPoints * xMean * xMean;
...
_slope = (sumXY - sumY*xMean - sumX * yMean + nPoints * xMean * yMean) / denominator;
The formulas are identical because nPoints*xMean == sumX and nPoints*xMean*yMean == sumX * yMean == sumY * xMean.
I would suggest coding it from scratch. It is a very simple implementation in C++. You can code up both the intercept and gradient for least-squares fit (the same method as polyfit) from your data directly from the formulas here
http://en.wikipedia.org/wiki/Simple_linear_regression#Fitting_the_regression_line
These are closed form formulas that you can easily evaluate yourself using loops. If you were using higher degree fits then I would suggest a matrix library or more sophisticated algorithms but for simple linear regression as you describe above this is all you need. Matrices and linear algebra routines would be overkill for such a problem (in my opinion).
Equation of line is Ax + By + C=0.
So it can be easily( when B is not so close to zero ) convert to y = (-A/B)*x + (-C/B)
typedef double scalar_type;
typedef std::array< scalar_type, 2 > point_type;
typedef std::vector< point_type > cloud_type;
bool fit( scalar_type & A, scalar_type & B, scalar_type & C, cloud_type const& cloud )
{
if( cloud.size() < 2 ){ return false; }
scalar_type X=0, Y=0, XY=0, X2=0, Y2=0;
for( auto const& point: cloud )
{ // Do all calculation symmetric regarding X and Y
X += point[0];
Y += point[1];
XY += point[0] * point[1];
X2 += point[0] * point[0];
Y2 += point[1] * point[1];
}
X /= cloud.size();
Y /= cloud.size();
XY /= cloud.size();
X2 /= cloud.size();
Y2 /= cloud.size();
A = - ( XY - X * Y ); //!< Common for both solution
scalar_type Bx = X2 - X * X;
scalar_type By = Y2 - Y * Y;
if( fabs( Bx ) < fabs( By ) ) //!< Test verticality/horizontality
{ // Line is more Vertical.
B = By;
std::swap(A,B);
}
else
{ // Line is more Horizontal.
// Classical solution, when we expect more horizontal-like line
B = Bx;
}
C = - ( A * X + B * Y );
//Optional normalization:
// scalar_type D = sqrt( A*A + B*B );
// A /= D;
// B /= D;
// C /= D;
return true;
}
You can also use or go over this implementation there is also documentation here.
Fitting a Line can be acomplished in different ways.
Least Square means minimizing the sum of the squared distance.
But you could take another cost function as example the (not squared) distance. But normaly you use the squred distance (Least Square).
There is also a possibility to define the distance in different ways. Normaly you just use the "y"-axis for the distance. But you could also use the total/orthogonal distance. There the distance is calculated in x- and y-direction. This can be a better fit if you have also errors in x direction (let it be the time of measurment) and you didn't start the measurment on the exact time you saved in the data. For Least Square and Total Least Square Line fit exist algorithms in closed form. So if you fitted with one of those you will get the line with the minimal sum of the squared distance to the datapoints. You can't fit a better line in the sence of your defenition. You could just change the definition as examples taking another cost function or defining distance in another way.
There is a lot of stuff about fitting models into data you could think of, but normaly they all use the "Least Square Line Fit" and you should be fine most times. But if you have a special case it can be necessary to think about what your doing. Taking Least Square done in maybe a few minutes. Thinking about what Method fits you best to the problem envolves understanding the math, which can take indefinit time :-).
Note: This answer is NOT AN ANSWER TO THIS QUESTION but to this one "Line closest to a set of points" that has been flagged as "duplicate" of this one (incorrectly in my opinion), no way to add new answers to it.
The question asks for:
Find the line whose distance from all the points is minimum ? By
distance I mean the shortest distance between the point and the line.
The most usual interpretation of distance "between the point and the line" is the euclidean distance and the most common interpretation of "from all points" is the sum of distances (in absolute or squared value).
When the target is minimize the sum of squared euclidean distances, the linear regression (LST) is not the algorithm to use. In addition, linear regression can not result in a vertical line. The algorithm to be used is the "total least squares". See by example wikipedia for the problem description and this answer in math stack exchange for details about the formulation.
to fit a line y=param[0]x+param[1] simply do this:
// loop over data:
{
sum_x += x[i];
sum_y += y[i];
sum_xy += x[i] * y[i];
sum_x2 += x[i] * x[i];
}
// means
double mean_x = sum_x / ninliers;
double mean_y = sum_y / ninliers;
float varx = sum_x2 - sum_x * mean_x;
float cov = sum_xy - sum_x * mean_y;
// check for zero varx
param[0] = cov / varx;
param[1] = mean_y - param[0] * mean_x;
More on the topic http://easycalculation.com/statistics/learn-regression.php
(formulas are the same, they just multiplied and divided by N, a sample sz.). If you want to fit plane to 3D data use a similar approach -
http://www.mymathforum.com/viewtopic.php?f=13&t=8793
Disclaimer: all quadratic fits are linear and optimal in a sense that they reduce the noise in parameters. However, you might interested in the reducing noise in the data instead. You might also want to ignore outliers since they can bia s your solutions greatly. Both problems can be solved with RANSAC. See my post at: