Value type preventing calculation? - c++

I am writing a program for class that simply calculates distance between two coordinate points (x,y).
differenceofx1 = x1 - x2;
differenceofy1 = y1 - y2;
squareofx1 = differenceofx1 * differenceofx1;
squareofy1 = differenceofy1 * differenceofy1;
distance1 = sqrt(squareofx1 - squareofy1);
When I calculate the distance, it works. However there are some situations such as the result being a square root of a non-square number, or the difference of x1 and x2 / y1 and y2 being negative due to the input order, that it just gives a distance of 0.00000 when the distance is clearly more than 0. I am using double for all the variables, should I use float instead for the negative possibility or does double do the same job? I set the precision to 8 as well but I don't understand why it wouldn't calculate properly?
I am sorry for the simplicity of the question, I am a bit more than a beginner.

You are using the distance formula wrong
it should be
distance1 = sqrt(squareofx1 + squareofy1);
instead of
distance1 = sqrt(squareofx1 - squareofy1);
due to the wrong formula if squareofx1 is less than squareofy1 you get an error as sqrt of a negative number is not possible in case of real coordinates.

Firstly, your formula is incorrect change it to distance1 = sqrt(squareofx1 + squareofy1) as #fefe mentioned. Btw All your calculation can be represented in one line of code:
distance1 = sqrt((x1-x2)*(x1-x2) + (y1-y2)*(y1-y2));
No need for variables like differenceofx1, differenceofy1, squareofx1, squareofy1 unless you are using the results stored in these variables again in your program.
Secondly, Double give you more precision than float. If you need precision more than 6-7 places after decimal use Double else float works too. Read more about Float vs Double

Related

How to approximate Euclidean distance on the integer plane, without overflow?

I'm working on a platform that has only integer arithmetic. The application uses geographic information, and I'm representing points by (x, y) coordinates where x and y are distances measured in meters.
As an approximation, I want to compute the Euclidean distance between two points. But to do this I have to square distances, and with 32-bit integers, the largest distance I can represent is 32 kilometers. Not good.
My needs are more on the order of 1000 kilometers. But I'd like to be able to resolve distances on a scale smaller than 30 meters.
Hence my question: how can I compute Euclidean distance, using only integer arithmetic, without overflow, on distances whose squares don't fit in a single word?
ETA: I would like to be able to compute distances, but I might settle for being able to compare them.
Perhaps comparing the octagonal distance approximation would be sufficient?
Slightly more up to date is this article on fast approximate distance functions.
I would recommend to use fixed point calculation using integers and then the distance approximation is already not too complicated.
fixed point calculation
distance approximation
Fast Approximate Distance Functions by Rafael Baptista
First step is to choose some fixed point representation for our needs:
For example in case we need a number range for 1000km with 1m resolution we can use 20bits that would be 2^20 = 1,048,576. So we have around 10bits for fractions.
Then we need to implement the approximation we choose:
For example in case we select the following approximation:
h ≈ b (1 + 0.337 (a/b)) = b + 0.337 a AND assuming 0 ≤ a ≤ b
We will implement as follows:
int32_t dx = (x1 > x2 ? x1 - x2 : x2 - x1);
int32_t dy = (y1 > y2 ? y1 - y2 : y2 - y1);
int32_t a = dx > dy ? dy : dx;
int32_t b = dx > dy ? dx : dy;
int32_t h = b + (345 * a >> 10); /* 345.088 = 0.337 * 2^10 */
About overflow:
Adding two <+20.0> positive numbers will result a maximum of <+21.0> number. That is Ok.
The multiplication is also safe while we use numbers in a range of -1..1. In this case the result will also remain in the same range. In our case <+20.0> * <+0.10> will result <+20.10> numbers. That we convert back to <+20.0>.
There is one step here we need to pay attention. During the multiplication we will get temporary a number with <+20.10> that is already near to our 32bits limit.
Exact calculation
We can also calculate the exact distance using the following consideration:
h = b sqrt(1 + (a/b)^2) AND assuming 0 < b ≤ a
In tis case we also need to calculate the square root:
square root
In case the a/b still significantly larger than one or too large to calculate the square of it, we can simplify the calculation to:
h = a
See the implementation here
I would leave the square root out of play, so that I can approximate the Euclidean distance. However, when comparing distances, this approach gives you 100% accuracy, since the comparison would be the same if you squared the distances.
I am pretty sure about that, since I had use that approach when searching for nearest neighbours in high dimensional spaces. You can check my code and the theory in kd-GeRaF.

fastest way to compute angle with x-axis

What is the fastest way to calculate angle between a line and the x-axis?
I need to define a function, which is Injective at the PI:2PI interval (I have to angle between point which is the uppermost point and any point below).
PointType * top = UPPERMOST_POINT;
PointType * targ = TARGET_POINT;
double targetVectorX = targ->x - top->x;
double targetVectorY = targ->y - top->y;
first try
//#1
double magnitudeTarVec = sqrt(targetVectorX*targetVectorX + targetVectorY*targetVectorY);
angle = tarX / magTar;
second try
//#2 slower
angle = atan2(targetVectorY, targetVectorX);
I do not need the angle directly (radians or degrees), just any value is fine as far as by comparing these values of this kind from 2 points I can tell which angle is bigger. (for example angle in example one is between -1 and 1 (it is cosine argument))
Check for y being zero as atan2 does; then The quotient x/y will be plenty fine. (assuming I understand you correctly).
I just wrote Fastest way to sort vectors by angle without actually computing that angle about the general question of finding functions monotonic in the angle, without any code or connections to C++ or the likes. Based on the currently accepted answer there I'd now suggest
double angle = copysign( // magnitude of first argument with sign of second
1. - targetVectorX/(fabs(targetVectorX) + fabs(targetVectorY)),
targetVectorY);
The great benefit compared to the currently accepted answer here is the fact that you won't have to worry about infinite values, since all non-zero vectors (i.e. targetVectorX and targetVectorY are not both equal to zero at the same time) will result in finite pseudoangle values. The resulting pseudoangles will be in the range [−2 … 2] for real angles [−π … π], so the signs and the discontinuity are just like you'd get them from atan2.

Why does division yield a vastly different result than multiplication by a fraction in floating points

I understand why floating point numbers can't be compared, and know about the mantissa and exponent binary representation, but I'm no expert and today I came across something I don't get:
Namely lets say you have something like:
float denominator, numerator, resultone, resulttwo;
resultone = numerator / denominator;
float buff = 1 / denominator;
resulttwo = numerator * buff;
To my knowledge different flops can yield different results and this is not unusual. But in some edge cases these two results seem to be vastly different. To be more specific in my GLSL code calculating the Beckmann facet slope distribution for the Cook-Torrance lighitng model:
float a = 1 / (facetSlopeRMS * facetSlopeRMS * pow(clampedCosHalfNormal, 4));
float b = clampedCosHalfNormal * clampedCosHalfNormal - 1.0;
float c = facetSlopeRMS * facetSlopeRMS * clampedCosHalfNormal * clampedCosHalfNormal;
facetSlopeDistribution = a * exp(b/c);
yields very very different results to
float a = (facetSlopeRMS * facetSlopeRMS * pow(clampedCosHalfNormal, 4));
facetDlopeDistribution = exp(b/c) / a;
Why does it? The second form of the expression is problematic.
If I say try to add the second form of the expression to a color I get blacks, even though the expression should always evaluate to a positive number. Am I getting an infinity? A NaN? if so why?
I didn't go through your mathematics in detail, but you must be aware that small errors get pumped up easily by all these powers and exponentials. You should try and substitute all variables var with var + e(var) (on paper, yes) and derive an expression for the total error - without simplifying in between steps, because that's where the error comes from!
This is also a very common problem in computational fluid dynamics, where you can observe things like 'numeric diffusion' if your grid isn't properly aligned with the simulated flow.
So get a clear grip on where the biggest errors come from, and rewrite equations where possible to minimize the numeric error.
edit: to clarify, an example
Say you have some variable x and an expression y=exp(x). The error in x is denoted e(x) and is small compared to x (say e(x)/x < 0.0001, but note that this depends on the type you are using). Then you could say that
e(y) = y(x+e(x)) - y(x)
e(y) ~ dy/dx * e(x) (for small e(x))
e(y) = exp(x) * e(x)
So there's a magnification of the absolute error of exp(x), meaning that around x=0 there's really no issue (not a surprise, since at that point the slope of exp(x) equals that of x) , but for big x you will notice this.
The relative error would then be
e(y)/y = e(y)/exp(x) = e(x)
whilst the relative error in x was
e(x)/x
so you added a factor of x to the relative error.

Precision of cos(atan2(y,x)) versus using complex <double>, C++

I'm writing some coordinate transformations (more specifically the Joukoswky Transform, Wikipedia Joukowsky Transform), and I'm interested in performance, but of course precision. I'm trying to do the coordinate transformations in two ways:
1) Calculating the real and complex parts in separate, using double precision, as below:
double r2 = chi.x*chi.x + chi.y*chi.y;
//double sq = pow(r2,-0.5*n) + pow(r2,0.5*n); //slow!!!
double sq = sqrt(r2); //way faster!
double co = cos(atan2(chi.y,chi.x));
double si = sin(atan2(chi.y,chi.x));
Z.x = 0.5*(co*sq + co/sq);
Z.y = 0.5*si*sq;
where chi and Z are simple structures with double x and y as members.
2) Using complex :
Z = 0.5 * (chi + (1.0 / chi));
Where Z and chi are complex . There interesting part is that indeed the case 1) is faster (about 20%), but the precision is bad, giving error in the third decimal number after the comma after the inverse transform, while the complex gives back the exact number.
So, the problem is on the cos(atan2), sin(atan2)? But if it is, how the complex handles that?
EDIT: Just figured out that this was not exactly the question that I had in mind. I have to do the general transformation, as
Z = 1/2*(chi^n + (1/chi)^n), and so far the code above was the way I've figured to do it. More precisely,
double sq = pow(sqrt(r2),n); //way faster!
double co = cos(n*atan2(chi.y,chi.x));
double si = sin(n*atan2(chi.y,chi.x));
Z.x = 0.5*(co*sq + co/sq);
Z.y = 0.5*(si*sq - sq/si);
Also correcting the bug on Z.y.
Given r = sqrt(x*x+y*y):
cos(atan2(y,x)) == x/r
sin(atan2(y,x)) == y/r
Calculating it this way should be more accurate and faster.
When you plug these values into the formulas for Z.x and Z.y, the square root will cancel out as well, so you'll be left with only basic arithmetic operations.
I think that in 1) it should be
Z.y = 0.5*(si*sq - si/sq);
If you want really good performance you may want to go back to first principles and observe that
1/(a+ib) = (a-ib)/(a*a+b*b)
No sqrt(), atan2() or cos() or sin().

Most accurate line intersection ordinate computation with floats?

I'm computing the ordinate y of a point on a line at a given abscissa x. The line is defined by its two end points coordinates (x0,y0)(x1,y1). End points coordinates are floats and the computation must be done in float precision for use in GPU.
The maths, and thus the naive implementation, are trivial.
Let t = (x - x0)/(x1 - x0), then y = (1 - t) * y0 + t * y1 = y0 + t * (y1 - y0).
The problem is when x1 - x0 is small. The result will introduce cancellation error. When combined with the one of x - x0, in the division I expect a significant error in t.
The question is if there exist another way to determine y with a better accuracy ?
i.e. should I compute (x - x0)*(y1 - y0) first, and divide by (x1 - x0) after ?
The difference y1 - y0 will always be big.
To a large degree, your underlying problem is fundamental. When (x1-x0) is small, it means there are only a few bits in the mantissa of x1 and x0 which differ. And by extension, there are only a limted number of floats between x0 and x1. E.g. if only the lower 4 bits of the mantissa differ, there are at most 14 values between them.
In your best algorithm, the t term represents these lower bits. And to continue or example, if x0 and x1 differ by 4 bits, then t can take on only 16 values either. The calculation of these possible values is fairly robust. Whether you're calculating 3E0/14E0 or 3E-12/14E-12, the result is going to be close to the mathematical value of 3/14.
Your formula has the additional advantage of having y0 <= y <= y1, since 0 <= t <= 1
(I'm assuming that you know enough about float representations, and therefore "(x1-x0) is small" really means "small, relative to the values of x1 and x0 themselves". A difference of 1E-1 is small when x0=1E3 but large if x0=1E-6 )
You may have a look at Qt's "QLine" (if I remember it right) sources; they have implemented an intersection determination algorithm taken from one the "Graphics Gems" books (the reference must be in the code comments, the book was on EDonkey a couple of years ago), which, in turn, has some guarantees on applicability for a given screen resolution when calculations are performed with given bit-width (they use fixed-point arithmetics if I'm not wrong).
If you have the possibility to do it, you can introduce two cases in your computation, depending on abs(x1-x0) < abs(y1-y0). In the vertical case abs(x1-x0) < abs(y1-y0), compute x from y instead of y from x.
EDIT. Another possibility would be to obtain the result bit by bit using a variant of dichotomic search. This will be slower, but may improve the result in extreme cases.
// Input is X
xmin = min(x0,x1);
xmax = max(x0,x1);
ymin = min(y0,y1);
ymax = max(y0,y1);
for (int i=0;i<20;i++) // get 20 bits in result
{
xmid = (xmin+xmax)*0.5;
ymid = (ymin+ymax)*0.5;
if ( x < xmid ) { xmax = xmid; ymax = ymid; } // first half
else { xmin = xmid; ymin = ymid; } // second half
}
// Output is some value in [ymin,ymax]
Y = ymin;
I have implemented a benchmark program to compare the effect of the different expression.
I computed y using double precision and then compute y using single precision with different expressions.
Here are the expression tested:
inline double getYDbl( double x, double x0, double y0, double x1, double y1 )
{
double const t = (x - x0)/(x1 - x0);
return y0 + t*(y1 - y0);
}
inline float getYFlt1( float x, float x0, float y0, float x1, float y1 )
{
double const t = (x - x0)/(x1 - x0);
return y0 + t*(y1 - y0);
}
inline float getYFlt2( float x, float x0, float y0, float x1, float y1 )
{
double const t = (x - x0)*(y1 - y0);
return y0 + t/(x1 - x0);
}
inline float getYFlt3( float x, float x0, float y0, float x1, float y1 )
{
double const t = (y1 - y0)/(x1 - x0);
return y0 + t*(x - x0);
}
inline float getYFlt4( float x, float x0, float y0, float x1, float y1 )
{
double const t = (x1 - x0)/(y1 - y0);
return y0 + (x - x0)/t;
}
I computed the average and stdDev of the difference between the double precision result and single precision result.
The result is that there is none on the average over 1000 and 10K random value sets. I used icc compiler with and without optimization as well as g++.
Note that I had to use the isnan() function to filter out bogus values. I suspect these result from underflow in the difference or division.
I don't know if the compilers rearrange the expression.
Anyway, the conclusion from this test is that the above rearrangements of the expression have no effect on the computation precision. The error remains the same (on average).
Check if the distance between x0 and x1 is small, i.e. fabs(x1 - x0) < eps. Then the line is parallell to the y axis of the coordinate system, i.e. you can't calculuate the y values of that line depending on x. You have infinite many y values and therefore you have to treat this case differently.
If your source data is already a float then you already have fundamental inaccuracy.
To explain further, imagine if you were doing this graphically. You have a 2D sheet of graph paper, and 2 point marked.
Case 1: Those points are very accurate, and have been marked with a very sharp pencil. Its easy to draw the line joining them, and easy to then get y given x (or vice versa).
Case 2: These point have been marked with a big fat felt tip pen, like a bingo marker. Clearly the line you draw will be less accurate. Do you go through the centre of the spots? The top edge? The bottom edge? Top of one, bottom of the other? Clearly there are many different options. If the two dots are close to each other then the variation will be even greater.
Floats have a certain level of inaccuracy inherent in them, due to the way they represent numbers, ergo they correspond more to case 2 than case 1 (which one could suggest is the equivalent of using an arbitrary precision librray). No algorithm in the world can compensate for that. Imprecise data in, Imprecise data out
How about computing something like:
t = sign * power2 ( sqrt (abs(x - x0))/ sqrt (abs(x1 - x0)))
The idea is to use a mathematical equivalent formula in which low (x1-x0) has less effect.
(not sure if the one I wrote matches this criteria)
As MSalters said, the problem is already in the original data.
Interpolation / extrapolation requires the slope, which already has low accuracy in the given conditions (worst for very short line segments far away from the origin).
Choice of algorithm canot regain this accuracy loss. My gut feeling is that the different evaluation order will not change things, as the error is introduced by the subtractions, not the devision.
Idea:
If you have more accurate data when the lines are generated, you can change the representation from ((x0, y0), (x1, y1)) to (x0,y0, angle, length). You could store angle or slope, slope has a pole, but angle requires trig functions... ugly.
Of course that won't work if you need the end point frequently, and you have so many lines that you can't store additional data, I have no idea. But maybe there is another representation that works well for your needs.
doubles have enough resolution in most situations, but that would double the working set too.