How to approximate Euclidean distance on the integer plane, without overflow?

How to approximate Euclidean distance on the integer plane, without overflow? - euclidean-distance

I'm working on a platform that has only integer arithmetic. The application uses geographic information, and I'm representing points by (x, y) coordinates where x and y are distances measured in meters.
As an approximation, I want to compute the Euclidean distance between two points. But to do this I have to square distances, and with 32-bit integers, the largest distance I can represent is 32 kilometers. Not good.
My needs are more on the order of 1000 kilometers. But I'd like to be able to resolve distances on a scale smaller than 30 meters.
Hence my question: how can I compute Euclidean distance, using only integer arithmetic, without overflow, on distances whose squares don't fit in a single word?
ETA: I would like to be able to compute distances, but I might settle for being able to compare them.

Perhaps comparing the octagonal distance approximation would be sufficient?
Slightly more up to date is this article on fast approximate distance functions.

I would recommend to use fixed point calculation using integers and then the distance approximation is already not too complicated.
fixed point calculation
distance approximation
Fast Approximate Distance Functions by Rafael Baptista
First step is to choose some fixed point representation for our needs:
For example in case we need a number range for 1000km with 1m resolution we can use 20bits that would be 2^20 = 1,048,576. So we have around 10bits for fractions.
Then we need to implement the approximation we choose:
For example in case we select the following approximation:
h ≈ b (1 + 0.337 (a/b)) = b + 0.337 a AND assuming 0 ≤ a ≤ b
We will implement as follows:
int32_t dx = (x1 > x2 ? x1 - x2 : x2 - x1);
int32_t dy = (y1 > y2 ? y1 - y2 : y2 - y1);
int32_t a = dx > dy ? dy : dx;
int32_t b = dx > dy ? dx : dy;
int32_t h = b + (345 * a >> 10); /* 345.088 = 0.337 * 2^10 */
About overflow:
Adding two <+20.0> positive numbers will result a maximum of <+21.0> number. That is Ok.
The multiplication is also safe while we use numbers in a range of -1..1. In this case the result will also remain in the same range. In our case <+20.0> * <+0.10> will result <+20.10> numbers. That we convert back to <+20.0>.
There is one step here we need to pay attention. During the multiplication we will get temporary a number with <+20.10> that is already near to our 32bits limit.
Exact calculation
We can also calculate the exact distance using the following consideration:
h = b sqrt(1 + (a/b)^2) AND assuming 0 < b ≤ a
In tis case we also need to calculate the square root:
square root
In case the a/b still significantly larger than one or too large to calculate the square of it, we can simplify the calculation to:
h = a
See the implementation here

I would leave the square root out of play, so that I can approximate the Euclidean distance. However, when comparing distances, this approach gives you 100% accuracy, since the comparison would be the same if you squared the distances.
I am pretty sure about that, since I had use that approach when searching for nearest neighbours in high dimensional spaces. You can check my code and the theory in kd-GeRaF.

Related

Value type preventing calculation?

I am writing a program for class that simply calculates distance between two coordinate points (x,y).
differenceofx1 = x1 - x2;
differenceofy1 = y1 - y2;
squareofx1 = differenceofx1 * differenceofx1;
squareofy1 = differenceofy1 * differenceofy1;
distance1 = sqrt(squareofx1 - squareofy1);
When I calculate the distance, it works. However there are some situations such as the result being a square root of a non-square number, or the difference of x1 and x2 / y1 and y2 being negative due to the input order, that it just gives a distance of 0.00000 when the distance is clearly more than 0. I am using double for all the variables, should I use float instead for the negative possibility or does double do the same job? I set the precision to 8 as well but I don't understand why it wouldn't calculate properly?
I am sorry for the simplicity of the question, I am a bit more than a beginner.

You are using the distance formula wrong
it should be
distance1 = sqrt(squareofx1 + squareofy1);
instead of
distance1 = sqrt(squareofx1 - squareofy1);
due to the wrong formula if squareofx1 is less than squareofy1 you get an error as sqrt of a negative number is not possible in case of real coordinates.

Firstly, your formula is incorrect change it to distance1 = sqrt(squareofx1 + squareofy1) as #fefe mentioned. Btw All your calculation can be represented in one line of code:
distance1 = sqrt((x1-x2)*(x1-x2) + (y1-y2)*(y1-y2));
No need for variables like differenceofx1, differenceofy1, squareofx1, squareofy1 unless you are using the results stored in these variables again in your program.
Secondly, Double give you more precision than float. If you need precision more than 6-7 places after decimal use Double else float works too. Read more about Float vs Double

Oh where has my precision gone with OpenMesh vector arithmetic?

Using doubles I would expect to have about 15 decimal points of precision. I know that many decimal numbers are not exactly representable in floating point notation, so I would get an approximation for 1/3 for example. However, using a double I would expect an approximation that was correct to about 15 decimal points. I would also expect to retain that level of accuracy when doing arithmetic.
However, in the following example, I try to calculate the area of a triangle using Heron's formula and OpenMesh::Vec3d which are backed by OpenMesh::VectorDataT<double,3> and end up with a result that is only accurate to 5 decimal points.
The correct result is area = 8.19922e-8, but I'm getting area=8.1992238711962083e-8. Any ideas where this is coming from?
The suggestion that this might result from the instability in Heron's Formula is a good one, but unfortunately is not the case in this example. I have added code which calculates the stable variation on Heron for those who might be interested. In this example, u.norm()>v.norm()>w.norm().
#include <OpenMesh/Core/Mesh/PolyMesh_ArrayKernelT.hh>
int main()
{
//triangle vertices
OpenMesh::Vec3d x(0.051051, 0.057411, 0.001355);
OpenMesh::Vec3d y(0.050981, 0.057337, -0.000678);
OpenMesh::Vec3d z(0.050949, 0.057303, 0.0);
//edge vectors
OpenMesh::Vec3d u = x-y;
OpenMesh::Vec3d v = x-z;
OpenMesh::Vec3d w = y-z;
//Heron's Formula
double semiP = (u.norm() + v.norm() + w.norm())/2.0;
double area = sqrt(semiP * (semiP - u.norm()) * (semiP - v.norm()) * (semiP - w.norm()) );
//Heron's Formula for small angles
double areaSmall = sqrt((u.norm() + (v.norm()+w.norm()))*(w.norm()-(u.norm()-v.norm()))*(w.norm()+(u.norm()-v.norm()))*(u.norm()+(v.norm()-w.norm())))/4.0;
}

Heron's formula is numerically unstable. If you have a very "flat" triangle with small angles, the sum of the two small sides is almost the long side, so one of the terms gets very small. If, for example, a and b are the small sides,
(s - c)
will be very small, because
s = (a + b + c)/2
is nearly equal to c.
The wikipedia article about herons formula mentions a stable alternative:
Arrange the sides such that a > b > c and use
A = 1/4*sqrt((a + (b + c))*(c - (a - b))*(c + (a - b))*(a + (b - c)))

To 75 decimal places, the correct area of your triangle is
0.000000081992238711963087279421583920293974467992093148008322378721298327364.
If I replace the nine double constants you have with their decimal equivalents, I get
0.000000081992238711965902754749500279615357792172906541206211853522524016959
It would appear that you are not getting what you're expecting because you're expecting something unreasonable.

Any calculation involving subtraction will result in a loss of precision, if the values are at all close to each other. How many significant digits do you expect from this subtraction?
1.23456789012345
- 1.23456789000000
----------------
0.00000000012345
Both operands have 15 digits of precision, but the result only has 5.

Calculate the 3d distance between point and plane, C++

I'm using:
D = |ax + by + cz + d| / |n| where n is the normal to plane; a, b, c, d are the coefficients of the equation of the plane; x, y, z are the coordinates of the point from the plane.
To calculate the distance from a 3d point to a 3d plane. The issue that I'm having is that the distances in question are extremely small and this is causing the result( a double ) to be represented in scientific notation, which is not handled correctly in if statements. For example:
if( dist == 0 )
{
//Execute this
}
If dist is any scientific number the code inside the if statement is executed, even though dist is not 0. My question is, is there anyway the scientific number can be converted back into fixed notation to make it usable in if statements similar to these??
Im using VisualStudio 2010, C++.

Normally you would use some tolerance value to compare floating-point numbers:
#define EPSILON (1e-6)
// dist == 0.0?
if (dist < EPSILON) {
// ...
}
Or to compare to any other floating point v:
// dist == v?
if (fabs(dist - v) < EPSILON) {
// ...
}
Sure, you have to choose EPSILON according to your problem.

dist is not represented in scientific notation (unless you are storing it as a string) that's just how it is printed. As another minor point, it's usually a good idea to compare to a value or the same type. 0 is an integer, 0.0 is a double.
From what I can see from some quick tests, in order for you to be seeing dist == 0 as true, it would actually have to be zero. That means you have all the numbers down to DBL_MIN, which is 2.2250738585072014e-308 for a 64 bit IEEE754 fpu. More likely your maths is wrong, and it is actually zero. Check your numerator before you do the division.
What on earth is physically that size? Well if you are specifying the diameter of an electron in units of "the diameter of the universe", then that's only 3.2×10^-42. I'm not sure there is an easy way to visualize just how small doubles can be. I tried 1 / number of atoms in the universe and it still wasn't small enough.

Very fast 3D distance check?

Is there a way to do a quick and dirty 3D distance check where the results are rough, but it is very very fast? I need to do depth sorting. I use STL sort like this:
bool sortfunc(CBox* a, CBox* b)
{
return a->Get3dDistance(Player.center,a->center) <
b->Get3dDistance(Player.center,b->center);
}
float CBox::Get3dDistance( Vec3 c1, Vec3 c2 )
{
//(Dx*Dx+Dy*Dy+Dz*Dz)^.5
float dx = c2.x - c1.x;
float dy = c2.y - c1.y;
float dz = c2.z - c1.z;
return sqrt((float)(dx * dx + dy * dy + dz * dz));
}
Is there possibly a way to do it without a square root or possibly without multiplication?

You can leave out the square root because for all positive (or really, non-negative) numbers x and y, if sqrt(x) < sqrt(y) then x < y. Since you're summing squares of real numbers, the square of every real number is non-negative, and the sum of any positive numbers is positive, the square root condition holds.
You cannot eliminate the multiplication, however, without changing the algorithm. Here's a counterexample: if x is (3, 1, 1) and y is (4, 0, 0), |x| < |y| because sqrt(1*1+1*1+3*3) < sqrt(4*4+0*0+0*0) and 1*1+1*1+3*3 < 4*4+0*0+0*0, but 1+1+3 > 4+0+0.
Since modern CPUs can compute a dot product faster than they can actually load the operands from memory, it's unlikely that you would have anything to gain by eliminating the multiply anyway (I think the newest CPUs have a special instruction that can compute a dot product every 3 cycles!).
I would not consider changing the algorithm without doing some profiling first. Your choice of algorithm will heavily depend on the size of your dataset (does it fit in cache?), how often you have to run it, and what you do with the results (collision detection? proximity? occlusion?).

What I usually do is first filter by Manhattan distance
float CBox::Within3DManhattanDistance( Vec3 c1, Vec3 c2, float distance )
{
float dx = abs(c2.x - c1.x);
float dy = abs(c2.y - c1.y);
float dz = abs(c2.z - c1.z);
if (dx > distance) return 0; // too far in x direction
if (dy > distance) return 0; // too far in y direction
if (dz > distance) return 0; // too far in z direction
return 1; // we're within the cube
}
Actually you can optimize this further if you know more about your environment. For example, in an environment where there is a ground like a flight simulator or a first person shooter game, the horizontal axis is very much larger than the vertical axis. In such an environment, if two objects are far apart they are very likely separated more by the x and y axis rather than the z axis (in a first person shooter most objects share the same z axis). So if you first compare x and y you can return early from the function and avoid doing extra calculations:
float CBox::Within3DManhattanDistance( Vec3 c1, Vec3 c2, float distance )
{
float dx = abs(c2.x - c1.x);
if (dx > distance) return 0; // too far in x direction
float dy = abs(c2.y - c1.y);
if (dy > distance) return 0; // too far in y direction
// since x and y distance are likely to be larger than
// z distance most of the time we don't need to execute
// the code below:
float dz = abs(c2.z - c1.z);
if (dz > distance) return 0; // too far in z direction
return 1; // we're within the cube
}
Sorry, I didn't realize the function is used for sorting. You can still use Manhattan distance to get a very rough first sort:
float CBox::ManhattanDistance( Vec3 c1, Vec3 c2 )
{
float dx = abs(c2.x - c1.x);
float dy = abs(c2.y - c1.y);
float dz = abs(c2.z - c1.z);
return dx+dy+dz;
}
After the rough first sort you can then take the topmost results, say the top 10 closest players, and re-sort using proper distance calculations.

Here's an equation that might help you get rid of both sqrt and multiply:
max(|dx|, |dy|, |dz|) <= distance(dx,dy,dz) <= |dx| + |dy| + |dz|
This gets you a range estimate for the distance which pins it down to within a factor of 3 (the upper and lower bounds can differ by at most 3x). You can then sort on, say, the lower number. You then need to process the array until you reach an object which is 3x farther away than the first obscuring object. You are then guaranteed to not find any object that is closer later in the array.
By the way, sorting is overkill here. A more efficient way would be to make a series of buckets with different distance estimates, say [1-3], [3-9], [9-27], .... Then put each element in a bucket. Process the buckets from smallest to largest until you reach an obscuring object. Process 1 additional bucket just to be sure.
By the way, floating point multiply is pretty fast nowadays. I'm not sure you gain much by converting it to absolute value.

I'm disappointed that the great old mathematical tricks seem to be getting lost. Here is the answer you're asking for. Source is Paul Hsieh's excellent web site: http://www.azillionmonkeys.com/qed/sqroot.html . Note that you don't care about distance; you will do fine for your sort with square of distance, which will be much faster.
In 2D, we can get a crude approximation of the distance metric without a square root with the formula:
distanceapprox (x, y) =
which will deviate from the true answer by at most about 8%. A similar derivation for 3 dimensions leads to:
distanceapprox (x, y, z) =
with a maximum error of about 16%.
However, something that should be pointed out, is that often the distance is only required for comparison purposes. For example, in the classical mandelbrot set (z←z2+c) calculation, the magnitude of a complex number is typically compared to a boundary radius length of 2. In these cases, one can simply drop the square root, by essentially squaring both sides of the comparison (since distances are always non-negative). That is to say:
√(Δx2+Δy2) < d is equivalent to Δx2+Δy2 < d2, if d ≥ 0
I should also mention that Chapter 13.2 of Richard G. Lyons's "Understanding Digital Signal Processing" has an incredible collection of 2D distance algorithms (a.k.a complex number magnitude approximations). As one example:
Max = x > y ? x : y;
Min = x < y ? x : y;
if ( Min < 0.04142135Max )
|V| = 0.99 * Max + 0.197 * Min;
else
|V| = 0.84 * Max + 0.561 * Min;
which has a maximum error of 1.0% from the actual distance. The penalty of course is that you're doing a couple branches; but even the "most accepted" answer to this question has at least three branches in it.
If you're serious about doing a super fast distance estimate to a specific precision, you could do so by writing your own simplified fsqrt() estimate using the same basic method as the compiler vendors do, but at a lower precision, by doing a fixed number of iterations. For example, you can eliminate the special case handling for extremely small or large numbers, and/or also reduce the number of Newton-Rapheson iterations. This was the key strategy underlying the so-called "Quake 3" fast inverse square root implementation -- it's the classic Newton algorithm with exactly one iteration.
Do not assume that your fsqrt() implementation is slow without benchmarking it and/or reading the sources. Most modern fsqrt() library implementations are branchless and really damned fast. Here for example is an old IBM floating point fsqrt implementation. Premature optimization is, and always will be, the root of all evil.

Note that for 2 (non-negative) distances A and B, if sqrt(A) < sqrt(B), then A < B. Create a specialized version of Get3DDistance() (GetSqrOf3DDistance()) that does not call sqrt() that would be used only for the sortfunc().

If you worry about performance, you should also take care of the way you send your arguments:
float Get3dDistance( Vec3 c1, Vec3 c2 );
implies two copies of Vec3 structure. Use references instead:
float Get3dDistance( Vec3 const & c1, Vec3 const & c2 );

You could compare squares of distances instead of the actual distances, since d2 = (x1-x2)2 + (y1-y2)2+ (z1-z2)2. It doesn't get rid of the multiplication, but it does eliminate the square root operation.

How often are the input vectors updated and how often are they sorted? Depending on your design, it might be quite efficient to extend the "Vec3" class with a pre-calculated distance and sort on that instead. Especially relevant if your implementation allows you to use vectorized operations.
Other than that, see the flipcode.com article on approximating distance functions for a discussion on yet another approach.

Depending slightly on the number of points that you are being used to compare with, what is below is pretty much guaranteed to be the get the list of points in approximate order assuming all points change at all iteration.
1) Rewrite the array into a single list of Manhattan distances with
out[ i ] = abs( posn[ i ].x - player.x ) + abs( posn[ i ].y - player.y ) + abs( posn[ i ].z - player.z );
2) Now you can use radix sort on floating point numbers to order them.
Note that in practice this is going to be a lot faster than sorting the list of 3d positions because it significantly reduces the memory bandwidth requirements in the sort operation which all of the time is going to be spend and in which unpredictable accesses and writes are going to occur. This will run on O(N) time.
If many of the points are stationary at each direction there are far faster algorithms like using KD-Trees, although implementation is quite a bit more complex and it is much harder to get good memory access patterns.

If this is simply a value for sorting, then you can swap the sqrt() for a abs(). If you need to compare distances against set values, get the square of that value.
E.g. instead of checking sqrt(...) against a, you can compare abs(...) against a*a.

You may want to consider caching the distance between the player and the object as you calculate it, and then use that in your sortfunc. This would depend upon how many times your sort function looks at each object, so you might have to profile to be sure.
What I'm getting at is that your sort function might do something like this:
compare(a,b);
compare(a,c);
compare(a,d);
and you would calculate the distance between the player and 'a' every time.
As others have mentioned, you can leave out the sqrt in this case.

If you could center your coordinates around the player, use spherical coordinates? Then you could sort by the radius.
That's a big if, though.

If your operation happens a lot, it might be worth to put it into some 3D data structure. You probably need the distance sorting to decide which object is visible, or some similar task. In order of complexity you can use:
Uniform (cubic) subdivision
Divide the used space into cells, and assign the objects to the cells. Fast access to element, neighbours are trivial, but empty cells take up a lot of space.
Quadtree
Given a threshold, divide used space recursively into four quads until less then threshold number of object is inside. Logarithmic access element if objects don't stack upon each other, neighbours are not hard to find, space efficient solution.
Octree
Same as Quadtree, but divides into 8, optimal even if objects are above each other.
Kd tree
Given some heuristic cost function, and a threshold, split space into two halves with a plane where the cost function is minimal. (Eg.: same amount of objects at each side.) Repeat recursively until threshold reached. Always logarithmic, neighbours are harder to get, space efficient (and works in all dimensions).
Using any of the above data structures, you can start from a position, and go from neighbour to neighbour to list the objects in increasing distance. You can stop at desired cut distance. You can also skip cells that cannot be seen from the camera.
For the distance check, you can do one of the above mentioned routines, but ultimately they wont scale well with increasing number of objects. These can be used to display data that takes hundreds of gigabytes of hard disc space.

Calculate the gradient for an histogram in c++

I calculated the histogram(a simple 1d array) for an 3D grayscale Image.
Now I would like to calculate the gradient for the this histogram at each point. So this would actually mean I have to calculate the gradient for a 1D function at certain points. However I do not have a function. So how can I calculate it with concrete x and y values?
For the sake of simplicity could you probably explain this to me on an example histogram - for example with the following values (x is the intensity, and y the frequency of this intensity):
x1 = 1; y1 = 3
x2 = 2; y2 = 6
x3 = 3; y3 = 8
x4 = 4; y4 = 5
x5 = 5; y5 = 9
x6 = 6; y6 = 12
x7 = 7; y7 = 5
x8 = 8; y8 = 3
x9 = 9; y9 = 5
x10 = 10; y10 = 2
I know that this is also a math problem, but since I need to solve it in c++ I though you could help me here.
Thank you for your advice
Marc

I think you can calculate your gradient using the same approach used in image border detection (which is a gradient calculus). If your histogram is in a vector you can calculate an approximation of the gradient as*:
for each point in the histogram compute
gradient[x] = (hist[x+1] - hist[x])
This is a very simple way to do it, but I'm not sure if is the most accurate.
approximation because you are working with discrete data instead of continuous
Edited:
Other operators will may emphasize small differences (small gradients will became more emphasized). Roberts algorithm derives from the derivative calculus:
lim delta -> 0 = f(x + delta) - f(x) / delta
delta tends infinitely to 0 (in order to avoid 0 division) but is never zero. As in computer's memory this is impossible, the smallest we can get of delta is 1 (because 1 is the smallest distance from to points in an image (or histogram)).
Substituting
lim delta -> 0 to lim delta -> 1
we get
f(x + 1) - f(x) / 1 = f(x + 1) - f(x) => vet[x+1] - vet[x]

Two generally approaches here:
a discrete approximation to the derivative
take the real derivative of a fitted function
In the first case try:
g = (y_(i+1) - y_(i-1))/2*dx
at all the points except the ends, or one of
g_left-end = (y_(i+1) - y_i)/dx
g_right-end = (y_i - y_(i-1))/dx
where dx is the spacing between x points. (Unlike the equally correct definition Andres suggested, this one is symmetric. Whether it matters or not depends on you use case.)
In the second case, fit a spline to your data[*], and ask the spline library the derivative at the point you want.
[*] Use a library! Do not implement this yourself unless this is a learning project. I'd use ROOT because I already have it on my machine, but it is a pretty heavy package just to get a spline...
Finally, if you data is noisy, you ma want to smooth it before doing slope detection. That was you avoid chasing the noise, and only look at large scale slopes.

Take some squared paper and draw on it your histogram. Draw also vertical and horizontal axes through the 0,0 point of your histogram.
Take a straight edge and, at each point you are interested in, rotate the straight edge until it accords with your idea of what the gradient at that point is. It is most important that you do this, your definition of gradient is the one you want.
Once the straight edge is at the angle you desire draw a line at that angle.
Drop perpendiculars from any 2 points on the line you just drew. It will be easier to take the following step if the horizontal distance between the 2 points you choose is about 25% or more of the width of your histogram. From the same 2 points draw horizontal lines to intersect the vertical axis of your histogram.
Your lines now define an x-distance and a y-distance, ie the length of the horizontal/ vertical (respectively) axes marked out by their intersections with the perpendiculars/horizontal lines. The gradient you want is the y-distance divided by the x-distance.
Now, to translate this into code is very straightforward, apart from step 2. You have to define what the criteria are for determining what the gradient at any point on the histogram is. Simple choices include:
a) at each point, set down your straight edge to pass through the point and the next one to its right;
b) at each point, set down your straight edge to pass through the point and the next one to its left;
c) at each point, set down your straight edge to pass through the point to the left and the point to the right.
You may want to investigate more complex choices such as fitting a curve (such as a quadratic or higher-order polynomial) through a number of points on your histogram and using the derivative of that to represent the gradient.
Until you understand the question on paper avoid coding in C++ or anything else. Once you do understand it, coding should be trivial.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js