Oh where has my precision gone with OpenMesh vector arithmetic? - c++

Using doubles I would expect to have about 15 decimal points of precision. I know that many decimal numbers are not exactly representable in floating point notation, so I would get an approximation for 1/3 for example. However, using a double I would expect an approximation that was correct to about 15 decimal points. I would also expect to retain that level of accuracy when doing arithmetic.
However, in the following example, I try to calculate the area of a triangle using Heron's formula and OpenMesh::Vec3d which are backed by OpenMesh::VectorDataT<double,3> and end up with a result that is only accurate to 5 decimal points.
The correct result is area = 8.19922e-8, but I'm getting area=8.1992238711962083e-8. Any ideas where this is coming from?
The suggestion that this might result from the instability in Heron's Formula is a good one, but unfortunately is not the case in this example. I have added code which calculates the stable variation on Heron for those who might be interested. In this example, u.norm()>v.norm()>w.norm().
#include <OpenMesh/Core/Mesh/PolyMesh_ArrayKernelT.hh>
int main()
//triangle vertices
OpenMesh::Vec3d x(0.051051, 0.057411, 0.001355);
OpenMesh::Vec3d y(0.050981, 0.057337, -0.000678);
OpenMesh::Vec3d z(0.050949, 0.057303, 0.0);
//edge vectors
OpenMesh::Vec3d u = x-y;
OpenMesh::Vec3d v = x-z;
OpenMesh::Vec3d w = y-z;
//Heron's Formula
double semiP = (u.norm() + v.norm() + w.norm())/2.0;
double area = sqrt(semiP * (semiP - u.norm()) * (semiP - v.norm()) * (semiP - w.norm()) );
//Heron's Formula for small angles
double areaSmall = sqrt((u.norm() + (v.norm()+w.norm()))*(w.norm()-(u.norm()-v.norm()))*(w.norm()+(u.norm()-v.norm()))*(u.norm()+(v.norm()-w.norm())))/4.0;

Heron's formula is numerically unstable. If you have a very "flat" triangle with small angles, the sum of the two small sides is almost the long side, so one of the terms gets very small. If, for example, a and b are the small sides,
(s - c)
will be very small, because
s = (a + b + c)/2
is nearly equal to c.
The wikipedia article about herons formula mentions a stable alternative:
Arrange the sides such that a > b > c and use
A = 1/4*sqrt((a + (b + c))*(c - (a - b))*(c + (a - b))*(a + (b - c)))

To 75 decimal places, the correct area of your triangle is
If I replace the nine double constants you have with their decimal equivalents, I get
It would appear that you are not getting what you're expecting because you're expecting something unreasonable.

Any calculation involving subtraction will result in a loss of precision, if the values are at all close to each other. How many significant digits do you expect from this subtraction?
- 1.23456789000000
Both operands have 15 digits of precision, but the result only has 5.


c++ half library has lower precision for positive numbers

I know that I am using a capability not built into c++ however, this library seems to be so commonly used that I am surprised to see this error pop up.
For those of you who do not know about the library it can be found here. Essentially, it is supposed to allow the support of 16 bit floating point (lower precision) numbers.
My problem is that the precision of half floats appears to diminish for positive numbers.
In this code, I am generating a bunch of points to be rendered to the screen. {xs1, ys1} represents floating point precision calculation of sigmoid. {xs3, ys3} represents the values cast into floating point precision.
vector<float> xs1, ys1, xs3, ys3;
int res = 200000;
for (int i = 0; i < res; i++)
float prec = float(i) / float(res);
float fx = ((perc - 0.5) * 2.0)*8.0;
half hx = half(fx);
float fy = MFunctions::sigmoid(fx);
half hy = half(fy);
Here are the results (looking at zoomed in portions of the graph this generates with a window width of 2.2 and a window height of 0.02 units):
When looking at the floating precision graph, {xs1, ys1} both of the corners of the sigmoid function are smooth:
However, when looking at the half precision graph {xs3, ys3} the corner in the positive x axis shows a stepping effect while the corner in the negative x axis shows a lower resolution but smooth graph:
I am not sure why this is happening since the only difference between positive and negative numbers should be a sign bit.
Is there something wrong that I am doing or is this a flaw in the half library?
Sigmoid function output values are [0;1], so what you see is normal: in the bottom picture, values are around 1, so precision is much lower than around 0.

How to approximate Euclidean distance on the integer plane, without overflow?

I'm working on a platform that has only integer arithmetic. The application uses geographic information, and I'm representing points by (x, y) coordinates where x and y are distances measured in meters.
As an approximation, I want to compute the Euclidean distance between two points. But to do this I have to square distances, and with 32-bit integers, the largest distance I can represent is 32 kilometers. Not good.
My needs are more on the order of 1000 kilometers. But I'd like to be able to resolve distances on a scale smaller than 30 meters.
Hence my question: how can I compute Euclidean distance, using only integer arithmetic, without overflow, on distances whose squares don't fit in a single word?
ETA: I would like to be able to compute distances, but I might settle for being able to compare them.
Perhaps comparing the octagonal distance approximation would be sufficient?
Slightly more up to date is this article on fast approximate distance functions.
I would recommend to use fixed point calculation using integers and then the distance approximation is already not too complicated.
fixed point calculation
distance approximation
Fast Approximate Distance Functions by Rafael Baptista
First step is to choose some fixed point representation for our needs:
For example in case we need a number range for 1000km with 1m resolution we can use 20bits that would be 2^20 = 1,048,576. So we have around 10bits for fractions.
Then we need to implement the approximation we choose:
For example in case we select the following approximation:
h ≈ b (1 + 0.337 (a/b)) = b + 0.337 a AND assuming 0 ≤ a ≤ b
We will implement as follows:
int32_t dx = (x1 > x2 ? x1 - x2 : x2 - x1);
int32_t dy = (y1 > y2 ? y1 - y2 : y2 - y1);
int32_t a = dx > dy ? dy : dx;
int32_t b = dx > dy ? dx : dy;
int32_t h = b + (345 * a >> 10); /* 345.088 = 0.337 * 2^10 */
About overflow:
Adding two <+20.0> positive numbers will result a maximum of <+21.0> number. That is Ok.
The multiplication is also safe while we use numbers in a range of -1..1. In this case the result will also remain in the same range. In our case <+20.0> * <+0.10> will result <+20.10> numbers. That we convert back to <+20.0>.
There is one step here we need to pay attention. During the multiplication we will get temporary a number with <+20.10> that is already near to our 32bits limit.
Exact calculation
We can also calculate the exact distance using the following consideration:
h = b sqrt(1 + (a/b)^2) AND assuming 0 < b ≤ a
In tis case we also need to calculate the square root:
square root
In case the a/b still significantly larger than one or too large to calculate the square of it, we can simplify the calculation to:
h = a
See the implementation here
I would leave the square root out of play, so that I can approximate the Euclidean distance. However, when comparing distances, this approach gives you 100% accuracy, since the comparison would be the same if you squared the distances.
I am pretty sure about that, since I had use that approach when searching for nearest neighbours in high dimensional spaces. You can check my code and the theory in kd-GeRaF.

The result of own double precision cos() implemention in a shader is NaN, but works well on the CPU. What is going wrong?

as i said, i want implement my own double precision cos() function in a compute shader with GLSL, because there is just a built-in version for float.
This is my code:
double faculty[41];//values are calculated at the beginning of main()
double myCOS(double x)
double sum,tempExp,sign;
sum = 1.0;
tempExp = 1.0;
sign = -1.0;
for(int i = 1; i <= 30; i++)
tempExp *= x;
if(i % 2 == 0){
sum = sum + (sign * (tempExp / faculty[i]));
sign *= -1.0;
return sum;
The result of this code is, that the sum turns out to be NaN on the shader, but on the CPU the algorithm is working well.
I tried to debug this code too and I got the following information:
faculty[i] is positive and not zero for all entries
tempExp is positive in each step
none of the other variables are NaN during each step
the first time sum is NaN is at the step with i=4
and now my question: What exactly can go wrong if each variable is a number and nothing is divided by zero especially when the algorithm works on the CPU?
Let me guess:
First you determined the problem is in the loop, and you use only the following operations: +, *, /.
The rules for generating NaN from these operations are:
The divisions 0/0 and ±∞/±∞
The multiplications 0×±∞ and ±∞×0
The additions ∞ + (−∞), (−∞) + ∞ and equivalent subtractions
You ruled out the possibility for 0/0 and ±∞/±∞ by stating that faculty[] is correctly initialized.
The variable sign is always 1.0 or -1.0 so it cannot generate the NaN through the * operation.
What remains is the + operation if tempExp ever become ±∞.
So probably tempExp is too high on entry of your function and becomes ±∞, this will make sum to be ±∞ too. At the next iteration you will trigger the NaN generating operation through: ∞ + (−∞). This is because you multiply one side of the addition by sign and sign switches between positive and negative at each iteration.
You're trying to approximate cos(x) around 0.0. So you should use the properties of the cos() function to reduce your input value to a value near 0.0. Ideally in the range [0, pi/4]. For instance, remove multiples of 2*pi, and get the values of cos() in [pi/4, pi/2] by computing sin(x) around 0.0 and so on.
What can go dramatically wrong is a loss of precision. cos(x) usually is implemented by range reduction followed by a dedicated implementation for the range [0, pi/2]. Range reduction uses cos(x+2*pi) = cos(x). But this range reduction isn't perfect. For starters, pi cannot be exactly represented in finite math.
Now what happens if you try something as absurd as cos(1<<30) ? It's quite possible that the range reduction algorithm introduces an error in x that's larger than 2*pi, in which case the outcome is meaningless. Returning NaN in such cases is reasonable.

Calculate the 3d distance between point and plane, C++

I'm using:
D = |ax + by + cz + d| / |n| where n is the normal to plane; a, b, c, d are the coefficients of the equation of the plane; x, y, z are the coordinates of the point from the plane.
To calculate the distance from a 3d point to a 3d plane. The issue that I'm having is that the distances in question are extremely small and this is causing the result( a double ) to be represented in scientific notation, which is not handled correctly in if statements. For example:
if( dist == 0 )
//Execute this
If dist is any scientific number the code inside the if statement is executed, even though dist is not 0. My question is, is there anyway the scientific number can be converted back into fixed notation to make it usable in if statements similar to these??
Im using VisualStudio 2010, C++.
Normally you would use some tolerance value to compare floating-point numbers:
#define EPSILON (1e-6)
// dist == 0.0?
if (dist < EPSILON) {
// ...
Or to compare to any other floating point v:
// dist == v?
if (fabs(dist - v) < EPSILON) {
// ...
Sure, you have to choose EPSILON according to your problem.
dist is not represented in scientific notation (unless you are storing it as a string) that's just how it is printed. As another minor point, it's usually a good idea to compare to a value or the same type. 0 is an integer, 0.0 is a double.
From what I can see from some quick tests, in order for you to be seeing dist == 0 as true, it would actually have to be zero. That means you have all the numbers down to DBL_MIN, which is 2.2250738585072014e-308 for a 64 bit IEEE754 fpu. More likely your maths is wrong, and it is actually zero. Check your numerator before you do the division.
What on earth is physically that size? Well if you are specifying the diameter of an electron in units of "the diameter of the universe", then that's only 3.2×10^-42. I'm not sure there is an easy way to visualize just how small doubles can be. I tried 1 / number of atoms in the universe and it still wasn't small enough.

Why does division yield a vastly different result than multiplication by a fraction in floating points

I understand why floating point numbers can't be compared, and know about the mantissa and exponent binary representation, but I'm no expert and today I came across something I don't get:
Namely lets say you have something like:
float denominator, numerator, resultone, resulttwo;
resultone = numerator / denominator;
float buff = 1 / denominator;
resulttwo = numerator * buff;
To my knowledge different flops can yield different results and this is not unusual. But in some edge cases these two results seem to be vastly different. To be more specific in my GLSL code calculating the Beckmann facet slope distribution for the Cook-Torrance lighitng model:
float a = 1 / (facetSlopeRMS * facetSlopeRMS * pow(clampedCosHalfNormal, 4));
float b = clampedCosHalfNormal * clampedCosHalfNormal - 1.0;
float c = facetSlopeRMS * facetSlopeRMS * clampedCosHalfNormal * clampedCosHalfNormal;
facetSlopeDistribution = a * exp(b/c);
yields very very different results to
float a = (facetSlopeRMS * facetSlopeRMS * pow(clampedCosHalfNormal, 4));
facetDlopeDistribution = exp(b/c) / a;
Why does it? The second form of the expression is problematic.
If I say try to add the second form of the expression to a color I get blacks, even though the expression should always evaluate to a positive number. Am I getting an infinity? A NaN? if so why?
I didn't go through your mathematics in detail, but you must be aware that small errors get pumped up easily by all these powers and exponentials. You should try and substitute all variables var with var + e(var) (on paper, yes) and derive an expression for the total error - without simplifying in between steps, because that's where the error comes from!
This is also a very common problem in computational fluid dynamics, where you can observe things like 'numeric diffusion' if your grid isn't properly aligned with the simulated flow.
So get a clear grip on where the biggest errors come from, and rewrite equations where possible to minimize the numeric error.
edit: to clarify, an example
Say you have some variable x and an expression y=exp(x). The error in x is denoted e(x) and is small compared to x (say e(x)/x < 0.0001, but note that this depends on the type you are using). Then you could say that
e(y) = y(x+e(x)) - y(x)
e(y) ~ dy/dx * e(x) (for small e(x))
e(y) = exp(x) * e(x)
So there's a magnification of the absolute error of exp(x), meaning that around x=0 there's really no issue (not a surprise, since at that point the slope of exp(x) equals that of x) , but for big x you will notice this.
The relative error would then be
e(y)/y = e(y)/exp(x) = e(x)
whilst the relative error in x was
so you added a factor of x to the relative error.