What is the optimum epsilon/dx value to use within the finite difference method? - c++

double MyClass::dx = ?????;
double MyClass::f(double x)
{
return 3.0*x*x*x - 2.0*x*x + x - 5.0;
}
double MyClass::fp(double x) // derivative of f(x), that is f'(x)
{
return (f(x + dx) - f(x)) / dx;
}
When using finite difference method for derivation, it is critical to choose an optimum dx value. Mathematically, dx must be as small as possible. However, I'm not sure if it is a correct choice to choose it the smallest positive double precision number (i.e.; 2.2250738585072014 x 10−308).
Is there an optimal numeric interval or exact value to choose a dx in to make the calculation error as small as possible?
(I'm using 64-bit compiler. I will run my program on a Intel i5 processor.)

Choosing the smallest possible value is almost certainly wrong: if dx were that smallest number, then f(x + dx) would be exactly equal to f(x) due to rounding.
So you have a tradeoff: Choose dx too small, and you lose precision to rounding errors. Choose it too large, and your result will be imprecise due to changes in the derivative as x changes.
To judge the numeric errors, consider (f(x + dx) - f(x))/f(x)1 mathematically. The numerator denotes the difference you want to compute, but the denominator denotes the magnitude of numbers you're dealing with. If that fraction is about 2‒k, then you can expect approximately k bits of precision in your result.
If you know your function, you can compute what error you'd get from choosing dx too large. You can then balence things, so that the error incurred from this is about the same as the error incurred from rounding. But if you know the function, you might be better off by providing a function that directly computes the derivative, like in your example with the polygonal f.
The Wikipedia section that pogorskiy pointed out suggests a value of sqrt(ε)x, or approximately 1.5e-8 * x. Without any more detailed knowledge about the function, such a rule of thumb will provide a reasonable default. Also note that that same section suggests not dividing by dx, but instead by (x + dx) - x, as this takes rounding errors incurred by computing x + dx into account. But I guess that whole article is full of suggestions you might use.
1 This formula really should divide by f(x), not by dx, even though a past editor thought differently. I'm attempting to compare the amount of significant bits remaining after the division, not the slope of the tangent.

Why not just use the Power Rule to derive the derivative, you'll get an exact answer:
f(x) = 3x^3 - 2x^2 + x - 5
f'(x) = 9x^2 - 4x + 1
Therefore:
f(x) = 3.0 * x * x * x - 2.0 * x * x + x - 5.0
fp(x) = 9.0 * x * x - 4.0 * x + 1.0

Related

How to increase accuracy of floating point second derivative calculation?

I've written a simple program to calculate the first and second derivative of a function, using function pointers. My program computes the correct answers (more or less), but for some functions, the accuracy is less than I would like.
This is the function I am differentiating:
float f1(float x) {
return (x * x);
}
These are the derivative functions, using the central finite difference method:
// Function for calculating the first derivative.
float first_dx(float (*fx)(float), float x) {
float h = 0.001;
float dfdx;
dfdx = (fx(x + h) - fx(x - h)) / (2 * h);
return dfdx;
}
// Function for calculating the second derivative.
float second_dx(float (*fx)(float), float x) {
float h = 0.001;
float d2fdx2;
d2fdx2 = (fx(x - h) - 2 * fx(x) + fx(x + h)) / (h * h);
return d2fdx2;
}
Main function:
int main() {
pc.baud(9600);
float x = 2.0;
pc.printf("**** Function Pointers ****\r\n");
pc.printf("Value of f(%f): %f\r\n", x, f1(x));
pc.printf("First derivative: %f\r\n", first_dx(f1, x));
pc.printf("Second derivative: %f\r\n\r\n", second_dx(f1, x));
}
This is the output from the program:
**** Function Pointers ****
Value of f(2.000000): 4.000000
First derivative: 3.999948
Second derivative: 1.430511
I'm happy with the accuracy of the first derivative, but I believe the second derivative is too far off (it should be equal to ~2.0).
I have a basic understanding of how floating point numbers are represented and why they are sometimes inaccurate, but how can I make this second derivative result more accurate? Could I be using something better than the central finite difference method, or is there a way I can get better results with the current method?
The accuracy can be increased by choosing a type which has more precision. float is currently defined as an IEEE-754 32-bit number, giving you a precision of ~7.225 decimal places.
What you want is the 64-bit counterpart: double with ~15.955 decimal places accuracy.
That should be sufficient for your calculation, however worth mentioning is boosts implementation which offers a quadruple-precision floating point number (128-bit).
Finally The GNU Multiple Precision Arithmetic Library offers types with an arbitrary number of decimal places for precision.
Go analytical. ;-) probably not an option given "with the current
method".
Use double instead of float.
Vary the epsilon (h), and combine the results in some way. For example you could try 0.00001, 0.000001, 0.0000001 and average them. In fact, you'd want the result with the smallest h that doesn't overflow/underflow. But it's not clear how to detect overflow and underflow.

How to approximate Euclidean distance on the integer plane, without overflow?

I'm working on a platform that has only integer arithmetic. The application uses geographic information, and I'm representing points by (x, y) coordinates where x and y are distances measured in meters.
As an approximation, I want to compute the Euclidean distance between two points. But to do this I have to square distances, and with 32-bit integers, the largest distance I can represent is 32 kilometers. Not good.
My needs are more on the order of 1000 kilometers. But I'd like to be able to resolve distances on a scale smaller than 30 meters.
Hence my question: how can I compute Euclidean distance, using only integer arithmetic, without overflow, on distances whose squares don't fit in a single word?
ETA: I would like to be able to compute distances, but I might settle for being able to compare them.
Perhaps comparing the octagonal distance approximation would be sufficient?
Slightly more up to date is this article on fast approximate distance functions.
I would recommend to use fixed point calculation using integers and then the distance approximation is already not too complicated.
fixed point calculation
distance approximation
Fast Approximate Distance Functions by Rafael Baptista
First step is to choose some fixed point representation for our needs:
For example in case we need a number range for 1000km with 1m resolution we can use 20bits that would be 2^20 = 1,048,576. So we have around 10bits for fractions.
Then we need to implement the approximation we choose:
For example in case we select the following approximation:
h ≈ b (1 + 0.337 (a/b)) = b + 0.337 a AND assuming 0 ≤ a ≤ b
We will implement as follows:
int32_t dx = (x1 > x2 ? x1 - x2 : x2 - x1);
int32_t dy = (y1 > y2 ? y1 - y2 : y2 - y1);
int32_t a = dx > dy ? dy : dx;
int32_t b = dx > dy ? dx : dy;
int32_t h = b + (345 * a >> 10); /* 345.088 = 0.337 * 2^10 */
About overflow:
Adding two <+20.0> positive numbers will result a maximum of <+21.0> number. That is Ok.
The multiplication is also safe while we use numbers in a range of -1..1. In this case the result will also remain in the same range. In our case <+20.0> * <+0.10> will result <+20.10> numbers. That we convert back to <+20.0>.
There is one step here we need to pay attention. During the multiplication we will get temporary a number with <+20.10> that is already near to our 32bits limit.
Exact calculation
We can also calculate the exact distance using the following consideration:
h = b sqrt(1 + (a/b)^2) AND assuming 0 < b ≤ a
In tis case we also need to calculate the square root:
square root
In case the a/b still significantly larger than one or too large to calculate the square of it, we can simplify the calculation to:
h = a
See the implementation here
I would leave the square root out of play, so that I can approximate the Euclidean distance. However, when comparing distances, this approach gives you 100% accuracy, since the comparison would be the same if you squared the distances.
I am pretty sure about that, since I had use that approach when searching for nearest neighbours in high dimensional spaces. You can check my code and the theory in kd-GeRaF.

sin and cos are slow, is there an alternatve?

My game needs to move by a certain angle. To do this I get the vector of the angle via sin and cos. Unfortunately sin and cos are my bottleneck. I'm sure I do not need this much precision. Is there an alternative to a C sin & cos and look-up table that is decently precise but very fast?
I had found this:
float Skeleton::fastSin( float x )
{
const float B = 4.0f/pi;
const float C = -4.0f/(pi*pi);
float y = B * x + C * x * abs(x);
const float P = 0.225f;
return P * (y * abs(y) - y) + y;
}
Unfortunately, this does not seem to work. I get significantly different behavior when I use this sin rather than C sin.
Thanks
A lookup table is the standard solution. You could Also use two lookup tables on for degrees and one for tenths of degrees and utilize sin(A + B) = sin(a)cos(b) + cos(A)sin(b)
For your fastSin(), you should check its documentation to see what range it's valid on. The units you're using for your game could be too big or too small and scaling them to fit within that function's expected range could make it work better.
EDIT:
Someone else mentioned getting it into the desired range by subtracting PI, but apparently there's a function called fmod for doing modulus division on floats/doubles, so this should do it:
#include <iostream>
#include <cmath>
float fastSin( float x ){
x = fmod(x + M_PI, M_PI * 2) - M_PI; // restrict x so that -M_PI < x < M_PI
const float B = 4.0f/M_PI;
const float C = -4.0f/(M_PI*M_PI);
float y = B * x + C * x * std::abs(x);
const float P = 0.225f;
return P * (y * std::abs(y) - y) + y;
}
int main() {
std::cout << fastSin(100.0) << '\n' << std::sin(100.0) << std::endl;
}
I have no idea how expensive fmod is though, so I'm going to try a quick benchmark next.
Benchmark Results
I compiled this with -O2 and ran the result with the Unix time program:
int main() {
float a = 0;
for(int i = 0; i < REPETITIONS; i++) {
a += sin(i); // or fastSin(i);
}
std::cout << a << std::endl;
}
The result is that sin is about 1.8x slower (if fastSin takes 5 seconds, sin takes 9). The accuracy also seemed to be pretty good.
If you chose to go this route, make sure to compile with optimization on (-O2 in gcc).
I know this is already an old topic, but for people who have the same question, here is a tip.
A lot of times in 2D and 3D rotation, all vectors are rotated with a fixed angle. In stead of calling the cos() or sin() every cycle of the loop, create variable before the loop which contains the value of cos(angle) or sin(angle) already. You can use this variable in your loop. This way the function only has to be called once.
If you rephrase the return in fastSin as
return (1-P) * y + P * (y * abs(y))
And rewrite y as (for x>0 )
y = 4 * x * (pi-x) / (pi * pi)
you can see that y is a parabolic first-order approximation to sin(x) chosen so that it passes through (0,0), (pi/2,1) and (pi,0), and is symmetrical about x=pi/2.
Thus we can only expect our function to be a good approximation from 0 to pi. If we want values outside that range we can use the 2-pi periodicity of sin(x) and that sin(x+pi) = -sin(x).
The y*abs(y) is a "correction term" which also passes through those three points. (I'm not sure why y*abs(y) is used rather than just y*y since y is positive in the 0-pi range).
This form of overall approximation function guarantees that a linear blend of the two functions y and y*y, (1-P)*y + P * y*y will also pass through (0,0), (pi/2,1) and (pi,0).
We might expect y to be a decent approximation to sin(x), but the hope is that by picking a good value for P we get a better approximation.
One question is "How was P chosen?". Personally, I'd chose the P that produced the least RMS error over the 0,pi/2 interval. (I'm not sure that's how this P was chosen though)
Minimizing this wrt. P gives
This can be rearranged and solved for p
Wolfram alpha evaluates the initial integral to be the quadratic
E = (16 π^5 p^2 - (96 π^5 + 100800 π^2 - 967680)p + 651 π^5 - 20160 π^2)/(1260 π^4)
which has a minimum of
min(E) = -11612160/π^9 + 2419200/π^7 - 126000/π^5 - 2304/π^4 + 224/π^2 + (169 π)/420
≈ 5.582129689596371e-07
at
p = 3 + 30240/π^5 - 3150/π^3
≈ 0.2248391013559825
Which is pretty close to the specified P=0.225.
You can raise the accuracy of the approximation by adding an additional correction term. giving a form something like return (1-a-b)*y + a y * abs(y) + b y * y * abs(y). I would find a and b by in the same way as above, this time giving a system of two linear equations in a and b to solve, rather than a single equation in p. I'm not going to do the derivation as it is tedious and the conversion to latex images is painful... ;)
NOTE: When answering another question I thought of another valid choice for P.
The problem is that using reflection to extend the curve into (-pi,0) leaves a kink in the curve at x=0. However, I suspect we can choose P such that the kink becomes smooth.
To do this take the left and right derivatives at x=0 and ensure they are equal. This gives an equation for P.
You can compute a table S of 256 values, from sin(0) to sin(2 * pi). Then, to pick sin(x), bring back x in [0, 2 * pi], you can pick 2 values S[a], S[b] from the table, such as a < x < b. From this, linear interpolation, and you should have a fair approximation
memory saving trick : you actually need to store only from [0, pi / 2], and use symmetries of sin(x)
enhancement trick : linear interpolation can be a problem because of non-smooth derivatives, humans eyes is good at spotting such glitches in animation and graphics. Use cubic interpolation then.
What about
x*(0.0174532925199433-8.650935142277599*10^-7*x^2)
for deg and
x*(1-0.162716259904269*x^2)
for rad on -45, 45 and -pi/4 , pi/4 respectively?
This (i.e. the fastsin function) is approximating the sine function using a parabola. I suspect it's only good for values between -π and +π. Fortunately, you can keep adding or subtracting 2π until you get into this range. (Edited to specify what is approximating the sine function using a parabola.)
you can use this aproximation.
this solution use a quadratic curve :
http://www.starming.com/index.php?action=plugin&v=wave&ajax=iframe&iframe=fullviewonepost&mid=56&tid=4825

Why does division yield a vastly different result than multiplication by a fraction in floating points

I understand why floating point numbers can't be compared, and know about the mantissa and exponent binary representation, but I'm no expert and today I came across something I don't get:
Namely lets say you have something like:
float denominator, numerator, resultone, resulttwo;
resultone = numerator / denominator;
float buff = 1 / denominator;
resulttwo = numerator * buff;
To my knowledge different flops can yield different results and this is not unusual. But in some edge cases these two results seem to be vastly different. To be more specific in my GLSL code calculating the Beckmann facet slope distribution for the Cook-Torrance lighitng model:
float a = 1 / (facetSlopeRMS * facetSlopeRMS * pow(clampedCosHalfNormal, 4));
float b = clampedCosHalfNormal * clampedCosHalfNormal - 1.0;
float c = facetSlopeRMS * facetSlopeRMS * clampedCosHalfNormal * clampedCosHalfNormal;
facetSlopeDistribution = a * exp(b/c);
yields very very different results to
float a = (facetSlopeRMS * facetSlopeRMS * pow(clampedCosHalfNormal, 4));
facetDlopeDistribution = exp(b/c) / a;
Why does it? The second form of the expression is problematic.
If I say try to add the second form of the expression to a color I get blacks, even though the expression should always evaluate to a positive number. Am I getting an infinity? A NaN? if so why?
I didn't go through your mathematics in detail, but you must be aware that small errors get pumped up easily by all these powers and exponentials. You should try and substitute all variables var with var + e(var) (on paper, yes) and derive an expression for the total error - without simplifying in between steps, because that's where the error comes from!
This is also a very common problem in computational fluid dynamics, where you can observe things like 'numeric diffusion' if your grid isn't properly aligned with the simulated flow.
So get a clear grip on where the biggest errors come from, and rewrite equations where possible to minimize the numeric error.
edit: to clarify, an example
Say you have some variable x and an expression y=exp(x). The error in x is denoted e(x) and is small compared to x (say e(x)/x < 0.0001, but note that this depends on the type you are using). Then you could say that
e(y) = y(x+e(x)) - y(x)
e(y) ~ dy/dx * e(x) (for small e(x))
e(y) = exp(x) * e(x)
So there's a magnification of the absolute error of exp(x), meaning that around x=0 there's really no issue (not a surprise, since at that point the slope of exp(x) equals that of x) , but for big x you will notice this.
The relative error would then be
e(y)/y = e(y)/exp(x) = e(x)
whilst the relative error in x was
e(x)/x
so you added a factor of x to the relative error.

Create sine lookup table in C++

How can I rewrite the following pseudocode in C++?
real array sine_table[-1000..1000]
for x from -1000 to 1000
sine_table[x] := sine(pi * x / 1000)
I need to create a sine_table lookup table.
You can reduce the size of your table to 25% of the original by only storing values for the first quadrant, i.e. for x in [0,pi/2].
To do that your lookup routine just needs to map all values of x to the first quadrant using simple trig identities:
sin(x) = - sin(-x), to map from quadrant IV to I
sin(x) = sin(pi - x), to map from quadrant II to I
To map from quadrant III to I, apply both identities, i.e. sin(x) = - sin (pi + x)
Whether this strategy helps depends on how much memory usage matters in your case. But it seems wasteful to store four times as many values as you need just to avoid a comparison and subtraction or two during lookup.
I second Jeremy's recommendation to measure whether building a table is better than just using std::sin(). Even with the original large table, you'll have to spend cycles during each table lookup to convert the argument to the closest increment of pi/1000, and you'll lose some accuracy in the process.
If you're really trying to trade accuracy for speed, you might try approximating the sin() function using just the first few terms of the Taylor series expansion.
sin(x) = x - x^3/3! + x^5/5! ..., where ^ represents raising to a power and ! represents the factorial.
Of course, for efficiency, you should precompute the factorials and make use of the lower powers of x to compute higher ones, e.g. use x^3 when computing x^5.
One final point, the truncated Taylor series above is more accurate for values closer to zero, so its still worthwhile to map to the first or fourth quadrant before computing the approximate sine.
Addendum:
Yet one more potential improvement based on two observations:
1. You can compute any trig function if you can compute both the sine and cosine in the first octant [0,pi/4]
2. The Taylor series expansion centered at zero is more accurate near zero
So if you decide to use a truncated Taylor series, then you can improve accuracy (or use fewer terms for similar accuracy) by mapping to either the sine or cosine to get the angle in the range [0,pi/4] using identities like sin(x) = cos(pi/2-x) and cos(x) = sin(pi/2-x) in addition to the ones above (for example, if x > pi/4 once you've mapped to the first quadrant.)
Or if you decide to use a table lookup for both the sine and cosine, you could get by with two smaller tables that only covered the range [0,pi/4] at the expense of another possible comparison and subtraction on lookup to map to the smaller range. Then you could either use less memory for the tables, or use the same memory but provide finer granularity and accuracy.
long double sine_table[2001];
for (int index = 0; index < 2001; index++)
{
sine_table[index] = std::sin(PI * (index - 1000) / 1000.0);
}
One more point: calling trigonometric functions is pricey. if you want to prepare the lookup table for sine with constant step - you may save the calculation time, in expense of some potential precision loss.
Consider your minimal step is "a". That is, you need sin(a), sin(2a), sin(3a), ...
Then you may do the following trick: First calculate sin(a) and cos(a). Then for every consecutive step use the following trigonometric equalities:
sin([n+1] * a) = sin(n*a) * cos(a) + cos(n*a) * sin(a)
cos([n+1] * a) = cos(n*a) * cos(a) - sin(n*a) * sin(a)
The drawback of this method is that during this procedure the round-off error is accumulated.
double table[1000] = {0};
for (int i = 1; i <= 1000; i++)
{
sine_table[i-1] = std::sin(PI * i/ 1000.0);
}
double getSineValue(int multipleOfPi){
if(multipleOfPi == 0) return 0.0;
int sign = 1;
if(multipleOfPi < 0){
sign = -1;
}
return signsine_table[signmultipleOfPi - 1];
}
You can reduce the array length to 500, by a trick sin(pi/2 +/- angle) = +/- cos(angle).
So store sin and cos from 0 to pi/4.
I don't remember from top of my head but it increased the speed of my program.
You'll want the std::sin() function from <cmath>.
another approximation from a book or something
streamin ramp;
streamout sine;
float x,rect,k,i,j;
x = ramp -0.5;
rect = x * (1 - x < 0 & 2);
k = (rect + 0.42493299) *(rect -0.5) * (rect - 0.92493302) ;
i = 0.436501 + (rect * (rect + 1.05802));
j = 1.21551 + (rect * (rect - 2.0580201));
sine = i*j*k*60.252201*x;
full discussion here:
http://synthmaker.co.uk/forum/viewtopic.php?f=4&t=6457&st=0&sk=t&sd=a
I presume that you know, that using a division is a lot slower than multiplying by decimal number, /5 is always slower than *0.2
it's just an approximation.
also:
streamin ramp;
streamin x; // 1.5 = Saw 3.142 = Sin 4.5 = SawSin
streamout sine;
float saw,saw2;
saw = (ramp * 2 - 1) * x;
saw2 = saw * saw;
sine = -0.166667 + saw2 * (0.00833333 + saw2 * (-0.000198409 + saw2 * (2.7526e-006+saw2 * -2.39e-008)));
sine = saw * (1+ saw2 * sine);