Variable grouping providing different answers in optimized code - c++

I've been attempting to unit test a C++ class I've written for Geodetic transforms.
I've noticed that a trivial grouping change of three variables greatly influences the error in the function.
EDIT : Here is the entire function for a compilable example:
Assume latitude, longitude and altitude are zero. Earth::a = 6378137 and Earth::b = 6356752.3 I'm working on getting benchmark numbers, something came up at work today and I had to do that instead.
void Geodesy::Geocentric2EFG(double latitude, double longitude, double altitude, double *E, double *F, double *G) {
double a2 = pow<double>(Earth::a, 2);
double b2 = pow<double>(Earth::b, 2);
double radius = sqrt((a2 * b2)/(a2 * pow<double>(sin(latitude), 2) + b2 * pow<double>(cos(longitude), 2)));
radius += altitude;
*E = radius * (cos(latitude) * cos(longitude));
*F = radius * (cos(latitude) * sin(longitude));
*G = radius * sin(latitude);
return;
}
Where all values are defined as double including those in Earth. The pow<T>() function is a recursive template function defined by:
template <typename T>
static inline T pow(const T &base, unsigned const exponent) {
return (exponent == 0) ? 1 : (base * pow(base, exponent - 1));
}
The code in question:
*E = radius * cos(latitude) * cos(longitude);
*F = radius * cos(latitude) * sin(longitude);
produces different results than:
*E = radius * (cos(latitude) * cos(longitude));
*F = radius * (cos(latitude) * sin(longitude));
What is the compiler doing in gcc with optimization level 3 to make these results 1e-2 different?

You have different rounding as floating point cannot represent all numbers:
a * b * c; is (a * b) * c which may differ than a * (b * c).
You may have similar issues with addition too.
example with addition:
10e10f + 1.f == 10e10f
so (1.f + 10e10f) - 10e10f == 10e10f - 10e10f == 0.f
whereas 1.f + (10e10f - 10e10f) == 1.f - 0.f == 1.f.

Related

Efficient floating point scaling in C++

I'm working on my fast (and accurate) sin implementation in C++, and I have a problem regarding the efficient angle scaling into the +- pi/2 range.
My sin function for +-pi/2 using Taylor series is the following
(Note: FLOAT is a macro expanded to float or double just for the benchmark)
/**
* Sin for 'small' angles, accurate on [-pi/2, pi/2], fairly accurate on [-pi, pi]
*/
// To switch between float and double
#define FLOAT float
FLOAT
my_sin_small(FLOAT x)
{
constexpr FLOAT C1 = 1. / (7. * 6. * 5. * 4. * 3. * 2.);
constexpr FLOAT C2 = -1. / (5. * 4. * 3. * 2.);
constexpr FLOAT C3 = 1. / (3. * 2.);
constexpr FLOAT C4 = -1.;
// Correction for sin(pi/2) = 1, due to the ignored taylor terms
constexpr FLOAT corr = -1. / 0.9998431013994987;
const FLOAT x2 = x * x;
return corr * x * (x2 * (x2 * (x2 * C1 + C2) + C3) + C4);
}
So far so good... The problem comes when I try to scale an arbitrary angle into the +-pi/2 range. My current solution is:
FLOAT
my_sin(FLOAT x)
{
constexpr FLOAT pi = 3.141592653589793238462;
constexpr FLOAT rpi = 1 / pi;
// convert to +-pi/2 range
int n = std::nearbyint(x * rpi);
FLOAT xbar = (n * pi - x) * (2 * (n & 1) - 1);
// (2 * (n % 2) - 1) is a sign correction (see below)
return my_sin_small(xbar);
};
I made a benchmark, and I'm losing a lot for the +-pi/2 scaling.
Tricking with int(angle/pi + 0.5) is a nope since it is limited to the int precision, also requires +- branching, and i try to avoid branches...
What should I try to improve the performance for this scaling? I'm out of ideas.
Benchmark results for float. (In the benchmark the angle could be out of the validity range for my_sin_small, but for the bench I don't care about that...):
Benchmark results for double.
Sign correction for xbar in my_sin():
Algo accuracy compared to python sin() function:
Candidate improvements
Convert the radians x to rotations by dividing by 2*pi.
Retain only the fraction so we have an angle (-1.0 ... 1.0). This simplifies the OP's modulo step to a simple "drop the whole number" step instead. Going forward with different angle units simply involves a co-efficient set change. No need to scale back to radians.
For positive values, subtract 0.5 so we have (-0.5 ... 0.5) and then flip the sign. This centers the possible values about 0.0 and makes for better convergence of the approximating polynomial as compared to the math sine function. For negative values - see below.
Call my_sin_small1() that uses this (-0.5 ... 0.5) rotations range rather than [-pi ... +pi] radians.
In my_sin_small1(), fold constants together to drop the corr * step.
Rather than use the truncated Taylor's series, use a more optimal set. IMO, this will provide better answers, especially near +/-pi.
Notes: No int to/from float code. With more analysis, possible to get a better set of coefficients that fix my_sin(+/-pi) closer to 0.0. This is just a quick set of code to demo less FP steps and good potential results.
C like code for OP to port to C++
FLOAT my_sin_small1(FLOAT x) {
static const FLOAT A1 = -5.64744881E+01;
static const FLOAT A2 = +7.81017968E+01;
static const FLOAT A3 = -4.11145353E+01;
static const FLOAT A4 = +6.27923581E+00;
const FLOAT x2 = x * x;
return x * (x2 * (x2 * (x2 * A1 + A2) + A3) + A4);
}
FLOAT my_sin1(FLOAT x) {
static const FLOAT pi = 3.141592653589793238462;
static const FLOAT pi2i = 1/(pi * 2);
x *= pi2i;
FLOAT xfraction = 0.5f - (x - truncf(x));
return my_sin_small1(xfraction);
}
For negative values, use -my_sin1(-x) or like code to flip the sign - or add 0.5 in the above minus 0.5 step.
Test
#include <math.h>
#include <stdio.h>
int main(void) {
for (int d = 0; d <= 360; d += 20) {
FLOAT x = d / 180.0 * M_PI;
FLOAT y = my_sin1(x);
printf("%12.6f %11.8f %11.8f\n", x, sin(x), y);
}
}
Output
0.000000 0.00000000 -0.00022483
0.349066 0.34202013 0.34221691
0.698132 0.64278759 0.64255589
1.047198 0.86602542 0.86590189
1.396263 0.98480775 0.98496443
1.745329 0.98480775 0.98501128
2.094395 0.86602537 0.86603642
2.443461 0.64278762 0.64260530
2.792527 0.34202022 0.34183803
3.141593 -0.00000009 0.00000000
3.490659 -0.34202016 -0.34183764
3.839724 -0.64278757 -0.64260519
4.188790 -0.86602546 -0.86603653
4.537856 -0.98480776 -0.98501128
4.886922 -0.98480776 -0.98496443
5.235988 -0.86602545 -0.86590189
5.585053 -0.64278773 -0.64255613
5.934119 -0.34202036 -0.34221727
6.283185 0.00000017 -0.00022483
Alternate code below makes for better results near 0.0, yet might cost a tad more time. OP seems more inclined to speed.
FLOAT xfraction = 0.5f - (x - truncf(x));
// vs.
FLOAT xfraction = x - truncf(x);
if (x >= 0.5f) x -= 1.0f;
[Edit]
Below is a better set with about 10% reduced error.
-56.0833765f
77.92947047f
-41.0936875f
6.278635918f
Yet another approach:
Spend more time (code) to reduce the range to ±pi/4 (±45 degrees), then possible to use only 3 or 2 terms of a polynomial that is like the usually Taylors series.
float sin_quick_small(float x) {
const float x2 = x * x;
#if 0
// max error about 7e-7
static const FLOAT A2 = +0.00811656036940792f;
static const FLOAT A3 = -0.166597759850666f;
static const FLOAT A4 = +0.999994132743861f;
return x * (x2 * (x2 * A2 + A3) + A4);
#else
// max error about 0.00016
static const FLOAT A3 = -0.160343346851626f;
static const FLOAT A4 = +0.999031566686144f;
return x * (x2 * A3 + A4);
#endif
}
float cos_quick_small(float x) {
return cosf(x); // TBD code.
}
float sin_quick(float x) {
if (x < 0.0) {
return -sin_quick(-x);
}
int quo;
float x90 = remquof(fabsf(x), 3.141592653589793238462f / 2, &quo);
switch (quo % 4) {
case 0:
return sin_quick_small(x90);
case 1:
return cos_quick_small(x90);
case 2:
return sin_quick_small(-x90);
case 3:
return -cos_quick_small(x90);
}
return 0.0;
}
int main() {
float max_x = 0.0;
float max_error = 0.0;
for (int d = -45; d <= 45; d += 1) {
FLOAT x = d / 180.0 * M_PI;
FLOAT y = sin_quick(x);
double err = fabs(y - sin(x));
if (err > max_error) {
max_x = x;
max_error = err;
}
printf("%12.6f %11.8f %11.8f err:%11.8f\n", x, sin(x), y, err);
}
printf("x:%.6f err:%.6f\n", max_x, max_error);
return 0;
}

B-spline curve with graphics.h c++

I am trying to draw a curve with B-spline. I did my research about what is B-spline and how can I use it in a program algorithm. After all that stuff, I finally find a code to make this right in Stack Overflow. I made some changes on this code and try to use in my program. It works but I have two problems with that.
Firstly the curve is in the right shape but not in the right position. It's like 20-40 pixel different than should to be.
Secondly in my function last part, I am dividing the two result of x and y to a number but it (divide number) seems like have to change for all the circumstances.
And finally, it's working for 6 coordinates as you can see.
How can I bind the number of coordinates to divide the number and fix the flip at the spline?
PS: I need to write that code with C
Here is my code's functions :
1-That's my B-spline calculate function:
void BSplineCurve(const Dot& point1,
const Dot& point2,
const Dot& point3,
const Dot& point4,
Dot& result,
const double t)
{
const double t2 = t * t;
const double t3 = t2 * t;
const double mt = 1.0 - t;
const double mt3 = mt * mt * mt;
const double bi3 = mt3;
const double bi2 = 3 * t3 - 6 * t2 + 4;
const double bi1 = -3 * t3 + 3 * t2 + 3 * t + 1;
const double bi = t3;
result.x = point1.x * bi3 + point2.x * bi2 + point3.x * bi1 + point4.x * bi;
result.x /= 4;
result.y = point1.y * bi3 + point2.y * bi2 + point3.y * bi1 + point4.y * bi;
result.y /= 4;
}
2- That's my Draw Function :
Dot points[6] = {ControlPoint1, ControlPoint2, ControlPoint3, ControlPoint4, ControlPoint5,
ControlPoint6};
for(double t = 5.9999;t > 2.0; t -= 0.001)
{
const int start = static_cast<int>(t)+1;
BSplineCurve(points[start -3 ],
points[start - 2],
points[start - 1],
points[start ],
DrawCurve,
start - t);
Draw1Dot(DrawCurve,points[0],distanceToEdges);}
3- And finally my Draw pixel function :
void Draw1Dot(Dot Koor, Dot mesafe, int ortala)
{
putpixel(mesafe.x + Koor.x + ortala, mesafe.y + Koor.y + ortala, 3);
}
Can you help me understand what I'm doing wrong?

Fast approximate float division

On modern processors, float division is a good order of magnitude slower than float multiplication (when measured by reciprocal throughput).
I'm wondering if there are any algorithms out there for computating a fast approximation to x/y, given certain assumptions and tolerance levels. For example, if you assume that 0<x<y, and are willing to accept any output that is within 10% of the true value, are there algorithms faster than the built-in FDIV operation?
I hope that this helps because this is probably as close as your going to get to what you are looking for.
__inline__ double __attribute__((const)) divide( double y, double x ) {
// calculates y/x
union {
double dbl;
unsigned long long ull;
} u;
u.dbl = x; // x = x
u.ull = ( 0xbfcdd6a18f6a6f52ULL - u.ull ) >> (unsigned char)1;
// pow( x, -0.5 )
u.dbl *= u.dbl; // pow( pow(x,-0.5), 2 ) = pow( x, -1 ) = 1.0/x
return u.dbl * y; // (1.0/x) * y = y/x
}
See also:
Another post about reciprocal approximation.
The Wikipedia page.
FDIV is usually exceptionally slower than FMUL just b/c it can't be piped like multiplication and requires multiple clk cycles for iterative convergence HW seeking process.
Easiest way is to simply recognize that division is nothing more than the multiplication of the dividend y and the inverse of the divisor x. The not so straight forward part is remembering a float value x = m * 2 ^ e & its inverse x^-1 = (1/m)*2^(-e) = (2/m)*2^(-e-1) = p * 2^q approximating this new mantissa p = 2/m = 3-x, for 1<=m<2. This gives a rough piece-wise linear approximation of the inverse function, however we can do a lot better by using an iterative Newton Root Finding Method to improve that approximation.
let w = f(x) = 1/x, the inverse of this function f(x) is found by solving for x in terms of w or x = f^(-1)(w) = 1/w. To improve the output with the root finding method we must first create a function whose zero reflects the desired output, i.e. g(w) = 1/w - x, d/dw(g(w)) = -1/w^2.
w[n+1]= w[n] - g(w[n])/g'(w[n]) = w[n] + w[n]^2 * (1/w[n] - x) = w[n] * (2 - x*w[n])
w[n+1] = w[n] * (2 - x*w[n]), when w[n]=1/x, w[n+1]=1/x*(2-x*1/x)=1/x
These components then add to get the final piece of code:
float inv_fast(float x) {
union { float f; int i; } v;
float w, sx;
int m;
sx = (x < 0) ? -1:1;
x = sx * x;
v.i = (int)(0x7EF127EA - *(uint32_t *)&x);
w = x * v.f;
// Efficient Iterative Approximation Improvement in horner polynomial form.
v.f = v.f * (2 - w); // Single iteration, Err = -3.36e-3 * 2^(-flr(log2(x)))
// v.f = v.f * ( 4 + w * (-6 + w * (4 - w))); // Second iteration, Err = -1.13e-5 * 2^(-flr(log2(x)))
// v.f = v.f * (8 + w * (-28 + w * (56 + w * (-70 + w *(56 + w * (-28 + w * (8 - w))))))); // Third Iteration, Err = +-6.8e-8 * 2^(-flr(log2(x)))
return v.f * sx;
}

Fast Inverse Square Root on x64

I found on net Fast Inverse Square Root on http://en.wikipedia.org/wiki/Fast_inverse_square_root . Does it work properly on x64 ?
Did anyone use and serious test ?
Originally Fast Inverse Square Root was written for a 32-bit float, so as long as you operate on IEEE-754 floating point representation, there is no way x64 architecture will affect the result.
Note that for "double" precision floating point (64-bit) you should use another constant:
...the "magic number" for 64 bit IEEE754 size type double ... was shown to be exactly 0x5fe6eb50c7b537a9
Here is an implementation for double precision floats:
#include <cstdint>
double invsqrtQuake( double number )
{
double y = number;
double x2 = y * 0.5;
std::int64_t i = *(std::int64_t *) &y;
// The magic number is for doubles is from https://cs.uwaterloo.ca/~m32rober/rsqrt.pdf
i = 0x5fe6eb50c7b537a9 - (i >> 1);
y = *(double *) &i;
y = y * (1.5 - (x2 * y * y)); // 1st iteration
// y = y * ( 1.5 - ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}
I did a few tests and it seems to work fine
Yes, it works if using the correct magic number and corresponding integer type. In addition to the answers above, here's a C++11 implementation that works for both double and float. Conditionals should optimise out at compile time.
template <typename T, char iterations = 2> inline T inv_sqrt(T x) {
static_assert(std::is_floating_point<T>::value, "T must be floating point");
static_assert(iterations == 1 or iterations == 2, "itarations must equal 1 or 2");
typedef typename std::conditional<sizeof(T) == 8, std::int64_t, std::int32_t>::type Tint;
T y = x;
T x2 = y * 0.5;
Tint i = *(Tint *)&y;
i = (sizeof(T) == 8 ? 0x5fe6eb50c7b537a9 : 0x5f3759df) - (i >> 1);
y = *(T *)&i;
y = y * (1.5 - (x2 * y * y));
if (iterations == 2)
y = y * (1.5 - (x2 * y * y));
return y;
}
As for testing, I use the following doctest in my project:
#ifdef DOCTEST_LIBRARY_INCLUDED
TEST_CASE_TEMPLATE("inv_sqrt", T, double, float) {
std::vector<T> vals = {0.23, 3.3, 10.2, 100.45, 512.06};
for (auto x : vals)
CHECK(inv_sqrt<T>(x) == doctest::Approx(1.0 / std::sqrt(x)));
}
#endif

Calculating distances but the result is - 2147483648

Below is the code to calculate the distance
// creating array of cities
double x[] = {21.0,12.0,15.0,3.0,7.0,30.0};
double y[] = {17.0,10.0,4.0,2.0,3.0,1.0};
// distance function - C = sqrt of A squared + B squared
One issue is that the order of operations is messing you up (multiplication is done before subtraction)
Change
(x[c1] - x[c2] * x[c1] - x[c2]) + (y[c1] - y[c2] * y[c1] - y[c2])
to
((x[c1] - x[c2]) * (x[c1] - x[c2])) + ((y[c1] - y[c2]) * (y[c1] - y[c2]))
I would also recommend, just for clarity, doing some of those calculations on separate lines (clearly that's a style choice that I prefer, and I'm sure some would disagree). It should make no difference to the compiler though
double deltaX = x[c1] - x[c2];
double deltaY = y[c1] - y[c2];
double distance = sqrt(deltaX * deltaX + deltaY * deltaY);
In my opinion that makes for more maintainable (and less error prone, as in this instance) code. Note that, as rewritten, the order of operations does not require extra parentheses.
Remember operator precedence: a - b * c - d means a - (b * c) - d.
Do you want
(x[c1] - (x[c2] * x[c1]) - x[c2])
or
((x[c1] - x[c2]) * (x[c1] - x[c2]))
(x[c1] - x[c2] * x[c1] - x[c2]) will be similar to (x[c1] - (x[c2] * x[c1]) - x[c2]) because * has higher precedence than -.
I am going to go ahead and fix a couple of issues:
// creating array of cities
double x[] = {21.0,12.0,15.0,3.0,7.0,30.0};
double y[] = {17.0,10.0,4.0,2.0,3.0,1.0};
// distance function - C = sqrt of A squared + B squared
double dist(int c1, int c2) {
double z = sqrt (
((x[c1] - x[c2]) * (x[c1] - x[c2])) + ((y[c1] - y[c2]) * (y[c1] - y[c2])));
return z;
}
void main()
{
int a[] = {1, 2, 3, 4, 5, 6};
execute(a, 0, sizeof(a)/sizeof(int));
int x;
printf("Type in a number \n");
scanf("%d", &x);
int y;
printf("Type in a number \n");
scanf("%d", &y);
double z = dist (x,y);
cout << "The result is " << z;
}
This fixes the unused return value, and also fixes the order of operation, and incorrect variable type of int.