Wrong conversion using macro - c++

So I have a weird problem with a GNU GCC (C/C++) macro defined as follows:
#define PI 3.14159265359
#define DEG_TO_RAD(a) (a * PI / 180.0)
#define ARCSEC_TO_DEG(a) (a / 3600.0)
#define ARCSEC_TO_RAD(a) DEG_TO_RAD( ARCSEC_TO_DEG( a ) )
The macro, as you can tell, is simply converting a value in seconds of arc to radians. However, depending on where the macro is applied, I get a different result:
double xi2 = ARCSEC_TO_RAD( 2306.2181 * c + 0.30188 * c2 + 0.017998 * c3);
double xi = 2306.2181 * c + 0.30188 * c2 + 0.017998 * c3;
printf("c = %.10f; xi = %.10f = %.10f = %.10f; ",
c, xi, ARCSEC_TO_RAD(xi), xi2);
This outputs:
c = 0.1899931554; xi = 438.1766743152 = 0.0021243405 = 7.6476237313;
Where's the silly error...?

Going step by step,
ARCSEC_TO_RAD( 2306.2181 * c + 0.30188 * c2 + 0.017998 * c3);
will expand to
DEG_TO_RAD( ARCSEC_TO_DEG(2306.2181 * c + 0.30188 * c2 + 0.017998 * c3))
DEG_TO_RAD( (2306.2181 * c + 0.30188 * c2 + 0.017998 * c3 / 3600.0))
((2306.2181 * c + 0.30188 * c2 + 0.017998 * c3 / 3600.0) * P* / 180.0)
Now the regular order of operations kick in here, so 2306.2181 * c + 0.30188 * c2 + 0.017998 * c3 will not be divided by 3600. Only 0.017998 * c3 will. The old school C solution is to place brackets around the macro substitutions.
The modern C and C++ solutions are to use functions. inline the functions if you need to to meet ODR, but the compiler will likely decide on its own whether the function should be expanded inline or not.
This question is tagged C++, so here's the C++ solution:
#include <iostream>
constexpr double PI = 3.14159265359;
/* or
#include <cmath>
const double PI = std::acos(-1);
but I'm not certain you can properly constexpr this */
double DEG_TO_RAD(double a)
{
return a * PI / 180.0;
}
double ARCSEC_TO_DEG(double a)
{
return a / 3600.0;
}
double ARCSEC_TO_RAD(double a)
{
return DEG_TO_RAD( ARCSEC_TO_DEG( a ) );
}
int main ()
{
double c = 10;
double c2 = 20;
double c3 = 30;
std::cout << DEG_TO_RAD(2306.2181 * c + 0.30188 * c2 + 0.017998 * c3) << std::endl;
}
In C++11 or more recent, constexpr can be added to make these former macros compile-time constants should it be necessary.

I strogly recomend you to use functions(maybe inline), instead of MACROS,
but if for some reason you can't, a workaround could be adding parenthesis to received arguments:
#define PI 3.14159265359
#define DEG_TO_RAD(a) ((a) * PI / 180.0)
#define ARCSEC_TO_DEG(a) ((a) / 3600.0)
#define ARCSEC_TO_RAD(a) DEG_TO_RAD( ARCSEC_TO_DEG( (a) ) )
//In the lastone () is not necessary but it a good prectice always adding parenthesis to macro args
This prevent you to have errors related to operator precedence when the macro is expanded.

Related

Efficient floating point scaling in C++

I'm working on my fast (and accurate) sin implementation in C++, and I have a problem regarding the efficient angle scaling into the +- pi/2 range.
My sin function for +-pi/2 using Taylor series is the following
(Note: FLOAT is a macro expanded to float or double just for the benchmark)
/**
* Sin for 'small' angles, accurate on [-pi/2, pi/2], fairly accurate on [-pi, pi]
*/
// To switch between float and double
#define FLOAT float
FLOAT
my_sin_small(FLOAT x)
{
constexpr FLOAT C1 = 1. / (7. * 6. * 5. * 4. * 3. * 2.);
constexpr FLOAT C2 = -1. / (5. * 4. * 3. * 2.);
constexpr FLOAT C3 = 1. / (3. * 2.);
constexpr FLOAT C4 = -1.;
// Correction for sin(pi/2) = 1, due to the ignored taylor terms
constexpr FLOAT corr = -1. / 0.9998431013994987;
const FLOAT x2 = x * x;
return corr * x * (x2 * (x2 * (x2 * C1 + C2) + C3) + C4);
}
So far so good... The problem comes when I try to scale an arbitrary angle into the +-pi/2 range. My current solution is:
FLOAT
my_sin(FLOAT x)
{
constexpr FLOAT pi = 3.141592653589793238462;
constexpr FLOAT rpi = 1 / pi;
// convert to +-pi/2 range
int n = std::nearbyint(x * rpi);
FLOAT xbar = (n * pi - x) * (2 * (n & 1) - 1);
// (2 * (n % 2) - 1) is a sign correction (see below)
return my_sin_small(xbar);
};
I made a benchmark, and I'm losing a lot for the +-pi/2 scaling.
Tricking with int(angle/pi + 0.5) is a nope since it is limited to the int precision, also requires +- branching, and i try to avoid branches...
What should I try to improve the performance for this scaling? I'm out of ideas.
Benchmark results for float. (In the benchmark the angle could be out of the validity range for my_sin_small, but for the bench I don't care about that...):
Benchmark results for double.
Sign correction for xbar in my_sin():
Algo accuracy compared to python sin() function:
Candidate improvements
Convert the radians x to rotations by dividing by 2*pi.
Retain only the fraction so we have an angle (-1.0 ... 1.0). This simplifies the OP's modulo step to a simple "drop the whole number" step instead. Going forward with different angle units simply involves a co-efficient set change. No need to scale back to radians.
For positive values, subtract 0.5 so we have (-0.5 ... 0.5) and then flip the sign. This centers the possible values about 0.0 and makes for better convergence of the approximating polynomial as compared to the math sine function. For negative values - see below.
Call my_sin_small1() that uses this (-0.5 ... 0.5) rotations range rather than [-pi ... +pi] radians.
In my_sin_small1(), fold constants together to drop the corr * step.
Rather than use the truncated Taylor's series, use a more optimal set. IMO, this will provide better answers, especially near +/-pi.
Notes: No int to/from float code. With more analysis, possible to get a better set of coefficients that fix my_sin(+/-pi) closer to 0.0. This is just a quick set of code to demo less FP steps and good potential results.
C like code for OP to port to C++
FLOAT my_sin_small1(FLOAT x) {
static const FLOAT A1 = -5.64744881E+01;
static const FLOAT A2 = +7.81017968E+01;
static const FLOAT A3 = -4.11145353E+01;
static const FLOAT A4 = +6.27923581E+00;
const FLOAT x2 = x * x;
return x * (x2 * (x2 * (x2 * A1 + A2) + A3) + A4);
}
FLOAT my_sin1(FLOAT x) {
static const FLOAT pi = 3.141592653589793238462;
static const FLOAT pi2i = 1/(pi * 2);
x *= pi2i;
FLOAT xfraction = 0.5f - (x - truncf(x));
return my_sin_small1(xfraction);
}
For negative values, use -my_sin1(-x) or like code to flip the sign - or add 0.5 in the above minus 0.5 step.
Test
#include <math.h>
#include <stdio.h>
int main(void) {
for (int d = 0; d <= 360; d += 20) {
FLOAT x = d / 180.0 * M_PI;
FLOAT y = my_sin1(x);
printf("%12.6f %11.8f %11.8f\n", x, sin(x), y);
}
}
Output
0.000000 0.00000000 -0.00022483
0.349066 0.34202013 0.34221691
0.698132 0.64278759 0.64255589
1.047198 0.86602542 0.86590189
1.396263 0.98480775 0.98496443
1.745329 0.98480775 0.98501128
2.094395 0.86602537 0.86603642
2.443461 0.64278762 0.64260530
2.792527 0.34202022 0.34183803
3.141593 -0.00000009 0.00000000
3.490659 -0.34202016 -0.34183764
3.839724 -0.64278757 -0.64260519
4.188790 -0.86602546 -0.86603653
4.537856 -0.98480776 -0.98501128
4.886922 -0.98480776 -0.98496443
5.235988 -0.86602545 -0.86590189
5.585053 -0.64278773 -0.64255613
5.934119 -0.34202036 -0.34221727
6.283185 0.00000017 -0.00022483
Alternate code below makes for better results near 0.0, yet might cost a tad more time. OP seems more inclined to speed.
FLOAT xfraction = 0.5f - (x - truncf(x));
// vs.
FLOAT xfraction = x - truncf(x);
if (x >= 0.5f) x -= 1.0f;
[Edit]
Below is a better set with about 10% reduced error.
-56.0833765f
77.92947047f
-41.0936875f
6.278635918f
Yet another approach:
Spend more time (code) to reduce the range to ±pi/4 (±45 degrees), then possible to use only 3 or 2 terms of a polynomial that is like the usually Taylors series.
float sin_quick_small(float x) {
const float x2 = x * x;
#if 0
// max error about 7e-7
static const FLOAT A2 = +0.00811656036940792f;
static const FLOAT A3 = -0.166597759850666f;
static const FLOAT A4 = +0.999994132743861f;
return x * (x2 * (x2 * A2 + A3) + A4);
#else
// max error about 0.00016
static const FLOAT A3 = -0.160343346851626f;
static const FLOAT A4 = +0.999031566686144f;
return x * (x2 * A3 + A4);
#endif
}
float cos_quick_small(float x) {
return cosf(x); // TBD code.
}
float sin_quick(float x) {
if (x < 0.0) {
return -sin_quick(-x);
}
int quo;
float x90 = remquof(fabsf(x), 3.141592653589793238462f / 2, &quo);
switch (quo % 4) {
case 0:
return sin_quick_small(x90);
case 1:
return cos_quick_small(x90);
case 2:
return sin_quick_small(-x90);
case 3:
return -cos_quick_small(x90);
}
return 0.0;
}
int main() {
float max_x = 0.0;
float max_error = 0.0;
for (int d = -45; d <= 45; d += 1) {
FLOAT x = d / 180.0 * M_PI;
FLOAT y = sin_quick(x);
double err = fabs(y - sin(x));
if (err > max_error) {
max_x = x;
max_error = err;
}
printf("%12.6f %11.8f %11.8f err:%11.8f\n", x, sin(x), y, err);
}
printf("x:%.6f err:%.6f\n", max_x, max_error);
return 0;
}

How to avoid overflow in expression (A - B * C) / D?

I need to compute an expression that looks like: (A - B * C) / D, where their types are: signed long long int A, B, C, D; Each number can be really big (not overflowing its type). While B * C could cause overflow, at same time expression (A - B * C) / D would fit in. How can I compute it correctly?
For example: In the equation (Ax + By = C), assuming A * x is overflowing, but y = (C - A * x) / B could fit in. No need to use BigInteger or double data type.
You can transform the equation to do the division first while accounting for the remainders:
Assume / is integer division and everything is infinite precision:
x == x / y * y + x % y
(A - B * C) / D
((A/D * D + (A%D)) - (B/D * D + (B%D)) * (C/D * D + (C%D))) / D
(A/D * D - B/D * D * C/D * D - (B/D * D * (C%D) + (B%D) * C/D * D) + (A%D) - (B%D) * (C%D)) / D
(A/D * D - B/D * D * C/D * D) / D - (B/D * D * (C%D) + (B%D) * C/D * D) / D + ((A%D) - (B%D) * (C%D)) / D
(A/D - B/D * C/D * D) - (B/D * (C%D) + (B%D) * C/D) + ((A%D) - (B%D) * (C%D)) / D
A/D - B/D * C - B/D * (C%D) - (B%D) * C/D) + ((A%D) - (B%D) * (C%D)) / D
Assuming D is not too small and not too big then x / D and x % D are small and we can do this:
using T = signed long long int;
T compute(T a, T b, T c, T d) {
T a1 = a / d, a2 = a % d;
T b1 = b / d, b2 = b % d;
T c1 = c / d, c2 = c % d;
T m1 = b1 * c, m2 = b1 * c2, m3 = b2 * c1, m4 = b2 * c2;
T s1 = a1 - m1 - m2 - m3, s2 = a2 - m4;
return s1 + s2 / d;
}
The critical part is the multiplication for m1 through m4. The range of numbers b and c that overflow while the result should have fit is rather small I believe.
I think you could change the order of the operations so it will look like:
A/D - (B/D)*C
The result should remain the same.
Since you mentioned gcd lets try that as alternative answer:
using T = signed long long int;
T gcd(T a, T b);
T compute(T a, T b, T c, T d) {
// use gcd for (b * c) / d
T db = gcd(b, d);
T dc = gcd(c, d / db);
T dm = db * dc;
// use gcd on a for (a - b*c) / dm
T da = gcd(a, dm);
// split da for use on (b * c) / da
db = gcd(b, da);
dc = da / db;
T dr = d / da;
return ((a / da) - (b / db) * (c / dc)) / dr;
}
Only works if d has enough factors in common with a, b and c.

warning: '*' in boolean context, suggest '&&' instead [-Wint-in-bool-context]?

i've this code (which is pretty all "float"):
#define sincf(x) (x == 0.0f) ? (1.0f) : (sinf(M_PI * x) / (M_PI * x))
// ...
for (int i = 0; i < num_taps; i++)
proto[i] = 2.0f * f * sincf(2.0f * f * (i - m / 2.0f));
// ...
why gcc says warning: '' in boolean context, suggest '&&' instead [-Wint-in-bool-context]* for the second "*"?
f is float
proto is *float
i and m are int
After the macro has been substituted, this part
2.0f * f * sincf(2.0f * f * (i - m / 2.0f));
becomes
2.0f * f * (2.0f * f * (i - m / 2.0f) == 0.0f) ? ...
and according to operator precedence, the multiplication 2.0f * f * condition will be done before checking if the condition is true (with ?). Like so:
(2.0f * f * (2.0f * f * (i - m / 2.0f) == 0.0f)) ? ...
The quick fix:
#define sincf(x) (((x) == 0.0f) ? (1.0f) : (sinf(M_PI * (x)) / (M_PI * (x))))
(x) == 0.0f will rarely be true but since it's only used to avoid division by zero, that's probably fine.
Now, this could easily be rewritten as a function and I suggest doing that. Example:
template<class T>
T sinc(T x) {
if(x == T{}) return {1}; // avoid division by zero
auto pix = static_cast<T>(M_PI) * x;
return std::sin(pix) / pix;
}
One could also cast x to double if T is an integral type. Here's a C++20 version of that:
#include <concepts> // std::integral
#include <numbers> // std::numbers::pi_v
template<class T>
T sinc(T x) {
if(x == T{}) return 1; // avoid division by zero
// C++20 added some constants to the standard library:
auto pix = std::numbers::pi_v<T> * x;
return std::sin(pix) / pix;
}
double sinc(std::integral auto x) {
return sinc<double>(x);
}

C++ What is wrong with this version of the quadratic formula?

In my book, it asks me the following question This is for compsci-1
What is wrong with this version of the quadratic formula?
x1 = (-b - sqrt(b * b - 4 * a * c)) / 2 * a;
x2 = (-b + sqrt(b * b - 4 * a * c)) / 2 * a;
The equation your code is translating is:
which of course is not the solution for quadratic equations. You want a solution for this equation:
What's the difference? In the first one you compute the numerator, then you divide by two, then you multiply by a. That's what your code is doing. In the second one you compute the numerator, then you compute the denominator, finally you divide them.
So with additional variables:
num1 = -b - sqrt(b * b - 4 * a * c);
num2 = -b + sqrt(b * b - 4 * a * c);
den = 2 * a;
x1 = num1 / den;
x2 = num2 / den;
which can of course be written as:
x1 = (-b - sqrt(b * b - 4 * a * c)) / (2 * a);
x2 = (-b + sqrt(b * b - 4 * a * c)) / (2 * a);
Where you have to plug in those parenthesis in order to force the denominator to be computed before the division. As suggested in the comment by #atru.

Variable grouping providing different answers in optimized code

I've been attempting to unit test a C++ class I've written for Geodetic transforms.
I've noticed that a trivial grouping change of three variables greatly influences the error in the function.
EDIT : Here is the entire function for a compilable example:
Assume latitude, longitude and altitude are zero. Earth::a = 6378137 and Earth::b = 6356752.3 I'm working on getting benchmark numbers, something came up at work today and I had to do that instead.
void Geodesy::Geocentric2EFG(double latitude, double longitude, double altitude, double *E, double *F, double *G) {
double a2 = pow<double>(Earth::a, 2);
double b2 = pow<double>(Earth::b, 2);
double radius = sqrt((a2 * b2)/(a2 * pow<double>(sin(latitude), 2) + b2 * pow<double>(cos(longitude), 2)));
radius += altitude;
*E = radius * (cos(latitude) * cos(longitude));
*F = radius * (cos(latitude) * sin(longitude));
*G = radius * sin(latitude);
return;
}
Where all values are defined as double including those in Earth. The pow<T>() function is a recursive template function defined by:
template <typename T>
static inline T pow(const T &base, unsigned const exponent) {
return (exponent == 0) ? 1 : (base * pow(base, exponent - 1));
}
The code in question:
*E = radius * cos(latitude) * cos(longitude);
*F = radius * cos(latitude) * sin(longitude);
produces different results than:
*E = radius * (cos(latitude) * cos(longitude));
*F = radius * (cos(latitude) * sin(longitude));
What is the compiler doing in gcc with optimization level 3 to make these results 1e-2 different?
You have different rounding as floating point cannot represent all numbers:
a * b * c; is (a * b) * c which may differ than a * (b * c).
You may have similar issues with addition too.
example with addition:
10e10f + 1.f == 10e10f
so (1.f + 10e10f) - 10e10f == 10e10f - 10e10f == 0.f
whereas 1.f + (10e10f - 10e10f) == 1.f - 0.f == 1.f.