C Language: Initializing float variable with a calculation - c++

I haven't done a lot of floating point math programming in any language let alone in C.
I'm writing a temperature conversion program as an exercise and have a question about floating point numbers. I have a code frag as listed below. In both cases Temp1 and Temp2 are 0.0 when P_FahrenheitTemp is <> 32.0. However, if I use the CF3 factor in the calculation the LOC VERRKKKS!!! :-)
This seems intuitively obvious to me but... Is this compiler dependent or is a cast operator necessary on the initialization? BTW, I'm writing this code on an IBM iSeries platform using the C/C++ compiler which strictly adheres to ASNI and ISO standards.
Thank you in advance for any info!
Martin Kuester
#define CF3 5/9;
float Conv2Celsius(float P_FahrenheitTemp)
{
float Temp1, Temp2, Temp3;
float ConvAdj = 32.0;
float CF1 = 0.555556;
float CF2 = 5/9;
//[°C] = ([°F] - 32) × 5/9
Temp1 = (P_FahrenheitTemp - ConvAdj) * CF1;
Temp2 = (P_FahrenheitTemp - ConvAdj) * CF2;
Temp3 = (P_FahrenheitTemp - ConvAdj) * CF3;
return(Temperature);
}

Let us look closer.
float CF1 = 0.555556;
Temp1 = (P_FahrenheitTemp - ConvAdj) * CF1;
// same as
Temp1 = (P_FahrenheitTemp - ConvAdj) * (float) 0.555556;
float CF2 = 5/9;
Temp2 = (P_FahrenheitTemp - ConvAdj) * CF2;
// same as
float CF2 = 0; // 5/9 is integer division
Temp2 = (P_FahrenheitTemp - ConvAdj) * 0;
#define CF3 5/9
Temp3 = (P_FahrenheitTemp - ConvAdj) * CF3;
// same as
Temp3 = (P_FahrenheitTemp - ConvAdj) * 5 / 9;
Temp3 = (P_FahrenheitTemp - ConvAdj) * 5.0f / 9;
// ^--- float multiplication -------^
// same as
Temp3 = (P_FahrenheitTemp - ConvAdj) * 5.0f / 9.0f;
// ^--- float divsion ----------------------^
Temp3 "VERRKKKS" because it is not scaling by 5/9. Instead it is a text substitution in the line-of-code and so multiplies by 5 and then divides by 9.
Temp3 is correct and best of the three.
Temp1 is almost correct as not as certainly precise * (float) 0.555556 as * 5.0f/9.0f.
Temp2 is wrong as the answer is always 0, even when it should not be
I have a code frag as listed below. In both cases Temp1 and Temp2 are 0.0 when P_FahrenheitTemp is <> 32.0.
Temp1 is not 0.0.
To set aside the minor additional error in the constant, use at least 9 digits with float and a f suffix.
//float CF1 = 0.555556;
float CF1 = 0.555555556f;
Suggested replacement
float Conv2Celsius(float P_FahrenheitTemp) {
float ConvAdj = 32.0f;
float CF = 5.0f/9.0f; // or 0.555555556f
//[°C] = ([°F] - 32) × 5/9
return (P_FahrenheitTemp - ConvAdj) * CF;
}

In C language the line float CF2 = 5/9; will be processed as follows:
The right side of the assignment operator 5/9 is evaluated first. The compiler here sees two integer values divided on each other so it will save the result into an integer temporary variable. This will lead to truncation of the fractional part of the actual result 0.555556 to 0.
The result will be assigned then to CF2.
What to do?
Alot of options; float CF2 = 5.0/9; or float CF2 = (float)5/9; or even float CF2 = 5./9;
the same with CF3

Related

Fast approximate float division

On modern processors, float division is a good order of magnitude slower than float multiplication (when measured by reciprocal throughput).
I'm wondering if there are any algorithms out there for computating a fast approximation to x/y, given certain assumptions and tolerance levels. For example, if you assume that 0<x<y, and are willing to accept any output that is within 10% of the true value, are there algorithms faster than the built-in FDIV operation?
I hope that this helps because this is probably as close as your going to get to what you are looking for.
__inline__ double __attribute__((const)) divide( double y, double x ) {
// calculates y/x
union {
double dbl;
unsigned long long ull;
} u;
u.dbl = x; // x = x
u.ull = ( 0xbfcdd6a18f6a6f52ULL - u.ull ) >> (unsigned char)1;
// pow( x, -0.5 )
u.dbl *= u.dbl; // pow( pow(x,-0.5), 2 ) = pow( x, -1 ) = 1.0/x
return u.dbl * y; // (1.0/x) * y = y/x
}
See also:
Another post about reciprocal approximation.
The Wikipedia page.
FDIV is usually exceptionally slower than FMUL just b/c it can't be piped like multiplication and requires multiple clk cycles for iterative convergence HW seeking process.
Easiest way is to simply recognize that division is nothing more than the multiplication of the dividend y and the inverse of the divisor x. The not so straight forward part is remembering a float value x = m * 2 ^ e & its inverse x^-1 = (1/m)*2^(-e) = (2/m)*2^(-e-1) = p * 2^q approximating this new mantissa p = 2/m = 3-x, for 1<=m<2. This gives a rough piece-wise linear approximation of the inverse function, however we can do a lot better by using an iterative Newton Root Finding Method to improve that approximation.
let w = f(x) = 1/x, the inverse of this function f(x) is found by solving for x in terms of w or x = f^(-1)(w) = 1/w. To improve the output with the root finding method we must first create a function whose zero reflects the desired output, i.e. g(w) = 1/w - x, d/dw(g(w)) = -1/w^2.
w[n+1]= w[n] - g(w[n])/g'(w[n]) = w[n] + w[n]^2 * (1/w[n] - x) = w[n] * (2 - x*w[n])
w[n+1] = w[n] * (2 - x*w[n]), when w[n]=1/x, w[n+1]=1/x*(2-x*1/x)=1/x
These components then add to get the final piece of code:
float inv_fast(float x) {
union { float f; int i; } v;
float w, sx;
int m;
sx = (x < 0) ? -1:1;
x = sx * x;
v.i = (int)(0x7EF127EA - *(uint32_t *)&x);
w = x * v.f;
// Efficient Iterative Approximation Improvement in horner polynomial form.
v.f = v.f * (2 - w); // Single iteration, Err = -3.36e-3 * 2^(-flr(log2(x)))
// v.f = v.f * ( 4 + w * (-6 + w * (4 - w))); // Second iteration, Err = -1.13e-5 * 2^(-flr(log2(x)))
// v.f = v.f * (8 + w * (-28 + w * (56 + w * (-70 + w *(56 + w * (-28 + w * (8 - w))))))); // Third Iteration, Err = +-6.8e-8 * 2^(-flr(log2(x)))
return v.f * sx;
}

Variable grouping providing different answers in optimized code

I've been attempting to unit test a C++ class I've written for Geodetic transforms.
I've noticed that a trivial grouping change of three variables greatly influences the error in the function.
EDIT : Here is the entire function for a compilable example:
Assume latitude, longitude and altitude are zero. Earth::a = 6378137 and Earth::b = 6356752.3 I'm working on getting benchmark numbers, something came up at work today and I had to do that instead.
void Geodesy::Geocentric2EFG(double latitude, double longitude, double altitude, double *E, double *F, double *G) {
double a2 = pow<double>(Earth::a, 2);
double b2 = pow<double>(Earth::b, 2);
double radius = sqrt((a2 * b2)/(a2 * pow<double>(sin(latitude), 2) + b2 * pow<double>(cos(longitude), 2)));
radius += altitude;
*E = radius * (cos(latitude) * cos(longitude));
*F = radius * (cos(latitude) * sin(longitude));
*G = radius * sin(latitude);
return;
}
Where all values are defined as double including those in Earth. The pow<T>() function is a recursive template function defined by:
template <typename T>
static inline T pow(const T &base, unsigned const exponent) {
return (exponent == 0) ? 1 : (base * pow(base, exponent - 1));
}
The code in question:
*E = radius * cos(latitude) * cos(longitude);
*F = radius * cos(latitude) * sin(longitude);
produces different results than:
*E = radius * (cos(latitude) * cos(longitude));
*F = radius * (cos(latitude) * sin(longitude));
What is the compiler doing in gcc with optimization level 3 to make these results 1e-2 different?
You have different rounding as floating point cannot represent all numbers:
a * b * c; is (a * b) * c which may differ than a * (b * c).
You may have similar issues with addition too.
example with addition:
10e10f + 1.f == 10e10f
so (1.f + 10e10f) - 10e10f == 10e10f - 10e10f == 0.f
whereas 1.f + (10e10f - 10e10f) == 1.f - 0.f == 1.f.

"double" does not print decimals

i was wondering why in this program, "pi_estimated" wouldn't print out as a number with decimal places although the variable was declared as a "double". However, it prints out an integer.
double get_pi(double required_accuracy)
{
double pi_estimation=0.0;
int x,y;
double p=0.0,q=0.0,r=0.0;
int D=0;
for(int N=1;N<=1e2;N++)
{
x = rand()%100;
p = (x/50.0 - 1.0)/100.0;
y = rand()%100;
q = (y/50.0 - 1.0)/100.0;
r = p*p + q*q;
if((sqrt(r))<1.0)
{
D++;
pi_estimation = 4.0*(double (D/N));
}
if(double (4/(N+1)) < (required_accuracy*pi_estimation/100.0))
{
cout<<pi_estimation<<endl;
return (pi_estimation);
}
}
}
int main()
{
double pi_approx=0.0, a, actual_accuracy=0.0;
for(a=0.1;a>=1e-14;a/=10)
{
pi_approx = get_pi(a);
actual_accuracy = (fabs((pi_approx - M_PI)/(M_PI)))*100.0;
cout<<actual_accuracy<<endl;
}
}
This line is the culprit:
pi_estimation = 4.0*(double (D/N));
Since D and N are both ints, D/N is an int. Casting the int to a double cannot magically make decimals appear out of nowhere.
Here's the line, fixed:
pi_estimation = 4.0 * (((double) D) / N));
You could also multiply first, so you don't need so many parens:
pi_estimation = 4.0 * D / N;
D is being multiplied by 4.0, so it becomes a double because double * int = double. Then it's divided by N. Since (x * y) / z === x * (y / z) (associative property), the expressions are equivalent.
The problem is here:
pi_estimation = 4.0*(double (D/N));
D and N are both integers, so D/N is an integer that you are casting to a double and then multiplying by 4.0.
You want to do this:
pi_estimation = 4.0 * (static_cast<double>(D) / N));
Since D and N are both integral types, D/N is performed in integer arithmetic; the cast to double happens too late as precision is lost prior to the cast.
One fix is to write 4.0 * D / N. This will ensure that everything is calculated in floating point. (Since * and / have the same precedence, you don't need to write (double).)

Calculating distances but the result is - 2147483648

Below is the code to calculate the distance
// creating array of cities
double x[] = {21.0,12.0,15.0,3.0,7.0,30.0};
double y[] = {17.0,10.0,4.0,2.0,3.0,1.0};
// distance function - C = sqrt of A squared + B squared
One issue is that the order of operations is messing you up (multiplication is done before subtraction)
Change
(x[c1] - x[c2] * x[c1] - x[c2]) + (y[c1] - y[c2] * y[c1] - y[c2])
to
((x[c1] - x[c2]) * (x[c1] - x[c2])) + ((y[c1] - y[c2]) * (y[c1] - y[c2]))
I would also recommend, just for clarity, doing some of those calculations on separate lines (clearly that's a style choice that I prefer, and I'm sure some would disagree). It should make no difference to the compiler though
double deltaX = x[c1] - x[c2];
double deltaY = y[c1] - y[c2];
double distance = sqrt(deltaX * deltaX + deltaY * deltaY);
In my opinion that makes for more maintainable (and less error prone, as in this instance) code. Note that, as rewritten, the order of operations does not require extra parentheses.
Remember operator precedence: a - b * c - d means a - (b * c) - d.
Do you want
(x[c1] - (x[c2] * x[c1]) - x[c2])
or
((x[c1] - x[c2]) * (x[c1] - x[c2]))
(x[c1] - x[c2] * x[c1] - x[c2]) will be similar to (x[c1] - (x[c2] * x[c1]) - x[c2]) because * has higher precedence than -.
I am going to go ahead and fix a couple of issues:
// creating array of cities
double x[] = {21.0,12.0,15.0,3.0,7.0,30.0};
double y[] = {17.0,10.0,4.0,2.0,3.0,1.0};
// distance function - C = sqrt of A squared + B squared
double dist(int c1, int c2) {
double z = sqrt (
((x[c1] - x[c2]) * (x[c1] - x[c2])) + ((y[c1] - y[c2]) * (y[c1] - y[c2])));
return z;
}
void main()
{
int a[] = {1, 2, 3, 4, 5, 6};
execute(a, 0, sizeof(a)/sizeof(int));
int x;
printf("Type in a number \n");
scanf("%d", &x);
int y;
printf("Type in a number \n");
scanf("%d", &y);
double z = dist (x,y);
cout << "The result is " << z;
}
This fixes the unused return value, and also fixes the order of operation, and incorrect variable type of int.

C++ Question on the pow function

I'm trying to get this expression to work, I'm pretty sure its not the parenthesis because I counted all of them. Perhaps there something I'm doing wrong involving the parameter pow (x,y).
double calculatePeriodicPayment()
{
periodicPaymentcalc = (loan * ((interestRate / yearlyPayment))) / (1-((pow ((1+(interestRate / yearlyPayment)))),(-(yearlyPayment * numOfYearLoan))));
return periodicPaymentcalc;
}
Notice how much easier it is to figure out what the function is doing if you break each step up into pieces:
(I find it even easier if your variables match the source material, so I'll name my variables after the ones Wikipedia uses.)
// amortization calculator
// uses annuity formula (http://en.wikipedia.org/wiki/Amortization_calculator)
// A = (P x i) / (1 - pow(1 + i,-n))
// Where:
// A = periodic payment amount
// P = amount of principal
// i = periodic interest rate
// n = total number of payments
double calculatePeriodicPayment()
{
const double P = loan;
const double i = interestRate / yearlyPayment;
const double n = yearlyPayment * numOfYearLoan;
const double A = (P * i) / (1 - pow(1.0 + i, -n));
return A;
}
It's much easier to confirm that the logic of this function does what it should this way.
If you're curious, substituting my variable names in, your parenthises problem is as follows:
const double A = (P * i) / (1 - pow(1 + i)), -n; // <- this is how you have it
const double A = (P * i) / (1 - pow(1 + i, -n)); // <- this is how it should be
With this grouping, you're only passing one argument to pow, which is why the compiler says no overloaded function takes 1 arguments.
Edit: You mentioned I used more variables. However, your compiler will use temporary variables much like I did. Your complex statement will be broken up into pieces, and may look something like this:
double calculatePeriodicPayment()
{
const double temp1 = interestRate / yearlyPayment;
const double temp2 = loan * temp1;
const double temp3 = interestRate / yearlyPayment;
const double temp4 = 1.0 + temp3;
const double temp5 = yearlyPayment * numOfYearLoan;
const double temp6 = -temp5;
const double temp7 = pow(temp4, temp5);
const double temp8 = 1 - temp7;
const double temp9 = temp2 / temp8;
periodicPaymentcalc = temp9;
return periodicPaymentcalc;
}
Mine will also be broken up, and will look like:
double calculatePeriodicPayment()
{
const double P = loan;
const double i = interestRate / yearlyPayment;
const double n = yearlyPayment * numOfYearLoan;
const double temp1 = P * i;
const double temp2 = 1.0 + i;
const double temp3 = -n;
const double temp4 = pow(temp2, temp3);
const double temp5 = 1 - temp4;
const double temp6 = temp1 / temp5;
const double A = temp6;
return A;
}
Perhaps there are some optimizations that the compiler will use, such as noticing that it uses interestRate / yearlyPayment twice in your function, and use the same temporary for both places, but there's no gurantee this will happen. Notice that we use pretty much the same number of variables in both of our functions. I just used more named variables, and fewer unnamed temporaries.
There's a misplaced bracket. Here's a fixed version:
periodicPaymentcalc = (loan * ((interestRate / yearlyPayment))) / (1 - ((pow ((1+(interestRate / yearlyPayment)),(-(yearlyPayment * numOfYearLoan))))));
Use an editor that highlights matching brackets to avoid this kind of errors. Or simply create temporary variables to hold intermediate values.
periodicPaymentcalc = (loan * interestRate / yearlyPayment) /
(1.0 - pow (1.0 + interestRate / yearlyPayment, -yearlyPayment * numOfYearLoan));
Try that. I removed all the redundant parentheses too, as well as changing all literals to doubles, just for good measure.