installment - float error C2296 - c++

I have this function:
float ObliczRate(float fKwotaKredytu, float fOprocentowanie, int iIloscRat)
{
/*
Rata = K * y^n * (y-1) / (y^n-1);
y = 1 + (r / 12)
*/
float fRata, float fY;
fY = 1 + (fOprocentowanie / 12); // obliczanie stałej kredytu
fRata = fKwotaKredytu * fY^iIloscRat * (fY - 1) / (fY^iIloscRat - 1); // obliczanie raty stałej ze wzoru
return fRata;
}
And i have error: "error C2296: '^' : illegal, left operand has type 'float'" only on "(fY^iIloscRat - 1)". What's wrong with this?

It's because ^ is not an exponential operator, but the bitwise XOR operator. You want the std::pow function.

In C++ the operator ^ has a meaning of bitwise XOR operator not the power operation. You will have to use the pow function defined in the cmath header here.

In C++ you have to use a function pow to do a power operation.
The signature looks like this:
float pow( float base, float exp );
You can find it here

Please use the pow function for this purpose. Below is the link to the example
http://www.cplusplus.com/reference/cmath/pow/

Related

What is the correct way to use C++ style casts to perform an expression at a desired precision?

Given the following:
int a = 10, b = 5, c = 3, d = 1;
int x = 3, y = 2, z = 2;
return (float) a/x + b/y + c/z + d;
This presumably casts our precision to float and then performs our sequence of divisions at floating point precision.
What is the correct way to update this using C++ style casts?
Should this really be rewritten as:
return static_cast<float>(a) / static_cast<float>(b) + ... ?
Start by correcting your code:
(float) a/x + b/y + c/z + d
produces 7.33333, while the correct result is 8.33333. Why? because b/y and c/z divisions are done in ints (demo).
The reason the result is incorrect is that division takes precedence over addition: your program needs to divide b by y and c by z before adding them to the result of division of a by x, which is float.
You need to cast one of the division operands to get this to work correctly. C cast works fine, but if you would rather use C++-style cast, here is how you can do it:
return static_cast<float>(a) / b + static_cast<float>(b) / y +
static_cast<float>(c) / z + d;
/ has higher precedence than +, so b/y will be performed in int, not in float.
The correct way to perform each division in float is to cast at least one operand to float:
static_cast<float>(a)/x + static_cast<float>(b)/y + static_cast<float>(c)/z + d
This is clearer than the equivalent C expression:
(float) a/x + (float) b/y + (float) c/z + d
Here one requires knowledge of precedence to realise that the cast to float binds tighter than the division.
return (float) a/x + b/y + c/z + d;
is not correct if you want to return the float value of sum of all divisions. In above expression only a/x is float division and rest of them are int division (because of heiger precedence of / operator than +) which will result in value truncation. Better to stick with
return (double)a/x + (double)b/y + (double)c/z + d;
int a = 10, b = 5, c = 3, d = 1;
int x = 3, y = 2, z = 2;
return (float) a/x + b/y + c/z + d;
This presumably casts our precision to float and then performs our sequence of divisions at floating point precision.
No, it casts a to float and so a/x is performed as a floating point divide, but b/y and c/z are integer divides. Afterwards, the sums are computed after converting the integer division results to float.
This is because casts are simply another operator, and they have higher precedence than + and /. Dividing float by an int or adding a float to an int causes the ints to be automatically converted to floats.
If you want floating point division then you need to insert casts so that they are applied prior to the divisions, and then the other values get automatically promoted.
return (float) a/x + (float) b/y + (float) c/z + d;
Casting using C++ syntax is exactly the same, except the syntax won't let you get confused about what's actually being cast:
return static_cast<float>(a)/x + static_cast<float>(b)/y + static_cast<float>(c)/z + d;
You can also use constructor syntax, which also has the benefit of clearly showing what's cast:
return float(a)/x + float(b)/y + float(c)/z + d;
Or you can simply use temporary variables:
float af = a, bf = b, cf = c;
return af/x + bf/y + cf/z + d;
The cast is only necessary with division operation. And you can lighten syntax this way:
return 1.0*a/x + 1.0*b/y + 1.0*c/z + d;
This will compute the result as double type, that gets automatically casted to float if the function returns this type.

Variable grouping providing different answers in optimized code

I've been attempting to unit test a C++ class I've written for Geodetic transforms.
I've noticed that a trivial grouping change of three variables greatly influences the error in the function.
EDIT : Here is the entire function for a compilable example:
Assume latitude, longitude and altitude are zero. Earth::a = 6378137 and Earth::b = 6356752.3 I'm working on getting benchmark numbers, something came up at work today and I had to do that instead.
void Geodesy::Geocentric2EFG(double latitude, double longitude, double altitude, double *E, double *F, double *G) {
double a2 = pow<double>(Earth::a, 2);
double b2 = pow<double>(Earth::b, 2);
double radius = sqrt((a2 * b2)/(a2 * pow<double>(sin(latitude), 2) + b2 * pow<double>(cos(longitude), 2)));
radius += altitude;
*E = radius * (cos(latitude) * cos(longitude));
*F = radius * (cos(latitude) * sin(longitude));
*G = radius * sin(latitude);
return;
}
Where all values are defined as double including those in Earth. The pow<T>() function is a recursive template function defined by:
template <typename T>
static inline T pow(const T &base, unsigned const exponent) {
return (exponent == 0) ? 1 : (base * pow(base, exponent - 1));
}
The code in question:
*E = radius * cos(latitude) * cos(longitude);
*F = radius * cos(latitude) * sin(longitude);
produces different results than:
*E = radius * (cos(latitude) * cos(longitude));
*F = radius * (cos(latitude) * sin(longitude));
What is the compiler doing in gcc with optimization level 3 to make these results 1e-2 different?
You have different rounding as floating point cannot represent all numbers:
a * b * c; is (a * b) * c which may differ than a * (b * c).
You may have similar issues with addition too.
example with addition:
10e10f + 1.f == 10e10f
so (1.f + 10e10f) - 10e10f == 10e10f - 10e10f == 0.f
whereas 1.f + (10e10f - 10e10f) == 1.f - 0.f == 1.f.

C++ Functions - Error: '0' cannot be used as a function

I'm new to functions and trying to understand what I've done wrong. My build message spits out the error: '0' cannot be used as a function and highlights the line return ((5 / 9)(fahrenheit - 32)); within the function. Thanks in advance for any advice.
#include <iostream>
using namespace std;
double celsiusFunction(double fahrenheit);
int main()
{
double fahrenheitTemp;
fahrenheitTemp = celsiusFunction(99);
cout << fahrenheitTemp;
return 0;
}
double celsiusFunction(double fahrenheit)
{
return ((5 / 9)(fahrenheit - 32));
}
5 / 9 is 0, because both are integers and thus it's evaluated in integer arithmetic. Do this instead: 5.0 / 9.0 to get floating results.
You're not multiplying in the return statement, so the compiler interprets the second parentheses as a funciton call (that is, calling 5 / 9 with arguments fahrenheit - 32). This is of course nonsense. Do this:
return (5.0 / 9.0) * (fahrenheit - 32.0);
((5 / 9)(fahrenheit - 32))
\_____/\_______________/
1 2
2 is interpreted as a function call on 1. You forgot the multiplication:
((5 / 9) * (fahrenheit - 32))
You've forgotten the * operator
You should change return ((5 / 9)(fahrenheit - 32)); to
return ((5 / 9)*(fahrenheit - 32));
Add * after (5/9).
Because of missing * you are getting the error.

Numerical precision for difference of squares

in my code I often compute things like the following piece (here C code for simplicity):
float cos_theta = /* some simple operations; no cosf call! */;
float sin_theta = sqrtf(1.0f - cos_theta * cos_theta); // Option 1
For this example ignore that the argument of the square root might be negative due to imprecisions. I fixed that with additional fdimf call. However, I wondered if the following is more precise:
float sin_theta = sqrtf((1.0f + cos_theta) * (1.0f - cos_theta)); // Option 2
cos_theta is between -1 and +1 so for each choice there will be situations where I subtract similar numbers and thus will loose precision, right? What is the most precise and why?
The most precise way with floats is likely to compute both sin and cos using a single x87 instruction, fsincos.
However, if you need to do the computation manually, it's best to group arguments with similar magnitudes. This means the second option is more precise, especially when cos_theta is close to 0, where precision matters the most.
As the article
What Every Computer Scientist Should Know About Floating-Point Arithmetic notes:
The expression x2 - y2 is another formula that exhibits catastrophic
cancellation. It is more accurate to evaluate it as (x - y)(x + y).
Edit: it's more complicated than this. Although the above is generally true, (x - y)(x + y) is slightly less accurate when x and y are of very different magnitudes, as the footnote to the statement explains:
In this case, (x - y)(x + y) has three rounding errors, but x2 - y2 has only two since the rounding error committed when computing the smaller of x2 and y2 does not affect the final subtraction.
In other words, taking x - y, x + y, and the product (x - y)(x + y) each introduce rounding errors (3 steps of rounding error). x2, y2, and the subtraction x2 - y2 also each introduce rounding errors, but the rounding error obtained by squaring a relatively small number (the smaller of x and y) is so negligible that there are effectively only two steps of rounding error, making the difference of squares more precise.
So option 1 is actually going to be more precise. This is confirmed by dev.brutus's Java test.
I wrote small test. It calcutates expected value with double precision. Then it calculates an error with your options. The first option is better:
Algorithm: FloatTest$1
option 1 error = 3.802792362162126
option 2 error = 4.333273185303996
Algorithm: FloatTest$2
option 1 error = 3.802792362167937
option 2 error = 4.333273185305868
The Java code:
import org.junit.Test;
public class FloatTest {
#Test
public void test() {
testImpl(new ExpectedAlgorithm() {
public double te(double cos_theta) {
return Math.sqrt(1.0f - cos_theta * cos_theta);
}
});
testImpl(new ExpectedAlgorithm() {
public double te(double cos_theta) {
return Math.sqrt((1.0f + cos_theta) * (1.0f - cos_theta));
}
});
}
public void testImpl(ExpectedAlgorithm ea) {
double delta1 = 0;
double delta2 = 0;
for (double cos_theta = -1; cos_theta <= 1; cos_theta += 1e-8) {
double[] delta = delta(cos_theta, ea);
delta1 += delta[0];
delta2 += delta[1];
}
System.out.println("Algorithm: " + ea.getClass().getName());
System.out.println("option 1 error = " + delta1);
System.out.println("option 2 error = " + delta2);
}
private double[] delta(double cos_theta, ExpectedAlgorithm ea) {
double expected = ea.te(cos_theta);
double delta1 = Math.abs(expected - t1((float) cos_theta));
double delta2 = Math.abs(expected - t2((float) cos_theta));
return new double[]{delta1, delta2};
}
private double t1(float cos_theta) {
return Math.sqrt(1.0f - cos_theta * cos_theta);
}
private double t2(float cos_theta) {
return Math.sqrt((1.0f + cos_theta) * (1.0f - cos_theta));
}
interface ExpectedAlgorithm {
double te(double cos_theta);
}
}
The correct way to reason about numerical precision of some expression is to:
Measure the result discrepancy relative to the correct value in ULPs (Unit in the last place), introduced in 1960. by W. H. Kahan. You can find C, Python & Mathematica implementations here, and learn more on the topic here.
Discriminate between two or more expressions based on the worst case they produce, not average absolute error as done in other answers or by some other arbitrary metric. This is how numerical approximation polynomials are constructed (Remez algorithm), how standard library methods' implementations are analysed (e.g. Intel atan2), etc...
With that in mind, version_1: sqrt(1 - x * x) and version_2: sqrt((1 - x) * (1 + x)) produce significantly different outcomes. As presented in the plot below, version_1 demonstrates catastrophic performance for x close to 1 with error > 1_000_000 ulps, while on the other hand error of version_2 is well behaved.
That is why I always recommend using version_2, i.e. exploiting the square difference formula.
Python 3.6 code that produces square_diff_error.csv file:
from fractions import Fraction
from math import exp, fabs, sqrt
from random import random
from struct import pack, unpack
def ulp(x):
"""
Computing ULP of input double precision number x exploiting
lexicographic ordering property of positive IEEE-754 numbers.
The implementation correctly handles the special cases:
- ulp(NaN) = NaN
- ulp(-Inf) = Inf
- ulp(Inf) = Inf
Author: Hrvoje Abraham
Date: 11.12.2015
Revisions: 15.08.2017
26.11.2017
MIT License https://opensource.org/licenses/MIT
:param x: (float) float ULP will be calculated for
:returns: (float) the input float number ULP value
"""
# setting sign bit to 0, e.g. -0.0 becomes 0.0
t = abs(x)
# converting IEEE-754 64-bit format bit content to unsigned integer
ll = unpack('Q', pack('d', t))[0]
# computing first smaller integer, bigger in a case of ll=0 (t=0.0)
near_ll = abs(ll - 1)
# converting back to float, its value will be float nearest to t
near_t = unpack('d', pack('Q', near_ll))[0]
# abs takes care of case t=0.0
return abs(t - near_t)
with open('e:/square_diff_error.csv', 'w') as f:
for _ in range(100_000):
# nonlinear distribution of x in [0, 1] to produce more cases close to 1
k = 10
x = (exp(k) - exp(k * random())) / (exp(k) - 1)
fx = Fraction(x)
correct = sqrt(float(Fraction(1) - fx * fx))
version1 = sqrt(1.0 - x * x)
version2 = sqrt((1.0 - x) * (1.0 + x))
err1 = fabs(version1 - correct) / ulp(correct)
err2 = fabs(version2 - correct) / ulp(correct)
f.write(f'{x},{err1},{err2}\n')
Mathematica code that produces the final plot:
data = Import["e:/square_diff_error.csv"];
err1 = {1 - #[[1]], #[[2]]} & /# data;
err2 = {1 - #[[1]], #[[3]]} & /# data;
ListLogLogPlot[{err1, err2}, PlotRange -> All, Axes -> False, Frame -> True,
FrameLabel -> {"1-x", "error [ULPs]"}, LabelStyle -> {FontSize -> 20}]
As an aside, you will always have a problem when theta is small, because the cosine is flat around theta = 0. If theta is between -0.0001 and 0.0001 then cos(theta) in float is exactly one, so your sin_theta will be exactly zero.
To answer your question, when cos_theta is close to one (corresponding to a small theta), your second computation is clearly more accurate. This is shown by the following program, that lists the absolute and relative errors for both computations for various values of cos_theta. The errors are computed by comparing against a value which is computed with 200 bits of precision, using GNU MP library, and then converted to a float.
#include <math.h>
#include <stdio.h>
#include <gmp.h>
int main()
{
int i;
printf("cos_theta abs (1) rel (1) abs (2) rel (2)\n\n");
for (i = -14; i < 0; ++i) {
float x = 1 - pow(10, i/2.0);
float approx1 = sqrt(1 - x * x);
float approx2 = sqrt((1 - x) * (1 + x));
/* Use GNU MultiPrecision Library to get 'exact' answer */
mpf_t tmp1, tmp2;
mpf_init2(tmp1, 200); /* use 200 bits precision */
mpf_init2(tmp2, 200);
mpf_set_d(tmp1, x);
mpf_mul(tmp2, tmp1, tmp1); /* tmp2 = x * x */
mpf_neg(tmp1, tmp2); /* tmp1 = -x * x */
mpf_add_ui(tmp2, tmp1, 1); /* tmp2 = 1 - x * x */
mpf_sqrt(tmp1, tmp2); /* tmp1 = sqrt(1 - x * x) */
float exact = mpf_get_d(tmp1);
printf("%.8f %.3e %.3e %.3e %.3e\n", x,
fabs(approx1 - exact), fabs((approx1 - exact) / exact),
fabs(approx2 - exact), fabs((approx2 - exact) / exact));
/* printf("%.10f %.8f %.8f %.8f\n", x, exact, approx1, approx2); */
}
return 0;
}
Output:
cos_theta abs (1) rel (1) abs (2) rel (2)
0.99999988 2.910e-11 5.960e-08 0.000e+00 0.000e+00
0.99999970 5.821e-11 7.539e-08 0.000e+00 0.000e+00
0.99999899 3.492e-10 2.453e-07 1.164e-10 8.178e-08
0.99999684 2.095e-09 8.337e-07 0.000e+00 0.000e+00
0.99998999 1.118e-08 2.497e-06 0.000e+00 0.000e+00
0.99996835 6.240e-08 7.843e-06 9.313e-10 1.171e-07
0.99989998 3.530e-07 2.496e-05 0.000e+00 0.000e+00
0.99968380 3.818e-07 1.519e-05 0.000e+00 0.000e+00
0.99900001 1.490e-07 3.333e-06 0.000e+00 0.000e+00
0.99683774 8.941e-08 1.125e-06 7.451e-09 9.376e-08
0.99000001 5.960e-08 4.225e-07 0.000e+00 0.000e+00
0.96837723 1.490e-08 5.973e-08 0.000e+00 0.000e+00
0.89999998 2.980e-08 6.837e-08 0.000e+00 0.000e+00
0.68377221 5.960e-08 8.168e-08 5.960e-08 8.168e-08
When cos_theta is not close to one, then the accuracy of both methods is very close to each other and to round-off error.
[Edited for major think-o] It looks to me like option 2 will be better, because for a number like 0.000001 for example option 1 will return the sine as 1 while option will return a number just smaller than 1.
No difference in my option since (1-x) preserves the precision not effecting the carried bit. Then for (1+x) the same is true. Then the only thing effecting the carry bit precision is the multiplication. So in both cases there is one single multiplication, so they are both as likely to give the same carry bit error.

Magic numbers in C++ implementation of Excel NORMDIST function

Whilst looking for a C++ implementation of Excel's NORMDIST (cumulative)
function I found this on a website:
static double normdist(double x, double mean, double standard_dev)
{
double res;
double x=(x - mean) / standard_dev;
if (x == 0)
{
res=0.5;
}
else
{
double oor2pi = 1/(sqrt(double(2) * 3.14159265358979323846));
double t = 1 / (double(1) + 0.2316419 * fabs(x));
t *= oor2pi * exp(-0.5 * x * x)
* (0.31938153 + t
* (-0.356563782 + t
* (1.781477937 + t
* (-1.821255978 + t * 1.330274429))));
if (x >= 0)
{
res = double(1) - t;
}
else
{
res = t;
}
}
return res;
}
My limited maths knowledge made me think about Taylor series, but I am unable to determine where these numbers come from:
0.2316419,
0.31938153,
-0.356563782,
1.781477937,
-1.821255978,
1.330274429
Can anyone suggest where they come from, and how they can be derived?
Check out Numerical Recipes, chapter 6.2.2. The approximation is standard. Recall that
NormCdf(x) = 0.5 * (1 + erf(x / sqrt(2)))
erf(x) = 2 / (sqrt(pi)) integral(e^(-t^2) dt, t = 0..x)
and write erf as
1 - erf x ~= t * exp(-x^2 + P(t))
for positive x, where
t = 2 / (2 + x)
and since t is between 0 and 1, you can find P by Chebyshev approximation once and for all (Numerical Recipes, section 5.8). You don't use Taylor expansion: you want the approximation to be good in the whole real line, what Taylor expansion cannot guarantee. Chebyshev approximation is the best polynomial approximation in the L^2 norm, which is a good substitute to the very difficult to find minimax polynomial (= best polynomial approximation in the sup norm).
The version here is slightly different. Instead, one writes
1 - erf x = t * exp(-x^2) * P(t)
but the procedure is similar, and normCdf is computed directly, instead of erf.
In particular and very similarly 'the implementation' that you are using differs somewhat from the one that handles in the text, because it is of the form b*exp(-a*z^2)*y(t) but it´s also a Chevishev approx. to the erfc(x) function as you can see in this paper of Schonfelder(1978)[http://www.ams.org/journals/mcom/1978-32-144/S0025-5718-1978-0494846-8/S0025-5718-1978-0494846-8.pdf ]
Also in Numerical Recipes 3rd edition, at the final of the chapter 6.2.2 they provide a C implementation very accurate of the type t*exp(-z^2 + c0 + c1*t+ c2t^2 + c3*t^3 + ... + c9t^9)