Problem with pow giving me an infinite answer - c++

I'm trying to compute a discount factor given a rate and the number of payments to be made per year.
#include <iostream>
using namespace std;
int main()
{
uint32_t due_dates_per_year = 2;
double t = 1;
double rate = 0.05;
double df = pow(1 + rate / due_dates_per_year, -due_dates_per_year * t);
cout << df;
return 0;
}
The output is "inf" but it should be a little less than 1.
Does anyone know what's going on?

The problem is in the exponent (i.e. the second parameter) of std::pow. The expression
-due_dates_per_year * t
is grouped as (-due_dates_per_year) * t (that's how the C++ grammar parses the expression). The unary negation of the unsigned type produces a large (and also unsigned) number which explains the result.
Other than using a signed type for due_dates_per_year, the rearrangement
double df = std::pow(1 + rate / due_dates_per_year, -t * due_dates_per_year);
is a fix.

Related

Simpson's Composite Rule giving too large values for when n is very large

Using Simpson's Composite Rule to calculate the integral from 2 to 1,000 of 1/ln(x), however when using a large n (usually around 500,000), I start to get results that vary from the value my calculator and other sources give me (176.5644). For example, when n = 10,000,000, it gives me a value of 184.1495. Wondering why this is, since as n gets larger, the accuracy is supposed to increase and not decrease.
#include <iostream>
#include <cmath>
// the function f(x)
float f(float x)
{
return (float) 1 / std::log(x);
}
float my_simpson(float a, float b, long int n)
{
if (n % 2 == 1) n += 1; // since n has to be even
float area, h = (b-a)/n;
float x, y, z;
for (int i = 1; i <= n/2; i++)
{
x = a + (2*i - 2)*h;
y = a + (2*i - 1)*h;
z = a + 2*i*h;
area += f(x) + 4*f(y) + f(z);
}
return area*h/3;
}
int main()
{
std::cout.precision(20);
int upperBound = 1'000;
int subsplits = 1'000'000;
float approx = my_simpson(2, upperBound, subsplits);
std::cout << "Output: " << approx << std::endl;
return 0;
}
Update: Switched from floats to doubles and works much better now! Thank you!
Unlike a real (in mathematical sense) number, a float has a limited precision.
A typical IEEE 754 32-bit (single precision) floating-point number binary representation dedicates only 24 bits (one of which is implicit) to the mantissa and that translates in roughly less than 8 decimal significant digits (please take this as a gross semplification).
A double on the other end, has 53 significand bits, making it more accurate and (usually) the first choice for numerical computations, these days.
since as n gets larger, the accuracy is supposed to increase and not decrease.
Unfortunately, that's not how it works. There's a sweat spot, but after that the accumulation of rounding errors prevales and the results diverge from their expected values.
In OP's case, this calculation
area += f(x) + 4*f(y) + f(z);
introduces (and accumulates) rounding errors, due to the fact that area becomes much greater than f(x) + 4*f(y) + f(z) (e.g 224678.937 vs. 0.3606823). The bigger n is, the sooner this gets relevant, making the result diverging from the real one.
As mentioned in the comments, another issue (undefined behavior) is that area isn't initialized (to zero).

How does Cpp work with large numbers in calculations?

I have a code that tries to solve an integral of a function in a given interval numerically, using the method of Trapezoidal Rule (see the formula in Trapezoid method ), now, for the function sin(x) in the interval [-pi/2.0,pi/2.0], the integral is waited to be zero.
In this case, I take the number of partitions 'n' equal to 4. The problem is that when I have pi with 20 decimal places it is zero, with 14 decimal places it is 8.72e^(-17), then with 11 decimal places, it is zero, with 8 decimal places it is 8.72e^(-17), with 3 decimal places it is zero. I mean, the integral is zero or a number near zero for different approximations of pi, but it doesn't have a clear trend.
I would appreciate your help in understanding why this happens. (I did run it in Dev-C++).
#include <iostream>
#include <math.h>
using namespace std;
#define pi 3.14159265358979323846
//Pi: 3.14159265358979323846
double func(double x){
return sin(x);
}
int main() {
double x0 = -pi/2.0, xf = pi/2.0;
int n = 4;
double delta_x = (xf-x0)/(n*1.0);
double sum = (func(x0)+func(xf))/2.0;
double integral;
for (int k = 1; k<n; k++){
// cout<<"func: "<<func(x0+(k*delta_x))<<" "<<"last sum: "<<sum<<endl;
sum = sum + func(x0+(k*delta_x));
// cout<<"func + last sum= "<<sum<<endl;
}
integral = delta_x*sum;
cout<<"The value for the integral is: "<<integral<<endl;
return 0;
}
OP is integrating y=sin(x) from -a to +a. The various tests use different values of a, all near pi/2.
The approach uses a linear summation of values near -1.0, down to 0 and then up to near 1.0.
This summation is sensitive to calculation error with the last terms as the final math sum is expected to be 0.0. Since the start/end a varies, the error varies.
A more stable result would be had adding the extreme f = sin(f(k)) values first. e.g. sum += sin(f(k=1)), then sum += sin(f(k=3)), then sum += sin(f(k=2)) rather than k=1,2,3. In particular the formation of term x=f(k=3) is likely a bit off from the negative of its x=f(k=1) earlier term, further compounding the issue.
Welcome to the world or numerical analysis.
Problem exists if code used all float or all long double, just different degrees.
Problem is not due to using an inexact value of pi (Exact value is impossible with FP as pi is irrational and all finite FP are rational).
Much is due to the formation of x. Could try the below to form the x symmetrically about 0.0. Compare exactly x generated this way to x the original way.
x = (x0-x1)/2 + ((k - n/2)*delta_x)
Print out the exact values computed for deeper understanding.
printf("x:%a y:%a\n", x0+(k*delta_x), func(x0+(k*delta_x)));

Fastest way to find complex roots

Using a + i b = sqrt(a*a + b*b) * exp(i arctan2(a,b)) I arrive at the following way to compute complex roots. However, I heard that trigonometric functions rather use up performance so I wonder if there is a better way in vanilla c++ (no external libaries).
Example: Let u+iv = sqrt(a+i b)
#include <math.h>
#include <iostream>
int main(){
double a = -1.;
double b = 0;
double r = sqrt(sqrt(a*a+b*b));
double phi = 0.5 * atan2(b, a);
double u = r * cos(phi);
double v = r * sin(phi);
std::cout << u << std::endl;
std::cout << v << "i" << std::endl;
}
This is just meant as a MWE, so it's not written in a class or method.
Yes there is! I'm going to link a good explanation of the process here, but it looks like this can be accomplished by only calculating the magnitude of the original number and subtracting out the real portion of the original number and finally taking the square root of that to find the imaginary part of the square root. The real part can be found by dividing the imaginary part of the original number by 2 * the imaginary part of the root to get your final answer.
https://www.qc.edu.hk/math/Advanced%20Level/Finding%20the%20square%20root%20of%20a%20complex%20number.htm
Let me know if you need more help with the code but this requires no trig functions.

Function for calculating Pi using taylor series in c++

So I'm at a loss on why my code isn't working, essentially the function I am writing calculates an estimate for Pi using the taylor series, it just crashes whenever I try run the program.
here is my code
#include <iostream>
#include <math.h>
#include <stdlib.h>
using namespace std;
double get_pi(double accuracy)
{
double estimate_of_pi, latest_term, estimated_error;
int sign = -1;
int n;
estimate_of_pi = 0;
n = 0;
do
{
sign = -sign;
estimated_error = 4 * abs(1.0 / (2*n + 1.0)); //equation for error
latest_term = 4 * (1.0 *(2.0 * n + 1.0)); //calculation for latest term in series
estimate_of_pi = estimate_of_pi + latest_term; //adding latest term to estimate of pi
n = n + 1; //changing value of n for next run of the loop
}
while(abs(latest_term)< estimated_error);
return get_pi(accuracy);
}
int main()
{
cout << get_pi(100);
}
the logic behind the code is the following:
define all variables
set estimate of pi to be 0
calculate a term from the taylor series and calculate the error in
this term
it then adds the latest term to the estimate of pi
the program should then work out the next term in the series and the error in it and add it to the estimate of pi, until the condition in the while statement is satisfied
Thanks for any help I might get
There are several errors in your function. See my comments with lines starting with "//NOTE:".
double get_pi(double accuracy)
{
double estimate_of_pi, latest_term, estimated_error;
int sign = -1;
int n;
estimate_of_pi = 0;
n = 0;
do
{
sign = -sign;
//NOTE: This is an unnecessary line.
estimated_error = 4 * abs(1.0 / (2*n + 1.0)); //equation for error
//NOTE: You have encoded the formula incorrectly.
// The RHS needs to be "sign*4 * (1.0 /(2.0 * n + 1.0))"
// ^^^^ ^
latest_term = 4 * (1.0 *(2.0 * n + 1.0)); //calculation for latest term in series
estimate_of_pi = estimate_of_pi + latest_term; //adding latest term to estimate of pi
n = n + 1; //changing value of n for next run of the loop
}
//NOTE: The comparison is wrong.
// The conditional needs to be "fabs(latest_term) > estimated_error"
// ^^^^ ^^^
while(abs(latest_term)< estimated_error);
//NOTE: You are calling the function again.
// This leads to infinite recursion.
// It needs to be "return estimate_of_pi;"
return get_pi(accuracy);
}
Also, the function call in main is wrong. It needs to be:
get_pi(0.001)
to indicate that if the absolute value of the term is less then 0.001, the function can return.
Here's an updated version of the function that works for me.
double get_pi(double accuracy)
{
double estimate_of_pi, latest_term;
int sign = -1;
int n;
estimate_of_pi = 0;
n = 0;
do
{
sign = -sign;
latest_term = sign * 4 * (1.0 /(2.0 * n + 1.0)); //calculation for latest term in series
estimate_of_pi += latest_term; //adding latest term to estimate of pi
++n; //changing value of n for next run of the loop
}
while(fabs(latest_term) > accuracy);
return estimate_of_pi;
}
Your return statement may be the cause.
Try returning "estimate_of_pi" instead of get_pi(accuracy).
Your break condition can be rewritten as
2*n + 1 < 1/(2*n + 1) => (2*n + 1)^2 < 1
and this will never be true for any positive n. Thus your loop will never end. After fixing this you should change the return statement to
return estimated_error;
You currently are calling the function recursively without an end (assuming you fixed the stop condition).
Moreoever you have a sign and the parameter accuracy that you do not use at all in the calculation.
My advice for such iterations would be to always break on some maximum number of iterations. In this case you know it converges (assuming you fix the maths), but in general you can never be sure that your iteration converges.

Is there any "standard" way to calculate the numerical gradient?

I am trying to calculate the numerical gradient of a smooth function in c++. And the parameter value could vary from zero to a very large number(maybe 1e10 to 1e20?)
I used the function f(x,y) = 10*x^3 + y^3 as a testbench, but I found that if x or y is too large, I can't get correct gradient.
Here is my code to calculate the graidient:
#include <iostream>
#include <cmath>
#include <cassert>
using namespace std;
double f(double x, double y)
{
// black box expensive function
return 10 * pow(x, 3) + pow(y, 3);
}
int main()
{
// double x = -5897182590.8347721;
// double y = 269857217.0017581;
double x = 1.13041e+19;
double y = -5.49756e+14;
const double epsi = 1e-4;
double f1 = f(x, y);
double f2 = f(x, y+epsi);
double f3 = f(x, y-epsi);
cout << f1 << endl;
cout << f2 << endl;
cout << f3 << endl;
cout << f1 - f2 << endl; // 0
cout << f2 - f3 << endl; // 0
return 0;
}
If I use the above code to calculate the gradient, the gradient would be zero!
The testbench function, 10*x^3 + y^3, is just a demo, the real problem I need to solve is actually a black box function.
So, is there any "standard" way to calculate the numerical gradient?
In the first place, you should use the central difference scheme, which is more accurate (by cancellation of one more term of the Taylor develoment).
(f(x + h) - f(x - h)) / 2h
rather than
(f(x + h) - f(x)) / h
Then the choice of h is critical and using a fixed constant is the worst thing you can do. Because for small x, h will be too large so that the approximation formula no more works, and for large x, h will be too small, resulting in severe truncation error.
A much better choice is to take a relative value, h = x√ε, where ε is the machine epsilon (1 ulp), which gives a good tradeoff.
(f(x(1 + √ε)) - f(x(1 - √ε))) / 2x√ε
Beware that when x = 0, a relative value cannot work and you need to fall back to a constant. But then, nothing tells you which to use !
You need to consider the precision needed.
At first glance, since |y| = 5.49756e14 and epsi = 1e-4, you need at least ⌈log2(5.49756e14)-log2(1e-4)⌉ = 63 bits of significand precision (that is the number of bits used to encode the digits of your number, also known as mantissa) for y and y+epsi to be considered different.
The double-precision floating-point format only has 53 bits of significand precision (assuming it is 8 bytes). So, currently, f1, f2 and f3 are exactly the same because y, y+epsi and y-epsi are equal.
Now, let's consider the limit : y = 1e20, and the result of your function, 10x^3 + y^3. Let's ignore x for now, so let's take f = y^3. Now we can calculate the precision needed for f(y) and f(y+epsi) to be different : f(y) = 1e60 and f(epsi) = 1e-12. This gives a minimum significand precision of ⌈log2(1e60)-log2(1e-12)⌉ = 240 bits.
Even if you were to use the long double type, assuming it is 16 bytes, your results would not differ : f1, f2 and f3 would still be equal, even though y and y+epsi would not.
If we take x into account, the maximum value of f would be 11e60 (with x = y = 1e20). So the upper limit on precision is ⌈log2(11e60)-log2(1e-12)⌉ = 243 bits, or at least 31 bytes.
One way to solve your problem is to use another type, maybe a bignum used as fixed-point.
Another way is to rethink your problem and deal with it differently. Ultimately, what you want is f1 - f2. You can try to decompose f(y+epsi). Again, if you ignore x, f(y+epsi) = (y+epsi)^3 = y^3 + 3*y^2*epsi + 3*y*epsi^2 + epsi^3. So f(y+epsi) - f(y) = 3*y^2*epsi + 3*y*epsi^2 + epsi^3.
The only way to calculate gradient is calculus.
Gradient is a vector:
g(x, y) = Df/Dx i + Df/Dy j
where (i, j) are unit vectors in x and y directions, respectively.
One way to approximate derivatives is first order differences:
Df/Dx ~ (f(x2, y)-f(x1, y))/(x2-x1)
and
Df/Dy ~ (f(x, y2)-f(x, y1))/(y2-y1)
That doesn't look like what you're doing.
You have a closed form expression:
g(x, y) = 30*x^2 i + 3*y^2 j
You can plug in values for (x, y) and calculate the gradient exactly at any point. Compare that to your differences and see how well your approximation is doing.
How you implement it numerically is your responsibility. (10^19)^3 = 10^57, right?
What is the size of double on your machine? Is it a 64 bit IEEE double precision floating point number?
Use
dx = (1+abs(x))*eps, dfdx = (f(x+dx,y) - f(x,y)) / dx
dy = (1+abs(y))*eps, dfdy = (f(x,y+dy) - f(x,y)) / dy
to get meaningful step sizes for large arguments.
Use eps = 1e-8 for one-sided difference formulas, eps = 1e-5 for central difference quotients.
Explore automatic differentiation (see autodiff.org) for derivatives without difference quotients and thus much smaller numerical errors.
We can examine the behaviour of the error in the derivative using the following program - it calculates the 1-sided derivative and the central difference based derivative using a varying step size. Here I'm using x and y ~ 10^10, which is smaller than what you were using, but should illustrate the same point.
#include <iostream>
#include <cmath>
#include <cassert>
using namespace std;
double f(double x, double y) {
return 10 * pow(x, 3) + pow(y, 3);
}
double f_x(double x, double y) {
return 3 * 10 * pow(x,2);
}
double f_y(double x, double y) {
return 3 * pow(y,2);
}
int main()
{
// double x = -5897182590.8347721;
// double y = 269857217.0017581;
double x = 1.13041e+10;
double y = -5.49756e+10;
//double x = 10.1;
//double y = -5.2;
double epsi = 1e8;
for(int i=0; i<60; ++i) {
double dfx_n = (f(x+epsi,y) - f(x,y))/epsi;
double dfx_cd = (f(x+epsi,y) - f(x-epsi,y))/(2*epsi);
double dfx = f_x(x,y);
cout<<epsi<<" "<<fabs(dfx-dfx_n)<<" "<<fabs(dfx - dfx_cd)<<std::endl;
epsi/=1.5;
}
return 0;
}
The output shows that a 1-sided difference gets us an optimal error of about 1.37034e+13 at a step length of about 100.0. Note that while this error looks large, as a relative error it is 3.5746632302764072e-09 (since the exact value is 3.833e+21)
In comparison the 2-sided difference gets an optimal error of about 1.89493e+10 with a step size of about 45109.3. This is three-orders of magnitude better, (with a much larger step-size).
How can we work out the step size? The link in the comments of Yves Daosts answer gives us a ballpark value:
h=x_c sqrt(eps) for 1-Sided, and h=x_c cbrt(eps) for 2-Sided.
But either way, if the required step size for decent accuracy at x ~ 10^10 is 100.0, the required step size with x ~ 10^20 is going to be 10^10 larger too. So the problem is simply that your step size is way too small.
This can be verified by increasing the starting step-size in the above code and resetting the x/y values to the original values.
Then expected derivative is O(1e39), best 1-sided error of about O(1e31) occurs near a step length of 5.9e10, best 2-sided error of about O(1e29) occurs near a step length of 6.1e13.
As numerical differentiation is ill conditioned (which means a small error could alter your result significantly) you should consider to use Cauchy's integral formula. This way you can calculate the n-th derivative with an integral. This will lead to less problems with considering accuracy and stability.