Divide by zero prevention - c++

What is 1.#INF and why does casting to a float or double prevent a division by 0 of crashing?
Also, any great ideas of how to prevent division by 0? (Like any macro or template)?
int nQuota = 0;
int nZero = 3 / nQuota; //crash
cout << nZero << endl;
float fZero = 2 / nQuota; //crash
cout << fZero << endl;
if I use instead:
int nZero = 3 / (float)nQuota;
cout << nZero << endl;
//Output = -2147483648
float fZero = 2 / (float)nQuota;
cout << fZero << endl;
//Output = 1.#INF

1.#INF is positive infinity. You will get it when you divide a positive float by zero (if you divide the float zero itself by zero, then the result will be "not a number").
On the other hand, if you divide an integer by zero, the program will crash.
The reason float fZero = 2 / nQuota; crashes is because both operands of the / operator are integers, so the division is performed on integers. It doesn't matter that you then store the result in a float; C++ has no notion of target typing.
Why positive infinity cast to an integer is the smallest integer, I have no idea.

Wwhy using (float) or (double) prevents a division by 0 of crashing?
It doesn't necessarily. The standard is amazing spare when it comes to floating point. Most systems use the IEEE floating point standard nowadays, and that says that the default action for division by zero is to return ±infinity rather than crash. You can make it crash by enabling the appropriate floating point exceptions.
Note well: The only thing the floating point exception model and the C++ exception model have in common is the word "exception". On every machine I work on, a floating point exception does not throw a C++ exception.
Also, any great ideas of how to prevent division by 0?
Simple answer: Don't do it.
This is one of those "Doctor, doctor it hurts when I do this!" kinds of situations. So don't do it!
Make sure that the divisor is not zero.
Do sanity checks on divisors that are user inputs. Always filter your user inputs for sanity. A user input value of zero when the number is supposed to be in the millions will cause all kinds of havoc besides overflow. Do sanity checks on intermediate values.
Enable floating point exceptions.
Making the default behavior to allow errors (and these are almost always errors) to go through unchecked was IMHO a big mistake on the part of the standards committee. Use the default and those infinities and not-a-numbers will eventually turn everything into an Inf or a NaN.
The default should have been to stop floating point errors in their tracks, with an option to allow things like 1.0/0.0 and 0.0/0.0 to take place. That isn't the case, so you have to enable those traps. Do that and you can oftentimes find the cause of the problem in short order.
Write custom divide, custom multiply, custom square root, custom sine, ... functions.
This unfortunately is the route that many safety critical software systems must take. It is a royal pain. Option #1 is out because it's just wishful thinking. Option #3 is out because the system cannot be allowed to crash. Option #2 is still a good idea, but it doesn't always work because bad data always has a way of sneaking in. It's Murphy's law.
BTW, the problem is a bit worse than just division by zero. 10200/10-200 will also overflow.

You usually check to make sure you aren't dividing by zero. The code below isn't particularly useful unless nQuota has a legitimate value but it does prevent crashes
int nQuota = 0;
int nZero = 0;
float fZero = 0;
if (nQuota)
nZero = 3 / nQuota;
cout << nZero << endl;
if (nQuota)
fZero = 2 / nQuota;
cout << fZero << endl;

Related

How to add two large double precision numbers in c++

I have the following piece of code
#include <iostream>
#include <iomanip>
int main()
{
double x = 7033753.49999141693115234375;
double y = 7033753.499991415999829769134521484375;
double z = (x+ y)/2.0;
std::cout << "y is " << std::setprecision(40) << y << "\n";
std::cout << "x is " << std::setprecision(40) << x << "\n";
std::cout << "z is " << std::setprecision(40) << z << "\n";
return 0;
}
When the above code is run I get,
y is 7033753.499991415999829769134521484375
x is 7033753.49999141693115234375
z is 7033753.49999141693115234375
When I do the same in Wolfram Alpha the value of z is completely different
z = 7033753.4999914164654910564422607421875 #Wolfram answer
I am familiar with floating point precision and that large numbers away from zero can not be exactly represented. Is that what is happening here? Is there anyway in c++ where I can get the same answer as Wolfram without any performance penalty?
large numbers away from zero can not be exactly represented. Is that what is happening here?
Yes.
Note that there are also infinitely many rational numbers that cannot be represented near zero as well. But the distance between representable values does grow exponentially in larger value ranges.
Is there anyway in c++ where I can get the same answer as Wolfram ...
You can potentially get the same answer by using long double. My system produces exactly the same result as Wolfram. Note that precision of long double varies between systems even among systems that conform to IEEE 754 standard.
More generally though, if you need results that are accurate to many significant digits, then don't use finite precision math.
... without any performance penalty?
No. Precision comes with a cost.
Just telling IOStreams to print to 40 significant decimal figures of precision, doesn't mean that the value you're outputting actually has that much precision.
A typical double takes you up to 17 significant decimal figures (ish); beyond that, what you see is completely arbitrary.
Per eerorika's answer, it looks like the Wolfram Alpha answer is also falling foul of this, albeit possibly with some different precision limit than yours.
You can try a different approach like a "bignum" library, or limit yourself to the precision afforded by the types that you've chosen.

Generic way of handling fused-multiply-add floating-point inaccuracies

Yesterday I was tracking a bug in my project, which - after several hours - I've narrowed down to a piece of code which more or less was doing something like this:
#include <iostream>
#include <cmath>
#include <cassert>
volatile float r = -0.979541123;
volatile float alpha = 0.375402451;
int main()
{
float sx = r * cosf(alpha); // -0.911326
float sy = r * sinf(alpha); // -0.359146
float ex = r * cosf(alpha); // -0.911326
float ey = r * sinf(alpha); // -0.359146
float mx = ex - sx; // should be 0
float my = ey - sy; // should be 0
float distance = sqrtf(mx * mx + my * my) * 57.2958f; // should be 0, gives 1.34925e-06
// std::cout << "sv: {" << sx << ", " << sy << "}" << std::endl;
// std::cout << "ev: {" << ex << ", " << ey << "}" << std::endl;
// std::cout << "mv: {" << mx << ", " << my << "}" << std::endl;
std::cout << "distance: " << distance << std::endl;
assert(distance == 0.f);
// assert(sx == ex && sy == ey);
// assert(mx == 0.f && my == 0.f);
}
After compilation and execution:
$ g++ -Wall -Wextra -Wshadow -march=native -O2 vfma.cpp && ./a.out
distance: 1.34925e-06
a.out: vfma.cpp:23: int main(): Assertion `distance == 0.f' failed.
Aborted (core dumped)
From my point of view something is wrong, as I've asked for 2 subtractions of two bitwise-identical pairs (I expected to get two zeroes), then squaring them (two zeroes again) and adding them together (zero).
It turns out that the root cause of problem is the use of fused-multiply-add operation, which somewhere along the line makes the result inexact (from my point of view). Generally I have nothing against this optimization, as it promises to give results which are more exact, but in this case 1.34925e-06 is really far from the 0 that I was expecting.
The test case is very "fragile" - if you enable more prints or more asserts, it stops asserting, because compiler doesn't use fused-multiply-add anymore. For example if I uncomment all lines:
$ g++ -Wall -Wextra -Wshadow -march=native -O2 vfma.cpp && ./a.out
sv: {-0.911326, -0.359146}
ev: {-0.911326, -0.359146}
mv: {0, 0}
distance: 0
As I've considered this to be a bug in the compiler, I've reported that, but it got closed with the explanation that this is correct behaviour.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79436
So I'm wondering - how should one code such calculations to avoid the problem? I was thinking about a generic solution, but something better than:
mx = ex != sx ? ex - sx : 0.f;
I would like to fix or improve my code - if there's anything to fix/improve - instead of setting -ffp-contract=off for my whole project, as fused-multiply-add is used internally in the compiler libraries anyway (I see a lot of that in sinf() and cosf()), so it would be a "partial work-around", not a solution... I would also like to avoid solutions like "don't use floating-point" (;
In general no: this is exactly the price you pay for using -ffp-contract=fast (coincidently, it is precisely this example that William Kahan notes in the problems with automatic contraction)
Theoretically, if you were using C (not C++), and your compiler supported C-1999 pragmas (i.e. not gcc), you could use
#pragma STDC FP_CONTRACT OFF
// non-contracted code
#pragma STDC FP_CONTRACT ON
Interestingly, thanks to fma, the floats mx and my gives you the rounding error that was made when multiplying r and cos.
fma( r,cos, -r*cos) = theoretical(r*cos) - float(r*cos)
So the result you get somehow indicates how far was the computed (sx,sy) from the theoretical (sx,sy), due to multiplication of floats (but not accounting from rounding errors in computations of cos and sin).
So the question is how can your program rely on a difference (ex-sx,ey-sy) which is within the uncertainty interval related to floating point rounding?
I can see this question has been around for a while, but in case other folks come across it looking for an answer I figured I'd mention a couple of points..
First, it's difficult to tell exactly without analyzing the resulting assembly code, but I suspect the reason that the FMA gives a result that is so far outside expectations is not just the FMA itself, but also that you are assuming all of the calculations are being done in the order you have specified them, but with optimizing C/C++ compilers this is often not the case. This is also likely why uncommenting the print statements changes the results.
If mx and my were being calculated as the comments suggest, then even if the final mx*mx + my*my were done with an FMA, it would still result in the expected 0 result. The problem is that since none of the sx/sy/ex/ey/mx/my variables are used by anything else, there is a good possibility that the compiler never actually evaluates them as independent variables at all, and simply mushes all the math together into a big mass of multiplications, adds, and subtracts to calculate distance in one single step, which can then be represented any number of different ways in machine code (in any order, potentially with multiple FMAs, etc) however it figures it will get the best performance for that one big calculation.
However, if something else (like a print statement) references mx and my, then it's much more likely the compiler will calculate them separately, before calculating distance as a second step. In that case, the math does work out the way the comments suggest, and even an FMA in the final distance calculation doesn't change the results (because the inputs are all exactly 0).
The Answer
But that doesn't actually answer the real question. In answer to that, the most robust (and generally recommended) way to avoid this sort of problem in general is: Never assume floating point operations will ever produce an exact number, even if that number is 0. This means that, in general, it's a bad idea to ever use == to compare floating point numbers. Instead, you should choose a small number (often referred to as epsilon), which is larger than any possible/likely accumulated error, but still smaller than any significant result (for example, if you know that the distances you care about are only really significant to a couple of decimal places, then you could choose EPSILON = 0.01, which will mean "any difference less than 0.01 we'll consider to be the same as zero"). Then, instead of saying:
assert(distance == 0.f);
you would say:
assert(distance < EPSILON);
(the exact value for your epsilon will likely depend on the application, and may even be different for different types of calculations, of course)
Likewise, instead of saying something like if (a == b) for floating point numbers, you would instead say something like if (abs(a - b) < EPSILON), etc.
Another way to reduce (but not necessarily eliminate) this problem is to implement "fail-fast" logic in your application. For example, in the above code, instead of going all the way through and calculating distance and then seeing if it's 0 at the end, you could "short-circuit" some of the math by testing if (mx < EPSILON && my < EPSILON) before you even get to the point of calculating distance and skipping the rest if they're both zero (since you know the result will be zero in that case). The quicker you catch the situation the less opportunity there is for errors to accumulate (and sometimes you can also avoid doing some more costly calculations in cases you don't need to).

What is wrong with this while-loop (C++)?

I was pretty sure I was doing this the right way, but apparently I'm not. This while loops keeps running infinitely once it reaches 0. It keeps outputting "Monthly Payment: $0.00" and "Your loan after that payment is: $0.00" over and over again. What am I doing wrong?
while (loan_balance ! = 0)
{
monthly_payment = loan_balance / 20;
loan_balance = loan_balance - monthly_payment;
cout << "Monthly Payment: $" << monthly_payment << ends;
cout << "Your loan balance after that payment is: $" << loan_balance << endl;
}
If load_balance is a floating point type (float or double), then load_balance != 0 (where 0 is 0.0f) will likely never be false unless it is explicitly set to load_balance = 0.0f. So it should be compared to a small threshold instead, e.g.
while(load_balance >= 1e-4)
Also the not-equal operator is !=, with a space ! = is doesn't work.
loan_balance is probably a float or double; it might decrease to be something close to but not quite 0. Change your comparison to "> 0".
0 is a data type of type int
0.0f is a data type of type double or float
So you're saying
while(loan_balance != 0)
do stuff
The compiler in return says: "loan_balance will never ever be 0 if it is a double/float, so I'll just keep doing stuff."
Keep in mind: integers aren't floats/doubles
Your loan_balance is most likely never actually going to be 0 exactly. I am pretty sure you want your loan_balance in float or double.
double loan_balance=23000.00;
while (loan_balance >0.000)
{
monthly_payment = loan_balance / 20.0;
loan_balance = loan_balance - monthly_payment;
cout << "Monthly Payment: $" << monthly_payment << ends;
cout << "Your loan balance after that payment is: $" << loan_balance << endl;
}
You're facing a problem with both precision and rounding.
When you exponentially decrease a number, it will converge to zero. If it is a limited precision floating point number, there will be a number that is so small that cannot be distinguishable from zero with any possible representation. So, a loop like
double d=1.0; // or some other finite value
while(d!=0.0)
d*=ratio; // ratio is some finite value so that fabs(ratio)<1.0
would finish in a finite number of iterations.
However, depending on the values of d and ratio, especially when d approaches zero in the subnormal range (implying fewer significant bits) and fabs(ratio) is close to 1.0 (slow convergence), the selected rounding mode can cause d*ratio to be rounded towards d. When that happens, the loop above will never end.
In machines/compilers that support IEC 60559, you should be able to test and set the floating point rounding mode, using fegetround() and fesetround() (declared in <fenv.h>). It is likely that the default rounding on your system is to the nearest. For the loop above to converge more quickly (or at all), it would be better to cause rounding to be toward 0.
Note however, that it comes with a price: depending on the application changing the rounding mode might be undesirable on the grounds of precision/accuracy (OTOH, if your already working in the subnormal range, your precision is probably not that good anymore anyway).
But the original question's convergence problem is still a little bit more complicated, because it is done with a two-step operation. So there is one rounding in the division operation, and another one in the subtraction. To increase the chance and speed of convergence, the subtraction should take as much of loan_balance's value as possible, so maybe rounding up would be better in this case. In fact, when I added fesetround(FP_UPWARD) to the OP's original code (together with defining the initial value of loan_balance to 1.0), it converged after 14466 iterations. (Note however that that is just a guess, and the effect on the original code might have been just a special condition. A more in-depth analysis would be necessary to qualify it, and such an analysis would have to take into account different values of ratio, and the relative greatness of minuend and subtrahend.)

Why does it show nan?

Ok so i am doing an a program where I am trying to get the result of the right side to be equivalent to the left side with 0.0001% accuracy
sin x = x - (x^3)/3! + (x^5)/5! + (x^7)/7! +....
#include<iostream>
#include<iomanip>
#include<math.h>
using namespace std;
long int fact(long int n)
{
if(n == 1 || n == 0)
return 1;
else
return n*fact(n-1);
}
int main()
{
int n = 1, counts=0; //for sin
cout << "Enter value for sin" << endl;
long double x,value,next = 0,accuracy = 0.0001;
cin >> x;
value = sin(x);
do
{
if(counts%2 == 0)
next = next + (pow(x,n)/fact(n));
else
next = next - (pow(x,n)/fact(n));
counts++;
n = n+2;
} while((fabs(next - value))> 0);
cout << "The value of sin " << x << " is " << next << endl;
}
and lets say i enter 45 for x
I get the result
The value for sin 45 in nan.
can anyone help me out on where I did wrong ?
First your while condition should be
while((fabs(next - value))> accuracy) and fact should return long double.
When you change that it still won't work for value of 45. The reason is that this Taylor series converge too slowly for large values.
Here is the error term in the formula
Here k is the number of iterations a=0 and the function is sin.In order for the condition to become false 45^(k+1)/(k+1)! times some absolute value of sin or cos (depending what the k-th derivative is) (it's between 0 and 1) should be less than 0.0001.
Well in this formula for value of 50 the number is still very large (we should expect error of around 1.3*10^18 which means we will do more than 50 iterations for sure).
45^50 and 50! will overflow and then dividing them will give you infinity/infinity=NAN.
In your original version fact value doesn't fit in the integer (your value overflows to 0) and then the division over 0 gives you infinity which after subtract of another infinity gives you NAN.
I quote from here in regard to pow:
Return value
If no errors occur, base raised to the power of exp (or
iexp) (baseexp), is returned.
If a domain error occurs, an
implementation-defined value is returned (NaN where supported)
If a pole error or a range error due to overflow occurs, ±HUGE_VAL,
±HUGE_VALF, or ±HUGE_VALL is returned.
If a range error occurs due to
underflow, the correct result (after rounding) is returned.
Reading further:
Error handling
...
except where specified above, if any argument is NaN, NaN is returned
So basically, since n is increasing and and you have many loops pow returns NaN (the compiler you use obviously supports that). The rest is arithmetic. You calculate with overflowing values.
I believe you are trying to approximate sin(x) by using its Taylor series. I am not sure if that is the way to go.
Maybe you can try to stop the loop as soon as you hit NaN and not update the variable next and simply output that. That's the closest you can get I believe with your algorithm.
If the choice of 45 implies you think the input is in degrees, you should rethink that and likely should reduce mod 2 Pi.
First fix two bugs:
long double fact(long int n)
...
}while((fabs(next - value))> accuracy);
the return value of fact will overflow quickly if it is long int. The return value of fact will overflow eventually even for long double. When you compare to 0 instead of accuracy the answer is never correct enough, so only nan can stop the while
Because of rounding error, you still never converge (while pow is giving values bigger than fact you are computing differences between big numbers, which accumulates significant rounding error, which is then never removed). So you might instead stop by computing long double m=pow(x,n)/fact(n); before increasing n in each step of the loop and use:
}while(m > accuracy*.5);
At that point, either the answer has the specified accuracy or the remaining error is dominated by rounding error and iterating further won't help.
If you had compiled your system with any reasonable level of warnings enabled you would have immediately seen that you are not using the variable accuracy. This and the fact that your fact function returns a long int are but a small part of your problem. You will never get a good result for sin(45) using your algorithm even if you correct those issues.
The problem is that with x=45, the terms in the Taylor expansion of sin(x) won't start decreasing until n=45. This is a big problem because 4545/45! is a very large number, 2428380447472097974305091567498407675884664058685302734375 / 1171023117375434566685446533210657783808, or roughly 2*1018. Your algorithm initially adds and subtracts huge numbers that only start decreasing after 20+ additions/subtractions, with the eventual hope that the result will be somewhere between -1 and +1. That is an unrealizable hope given an input value of 45 and using a native floating point type.
You could use some BigNum type (the internet is chock-full of them) with your algorithm, but that's extreme overkill when you only want four place accuracy. Alternatively, you could take advantage of the cyclical nature of sin(x), sin(x+2*pi)=sin(x). An input value of 45 is equivalent to 1.017702849742894661522992634... (modulo 2*pi). Your algorithm works quite nicely for an input of 1.017702849742894661522992634.
You can do much better than that, but taking the input value modulo 2*pi is the first step toward a reasonable algorithm for computing sine and cosine. Even better, you can use the facts that sin(x+pi)=-sin(x). This lets you reduce the range from -infinity to +infinity to 0 to pi. Even better, you can use the fact that between 0 and pi, sin(x) is symmetric about pi/2. You can do even better than that. The implementations of the trigonometric functions take extreme advantage of these behaviors, but they typically do not use Taylor approximations.

C++ Modulus returning wrong answer

Here is my code :
#include <iostream>
#include <cmath>
using namespace std;
int main()
{
int n, i, num, m, k = 0;
cout << "Enter a number :\n";
cin >> num;
n = log10(num);
while (n > 0) {
i = pow(10, n);
m = num / i;
k = k + pow(m, 3);
num = num % i;
--n;
cout << m << endl;
cout << num << endl;
}
k = k + pow(num, 3);
return 0;
}
When I input 111 it gives me this
1
12
1
2
I am using codeblocks. I don't know what is wrong.
Whenever I use pow expecting an integer result, I add .5 so I use (int)(pow(10,m)+.5) instead of letting the compiler automatically convert pow(10,m) to an int.
I have read many places telling me others have done exhaustive tests of some of the situations in which I add that .5 and found zero cases where it makes a difference. But accurately identifying the conditions in which it isn't needed can be quite hard. Using it when it isn't needed does no real harm.
If it makes a difference, it is a difference you want. If it doesn't make a difference, it had a tiny cost.
In the posted code, I would adjust every call to pow that way, not just the one I used as an example.
There is no equally easy fix for your use of log10, but it may be subject to the same problem. Since you expect a non integer answer and want that non integer answer truncated down to an integer, adding .5 would be very wrong. So you may need to find some more complicated work around for the fundamental problem of working with floating point. I'm not certain, but assuming 32-bit integers, I think adding 1e-10 to the result of log10 before converting to int is both never enough to change log10(10^n-1) into log10(10^n) but always enough to correct the error that might have done the reverse.
pow does floating-point exponentiation.
Floating point functions and operations are inexact, you cannot ever rely on them to give you the exact value that they would appear to compute, unless you are an expert on the fine details of IEEE floating point representations and the guarantees given by your library functions.
(and furthermore, floating-point numbers might even be incapable of representing the integers you want exactly)
This is particularly problematic when you convert the result to an integer, because the result is truncated to zero: int x = 0.999999; sets x == 0, not x == 1. Even the tiniest error in the wrong direction completely spoils the result.
You could round to the nearest integer, but that has problems too; e.g. with sufficiently large numbers, your floating point numbers might not have enough precision to be near the result you want. Or if you do enough operations (or unstable operations) with the floating point numbers, the errors can accumulate to the point you get the wrong nearest integer.
If you want to do exact, integer arithmetic, then you should use functions that do so. e.g. write your own ipow function that computes integer exponentiation without any floating-point operations at all.