How close to division by zero can I get?

How close to division by zero can I get? - c++

I want to avoid dividing by zero so I have an if statement:
float number;
//........
if (number > 0.000000000000001)
number = 1/number;
How small of a value can I safely use in place of 0.000000000000001?

Just use:
if(number > 0)
number = 1/number;
Note the difference between > and >=. If number > 0, then it definitely is not 0.
If number can be negative you can also use:
if(number != 0)
number = 1/number;
Note that, as others have mentioned in the comments, checking that number is not 0 will not prevent your result from being Inf or -Inf.

The number in the if condition depends on what you want to do with the result. In IEEE 754, which is used by (almost?) all C implementations, dividing by 0 is OK: you get positive or negative infinity.
If your goal is to avoid +/- Infinity, then the number in the if condition will depend upon the numerator. When the numerator is 1, you can use DBL_MIN or FLT_MIN from math.h.
If your goal is to avoid huge numbers after the division, you can do the division and then check if fabs(number) is bigger than certain value after the division, and then take whatever action as needed.
There is no single correct answer to your question.

You can simply check:
if (number > 0)
I can't understand why you need the lower limit.

For numeric type T std::numeric_limits gives you anything you need. For example you could do this to make sure that anything above min_invertible has finite reciprocal:
float max_float = std::numeric_limits<float>::max();
float min_float = std::numeric_limits<float>::min(); // or denorm_min()
float min_invertible = (max_float*min_float > 1.0f )? min_float : 1.0f/max_float;

You can't decently check up front. DBL_MAX / 0.5 effectively is a division by zero; the result is the same infinity you'd get from any other division by (almost) zero.
There is a simple solution: just check the result. std::isinf(result) will tell you whether the result overflowed, and IEEE754 tells you that division cannot produce infinity in other cases. (Well, except for INF/x,. That's not really producing infinity but merely preserving it.)

Your risk of producing an unhelpful result through overflow or underflow depends on both numerator and denominator.
A safety check which takes that into consideration is:
if (den == 0.0 || log2(num) - log2(den) >= log2(FLT_MAX))
/* expect overflow */ ;
else
return num / den;
but you might want to shave a small amount off log2(FLT_MAX) to leave wiggle-room for subsequent arithmetic and round-off.
You can do something similar with frexp, which would work for negative values as well:
int max;
int n, d;
frexp(FLT_MAX, &max);
frexp(num, &n);
frexp(den, &d);
if (den == 0.0 || n - d > max)
/* might overflow */ ;
else
return num / den;
This avoids the work of computing the logarithm, which might be more efficient if the compiler can find a suitable way of doing it, but it's not as accurate.

With IEEE 32-bit floats, the smallest possible value greater than 0 is 2^-149.
If you're using IEEE 64-bit, the smallest possible value is 2^-1074.
That said, (x > 0) is probably the better test.

Related

Display of Double Precision Floating Points Vs Their comparrison

Preamble
I am looking into a system developed to be used by people who don't understand floating point arithmetic. For this reason the implementation of comparison for floating point numbers is not exposed to the people using the system. Currently comparisons of floating point numbers occur like this (And this cannot change due to legacy reasons):
// If either number is not finite, do default comparison
if (!IsFinite(num1) || !IsFinite(num2)) {
output = (num1 == num2);
} else {
// Get exponents of both numbers to determine epsilon for comparison
tmp = (OSINT32*)&num1+1;
exp1 = (((*tmp)>>20)& 0x07ff) - 1023;
tmp = (OSINT32*)&num2+1;
exp2 = (((*tmp)>>20)& 0x07ff) - 1023;
// Check if exponent is the same
if (exp1 != exp2) {
output = false;
} else {
// Calculate epsilon based on the magic number 47 (presumably calculated experimentally)?
epsilon = pow(2.0,exp1-47);
output = (fabs(num2-num1) <= eps);
}
}
The crux of it is, we calculate the epsilon based on the exponent of the number to stop users of the interface from making floating point comparison mistakes. A BIG NOTE: This is for people who are not software programmers so when they do pow(sqrt(2), 2) == 2 they don't get a big surprise. Maybe this is not the best idea, but like i said, it cannot be changed.
The Problem
We are having trouble figuring out how to display numbers to the user. In the past they simply displayed the number to 15 significant digits. But this results in problems of the following type:
>> SHOW 4.1 MOD 1
>> 0.099999999999999996
>> SHOW (4.1 MOD 1) == 0.1
>> TRUE
The comparison calls this correct because of the generated epsilon. But the printing of the number is confusing for people, how is 0.099999999999999996 = 0.1?. We need a way to show the number such that it represents the shortest number of significant bits to which a number compared to it would be TRUE. So for 0.099999999999999996 this would be 0.1, for 0.569999999992724327 it would be 0.569999999992725.
Is this possible?

You could calculate (num - pow(2.0, exp - 47)) and (num + pow(2.0, exp - 47)), convert both to string and search the smallest decimal between the range.
The exact value of a double is mantissa * pow(2.0, exp - 51) with an integer value mantissa, so if you add/subtract pow(2.0, exp - 47) you change the mantissa by 2^4, which should be exactly representable without rounding errors (unless in corner cases where the mantissa under/overflows, i.e if it is binary <= pow(2,4) or >= pow(2, 53) - pow(2,4). you might want to check for these*).
Then you have two strings, search the first position where the digits differ and cut it off there. Although there are a lot of rounding cases, especially when you not just want a correct number in the range, but the number closes to the input number (but that might not be needed). For example if you get "1.23" and "1.24", you might even want to output `"1.235".
This also shows that your example is wrong. epsilon for 0.569999999992724327 is (to maximal precision) 0.000000000000003552713678800500929355621337890625. The ranges are 0.569999999992720773889232077635824680328369140625 to 0.569999999992727879316589678637683391571044921875 and would be cut off at 0.569999999992725 (or 0.569999999992723 if you prefer that rounding)
An easier to implement sledgehammer method would be to output it to the maximal precision, cut one digit off, convert it back to double, check if it compares correctly. Then continue cutting, till the comparison fails. (could be improved with a binary search)
* They should still be exactly representable, but your comparison method will behave very odd. Consider num1 == 1 and num2 == 1 - pow(2.0, -53) = 0.99999999999999988897769753748434595763683319091796875. There difference 0.00000000000000011102230246251565404236316680908203125 is below your epsilon0.000000000000003552713678800500929355621337890625, but the comparison will say they differ, because they have different exponents

Yes, it's possible.
double a=fmod(4.1,1);
cerr<<std::setprecision(0)<<a<<"\n";
cerr<<std::setprecision(10)<<a<<"\n";
cerr<<std::setprecision(20)<<a<<"\n";
produces:
0.1
0.1
0.099999999999999644729
I think you just need to determine what level of display precision corresponds to your epsilon value.

We need a way to show the number such that it represents the shortest
number of significant bits to which a number compared to it would be
TRUE.
Can't you just do it the brute-force-ish way?
float num = 0.09999999;
for (int precision = 0; precision < MAX_PRECISION; ++precision) {
std::stringstream str;
float tmp = 0;
str << std::fixed << std::setprecision(precision) << num;
str >> tmp;
if (num == tmp) {
std::cout << std::fixed << std::setprecision(precision) << num;
break;
}
}

It is not possible to avoid confusing users given the constraints you've specified. For one thing, 0.0999999999999996447 compares equal to 0.1, and 0.1000000000000003664 compares equal to 0.1, but 0.0999999999999996447 does not compare equal to 0.1000000000000003664. For another, 2.00000000000001421 compares equal to 2.0, but 1.999999999999999778 does not compare equal to 2.0 even though it's much closer to 2.0 than 2.00000000000001421 is.
Enjoy.

Can I trust a real-to-int conversion of the result of ceil()?

Suppose I have some code such as:
float a, b = ...; // both positive
int s1 = ceil(sqrt(a/b));
int s2 = ceil(sqrt(a/b)) + 0.1;
Is it ever possible that s1 != s2? My concern is when a/b is a perfect square. For example, perhaps a=100.0 and b=4.0, then the output of ceil should be 5.00000 but what if instead it is 4.99999?
Similar question: is there a chance that 100.0/4.0 evaluates to say 5.00001 and then ceil will round it up to 6.00000?
I'd prefer to do this in integer math but the sqrt kinda screws that plan.
EDIT: suggestions on how to better implement this would be appreciated too! The a and b values are integer values, so actual code is more like: ceil(sqrt(float(a)/b))
EDIT: Based on levis501's answer, I think I will do this:
float a, b = ...; // both positive
int s = sqrt(a/b);
while (s*s*b < a) ++s;
Thank you all!

I don't think it's possible. Regardless of the value of sqrt(a/b), what it produces is some value N that we use as:
int s1 = ceil(N);
int s2 = ceil(N) + 0.1;
Since ceil always produces an integer value (albeit represented as a double), we will always have some value X, for which the first produces X.0 and the second X.1. Conversion to int will always truncate that .1, so both will result in X.
It might seem like there would be an exception if X was so large that X.1 overflowed the range of double. I don't see where this could be possible though. Except close to 0 (where overflow isn't a concern) the square root of a number will always be smaller than the input number. Therefore, before ceil(N)+0.1 could overflow, the a/b being used as an input in sqrt(a/b) would have to have overflowed already.

You may want to write an explicit function for your case. e.g.:
/* return the smallest positive integer whose square is at least x */
int isqrt(double x) {
int y1 = ceil(sqrt(x));
int y2 = y1 - 1;
if ((y2 * y2) >= x) return y2;
return y1;
}
This will handle the odd case where the square root of your ratio a/b is within the precision of double.

Equality of floating point numbers is indeed an issue, but IMHO not if we deal with integer numbers.
If you have the case of 100.0/4.0, it should perfectly evaluate to 25.0, as 25.0 is exactly representable as a float, as opposite to e.g. 25.1.

Yes, it's entirely possible that s1 != s2. Why is that a problem, though?
It seems natural enough that s1 != (s1 + 0.1).
BTW, if you would prefer to have 5.00001 rounded to 5.00000 instead of 6.00000, use rint instead of ceil.
And to answer the actual question (in your comment) - you can use sqrt to get a starting point and then just find the correct square using integer arithmetic.
int min_dimension_greater_than(int items, int buckets)
{
double target = double(items) / buckets;
int min_square = ceil(target);
int dim = floor(sqrt(target));
int square = dim * dim;
while (square < min_square) {
seed += 1;
square = dim * dim;
}
return dim;
}
And yes, this can be improved a lot, it's just a quick sketch.

s1 will always equal s2.
The C and C++ standards do not say much about the accuracy of math routines. Taken literally, it is impossible for the standard to be implemented, since the C standard says sqrt(x) returns the square root of x, but the square root of two cannot be exactly represented in floating point.
Implementing routines with good performance that always return a correctly rounded result (in round-to-nearest mode, this means the result is the representable floating-point number that is nearest to the exact result, with ties resolved in favor of a low zero bit) is a difficult research problem. Good math libraries target accuracy less than 1 ULP (so one of the two nearest representable numbers is returned), perhaps something slightly more than .5 ULP. (An ULP is the Unit of Least Precision, the value of the low bit given a particular value in the exponent field.) Some math libraries may be significantly worse than this. You would have to ask your vendor or check the documentation for more information.
So sqrt may be slightly off. If the exact square root is an integer (within the range in which integers are exactly representable in floating-point) and the library guarantees errors are less than 1 ULP, then the result of sqrt must be exactly correct, because any result other than the exact result is at least 1 ULP away.
Similarly, if the library guarantees errors are less than 1 ULP, then ceil must return the exact result, again because the exact result is representable and any other result would be at least 1 ULP away. Additionally, the nature of ceil is such that I would expect any reasonable math library to always return an integer, even if the rest of the library were not high quality.
As for overflow cases, if ceil(x) were beyond the range where all integers are exactly representable, then ceil(x)+.1 is closer to ceil(x) than it is to any other representable number, so the rounded result of adding .1 to ceil(x) should be ceil(x) in any system implementing the floating-point standard (IEEE 754). That is provided you are in the default rounding mode, which is round-to-nearest. It is possible to change the rounding mode to something like round-toward-infinity, which could cause ceil(x)+.1 to be an integer higher than ceil(x).

Is div function useful (stdlib.h)? [duplicate]

This question already has answers here:
What is the purpose of the div() library function?
(6 answers)
Closed 3 years ago.
There is a function called div in C,C++ (stdlib.h)
div_t div(int numer, int denom);
typedef struct _div_t
{
int quot;
int rem;
} div_t;
But C,C++ have / and % operators.
My question is: "When there are / and % operators, Is div function useful?"

Yes, it is: it calculates the quotient and remainder in one operation.
Aside from that, the same behaviour can be achieved with /+% (and a decent optimizer will optimize them into a single div anyway).
In order to sum it up: if you care about squeezing out last bits of performance, this may be your function of choice, especially if the optimizer on your platform is not so advanced. This is often the case for embedded platforms. Otherwise, use whatever way you find more readable.

The div() function returns a structure which contains the quotient and remainder of the division of the first parameter (the numerator) by the second (the denominator). There are four variants:
div_t div(int, int)
ldiv_t ldiv(long, long)
lldiv_t lldiv(long long, long long)
imaxdiv_t imaxdiv(intmax_t, intmax_t (intmax_t represents the biggest integer type available on the system)
The div_t structure looks like this:
typedef struct
{
int quot; /* Quotient. */
int rem; /* Remainder. */
} div_t;
The implementation does simply use the / and % operators, so it's not exactly a very complicated or necessary function, but it is part of the C standard (as defined by [ISO 9899:201x][1]).
See the implementation in GNU libc:
/* Return the `div_t' representation of NUMER over DENOM. */
div_t
div (numer, denom)
int numer, denom;
{
div_t result;
result.quot = numer / denom;
result.rem = numer % denom;
/* The ANSI standard says that |QUOT| <= |NUMER / DENOM|, where
NUMER / DENOM is to be computed in infinite precision. In
other words, we should always truncate the quotient towards
zero, never -infinity. Machine division and remainer may
work either way when one or both of NUMER or DENOM is
negative. If only one is negative and QUOT has been
truncated towards -infinity, REM will have the same sign as
DENOM and the opposite sign of NUMER; if both are negative
and QUOT has been truncated towards -infinity, REM will be
positive (will have the opposite sign of NUMER). These are
considered `wrong'. If both are NUM and DENOM are positive,
RESULT will always be positive. This all boils down to: if
NUMER >= 0, but REM < 0, we got the wrong answer. In that
case, to get the right answer, add 1 to QUOT and subtract
DENOM from REM. */
if (numer >= 0 && result.rem < 0)
{
++result.quot;
result.rem -= denom;
}
return result;
}

The semantics of div() is different than the semantics of % and /, which is important in some cases.
That is why the following code is in the implementation shown in psYchotic's answer:
if (numer >= 0 && result.rem < 0)
{
++result.quot;
result.rem -= denom;
}
% may return a negative answer, whereas div() always returns a non-negative remainder.
Check the WikiPedia entry, particularly "div always rounds towards 0, unlike ordinary integer division in C, where rounding for negative numbers is implementation-dependent."

div() filled a pre-C99 need: portability
Pre C99, the rounding direction of the quotient of a / b with a negative operand was implementation dependent. With div(), the rounding direction is not optional but specified to be toward 0. div() provided uniform portable division. A secondary use was the potential efficiency when code needed to calculate both the quotient and remainder.
With C99 and later, div() and / specifying the same round direction and with better compilers optimizing nearby a/b and a%b code, the need has diminished.
This was the compelling reason for div() and it explains the absence of udiv_t udiv(unsigned numer, unsigned denom) in the C spec: The issues of implementation dependent results of a/b with negative operands are non-existent for unsigned even in pre-C99.

Probably because on many processors the div instruction produces both values and you can always count on the compiler to recognize that adjacent / and % operators on the same inputs could be coalesced into one operation.

It costs less time if you need both value.
CPU always calculate both remainder and quotient when performing division.
If use "/" once and "%" once, cpu will calculate twice both number.
(forgive my poor English, I'm not native)

C++ Should this be easier?

long-time listener, first-time caller. I am relatively new to programming and was looking back at some of the code I wrote for an old lab. Is there an easier way to tell if a double is evenly divisible by an integer?
double num (//whatever);
int divisor (//an integer);
bool bananas;
if(floor(num)!= num || static_cast<int>(num)%divisor != 0) {
bananas=false;
}
if(bananas==true)
//do stuff;
}

The question is strange, and the checks are as well. The problem is that it makes little sense to speak about divisibility of a floating point number because floating point number are represented imprecisely in binary, and divisibility is about exactitude.
I encourage you to read this article, by David Goldberg: What Every Computer Scientist Should Know About Floating Point Arithmetic. It is a bit long-winded, so you may appreciate this website, instead: The Floating-Point Guide.
The truth is that floor(num) == num is a strange piece of code.
num is a double
floor(num) returns an double, close to an int
The trouble is that this does not check what you really wanted. For example, suppose (for the sake of example) that 5 cannot be represented exactly as a double, therefore, instead of storing 5, the computer will store 4.999999999999.
double num = 5; // 4.999999999999999
double floored = floor(num); // 4.0
assert(num != floored);
In general exact comparisons are meaningless for floating point numbers, because of rounding errors.
If you insist on using floor, I suggest to use floor(num + 0.5) which is better, though slightly biased. A better rounding method is the Banker's rounding because it is unbiased, and the article references others if you wish. Note that the Banker's rounding is the baked in in round...
As for your question, first you need a double aware modulo: fmod, then you need to remember the avoid exact comparisons bit.
A first (naive) attempt:
// divisor is deemed non-zero
// epsilon is a constant
double mod = fmod(num, divisor); // divisor will be converted to a double
if (mod <= epsilon) { }
Unfortunately it fails one important test: the magnitude of mod depends on the magnitude of divisor, thus if divisor is smaller than epsilon to begin with, it will always be true.
A second attempt:
// divisor is deemed non-zero
double const epsilon = divisor / 1000.0;
double mod = fmod(num, divisor);
if (mod <= epsilon) { }
Better, but not quite there: mod and epsilon are signed! Yes, it's a bizarre modulo, th sign of mod is the sign of num
A third attempt:
// divisor is deemed non-zero
double const eps = fabs(divisor / 1000.0);
double mod = fabs(fmod(num, divisor));
if (mod <= eps) { }
Much better.
Should work fairly well too if divisor comes from an integer, as there won't be precision issues... or at least not too much.
EDIT: fourth attempt, by #ybungalobill
The previous attempt does not deal well with situations where num/divisor errors on the wrong side. Like 1.999/1.000 --> 0.999, it's nearly divisor so we should indicate equality, yet it failed.
// divisor is deemed non-zero
mod = fabs(fmod(num/divisor, 1));
if (mod <= 0.001 || fabs(1 - mod) <= 0.001) { }
Looks like a never ending task eh ?
There is still cause for troubles though.
double has a limited precision, that is a limited number of digits that is representable (16 I think ?). This precision might be insufficient to represent an integer:
Integer n = 12345678901234567890;
double d = n; // 1.234567890123457 * 10^20
This truncation means it is impossible to map it back to its original value. This should not cause any issue with double and int, for example on my platform double is 8 bytes and int is 4 bytes, so it would work, but changing double to float or int to long could violate this assumption, oh hell!
Are you sure you really need floating point, by the way ?

Based on the above comments, I believe you can do this...
double num (//whatever);
int divisor (//an integer);
if(fmod(num, divisor) == 0) {
//do stuff;
}

I haven't checked it but why not do this?
if (floor(num) == num && !(static_cast<int>(num) % divisor)) {
// do stuff...
}

Handling overflow when casting doubles to integers in C

Today, I noticed that when I cast a double that is greater than the maximum possible integer to an integer, I get -2147483648. Similarly, when I cast a double that is less than the minimum possible integer, I also get -2147483648.
Is this behavior defined for all platforms?
What is the best way to detect this under/overflow? Is putting if statements for min and max int before the cast the best solution?

When casting floats to integers, overflow causes undefined behavior. From the C99 spec, section 6.3.1.4 Real floating and integer:
When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.
You have to check the range manually, but don't use code like:
// DON'T use code like this!
if (my_double > INT_MAX || my_double < INT_MIN)
printf("Overflow!");
INT_MAX is an integer constant that may not have an exact floating-point representation. When comparing to a float, it may be rounded to the nearest higher or nearest lower representable floating point value (this is implementation-defined). With 64-bit integers, for example, INT_MAX is 2^63 - 1 which will typically be rounded to 2^63, so the check essentially becomes my_double > INT_MAX + 1. This won't detect an overflow if my_double equals 2^63.
For example with gcc 4.9.1 on Linux, the following program
#include <math.h>
#include <stdint.h>
#include <stdio.h>
int main() {
double d = pow(2, 63);
int64_t i = INT64_MAX;
printf("%f > %lld is %s\n", d, i, d > i ? "true" : "false");
return 0;
}
prints
9223372036854775808.000000 > 9223372036854775807 is false
It's hard to get this right if you don't know the limits and internal representation of the integer and double types beforehand. But if you convert from double to int64_t, for example, you can use floating point constants that are exact doubles (assuming two's complement and IEEE doubles):
if (!(my_double >= -9223372036854775808.0 // -2^63
&& my_double < 9223372036854775808.0) // 2^63
) {
// Handle overflow.
}
The construct !(A && B)also handles NaNs correctly. A portable, safe, but slighty inaccurate version for ints is:
if (!(my_double > INT_MIN && my_double < INT_MAX)) {
// Handle overflow.
}
This errs on the side of caution and will falsely reject values that equal INT_MIN or INT_MAX. But for most applications, this should be fine.

limits.h has constants for max and min possible values for integer data types, you can check your double variable before casting, like
if (my_double > nextafter(INT_MAX, 0) || my_double < nextafter(INT_MIN, 0))
printf("Overflow!");
else
my_int = (int)my_double;
EDIT: nextafter() will solve the problem mentioned by nwellnhof

To answer your question: The behaviour when you cast out of range floats is undefined or implementation specific.
Speaking from experience: I've worked on a MIPS64 system that didn't implemented these kind of casts at all. Instead of doing something deterministic the CPU threw a CPU exception. The exception handler that ought to emulate the cast returned without doing anything to the result.
I've ended up with random integers. Guess how long it took to trace back a bug to this cause. :-)
You'll better do the range check yourself if you aren't sure that the number can't get out of the valid range.

A portable way for C++ is to use the SafeInt class:
http://www.codeplex.com/SafeInt
The implementation will allow for normal addition/subtract/etc on a C++ number type including casts. It will throw an exception whenever and overflow scenario is detected.
SafeInt<int> s1 = INT_MAX;
SafeInt<int> s2 = 42;
SafeInt<int> s3 = s1 + s2; // throws
I highly advise using this class in any place where overflow is an important scenario. It makes it very difficult to avoid silently overflowing. In cases where there is a recovery scenario for an overflow, simply catch the SafeIntException and recover as appropriate.
SafeInt now works on GCC as well as Visual Studio

What is the best way to detect this under/overflow?
Compare the truncated double to exact limits near INT_MIN,INT_MAX.
The trick is to exactly convert limits based on INT_MIN,INT_MAX into double values. A double may not exactly represent INT_MAX as the number of bits in an int may exceed that floating point's precision.*1 In that case, the conversion of INT_MAX to double suffers from rounding. The number after INT_MAX is a power-of-2 and is certainly representable as a double. 2.0*(INT_MAX/2 + 1) generates the whole number one greater than INT_MAX.
The same applies to INT_MIN on non-2s-complement machines.
INT_MAX is always a power-of-2 - 1.
INT_MIN is always:
-INT_MAX (not 2's complement) or
-INT_MAX-1 (2's complement)
int double_to_int(double x) {
x = trunc(x);
if (x >= 2.0*(INT_MAX/2 + 1)) Handle_Overflow();
#if -INT_MAX == INT_MIN
if (x <= 2.0*(INT_MIN/2 - 1)) Handle_Underflow();
#else
// Fixed 2022
// if (x < INT_MIN) Handle_Underflow();
if (x - INT_MIN < -1.0) Handle_Underflow();
#endif
return (int) x;
}
To detect NaN and not use trunc()
#define DBL_INT_MAXP1 (2.0*(INT_MAX/2+1))
#define DBL_INT_MINM1 (2.0*(INT_MIN/2-1))
int double_to_int(double x) {
if (x < DBL_INT_MAXP1) {
#if -INT_MAX == INT_MIN
if (x > DBL_INT_MINM1) {
return (int) x;
}
#else
if (ceil(x) >= INT_MIN) {
return (int) x;
}
#endif
Handle_Underflow();
} else if (x > 0) {
Handle_Overflow();
} else {
Handle_NaN();
}
}
[Edit 2022] Corner error corrected after 6 years.
double values in the range (INT_MIN - 1.0 ... INT_MIN) (non-inclusive end-points) convert well to int. Prior code failed those.
*1 This applies too to INT_MIN - 1 when int precision is more than double. Although this is rare, the issues readily applies to long long. Consider the difference between:
if (x < LLONG_MIN - 1.0) Handle_Underflow(); // Bad
if (x - LLONG_MIN < -1.0) Handle_Underflow();// Good
With 2's complement, some_int_type_MIN is a (negative) power-of-2 and exactly converts to a double. Thus x - LLONG_MIN is exact in the range of concern while LLONG_MIN - 1.0 may suffer precision loss in the subtraction.

We meet the same question. such as:
double d = 9223372036854775807L;
int i = (int)d;
in Linux/window, i = -2147483648. but In AIX 5.3 i = 2147483647.
If the double is outside the range of interger.
Linux/window always return INT_MIN.
AIX will return INT_MAX if double is postive, will return INT_MIN of
double is negetive.

Another option is to use boost::numeric_cast which allows for arbitrary conversion between numerical types. It detects loss of range when a numeric type is converted, and throws an exception if the range cannot be preserved.
The website referenced above also provides a small example which should give a quick overview on how this template can be used.
Of course, this isn't plain C anymore ;-)

I am not sure about this but I think it may be possible to "turn on" floating point exceptions for under/overflow...take a look at this Dealing with Floating-point Exceptions in MSVC7\8 so you might have an alternative to if/else checks.

I can't tell you for certain whether it is defined for all platforms, but that is pretty much what's happened on every platform I've used. Except, in my experience, it rolls. That is, if the value of the double is INT_MAX + 2, then when the result of the cast ends up being INT_MIN + 2.
As for the best way to handle it, I'm really not sure. I've run up against the issue myself, and have yet to find an elegant way to deal with it. I'm sure someone will respond that can help us both there.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How close to division by zero can I get? - c++

I want to avoid dividing by zero so I have an if statement: float number; //........ if (number > 0.000000000000001) number = 1/number; How small of a value can I safely use in place of 0.000000000000001?

You can simply check: if (number > 0) I can't understand why you need the lower limit.

With IEEE 32-bit floats, the smallest possible value greater than 0 is 2^-149. If you're using IEEE 64-bit, the smallest possible value is 2^-1074. That said, (x > 0) is probably the better test.

Related

Display of Double Precision Floating Points Vs Their comparrison

Can I trust a real-to-int conversion of the result of ceil()?

Is div function useful (stdlib.h)? [duplicate]

C++ Should this be easier?

Handling overflow when casting doubles to integers in C

Categories

Resources