I've a question to the code snippet below:
long l=9223372036854775807L;
float f=static_cast<float>(l);
The long value cannot be represanted exactly according to the IEEE754.
My Question is how is the lossy conversion handled:
Is the nearest floating point representation taken?
Is the next smaller/bigger representation taken?
Or is an other approach is taken?
I'm aware of this question
what happens at background when convert int to float but this does not anwser my question.
C++ defines the conversion like this (quoting latest standard draft):
[conv.fpint]
A prvalue of an integer type or of an unscoped enumeration type can be converted to a prvalue of a floating-point type.
The result is exact if possible.
If the value being converted is in the range of values that can be represented but the value cannot be represented exactly, it is an implementation-defined choice of either the next lower or higher representable value.
[ Note: Loss of precision occurs if the integral value cannot be represented exactly as a value of the floating-point type.
— end note
]
If the value being converted is outside the range of values that can be represented, the behavior is undefined.
If the source type is bool, the value false is converted to zero and the value true is converted to one.
The IEEE 754 standard defines conversion like this:
5.4.1 Arithmetic operations
It shall be possible to convert from all supported signed and unsigned integer formats to all supported arithmetic formats. Integral values are converted exactly from integer formats to floating-point formats whenever the value is representable in both formats. If the converted value is not exactly representable in the destination format, the result is determined according to the applicable rounding-direction attribute, and an inexact or floating-point overflow exception arises as specified in Clause 7, just as with arithmetic operations. The signs of integer zeros are preserved. Integer zeros without signs are converted to +0. The preferred exponent is 0.
Rounding modes are specified as:
4.3.1 Rounding-direction attributes to nearest
roundTiesToEven, the floating-point number nearest to the infinitely precise result shall be delivered; if the two nearest floating-point numbers bracketing an unrepresentable infinitely precise result are equally near, the one with an even least significant digit shall be delivered.
roundTiesToAway, the floating-point number nearest to the infinitely precise result shall be delivered; if the two nearest floating-point numbers bracketing an unrepresentable infinitely precise result are equally near, the one with larger magnitude shall be delivered.
4.3.2 Directed rounding attributes
roundTowardPositive, the result shall be the format’s floating-point number (possibly +∞) closest to and no less than the infinitely precise result
roundTowardNegative, the result shall be the format’s floating-point number (possibly −∞) closest to and no greater than the infinitely precise result
roundTowardZero, the result shall be the format’s floating-point number closest to and no greater in magnitude than the infinitely precise result.
4.3.3 Rounding attribute requirements
The roundTiesToEven rounding-direction attribute shall be the default rounding-direction attribute for results in binary formats.
So by default, your suggestion 1 would apply, but only if another mode hasn't been selected.
The C++ standard library inherits <cfenv> from the C standard. This header offers macros, functions and types for interacting with the floating point environment, including the rounding modes.
See here:
A prvalue of integer or unscoped enumeration type can be converted to
a prvalue of any floating-point type. If the value cannot be
represented correctly, it is implementation defined whether the
closest higher or the closest lower representable value will be
selected, although if IEEE arithmetic is supported, rounding defaults
to nearest. If the value cannot fit into the destination type, the
behavior is undefined. If the source type is bool, the value false is
converted to zero, and the value true is converted to one.
As for the rounding rules of IEEE 754, there seem to be five of them. I couldn't find any information on which ones are used in which situation, though. It looks like it's up to the implementation however, you can set the rounding mode in a C++ program as described here.
Related
Is there any definition how floating-point values evaluated at compile-time are rounded in C or C++ ? F.e. when I have double d = 1.0 / 3.0; ? I.e. what kind of rounding is done at compile-time.
And is there a definition of what's the default-rounding mode for a thread at runtime (C99's / C++11's fegetround() / fesetround()) ?
And is rounding to integer-values also included in the latter configuration parameters ? Im aware of nearbyint(), but this is specified to bound to the rounding-parameters which can be set by fesetround(). What I'm concerned about is direct casting to an integer.
In both specs (C17 and C++20), the compile time rounding is implementation defined.
In the C++ spec, this is specified in lex.fcon, which says
If the scaled value is not in the range of representable values for its type, the program is ill-formed. Otherwise, the value of a floating-point-literal is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.
The C spec has similar language (quote taken from N2176, C17 final draft):
the result is either the nearest representable value, or the larger or smaller representable value immediately adjacent to the nearest representable value, chosen in an implementation-defined manner.
It also recommends that translation time conversion should match execution time conversion by library functions (like strtod) but it is not required. See the description of representable values.
Conversions of floating point values to integers are specified in both to truncate (round towards 0 by discarding the fractional part).
What are the guarantees regarding conversions from integral to floating-point types in a C++ implementation supporting IEEE-754 FP arithmetic?
Specifically, is it always well-defined behaviour to convert any integral value to any floating-point type, possibly resulting in a value of +-inf? Or are there situations in which this would result in undefined behaviour?
(Note, I am not asking about exact conversion, just if performing the conversion is always legal from the point of view of the language standard)
IEC 60559 (the current successor standard to IEEE 754) makes integer-to-float conversion well-defined in all cases, as discussed in Franck's answer, but it is the language standard that has the final word on the subject.
In the base standard, C++11 section 4.9 "Floating-integral conversions", paragraph 2, makes out-of-range integer-to-floating-point conversions undefined behavior. (Quotation is from document N3337, which is the closest approximation to the official 2011 C++ standard that is publicly available at no charge.)
A prvalue of an integer type or of an unscoped enumeration type can be converted to a prvalue of a floating
point type. The result is exact if possible. If the value being converted is in the range of values that can
be represented but the value cannot be represented exactly, it is an implementation-defined choice of either
the next lower or higher representable value. [ Note: Loss of precision occurs if the integral value cannot
be represented exactly as a value of the floating type. — end note ] If the value being converted is outside
the range of values that can be represented, the behavior is undefined. If the source type is bool, the value
false is converted to zero and the value true is converted to one.
Emphasis mine. The C standard says the same thing in different words (section 6.3.1.4 paragraph 2).
The C++ standard does not discuss what it would mean for an implementation of C++ to supply IEC 60559-conformant floating-point arithmetic. However, the C standard (closest approximation to C11 available online at no charge is N1570) does discuss this in its Annex F, and C++ implementors do tend to turn to C for guidance when C++ leaves something unspecified. There is no explicit discussion of integer to floating point conversion in Annex F, but there is this sentence in F.1p1:
Since
negative and positive infinity are representable in IEC 60559 formats, all real numbers lie
within the range of representable values.
Putting that sentence together with 6.3.1.4p2 suggests to me that the C committee meant for integer-to-floating conversion to produce ±Inf when the integer's magnitude is outside the range of representable finite numbers. And that interpretation is consistent with the IEC 60559-specified behavior of conversions, so we can be reasonably confident that that's what an implementation of C that claimed to conform to Annex F would do.
However, applying any interpretation of the C standard to C++ is risky at best; C++ has not been defined as a superset of C for a very long time.
If your implementation of C++ predefines the macro __STDC_IEC_559__ and/or documents compliance with IEC 60559 in some way and you don't use the "be sloppy about floating-point math in the name of speed" compilation mode (which may be on by default), you can probably rely on out-of-range conversions to produce ±Inf. Otherwise, it's UB.
In section 7.4 the standard IEEE-754 (2008) says it is well-defined behavior for it. But it is related to IEEE-754 and C/C++ implementations are free to respect it or not (see the zwol answer).
The overflow exception shall be signaled if and only if
the destination format’s largest finite number is
exceeded in magnitude by what would have been the rounded floating-point result were the exponent
range unbounded. The default result shall be determined by the rounding-direction attribute and the sign of
the intermediate result as follows:
a)
roundTiesToEven and roundTiesToAway carry all overflows to
∞
with the sign of the intermediate
result.
b) roundTowardZero carries all overflows to the format’s largest finite number with the sign of the
intermediate result.
c)
roundTowardNegative carries positive overflows to the format’s largest finite number, and carries
negative overflows to
−∞
d) roundTowardPositive carries negative overflows to the format’s most negative finite number, and
carries positive overflows to +
∞
All cases of these 4 points provide a deterministic result for the conversion from integral to floating-point type.
All other cases (no overflow) are also well-defined with deterministic results given by the IEEE-754 standard.
Basically:
float nanf=std::numeric_limits::signaling_NaN<decltype(g_nanf)>();
double nand = nanf;
assert(std::isnan(nand));
can assert fire?
also what if I was assigning double nan to float
From N3337:
4.6 Floating point promotion [conv.fpprom]
1 A prvalue of type float can be converted to a prvalue of type double. The value is unchanged.
,
4.8 Floating point conversions [conv.double]
1 A prvalue of floating point type can be converted to a prvalue of another floating point
type. If the source value can be exactly represented in the
destination type, the result of the conversion is that exact
representation. If the source value is between two adjacent
destination values, the result of the conversion is an
implementation-defined choice of either of those values. Otherwise,
the behavior is undefined.
and
3.9.1 Fundamental types [basic.fundamental]
8 There are three floating point types: float, double, and long double. The type double provides
at least as much precision as float, and the type long double provides
at least as much precision as double. The set of values of the type
float is a subset of the set of values of the type double; the set of
values of the type double is a subset of the set of values of the type
long double.
Now we should confirm that NaN is in fact a valid value for a floating point type. The definition for isnan refers back to the C Standard. From N1570:
7.12.3.4 The isnan macro
2 The isnan macro determines whether its argument value is a NaN.
So to summarise: yes, going from float to double should preserve NaN-ness. Going from double to float is perhaps a little iffier, but as double supports NaN, we conclude that this conversion also must be preserved, by the "subset of values" wording.
(What the word "value" actually means seems to be somewhat ill-defined.)
C++ doesn't require adherence to IEEE-754; however, for platforms that do follow the standard, clause 6.2 governs the behavior for quiet NaNs:
For an operation with quiet NaN inputs, other than maximum and minimum operations, if a floating-point result is to be delivered the result shall be a quiet NaN ...
and signaling NaNs:
Under default exception handling, any operation signaling an invalid operation exception and for which a floating-point result is to be delivered shall deliver a quiet NaN.
Signaling NaNs shall be reserved operands that, under default exception handling, signal the invalid operation exception (see 7.2) for every general-computational and signaling-computational operation ...
BoBTFish's answer sounds pretty convincing to me. Now, I don't have normative information on this subject myself, but I want to provide an alternative answer based on a bit of deduction:
It is generally true that:
Generally speaking, any expression involving a NaN must return NaN.
An arithmetic expression might involve casting from/to single and double, maybe multiple times.
I don't see how the compiler could could meet both requirements without preserving NaN during a float/double cast.
As I interpret it, MSDN's definition of numeric_limits::is_exactis almost always false:
[all] calculations done on [this] type are free of rounding errors.
And IBM's definition is almost always true: (Or a circular definition, depending on how you read it)
a type that has exact representations for all its values
What I'm certain of is that I could store a 2 in both a double and a long and they would both be represented exactly.
I could then divide them both by 10 and neither would hold the mathematical result exactly.
Given any numeric data type T, what is the correct way to define std::numeric_limits<T>::is_exact?
Edit:
I've posted what I think is an accurate answer to this question from details supplied in many answers. This answer is not a contender for the bounty.
The definition in the standard (see NPE's answer) isn't very exact, is it? Instead, it's circular and vague.
Given that the IEC floating point standard has a concept of "inexact" numbers (and an inexact exception when a computation yields an inexact number), I suspect that this is the origin of the name is_exact. Note that of the standard types, is_exact is false only for float, double, and long double.
The intent is to indicate whether the type exactly represents all of the numbers of the underlying mathematical type. For integral types, the underlying mathematical type is some finite subset of the integers. Since each integral types exactly represents each and every one of the members of the subset of the integers targeted by that type, is_exact is true for all of the integral types. For floating point types, the underlying mathematical type is some finite range subset of the real numbers. (An example of a finite range subset is "all real numbers between 0 and 1".) There's no way to represent even a finite range subset of the reals exactly; almost all are uncomputable. The IEC/IEEE format makes matters even worse. With that format, computers can't even represent a finite range subset of the rational numbers exactly (let alone a finite range subset of the computable numbers).
I suspect that the origin of the term is_exact is the long-standing concept of "inexact" numbers in various floating point representation models. Perhaps a better name would have been is_complete.
Addendum
The numeric types defined by the language aren't the be-all and end-all of representations of "numbers". A fixed point representation is essentially the integers, so they too would be exact (no holes in the representation). Representing the rationals as a pair of standard integral types (e.g., int/int) would not be exact, but a class that represented the rationals as a Bignum pair would, at least theoretically, be "exact".
What about the reals? There's no way to represent the reals exactly because almost all of the reals are not computable. The best we could possibly do with computers is the computable numbers. That would require representing a number as some algorithm. While this might be useful theoretically, from a practical standpoint, it's not that useful at all.
Second Addendum
The place to start is with the standard. Both C++03 and C++11 define is_exact as being
True if the type uses an exact representation.
That is both vague and circular. It's meaningless. Not quite so meaningless is that integer types (char, short, int, long, etc.) are "exact" by fiat:
All integer types are exact, ...
What about other arithmetic types? The first thing to note is that the only other arithmetic types are the floating point types float, double, and long double (3.9.1/8):
There are three floating point types: float, double, and long double. ... The value representation of floating-point types is implementation-defined. Integral and floating types are collectively called arithmetic types.
The meaning of the floating point types in C++ is markedly murky. Compare with Fortran:
A real datum is a processor approximation to the value of a real number.
Compare with ISO/IEC 10967-1, Language independent arithmetic (which the C++ standards reference in footnotes, but never as a normative reference):
A floating point type F shall be a finite subset of ℝ.
C++ on the other hand is moot with regard to what the floating point types are supposed to represent. As far as I can tell, an implementation could get away with making float a synonym for int, double a synonym for long, and long double a synonym for long long.
Once more from the standards on is_exact:
... but not all exact types are integer. For example, rational and fixed-exponent representations are exact but not integer.
This obviously doesn't apply to user-developed extensions for the simple reason that users are not allowed to define std::whatever<MyType>. Do that and you're invoking undefined behavior. This final clause can only pertain to implementations that
Define float, double, and long double in some peculiar way, or
Provide some non-standard rational or fixed point type as an arithmetic type and decide to provide a std::numeric_limits<non_standard_type> for these non-standard extensions.
I suggest that is_exact is true iff all literals of that type have their exact value. So is_exact is false for the floating types because the value of literal 0.1 is not exactly 0.1.
Per Christian Rau's comment, we can instead define is_exact to be true when the results of the four arithmetic operations between any two values of the type are either out of range or can be represented exactly, using the definitions of the operations for that type (i.e., truncating integer division, unsigned wraparound). With this definition you can cavil that floating-point operations are defined to produce the nearest representable value. Don't :-)
The problem of exactnes is not restricted to C, so lets look further.
Germane dicussion about redaction of standards apart, inexact has to apply to mathematical operations that require rounding for representing the result with the same type. For example, Scheme has such kind of definition of exactness/inexactness by mean of exact operations and exact literal constants see R5RS §6. standard procedures from http://www.schemers.org/Documents/Standards/R5RS/HTML
For case of double x=0.1 we either consider that 0.1 is a well defined double literal, or as in Scheme, that the literal is an inexact constant formed by an inexact compile time operation (rounding to the nearest double the result of operation 1/10 which is well defined in Q). So we always end up on operations.
Let's concentrate on +, the others can be defined mathematically by mean of + and group property.
A possible definition of inexactness could then be:
If there exists any pair of values (a,b) of a type such that a+b-a-b != 0,
then this type is inexact (in the sense that + operation is inexact).
For every floating point representation we know of (trivial case of nan and inf apart) there obviously exist such pair, so we can tell that float (operations) are inexact.
For well defined unsigned arithmetic model, + is exact.
For signed int, we have the problem of UB in case of overflow, so no warranty of exactness... Unless we refine the rule to cope with this broken arithmetic model:
If there exists any pair (a,b) such that (a+b) is well defined
and a+b-a-b != 0,
then the + operation is inexact.
Above well definedness could help us extend to other operations as well, but it's not really necessary.
We would then have to consider the case of / as false polymorphism rather than inexactness
(/ being defined as the quotient of Euclidean division for int).
Of course, this is not an official rule, validity of this answer is limited to the effort of rational thinking
The definition given in the C++ standard seems fairly unambiguous:
static constexpr bool is_exact;
True if the type uses an exact representation. All integer types are exact, but not all exact types are
integer. For example, rational and fixed-exponent representations are exact but not integer.
Meaningful for all specializations.
In C++ the int type is used to represent a mathematical integer type (i.e. one of the set of {..., -1, 0, 1, ...}). Due to the practical limitation of implementation, the language defines the minimum range of values that should be held by that type, and all valid values in that range must be represented without ambiguity on all known architectures.
The standard also defines types that are used to hold floating point numbers, each with their own range of valid values. What you won't find is the list of valid floating point numbers. Again, due to practical limitations the standard allows for approximations of these types. Many people try to say that only numbers that can be represented by the IEEE floating point standard are exact values for those types, but that's not part of the standard. Though it is true that the implementation of the language on binary computers has a standard for how double and float are represented, there is nothing in the language that says it has to be implemented on a binary computer. In other words float isn't defined by the IEEE standard, the IEEE standard is just an acceptable implementation. As such, if there were an implementation that could hold any value in the range of values that define double and float without rounding rules or estimation, you could say that is_exact is true for that platform.
Strictly speaking, T can't be your only argument to tell whether a type "is_exact", but we can infer some of the other arguments. Because you're probably using a binary computer with standard hardware and any publicly available C++ compiler, when you assign a double the value of .1 (which is in the acceptable range for the floating point types), that's not the number the computer will use in calculations with that variable. It uses the closest approximation as defined by the IEEE standard. Granted, if you compare a literal with itself your compiler should return true, because the IEEE standard is pretty explicit. We know that computers don't have infinite precision and therefore calculations that we expect to have a value of .1 won't necessarily end up with the same approximate representation that the literal value has. Enter the dreaded epsilon comparison.
To practically answer your question, I would say that for any type which requires an epsilon comparison to test for approximate equality, is_exact should return false. If strict comparison is sufficient for that type, it should return true.
std::numeric_limits<T>::is_exact should be false if and only if T's definition allows values that may be unstorable.
C++ considers any floating point literal to be a valid value for its type. And implementations are allowed to decide which values have exact stored representation.
So for every real number in the allowed range (such as 2.0 or 0.2), C++ always promises that the number is a valid double and never promises that the value can be stored exactly.
This means that two assumptions made in the question - while true for the ubiquitous IEEE floating point standard - are incorrect for the C++ definition:
I'm certain that I could store a 2 in a double exactly.
I could then divide [it] by 10 and [the double would not] hold the
mathematical result exactly.
Do the underlying bits just get "reinterpreted" as a floating point value? Or is there a run-time conversion to produce the nearest floating point value?
Is endianness a factor on any platforms (i.e., endianness of floats differs from ints)?
How do different width types behave (e.g., int to float vs. int to double)?
What does the language standard guarantee about the safety of such casts/conversions? By cast, I mean a static_cast or C-style cast.
What about the inverse float to int conversion (or double to int)? If a float holds a small magnitude value (e.g., 2), does the bit pattern have the same meaning when interpreted as an int?
Do the underlying bits just get "reinterpreted" as a floating point value?
No, the value is converted according to the rules in the standard.
is there a run-time conversion to produce the nearest floating point value?
Yes there's a run-time conversion.
For floating point -> integer, the value is truncated, provided that the source value is in range of the integer type. If it is not, behaviour is undefined. At least I think that it's the source value, not the result, that matters. I'd have to look it up to be sure. The boundary case if the target type is char, say, would be CHAR_MAX + 0.5. I think it's undefined to cast that to char, but as I say I'm not certain.
For integer -> floating point, the result is the exact same value if possible, or else is one of the two floating point values either side of the integer value. Not necessarily the nearer of the two.
Is endianness a factor on any platforms (i.e., endianness of floats differs from ints)?
No, never. The conversions are defined in terms of values, not storage representations.
How do different width types behave (e.g., int to float vs. int to double)?
All that matters is the ranges and precisions of the types. Assuming 32 bit ints and IEEE 32 bit floats, it's possible for an int->float conversion to be imprecise. Assuming also 64 bit IEEE doubles, it is not possible for an int->double conversion to be imprecise, because all int values can be exactly represented as a double.
What does the language standard guarantee about the safety of such casts/conversions? By cast, I mean a static_cast or C-style cast.
As indicated above, it's safe except in the case where a floating point value is converted to an integer type, and the value is outside the range of the destination type.
If a float holds a small magnitude value (e.g., 2), does the bit pattern have the same meaning when interpreted as an int?
No, it does not. The IEEE 32 bit representation of 2 is 0x40000000.
For reference, this is what ISO-IEC 14882-2003 says
4.9 Floating-integral conversions
An rvalue of a floating point type can be converted to an rvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type. [Note:If the destination type is `bool, see 4.12. ]
An rvalue of an integer type or of an enumeration type can be converted to an rvalue of a floating point type. The result is exact if possible. Otherwise, it is an implementation-defined choice of either the next lower or higher representable value. [Note:loss of precision occurs if the integral value cannot be represented exactly as a value of the floating type. ] If the source type is bool, the value falseis converted to zero and the value true is converted to one.
Reference: What Every Computer Scientist Should Know About Floating-Point Arithmetic
Other highly valuable references on the subject of fast float to int conversions:
Know your FPU
Let's Go To The (Floating) Point
Know your FPU: Fixing Floating Fast
Have a good read!
There are normally run-time conversions, as the bit representations are not generally compatible (with the exception that binary 0 is normally both 0 and 0.0). The C and C++ standards deal only with value, not representation, and specify generally reasonable behavior. Remember that a large int value will not normally be exactly representable in a float, and a large float value cannot be represented by an int.
Therefore:
All conversions are by value, not bit patterns. Don't worry about the bit patterns.
Don't worry about endianness, either, since that's a matter of bitwise representation.
Converting int to float can lose precision if the integer value is large in absolute value; it is less likely to with double, since double is more precise, and can represent many more exact numbers. (The details depend on what representations the system is actually using.)
The language definitions say nothing about bit patterns.
Converting from float to int is also a matter of values, not bit patterns. An exact floating-point 2.0 will convert to an integral 2 because that's how the implementation is set up, not because of bit patterns.
When you convert an integer to a float, you are not liable to loose any precision unless you are dealing with extremely large integers.
When you convert a float to an int you are essentially performing the floor() operation. so it just drops the bits after the decimal
For more information on floating point read: http://www.lahey.com/float.htm
The IEEE single-precision format has 24 bits of mantissa, 8 bits of exponent, and a sign bit. The internal floating-point registers in Intel microprocessors such as the Pentium have 64 bits of mantissa, 15 bits of exponent and a sign bit. This allows intermediate calculations to be performed with much less loss of precision than many other implementations. The down side of this is that, depending upon how intermediate values are kept in registers, calculations that look the same can give different results.
So if your integer uses more than 24 bits (excluding the hidden leading bit), then you are likely to loose some precision in conversion.
Reinterpreted? The term "reinterpretation" usually refers to raw memory reinterpretation. It is, of course, impossible to meaningfully reinterpret an integer value as a floating-point value (and vice versa) since their physical representations are generally completely different.
When you cast the types, a run-time conversion is being performed (as opposed to reinterpretation). The conversion is normally not just conceptual, it requires an actual run-time effort, because of the difference in physical representation. There are no language-defined relationships between the bit patterns of source and target values. Endianness plays no role in it either.
When you convert an integer value to a floating-point type, the original value is converted exactly if it can be represented exactly by the target type. Otherwise, the value will be changed by the conversion process.
When you convert a floating-point value to integer type, the fractional part is simply discarded (i.e. not the nearest value is taken, but the number is rounded towards zero). If the result does not fit into the target integer type, the behavior is undefined.
Note also, that floating-point to integer conversions (and the reverse) are standard conversions and formally require no explicit cast whatsoever. People might sometimes use an explicit cast to suppress compiler warnings.
If you cast the value itself, it will get converted (so in a float -> int conversion 3.14 becomes 3).
But if you cast the pointer, then you will actually 'reinterpret' the underlying bits. So if you do something like this:
double d = 3.14;
int x = *reinterpret_cast<int *>(&d);
x will have a 'random' value that is based on the representation of floating point.
Converting FP to integral type is nontrivial and not even completely defined.
Typically your FPU implements a hardware instruction to convert from IEEE format to int. That instruction might take parameters (implemented in hardware) controlling rounding. Your ABI probably specifies round-to-nearest-even. If you're on X86+SSE, it's "probably not too slow," but I can't find a reference with one Google search.
As with anything FP, there are corner cases. It would be nice if infinity were mapped to (TYPE)_MAX but that is alas not typically the case — the result of int x = INFINITY; is undefined.