Integer to float conversions with IEEE FP - c++

What are the guarantees regarding conversions from integral to floating-point types in a C++ implementation supporting IEEE-754 FP arithmetic?
Specifically, is it always well-defined behaviour to convert any integral value to any floating-point type, possibly resulting in a value of +-inf? Or are there situations in which this would result in undefined behaviour?
(Note, I am not asking about exact conversion, just if performing the conversion is always legal from the point of view of the language standard)

IEC 60559 (the current successor standard to IEEE 754) makes integer-to-float conversion well-defined in all cases, as discussed in Franck's answer, but it is the language standard that has the final word on the subject.
In the base standard, C++11 section 4.9 "Floating-integral conversions", paragraph 2, makes out-of-range integer-to-floating-point conversions undefined behavior. (Quotation is from document N3337, which is the closest approximation to the official 2011 C++ standard that is publicly available at no charge.)
A prvalue of an integer type or of an unscoped enumeration type can be converted to a prvalue of a floating
point type. The result is exact if possible. If the value being converted is in the range of values that can
be represented but the value cannot be represented exactly, it is an implementation-defined choice of either
the next lower or higher representable value. [ Note: Loss of precision occurs if the integral value cannot
be represented exactly as a value of the floating type. — end note ] If the value being converted is outside
the range of values that can be represented, the behavior is undefined. If the source type is bool, the value
false is converted to zero and the value true is converted to one.
Emphasis mine. The C standard says the same thing in different words (section 6.3.1.4 paragraph 2).
The C++ standard does not discuss what it would mean for an implementation of C++ to supply IEC 60559-conformant floating-point arithmetic. However, the C standard (closest approximation to C11 available online at no charge is N1570) does discuss this in its Annex F, and C++ implementors do tend to turn to C for guidance when C++ leaves something unspecified. There is no explicit discussion of integer to floating point conversion in Annex F, but there is this sentence in F.1p1:
Since
negative and positive infinity are representable in IEC 60559 formats, all real numbers lie
within the range of representable values.
Putting that sentence together with 6.3.1.4p2 suggests to me that the C committee meant for integer-to-floating conversion to produce ±Inf when the integer's magnitude is outside the range of representable finite numbers. And that interpretation is consistent with the IEC 60559-specified behavior of conversions, so we can be reasonably confident that that's what an implementation of C that claimed to conform to Annex F would do.
However, applying any interpretation of the C standard to C++ is risky at best; C++ has not been defined as a superset of C for a very long time.
If your implementation of C++ predefines the macro __STDC_IEC_559__ and/or documents compliance with IEC 60559 in some way and you don't use the "be sloppy about floating-point math in the name of speed" compilation mode (which may be on by default), you can probably rely on out-of-range conversions to produce ±Inf. Otherwise, it's UB.

In section 7.4 the standard IEEE-754 (2008) says it is well-defined behavior for it. But it is related to IEEE-754 and C/C++ implementations are free to respect it or not (see the zwol answer).
The overflow exception shall be signaled if and only if
the destination format’s largest finite number is
exceeded in magnitude by what would have been the rounded floating-point result were the exponent
range unbounded. The default result shall be determined by the rounding-direction attribute and the sign of
the intermediate result as follows:
a)
roundTiesToEven and roundTiesToAway carry all overflows to
∞
with the sign of the intermediate
result.
b) roundTowardZero carries all overflows to the format’s largest finite number with the sign of the
intermediate result.
c)
roundTowardNegative carries positive overflows to the format’s largest finite number, and carries
negative overflows to
−∞
d) roundTowardPositive carries negative overflows to the format’s most negative finite number, and
carries positive overflows to +
∞
All cases of these 4 points provide a deterministic result for the conversion from integral to floating-point type.
All other cases (no overflow) are also well-defined with deterministic results given by the IEEE-754 standard.

Related

How is the precision loss from integer to float defined in C++?

I've a question to the code snippet below:
long l=9223372036854775807L;
float f=static_cast<float>(l);
The long value cannot be represanted exactly according to the IEEE754.
My Question is how is the lossy conversion handled:
Is the nearest floating point representation taken?
Is the next smaller/bigger representation taken?
Or is an other approach is taken?
I'm aware of this question
what happens at background when convert int to float but this does not anwser my question.
C++ defines the conversion like this (quoting latest standard draft):
[conv.fpint]
A prvalue of an integer type or of an unscoped enumeration type can be converted to a prvalue of a floating-point type.
The result is exact if possible.
If the value being converted is in the range of values that can be represented but the value cannot be represented exactly, it is an implementation-defined choice of either the next lower or higher representable value.
[ Note: Loss of precision occurs if the integral value cannot be represented exactly as a value of the floating-point type.
— end note
]
If the value being converted is outside the range of values that can be represented, the behavior is undefined.
If the source type is bool, the value false is converted to zero and the value true is converted to one.
The IEEE 754 standard defines conversion like this:
5.4.1 Arithmetic operations
It shall be possible to convert from all supported signed and unsigned integer formats to all supported arithmetic formats. Integral values are converted exactly from integer formats to floating-point formats whenever the value is representable in both formats. If the converted value is not exactly representable in the destination format, the result is determined according to the applicable rounding-direction attribute, and an inexact or floating-point overflow exception arises as specified in Clause 7, just as with arithmetic operations. The signs of integer zeros are preserved. Integer zeros without signs are converted to +0. The preferred exponent is 0.
Rounding modes are specified as:
4.3.1 Rounding-direction attributes to nearest
roundTiesToEven, the floating-point number nearest to the infinitely precise result shall be delivered; if the two nearest floating-point numbers bracketing an unrepresentable infinitely precise result are equally near, the one with an even least significant digit shall be delivered.
roundTiesToAway, the floating-point number nearest to the infinitely precise result shall be delivered; if the two nearest floating-point numbers bracketing an unrepresentable infinitely precise result are equally near, the one with larger magnitude shall be delivered.
4.3.2 Directed rounding attributes
roundTowardPositive, the result shall be the format’s floating-point number (possibly +∞) closest to and no less than the infinitely precise result
roundTowardNegative, the result shall be the format’s floating-point number (possibly −∞) closest to and no greater than the infinitely precise result
roundTowardZero, the result shall be the format’s floating-point number closest to and no greater in magnitude than the infinitely precise result.
4.3.3 Rounding attribute requirements
The roundTiesToEven rounding-direction attribute shall be the default rounding-direction attribute for results in binary formats.
So by default, your suggestion 1 would apply, but only if another mode hasn't been selected.
The C++ standard library inherits <cfenv> from the C standard. This header offers macros, functions and types for interacting with the floating point environment, including the rounding modes.
See here:
A prvalue of integer or unscoped enumeration type can be converted to
a prvalue of any floating-point type. If the value cannot be
represented correctly, it is implementation defined whether the
closest higher or the closest lower representable value will be
selected, although if IEEE arithmetic is supported, rounding defaults
to nearest. If the value cannot fit into the destination type, the
behavior is undefined. If the source type is bool, the value false is
converted to zero, and the value true is converted to one.
As for the rounding rules of IEEE 754, there seem to be five of them. I couldn't find any information on which ones are used in which situation, though. It looks like it's up to the implementation however, you can set the rounding mode in a C++ program as described here.

The behaviour of floating point division by zero

Consider
#include <iostream>
int main()
{
double a = 1.0 / 0;
double b = -1.0 / 0;
double c = 0.0 / 0;
std::cout << a << b << c; // to stop compilers from optimising out the code.
}
I have always thought that a will be +Inf, b will be -Inf, and c will be NaN. But I also hear rumours that strictly speaking the behaviour of floating point division by zero is undefined and therefore the above code cannot considered to be portable C++. (That theoretically obliterates the integrity of my million line plus code stack. Oops.)
Who's correct?
Note I'm happy with implementation defined, but I'm talking about cat-eating, demon-sneezing undefined behaviour here.
C++ standard does not force the IEEE 754 standard, because that depends mostly on hardware architecture.
If the hardware/compiler implement correctly the IEEE 754 standard, the division will provide the expected INF, -INF and NaN, otherwise... it depends.
Undefined means, the compiler implementation decides, and there are many variables to that like the hardware architecture, code generation efficiency, compiler developer laziness, etc..
Source:
The C++ standard state that a division by 0.0 is undefined
C++ Standard 5.6.4
... If the second operand of / or % is zero the behavior is undefined
C++ Standard 18.3.2.4
...static constexpr bool is_iec559;
...56. True if and only if the type adheres to IEC 559 standard.217
...57. Meaningful for all floating point types.
C++ detection of IEEE754:
The standard library includes a template to detect if IEEE754 is supported or not:
static constexpr bool is_iec559;
#include <numeric>
bool isFloatIeee754 = std::numeric_limits<float>::is_iec559();
What if IEEE754 is not supported?
It depends, usually a division by 0 trigger a hardware exception and make the application terminate.
Quoting cppreference:
If the second operand is zero, the behavior is undefined, except that if floating-point division is taking place and the type supports IEEE floating-point arithmetic (see std::numeric_limits::is_iec559), then:
if one operand is NaN, the result is NaN
dividing a non-zero number by ±0.0 gives the correctly-signed infinity and FE_DIVBYZERO is raised
dividing 0.0 by 0.0 gives NaN and FE_INVALID is raised
We are talking about floating-point division here, so it is actually implementation-defined whether double division by zero is undefined.
If std::numeric_limits<double>::is_iec559 is true, and it is "usually true", then the behaviour is well-defined and produces the expected results.
A pretty safe bet would be to plop down a:
static_assert(std::numeric_limits<double>::is_iec559, "Please use IEEE754, you weirdo");
... near your code.
Division by zero both integer and floating point are undefined behavior [expr.mul]p4:
The binary / operator yields the quotient, and the binary % operator yields the remainder from the division
of the first expression by the second. If the second operand of / or % is zero the behavior is undefined. ...
Although implementation can optionally support Annex F which has well defined semantics for floating point division by zero.
We can see from this clang bug report clang sanitizer regards IEC 60559 floating-point division by zero as undefined that even though the macro __STDC_IEC_559__ is defined, it is being defined by the system headers and at least for clang does not support Annex F and so for clang remains undefined behavior:
Annex F of the C standard (IEC 60559 / IEEE 754 support) defines the
floating-point division by zero, but clang (3.3 and 3.4 Debian snapshot)
regards it as undefined. This is incorrect:
Support for Annex F is optional, and we do not support it.
#if STDC_IEC_559
This macro is being defined by your system headers, not by us; this is
a bug in your system headers. (FWIW, GCC does not fully support Annex
F either, IIRC, so it's not even a Clang-specific bug.)
That bug report and two other bug reports UBSan: Floating point division by zero is not undefined and clang should support Annex F of ISO C (IEC 60559 / IEEE 754) indicate that gcc is conforming to Annex F with respect to floating point divide by zero.
Though I agree that it isn't up to the C library to define STDC_IEC_559 unconditionally, the problem is specific to clang. GCC does not fully support Annex F, but at least its intent is to support it by default and the division is well-defined with it if the rounding mode isn't changed. Nowadays not supporting IEEE 754 (at least the basic features like the handling of division by zero) is regarded as bad behavior.
This is further support by the gcc Semantics of Floating Point Math in GCC wiki which indicates that -fno-signaling-nans is the default which agrees with the gcc optimizations options documentation which says:
The default is -fno-signaling-nans.
Interesting to note that UBSan for clang defaults to including float-divide-by-zero under -fsanitize=undefined while gcc does not:
Detect floating-point division by zero. Unlike other similar options, -fsanitize=float-divide-by-zero is not enabled by -fsanitize=undefined, since floating-point division by zero can be a legitimate way of obtaining infinities and NaNs.
See it live for clang and live for gcc.
Division by 0 is undefined behavior.
From section 5.6 of the C++ standard (C++11):
The binary / operator yields the quotient, and the binary % operator
yields the remainder from the division of the first expression by the
second. If the second operand of / or % is zero the behavior is
undefined. For integral operands the / operator yields the algebraic
quotient with any fractional part discarded; if the quotient a/b is
representable in the type of the result, (a/b)*b + a%b is equal to a .
No distinction is made between integer and floating point operands for the / operator. The standard only states that dividing by zero is undefined without regard to the operands.
In [expr]/4 we have
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined. [ Note: most existing implementations of C++ ignore integer overflows. Treatment of division by zero, forming a remainder using a zero divisor, and all floating point exceptions vary among machines, and is usually adjustable by a library function. —end note ]
Emphasis mine
So per the standard this is undefined behavior. It does go on to say that some of these cases are actually handled by the implementation and are configurable. So it won't say it is implementation defined but it does let you know that implementations do define some of this behavior.
As to the submitter's question 'Who's correct?', it is perfectly OK to say that both answers are correct. The fact that the C standard describes the behavior as 'undefined' DOES NOT dictate what the underlying hardware actually does; it merely means that if you want your program to be meaningful according to the standard you -may not assume- that the hardware actually implements that operation. But if you happen to be running on hardware that implements the IEEE standard, you will find the operation is in fact implemented, with the results as stipulated by the IEEE standard.
This also depends on the floating point environment.
cppreference has details:
http://en.cppreference.com/w/cpp/numeric/fenv
(no examples though).
This should be available in most desktop/server C++11 and C99 environments. There are also platform-specific variations that predate the standardization of all this.
I would expect that enabling exceptions makes the code run more slowly, so probably for this reason most platforms that I know of disable exceptions by default.

Is maximum float + x defined behavior?

I did a quick test using the following:
float x = std::numeric_limits<float>::max();
x += 0.1;
that resulted in x == std::numeric_limits::max() so it didn't get any bigger than the limit.
Is this guaranteed behavior across compilers and platforms though? What about HLSL?
Is this guaranteed behavior across compilers and platforms though?
No, the behavior is undefined. The standard says (emphasis mine):
5 Expressions....
If during the evaluation of an expression, the result is not mathematically defined or not in the range of
representable values for its type, the behavior is undefined. [ Note: most existing implementations of C++
ignore integer overflows. Treatment of division by zero, forming a remainder using a zero divisor, and all
floating point exceptions vary among machines, and is usually adjustable by a library function. —end note ]
As #user2079303 mentioned, in practice we can be less restricted:
it is not undefined if std::numeric_limits<float>::has_infinity. Which is often true. In that case, the result is merely unspecified.
The value of std::numeric_limits<T>::max() is defined to be the maximum finite value representable by type T (see 18.3.2.4 [numeric.limits.members] paragraph 4). Thus, the question actually becomes multiple subquestions:
Is it possible to create a value bigger than std::numeric_limits<T>::max(), i.e., is there an infinity?
If so, which value needs to be added to std::numeric_limits<T>::max() to get the infinity?
If not, is the behavior defined?
C++ does not specify the floating point format and different formats may disagree on what the result is. In particular, I don't think floating point formats need to define a value for infinity. For example, IBM Floating Points do not have an infinity. On the other hand the IEEE 754 does have an infinity representation.
Since overflow of arithmetic types may be undefined behavior (see 5 [expr] paragraph 4) and I don't see any exclusion for floating point types. Thus, the behavior would be undefined behavior if there is no infinity. At least, it can be tested whether a type does have an infinity (see 18.3.2.3 [numeric.limits] paragraph 35) in which case the operation can't overflow.
If there is an infinity I think adding any value to std::numeric_limits<T>::max() would get you infinity. However, determining whether that is, indeed, the case would require to dig through the respective floating point specification. I could imagine that IEEE 754 might ignore additions if the value is too small to be relevant as is the case for adding 0.1 to std::numeric_limits<T>::max(). I could also imagine that it decides that it always overflows to infinity.

std::numeric_limits::is_exact ... what is a usable definition?

As I interpret it, MSDN's definition of numeric_limits::is_exactis almost always false:
[all] calculations done on [this] type are free of rounding errors.
And IBM's definition is almost always true: (Or a circular definition, depending on how you read it)
a type that has exact representations for all its values
What I'm certain of is that I could store a 2 in both a double and a long and they would both be represented exactly.
I could then divide them both by 10 and neither would hold the mathematical result exactly.
Given any numeric data type T, what is the correct way to define std::numeric_limits<T>::is_exact?
Edit:
I've posted what I think is an accurate answer to this question from details supplied in many answers. This answer is not a contender for the bounty.
The definition in the standard (see NPE's answer) isn't very exact, is it? Instead, it's circular and vague.
Given that the IEC floating point standard has a concept of "inexact" numbers (and an inexact exception when a computation yields an inexact number), I suspect that this is the origin of the name is_exact. Note that of the standard types, is_exact is false only for float, double, and long double.
The intent is to indicate whether the type exactly represents all of the numbers of the underlying mathematical type. For integral types, the underlying mathematical type is some finite subset of the integers. Since each integral types exactly represents each and every one of the members of the subset of the integers targeted by that type, is_exact is true for all of the integral types. For floating point types, the underlying mathematical type is some finite range subset of the real numbers. (An example of a finite range subset is "all real numbers between 0 and 1".) There's no way to represent even a finite range subset of the reals exactly; almost all are uncomputable. The IEC/IEEE format makes matters even worse. With that format, computers can't even represent a finite range subset of the rational numbers exactly (let alone a finite range subset of the computable numbers).
I suspect that the origin of the term is_exact is the long-standing concept of "inexact" numbers in various floating point representation models. Perhaps a better name would have been is_complete.
Addendum
The numeric types defined by the language aren't the be-all and end-all of representations of "numbers". A fixed point representation is essentially the integers, so they too would be exact (no holes in the representation). Representing the rationals as a pair of standard integral types (e.g., int/int) would not be exact, but a class that represented the rationals as a Bignum pair would, at least theoretically, be "exact".
What about the reals? There's no way to represent the reals exactly because almost all of the reals are not computable. The best we could possibly do with computers is the computable numbers. That would require representing a number as some algorithm. While this might be useful theoretically, from a practical standpoint, it's not that useful at all.
Second Addendum
The place to start is with the standard. Both C++03 and C++11 define is_exact as being
True if the type uses an exact representation.
That is both vague and circular. It's meaningless. Not quite so meaningless is that integer types (char, short, int, long, etc.) are "exact" by fiat:
All integer types are exact, ...
What about other arithmetic types? The first thing to note is that the only other arithmetic types are the floating point types float, double, and long double (3.9.1/8):
There are three floating point types: float, double, and long double. ... The value representation of floating-point types is implementation-defined. Integral and floating types are collectively called arithmetic types.
The meaning of the floating point types in C++ is markedly murky. Compare with Fortran:
A real datum is a processor approximation to the value of a real number.
Compare with ISO/IEC 10967-1, Language independent arithmetic (which the C++ standards reference in footnotes, but never as a normative reference):
A floating point type F shall be a finite subset of ℝ.
C++ on the other hand is moot with regard to what the floating point types are supposed to represent. As far as I can tell, an implementation could get away with making float a synonym for int, double a synonym for long, and long double a synonym for long long.
Once more from the standards on is_exact:
... but not all exact types are integer. For example, rational and fixed-exponent representations are exact but not integer.
This obviously doesn't apply to user-developed extensions for the simple reason that users are not allowed to define std::whatever<MyType>. Do that and you're invoking undefined behavior. This final clause can only pertain to implementations that
Define float, double, and long double in some peculiar way, or
Provide some non-standard rational or fixed point type as an arithmetic type and decide to provide a std::numeric_limits<non_standard_type> for these non-standard extensions.
I suggest that is_exact is true iff all literals of that type have their exact value. So is_exact is false for the floating types because the value of literal 0.1 is not exactly 0.1.
Per Christian Rau's comment, we can instead define is_exact to be true when the results of the four arithmetic operations between any two values of the type are either out of range or can be represented exactly, using the definitions of the operations for that type (i.e., truncating integer division, unsigned wraparound). With this definition you can cavil that floating-point operations are defined to produce the nearest representable value. Don't :-)
The problem of exactnes is not restricted to C, so lets look further.
Germane dicussion about redaction of standards apart, inexact has to apply to mathematical operations that require rounding for representing the result with the same type. For example, Scheme has such kind of definition of exactness/inexactness by mean of exact operations and exact literal constants see R5RS §6. standard procedures from http://www.schemers.org/Documents/Standards/R5RS/HTML
For case of double x=0.1 we either consider that 0.1 is a well defined double literal, or as in Scheme, that the literal is an inexact constant formed by an inexact compile time operation (rounding to the nearest double the result of operation 1/10 which is well defined in Q). So we always end up on operations.
Let's concentrate on +, the others can be defined mathematically by mean of + and group property.
A possible definition of inexactness could then be:
If there exists any pair of values (a,b) of a type such that a+b-a-b != 0,
then this type is inexact (in the sense that + operation is inexact).
For every floating point representation we know of (trivial case of nan and inf apart) there obviously exist such pair, so we can tell that float (operations) are inexact.
For well defined unsigned arithmetic model, + is exact.
For signed int, we have the problem of UB in case of overflow, so no warranty of exactness... Unless we refine the rule to cope with this broken arithmetic model:
If there exists any pair (a,b) such that (a+b) is well defined
and a+b-a-b != 0,
then the + operation is inexact.
Above well definedness could help us extend to other operations as well, but it's not really necessary.
We would then have to consider the case of / as false polymorphism rather than inexactness
(/ being defined as the quotient of Euclidean division for int).
Of course, this is not an official rule, validity of this answer is limited to the effort of rational thinking
The definition given in the C++ standard seems fairly unambiguous:
static constexpr bool is_exact;
True if the type uses an exact representation. All integer types are exact, but not all exact types are
integer. For example, rational and fixed-exponent representations are exact but not integer.
Meaningful for all specializations.
In C++ the int type is used to represent a mathematical integer type (i.e. one of the set of {..., -1, 0, 1, ...}). Due to the practical limitation of implementation, the language defines the minimum range of values that should be held by that type, and all valid values in that range must be represented without ambiguity on all known architectures.
The standard also defines types that are used to hold floating point numbers, each with their own range of valid values. What you won't find is the list of valid floating point numbers. Again, due to practical limitations the standard allows for approximations of these types. Many people try to say that only numbers that can be represented by the IEEE floating point standard are exact values for those types, but that's not part of the standard. Though it is true that the implementation of the language on binary computers has a standard for how double and float are represented, there is nothing in the language that says it has to be implemented on a binary computer. In other words float isn't defined by the IEEE standard, the IEEE standard is just an acceptable implementation. As such, if there were an implementation that could hold any value in the range of values that define double and float without rounding rules or estimation, you could say that is_exact is true for that platform.
Strictly speaking, T can't be your only argument to tell whether a type "is_exact", but we can infer some of the other arguments. Because you're probably using a binary computer with standard hardware and any publicly available C++ compiler, when you assign a double the value of .1 (which is in the acceptable range for the floating point types), that's not the number the computer will use in calculations with that variable. It uses the closest approximation as defined by the IEEE standard. Granted, if you compare a literal with itself your compiler should return true, because the IEEE standard is pretty explicit. We know that computers don't have infinite precision and therefore calculations that we expect to have a value of .1 won't necessarily end up with the same approximate representation that the literal value has. Enter the dreaded epsilon comparison.
To practically answer your question, I would say that for any type which requires an epsilon comparison to test for approximate equality, is_exact should return false. If strict comparison is sufficient for that type, it should return true.
std::numeric_limits<T>::is_exact should be false if and only if T's definition allows values that may be unstorable.
C++ considers any floating point literal to be a valid value for its type. And implementations are allowed to decide which values have exact stored representation.
So for every real number in the allowed range (such as 2.0 or 0.2), C++ always promises that the number is a valid double and never promises that the value can be stored exactly.
This means that two assumptions made in the question - while true for the ubiquitous IEEE floating point standard - are incorrect for the C++ definition:
I'm certain that I could store a 2 in a double exactly.
I could then divide [it] by 10 and [the double would not] hold the
mathematical result exactly.

Division by zero: Undefined Behavior or Implementation Defined in C and/or C++?

Regarding division by zero, the standards say:
C99 6.5.5p5 - The result of the / operator is the quotient from the division of the first operand by the second; the result of the % operator is the remainder. In both operations, if the value of the second operand is zero, the behavior is undefined.
C++03 5.6.4 - The binary / operator yields the quotient, and the binary % operator yields the remainder from the division of the first expression by the second. If the second operand of / or % is zero the behavior is undefined.
If we were to take the above paragraphs at face value, the answer is clearly Undefined Behavior for both languages. However, if we look further down in the C99 standard we see the following paragraph which appears to be contradictory(1):
C99 7.12p4 - The macro INFINITY expands to a constant expression of type float representing positive or unsigned infinity, if available;
Do the standards have some sort of golden rule where Undefined Behavior cannot be superseded by a (potentially) contradictory statement? Barring that, I don't think it's unreasonable to conclude that if your implementation defines the INFINITY macro, division by zero is defined to be such. However, if your implementation does not define such a macro, the behavior is Undefined.
I'm curious what the consensus is (if any) on this matter for each of the two languages. Would the answer change if we are talking about integer division int i = 1 / 0 versus floating point division float i = 1.0 / 0.0 ?
Note (1) The C++03 standard talks about the <cmath> library which includes the INFINITY macro.
I don't see any contradiction. Division by zero is undefined, period. There is no mention of "... unless INFINITY is defined" anywhere in the quoted text.
Note that nowhere in mathematics it is defined that 1 / 0 = infinity. One might interpret it that way, but it is a personal, "shortcut" style interpretation, rather than a sound fact.
1 / 0 is not infinity, only lim 1/x = ∞ (x -> +0)
This was not a math purest question, but a C/C++ question.
According to the IEEE 754 Standard, which all modern C compilers / FPU's use, we have
3.0 / 0.0 = INF
0.0 / 0.0 = NaN
-3.0 / 0.0 = -INF
The FPU will have a status flag that you can set to generate an exception if so desired, but this is not the norm.
INF can be quite useful to avoiding branching when INF is a useful result. See discussion here
http://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF
Why would it?
That doesn't make sense mathematically, it's not as if 1/x is defined as ∞ in mathematics in general. Also, you would at least need two more cases: -1/x and 0/x can't also equal ∞.
See division by zero in general, and the section about computer arithmetic in particular.
Implementations which define __STDC_IEC_559__ are required to abide by the requirements given in Annex F, which in turn requires floating-point semantics consistent with IEC 60559. The Standard imposes no requirements on the behavior of floating-point division by zero on implementations which do not define __STDC_IEC_559__, but does for those which do define it. In cases where IEC 60559 specifies a behavior but the C Standard does not, a compiler which defines __STDC_IEC_559__ is required by the C Standard to behave as described in the IEC standard.
As defined by IEC 60559 (or the US standard IEEE-754) Division of zero by zero yields NaN, division of a floating-point number by positive zero or literal constant zero yields an INF value with the same sign as the dividend, and division of a floating-point number by negative zero yields an INF with the opposite sign.
I've only got the C99 draft. In §7.12/4 it says:
The macro
INFINITY
expands to a constant expression of
type float representing positive or
unsigned infinity, if available; else
to a positive constant of type float
that overflows at translation time.
Note that INFINITY can be defined in terms of floating-point overflow, not necessarily divide-by-zero.
For the INFINITY macro: there is a explicit coding to represent +/- infinity in the IEEE754 standard, which is if all exponent bits are set and all fraction bits are cleared (if a fraction bit is set, it represents NaN)
With my compiler, (int) INFINITY == -2147483648, so an expression that evaluates to int i = 1/0 would definitely produce wrong results if INFINITIY was returned
Bottom line, C99 (as per your quotes) does not say anything about INFINITY in the context of "implementation-defined". Secondly, what you quoted does not show inconsistent meaning of "undefined behavior".
[Quoting Wikipedia's Undefined Behavior page] "In C and C++, implementation-defined behavior is also used, where the language standard does not specify the behavior, but the implementation must choose a behavior and needs to document and observe the rules it chose."
More precisely, the standard means "implementation-defined" (I think only) when it uses those words with respect to the statement made since "implementation-defined" is a specific attribute of the standard. The quote of C99 7.12p4 didn't mention "implementation-defined".
[From C99 std (late draft)] "undefined behavior: behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements"
Note there is "no requirement" imposed for undefined behavior!
[C99 ..] "implementation-defined behavior: unspecified behavior where each implementation documents how the choice is made"
[C99 ..] "unspecified behavior: use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance"
Documentation is a requirement for implementation-defined behavior.