std::numeric_limits::is_exact ... what is a usable definition?

std::numeric_limits::is_exact ... what is a usable definition? - c++

As I interpret it, MSDN's definition of numeric_limits::is_exactis almost always false:
[all] calculations done on [this] type are free of rounding errors.
And IBM's definition is almost always true: (Or a circular definition, depending on how you read it)
a type that has exact representations for all its values
What I'm certain of is that I could store a 2 in both a double and a long and they would both be represented exactly.
I could then divide them both by 10 and neither would hold the mathematical result exactly.
Given any numeric data type T, what is the correct way to define std::numeric_limits<T>::is_exact?
Edit:
I've posted what I think is an accurate answer to this question from details supplied in many answers. This answer is not a contender for the bounty.

The definition in the standard (see NPE's answer) isn't very exact, is it? Instead, it's circular and vague.
Given that the IEC floating point standard has a concept of "inexact" numbers (and an inexact exception when a computation yields an inexact number), I suspect that this is the origin of the name is_exact. Note that of the standard types, is_exact is false only for float, double, and long double.
The intent is to indicate whether the type exactly represents all of the numbers of the underlying mathematical type. For integral types, the underlying mathematical type is some finite subset of the integers. Since each integral types exactly represents each and every one of the members of the subset of the integers targeted by that type, is_exact is true for all of the integral types. For floating point types, the underlying mathematical type is some finite range subset of the real numbers. (An example of a finite range subset is "all real numbers between 0 and 1".) There's no way to represent even a finite range subset of the reals exactly; almost all are uncomputable. The IEC/IEEE format makes matters even worse. With that format, computers can't even represent a finite range subset of the rational numbers exactly (let alone a finite range subset of the computable numbers).
I suspect that the origin of the term is_exact is the long-standing concept of "inexact" numbers in various floating point representation models. Perhaps a better name would have been is_complete.
Addendum
The numeric types defined by the language aren't the be-all and end-all of representations of "numbers". A fixed point representation is essentially the integers, so they too would be exact (no holes in the representation). Representing the rationals as a pair of standard integral types (e.g., int/int) would not be exact, but a class that represented the rationals as a Bignum pair would, at least theoretically, be "exact".
What about the reals? There's no way to represent the reals exactly because almost all of the reals are not computable. The best we could possibly do with computers is the computable numbers. That would require representing a number as some algorithm. While this might be useful theoretically, from a practical standpoint, it's not that useful at all.
Second Addendum
The place to start is with the standard. Both C++03 and C++11 define is_exact as being
True if the type uses an exact representation.
That is both vague and circular. It's meaningless. Not quite so meaningless is that integer types (char, short, int, long, etc.) are "exact" by fiat:
All integer types are exact, ...
What about other arithmetic types? The first thing to note is that the only other arithmetic types are the floating point types float, double, and long double (3.9.1/8):
There are three floating point types: float, double, and long double. ... The value representation of floating-point types is implementation-defined. Integral and floating types are collectively called arithmetic types.
The meaning of the floating point types in C++ is markedly murky. Compare with Fortran:
A real datum is a processor approximation to the value of a real number.
Compare with ISO/IEC 10967-1, Language independent arithmetic (which the C++ standards reference in footnotes, but never as a normative reference):
A floating point type F shall be a finite subset of ℝ.
C++ on the other hand is moot with regard to what the floating point types are supposed to represent. As far as I can tell, an implementation could get away with making float a synonym for int, double a synonym for long, and long double a synonym for long long.
Once more from the standards on is_exact:
... but not all exact types are integer. For example, rational and fixed-exponent representations are exact but not integer.
This obviously doesn't apply to user-developed extensions for the simple reason that users are not allowed to define std::whatever<MyType>. Do that and you're invoking undefined behavior. This final clause can only pertain to implementations that
Define float, double, and long double in some peculiar way, or
Provide some non-standard rational or fixed point type as an arithmetic type and decide to provide a std::numeric_limits<non_standard_type> for these non-standard extensions.

I suggest that is_exact is true iff all literals of that type have their exact value. So is_exact is false for the floating types because the value of literal 0.1 is not exactly 0.1.
Per Christian Rau's comment, we can instead define is_exact to be true when the results of the four arithmetic operations between any two values of the type are either out of range or can be represented exactly, using the definitions of the operations for that type (i.e., truncating integer division, unsigned wraparound). With this definition you can cavil that floating-point operations are defined to produce the nearest representable value. Don't :-)

The problem of exactnes is not restricted to C, so lets look further.
Germane dicussion about redaction of standards apart, inexact has to apply to mathematical operations that require rounding for representing the result with the same type. For example, Scheme has such kind of definition of exactness/inexactness by mean of exact operations and exact literal constants see R5RS §6. standard procedures from http://www.schemers.org/Documents/Standards/R5RS/HTML
For case of double x=0.1 we either consider that 0.1 is a well defined double literal, or as in Scheme, that the literal is an inexact constant formed by an inexact compile time operation (rounding to the nearest double the result of operation 1/10 which is well defined in Q). So we always end up on operations.
Let's concentrate on +, the others can be defined mathematically by mean of + and group property.
A possible definition of inexactness could then be:
If there exists any pair of values (a,b) of a type such that a+b-a-b != 0,
then this type is inexact (in the sense that + operation is inexact).
For every floating point representation we know of (trivial case of nan and inf apart) there obviously exist such pair, so we can tell that float (operations) are inexact.
For well defined unsigned arithmetic model, + is exact.
For signed int, we have the problem of UB in case of overflow, so no warranty of exactness... Unless we refine the rule to cope with this broken arithmetic model:
If there exists any pair (a,b) such that (a+b) is well defined
and a+b-a-b != 0,
then the + operation is inexact.
Above well definedness could help us extend to other operations as well, but it's not really necessary.
We would then have to consider the case of / as false polymorphism rather than inexactness
(/ being defined as the quotient of Euclidean division for int).
Of course, this is not an official rule, validity of this answer is limited to the effort of rational thinking

The definition given in the C++ standard seems fairly unambiguous:
static constexpr bool is_exact;
True if the type uses an exact representation. All integer types are exact, but not all exact types are
integer. For example, rational and ﬁxed-exponent representations are exact but not integer.
Meaningful for all specializations.

In C++ the int type is used to represent a mathematical integer type (i.e. one of the set of {..., -1, 0, 1, ...}). Due to the practical limitation of implementation, the language defines the minimum range of values that should be held by that type, and all valid values in that range must be represented without ambiguity on all known architectures.
The standard also defines types that are used to hold floating point numbers, each with their own range of valid values. What you won't find is the list of valid floating point numbers. Again, due to practical limitations the standard allows for approximations of these types. Many people try to say that only numbers that can be represented by the IEEE floating point standard are exact values for those types, but that's not part of the standard. Though it is true that the implementation of the language on binary computers has a standard for how double and float are represented, there is nothing in the language that says it has to be implemented on a binary computer. In other words float isn't defined by the IEEE standard, the IEEE standard is just an acceptable implementation. As such, if there were an implementation that could hold any value in the range of values that define double and float without rounding rules or estimation, you could say that is_exact is true for that platform.
Strictly speaking, T can't be your only argument to tell whether a type "is_exact", but we can infer some of the other arguments. Because you're probably using a binary computer with standard hardware and any publicly available C++ compiler, when you assign a double the value of .1 (which is in the acceptable range for the floating point types), that's not the number the computer will use in calculations with that variable. It uses the closest approximation as defined by the IEEE standard. Granted, if you compare a literal with itself your compiler should return true, because the IEEE standard is pretty explicit. We know that computers don't have infinite precision and therefore calculations that we expect to have a value of .1 won't necessarily end up with the same approximate representation that the literal value has. Enter the dreaded epsilon comparison.
To practically answer your question, I would say that for any type which requires an epsilon comparison to test for approximate equality, is_exact should return false. If strict comparison is sufficient for that type, it should return true.

std::numeric_limits<T>::is_exact should be false if and only if T's definition allows values that may be unstorable.
C++ considers any floating point literal to be a valid value for its type. And implementations are allowed to decide which values have exact stored representation.
So for every real number in the allowed range (such as 2.0 or 0.2), C++ always promises that the number is a valid double and never promises that the value can be stored exactly.
This means that two assumptions made in the question - while true for the ubiquitous IEEE floating point standard - are incorrect for the C++ definition:
I'm certain that I could store a 2 in a double exactly.
I could then divide [it] by 10 and [the double would not] hold the
mathematical result exactly.

Related

Is there a function that can convert every double to a unique uint64_t, maintaining precision and ORDER? (Why can't I find one?)

My understanding is that
Doubles in C++ are (at least conceptually) encoded as double-precision IEEE 754-encoded floating point numbers.
IEEE 754 says that such numbers can be represented with 64 bits.
So I should expect there exists a function f that can map every double to a unique uint64_t, and that the order should be maintained -- namely, for all double lhs, rhs, lhs < rhs == f(lhs) < f(rhs), except when (lhs or rhs is NaN).
I haven't been able to find such a function in a library or StackOverflow answer, even though such a function is probably useful to avoid instantiating an extra template for doubles in sort algorithms where double is rare as a sort-key.
I know that simply dividing by EPSILON would not work because the precision actually decreases as the numbers get larger (and improves as numbers get very close to zero); I haven't quite worked out the exact details of that scaling, though.
Surely there exists such a function in principle.
Have I not found it because it cannot be written in standard C++? That it would be too slow? That it's not as useful to people as I think?

If the representations of IEEE-754 64-bit floats are treated as 64-bit twos-complement values those values have the same order as the corresponding floating point values. The only adjustment involved is the mental one to see the pattern of bits as representing either a floating-point value or an integer value. In the CPU that’s easy: you have 64 bits of data stored in memory, and if you apply a floating-point operation to those bits you’re doing floating-point operations and if you apply integer operations to those bits you’re doing integer operations.
In C++ the type of the data determines the type of operations you can do. To apply floating-point operations to a 64-bit data object that object has to be a floating-point type. To apply integral operations it has to be an integral type.
To convert bit patterns from floating-point to integral:
std::int64_t to_int(double d) {
std::int64_t res
std::memcpy(&res, &d, sizeof(std::int64_t));
return res;
}
Converting in the other direction is left as an exercise for the reader.

Z3 - Floating point arithmetic API function Z3_mk_fpa_to_ubv

I am playing around with Z3-4.6.0 C++ for the first time. Sorry for the noob questions.
My question has 2 parts.
If I have a floating point number, and I use Z3_mk_fpa_to_ubv(...) function to create an unsigned bit-vector.
How much precision is lost?
If the precision is not lost, can I use this new unsigned bit-vector as a regular bit-vector and apply all operations defined for it for e.g., Z3_mk_bvadd(....)?
I know I can use Z3_mk_fpa_to_ieee_bv(....) for graceful, and IEEE-754 compliant conversion. Afterwards I can add,sub etc the bit-vectors.
Just being curious.
Thank you very much.

I'm afraid you're misinterpreting the role of these functions. A good reference to keep open while working with SMTLib floats is: http://smtlib.cs.uiowa.edu/papers/BTRW15.pdf
mk_fpa_to_ubv
This function corresponds to the FPToUInt function in the cited paper. It's defined as follows:
(The NaN choice above is misleading: It should be read as "undefined.")
Note that the precision loss can be huge here, depending on what the FP value is and the bit-width of the vector. Imagine converting a double-precision floating point value to an 8-bit word: You're smashing values in the range ±2.23×10^−308 to ±1.80×10^308 to a mere 256 different values. This means a large number of conversions simply will go through massive rounding cancelations.
You should think of this as "casting" in C like languages:
unsigned char c;
double f;
c = (char) f;
This is the essence of conversion from double-precision to unsigned byte, which will suffer major precision loss. In the other direction, if you convert to a really large bit-vector (say one that has a thousand bits), then your conversion will still be losing precision per the rounding mode, though you'll be able to cover all the integer values precisely in the range. So, it really depends on what BV-type you convert to and the rounding mode you choose.
mk_fpa_to_ieee_bv
This function has nothing to do with "preserving" the value. So asking "precision loss" here is irrelevant. What it does is that it gives you the underlying bit-vector representation of the floating-point value, per the IEEE-754 spec. The wikipedia article has a good discussion on this representation: https://en.wikipedia.org/wiki/Double-precision_floating-point_format#IEEE_754_double-precision_binary_floating-point_format:_binary64
In particular, if you interpret the output of this function as a two's complement integer value, you'll get a completely irrelevant value that has nothing to do with the value of the floating-point number itself. (Also, this conversion is not unique since NaN has multiple corresponding bit-vector patterns.)
Summary
Long story short, conversions from floats to bit-vectors will suffer from precision loss not only due to losing the "fractional" part due to rounding, but also due to the limited range, unless you pick a very-large bit-vector size. The IEEE-754 representation conversion does not preserve value, and thus doing arithmetic on values converted via this function is more or less meaningless.

How are the sizes of non IEEE754 floating-point types constrainted?

How are the sizes of non IEEE754 floating-point types float, double, and long double constrained?
I know that each floating-point type must be able to represent all values from a smaller type, which implies sizeof(float) <= sizeof(double) <= sizeof(long double).
From what I can tell, the float.h/cfloat minimums require sizeof(float)*CHAR_BIT>=32, sizeof(double)*CHAR_BIT>=64, and sizeof(long double)*CHAR_BIT>=64.
Are there other constraints? If so, what are they, and do any imply a maximum on these sizes?

I think the question is about constraints about the representable values. There are only fairly basic constraints which are not explicitly spelled out in the C++ but are spelled out in the C standard in section 5.2.4.2.2 ("Characteristics of floating types <float.h>), paragraph 11 (I'm merely quoting the values I consider interesting in this context):
The values given in the following list shall be replaced by constant expressions with implementation-defined values that are greater or equal in magnitude (absolute value) to those shown, with the same sign:
FLT_DECIMAL_DIG 6
DBL_DECIMAL_DIG 10
LDBL_DECIMAL_DIG 10
FLT_MIN_10_EXP -37
DBL_MIN_10_EXP -37
LDBL_MIN_10_EXP -37
FLT_MAX_10_EXP +37
DBL_MAX_10_EXP +37
LDBL_MAX_10_EXP +37
FLT_MAX 1E+37
DBL_MAX 1E+37
LDBL_MAX 1E+37
FLT_EPSILON 1E-5
DBL_EPSILON 1E-9
LDBL_EPSILON 1E-9
This pretty much says that float is likely to be smaller than double and double and long double can be the same thing and that they an be fairly far off compared to the constraints of IEEE-754.

From N3337:
3.9.1.8
There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined. Integral and floating types are collectively called arithmetic types. Specializations of the standard template std::numeric_limits (18.3) shall specify the maximum and minimum values of each arithmetic type for an implementation.
The C standard is also relevant here, so here what it (N1570) has to say about floating point types:
6.2.5.10
There are three real floating types, designated as float, double, and long
double.42) The set of values of the type float is a subset of the set of values of the
type double; the set of values of the type double is a subset of the set of values of the type long double.
42) See ‘‘future language directions’’ (6.11.1).
6.11.1.1 Floating types
Future standardization may include additional floating-point types, including those with
greater range, precision, or both than long double.
So as far as I can tell, floating point is almost all implementation defined. For good reason, floating point is implemented by the CPU. The standard can not make any guarantees about how big or small the various floating point types will be. If it did it might become simply incompatible with newer processors.
The float.h and cfloat headers are using their ability within the standard to define the implementation. The sizes you gave are not part of the standard.
So no, there are no other constraints.* And no, there are no implied maximum sizes.
This isn't strictly true. There are lots of other information defined in N1570 Section 5.2.4.2.2 but nothing that restricts floating point values in the way you're asking.

Floating-point comparison of constant assignment

When comparing doubles for equality, we need to give a tolerance level, because floating-point computation might introduce errors. For example:
double x;
double y;
x = f();
y = g();
if (fabs(x-y)<epsilon) {
// they are equal!
} else {
// they are not!
}
However, if I simply assign a constant value, without any computation, do I still need to check the epsilon?
double x = 1;
double y = 1;
if (x==y) {
// they are equal!
} else {
// no they are not!
}
Is == comparison good enough? Or I need to do fabs(x-y)<epsilon again? Is it possible to introduce error in assigning? Am I too paranoid?
How about casting (double x = static_cast<double>(100))? Is that gonna introduce floating-point error as well?
I am using C++ on Linux, but if it differs by language, I would like to understand that as well.

Actually, it depends on the value and the implementation. The C++ standard (draft n3126) has this to say in 2.14.4 Floating literals:
If the scaled value is in the range of representable values for its type, the result is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.
In other words, if the value is exactly representable (and 1 is, in IEEE754, as is 100 in your static cast), you get the value. Otherwise (such as with 0.1) you get an implementation-defined close match (a). Now I'd be very worried about an implementation that chose a different close match based on the same input token but it is possible.
(a) Actually, that paragraph can be read in two ways, either the implementation is free to choose either the closest higher or closest lower value regardless of which is actually the closest, or it must choose the closest to the desired value.
If the latter, it doesn't change this answer however since all you have to do is hardcode a floating point value exactly at the midpoint of two representable types and the implementation is once again free to choose either.
For example, it might alternate between the next higher and next lower for the same reason banker's rounding is applied - to reduce the cumulative errors.

No if you assign literals they should be the same :)
Also if you start with the same value and do the same operations, they should be the same.
Floating point values are non-exact, but the operations should produce consistent results :)

Both cases are ultimately subject to implementation defined representations.
Storage of floating point values and their representations take on may forms - load by address or constant? optimized out by fast math? what is the register width? is it stored in an SSE register? Many variations exist.
If you need precise behavior and portability, do not rely on this implementation defined behavior.

IEEE-754, which is a standard common implementations of floating point numbers abide to, requires floating-point operations to produce a result that is the nearest representable value to an infinitely-precise result. Thus the only imprecision that you will face is rounding after each operation you perform, as well as propagation of rounding errors from the operations performed earlier in the chain. Floats are not per se inexact. And by the way, epsilon can and should be computed, you can consult any numerics book on that.
Floating point numbers can represent integers precisely up to the length of their mantissa. So for example if you cast from an int to a double, it will always be exact, but for casting into into a float, it will no longer be exact for very large integers.
There is one major example of extensive usage of floating point numbers as a substitute for integers, it's the LUA scripting language, which has no integer built-in type, and floating-point numbers are used extensively for logic and flow control etc. The performance and storage penalty from using floating-point numbers turns out to be smaller than the penalty of resolving multiple types at run time and makes the implementation lighter. LUA has been extensively used not only on PC, but also on game consoles.
Now, many compilers have an optional switch that disables IEEE-754 compatibility. Then compromises are made. Denormalized numbers (very very small numbers where the exponent has reached smallest possible value) are often treated as zero, and approximations in implementation of power, logarithm, sqrt, and 1/(x^2) can be made, but addition/subtraction, comparison and multiplication should retain their properties for numbers which can be exactly represented.

The easy answer: For constants == is ok.
There are two exceptions which you should be aware of:
First exception:
0.0 == -0.0
There is a negative zero which compares equal for the IEEE 754 standard. This means
1/INFINITY == 1/-INFINITY which breaks f(x) == f(y) => x == y
Second exception:
NaN != NaN
This is a special caveat of NotaNumber which allows to find out if a number is a NaN
on systems which do not have a test function available (Yes, that happens).

What happens in C++ when an integer type is cast to a floating point type or vice-versa?

Do the underlying bits just get "reinterpreted" as a floating point value? Or is there a run-time conversion to produce the nearest floating point value?
Is endianness a factor on any platforms (i.e., endianness of floats differs from ints)?
How do different width types behave (e.g., int to float vs. int to double)?
What does the language standard guarantee about the safety of such casts/conversions? By cast, I mean a static_cast or C-style cast.
What about the inverse float to int conversion (or double to int)? If a float holds a small magnitude value (e.g., 2), does the bit pattern have the same meaning when interpreted as an int?

Do the underlying bits just get "reinterpreted" as a floating point value?
No, the value is converted according to the rules in the standard.
is there a run-time conversion to produce the nearest floating point value?
Yes there's a run-time conversion.
For floating point -> integer, the value is truncated, provided that the source value is in range of the integer type. If it is not, behaviour is undefined. At least I think that it's the source value, not the result, that matters. I'd have to look it up to be sure. The boundary case if the target type is char, say, would be CHAR_MAX + 0.5. I think it's undefined to cast that to char, but as I say I'm not certain.
For integer -> floating point, the result is the exact same value if possible, or else is one of the two floating point values either side of the integer value. Not necessarily the nearer of the two.
Is endianness a factor on any platforms (i.e., endianness of floats differs from ints)?
No, never. The conversions are defined in terms of values, not storage representations.
How do different width types behave (e.g., int to float vs. int to double)?
All that matters is the ranges and precisions of the types. Assuming 32 bit ints and IEEE 32 bit floats, it's possible for an int->float conversion to be imprecise. Assuming also 64 bit IEEE doubles, it is not possible for an int->double conversion to be imprecise, because all int values can be exactly represented as a double.
What does the language standard guarantee about the safety of such casts/conversions? By cast, I mean a static_cast or C-style cast.
As indicated above, it's safe except in the case where a floating point value is converted to an integer type, and the value is outside the range of the destination type.
If a float holds a small magnitude value (e.g., 2), does the bit pattern have the same meaning when interpreted as an int?
No, it does not. The IEEE 32 bit representation of 2 is 0x40000000.

For reference, this is what ISO-IEC 14882-2003 says
4.9 Floating-integral conversions
An rvalue of a floating point type can be converted to an rvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type. [Note:If the destination type is `bool, see 4.12. ]
An rvalue of an integer type or of an enumeration type can be converted to an rvalue of a floating point type. The result is exact if possible. Otherwise, it is an implementation-defined choice of either the next lower or higher representable value. [Note:loss of precision occurs if the integral value cannot be represented exactly as a value of the floating type. ] If the source type is bool, the value falseis converted to zero and the value true is converted to one.
Reference: What Every Computer Scientist Should Know About Floating-Point Arithmetic
Other highly valuable references on the subject of fast float to int conversions:
Know your FPU
Let's Go To The (Floating) Point
Know your FPU: Fixing Floating Fast
Have a good read!

There are normally run-time conversions, as the bit representations are not generally compatible (with the exception that binary 0 is normally both 0 and 0.0). The C and C++ standards deal only with value, not representation, and specify generally reasonable behavior. Remember that a large int value will not normally be exactly representable in a float, and a large float value cannot be represented by an int.
Therefore:
All conversions are by value, not bit patterns. Don't worry about the bit patterns.
Don't worry about endianness, either, since that's a matter of bitwise representation.
Converting int to float can lose precision if the integer value is large in absolute value; it is less likely to with double, since double is more precise, and can represent many more exact numbers. (The details depend on what representations the system is actually using.)
The language definitions say nothing about bit patterns.
Converting from float to int is also a matter of values, not bit patterns. An exact floating-point 2.0 will convert to an integral 2 because that's how the implementation is set up, not because of bit patterns.

When you convert an integer to a float, you are not liable to loose any precision unless you are dealing with extremely large integers.
When you convert a float to an int you are essentially performing the floor() operation. so it just drops the bits after the decimal
For more information on floating point read: http://www.lahey.com/float.htm
The IEEE single-precision format has 24 bits of mantissa, 8 bits of exponent, and a sign bit. The internal floating-point registers in Intel microprocessors such as the Pentium have 64 bits of mantissa, 15 bits of exponent and a sign bit. This allows intermediate calculations to be performed with much less loss of precision than many other implementations. The down side of this is that, depending upon how intermediate values are kept in registers, calculations that look the same can give different results.
So if your integer uses more than 24 bits (excluding the hidden leading bit), then you are likely to loose some precision in conversion.

Reinterpreted? The term "reinterpretation" usually refers to raw memory reinterpretation. It is, of course, impossible to meaningfully reinterpret an integer value as a floating-point value (and vice versa) since their physical representations are generally completely different.
When you cast the types, a run-time conversion is being performed (as opposed to reinterpretation). The conversion is normally not just conceptual, it requires an actual run-time effort, because of the difference in physical representation. There are no language-defined relationships between the bit patterns of source and target values. Endianness plays no role in it either.
When you convert an integer value to a floating-point type, the original value is converted exactly if it can be represented exactly by the target type. Otherwise, the value will be changed by the conversion process.
When you convert a floating-point value to integer type, the fractional part is simply discarded (i.e. not the nearest value is taken, but the number is rounded towards zero). If the result does not fit into the target integer type, the behavior is undefined.
Note also, that floating-point to integer conversions (and the reverse) are standard conversions and formally require no explicit cast whatsoever. People might sometimes use an explicit cast to suppress compiler warnings.

If you cast the value itself, it will get converted (so in a float -> int conversion 3.14 becomes 3).
But if you cast the pointer, then you will actually 'reinterpret' the underlying bits. So if you do something like this:
double d = 3.14;
int x = *reinterpret_cast<int *>(&d);
x will have a 'random' value that is based on the representation of floating point.

Converting FP to integral type is nontrivial and not even completely defined.
Typically your FPU implements a hardware instruction to convert from IEEE format to int. That instruction might take parameters (implemented in hardware) controlling rounding. Your ABI probably specifies round-to-nearest-even. If you're on X86+SSE, it's "probably not too slow," but I can't find a reference with one Google search.
As with anything FP, there are corner cases. It would be nice if infinity were mapped to (TYPE)_MAX but that is alas not typically the case — the result of int x = INFINITY; is undefined.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js