Is it possible to specify a hexadecimal number in LLVM IR code? - llvm

For example:
error: floating point constant invalid for type
%3 = and i8 0x80, %2
^

From a scan of the IR reference manual, it looks like hexadecimal literals are reserved for the representation of floating point numbers that cannot be exactly represented in a reasonable number of digits.
Which explains why your error message says that the floating point constant is invalid.

Related

Are all 32 bit floats exactly represented as f64?

Can all IEEE 754 32 bit floating point numbers be represented exactly by a 64 bit floating point number? Stated another way, is a cast from f32 to f64 ever rounded?
Can all IEEE 754 32 bit floating point numbers be represented exactly by a 64 bit floating point number?
Yes. All numeric values of binary32 are in binary64.
Stated another way, is a cast from f32 to f64 ever rounded?
Not usually. Various language like C allow intermediate 32-bit FP calculations to employ wider math and so a cast may narrow (round) results. Yet if the value was truly f32, no rounding error would occur going to f64.
Aside:
The Not-a-number payload of a binary32 is 23 bits and that is fully encodable as a binary64, yet the detailed meaning of those is implementation dependent.

Why hexadecimal floating constants in C++17?

C++17 to add hexadecimal floating constant (floating point literal). Why? How about a couple of examples showing the benefits.
Floating point numbers are stored in x86/x64 processors in base 2, not base 10: https://en.wikipedia.org/wiki/Double-precision_floating-point_format . Because of that many decimal floating point numbers cannot be represented exactly, e.g decimal 0.1 could be represented as something like 0.1000000000000003 or 0.0999999999999997 - whatever has base 2 representation close enough to decimal 0.1 . Because of that inexactness, e.g. printing in decimal and then parsing of a floating-point number may result in a slightly different number than the one stored in memory binarily before printing.
For some application emergence of such errors is unacceptable: they want to parse into exactly the same binary floating-point number as the one which was before printing (e.g. one application exports floating-point data and another imports). For that, one could export and import doubles in hexadecimal format. Because 16 is a power of 2, binary floating-point numbers can be represented exactly in hexadecimal format.
printf and scanf have been extended with %a format specifier which allows to print and parse hexadecimal floating point numbers. Though MSVC++ does not support %a format specifier for scanf yet:
The a and A specifiers (see printf Type Field Characters) are not available with scanf.
To print a double in full precision with hexadecimal format one should specify printing of 13 hexadecimal digits after point, which correspond to 13*4=52 bits:
double x = 0.1;
printf("%.13a", x);
See more details on hexadecimal floating point with code and examples (note that at least for MSVC++ 2013 simple specification of %a in printf prints 6 hexadecimal digits after point, not 13 - this is stated in the end of the article).
Specifically for constants, as asked in the question, hexadecimal constants may be convenient for testing the application on exact hard-coded floating-point inputs. E.g. your bug may be reproducible for 0.1000000000000003, but not for 0.0999999999999997, so you need hexadecimal hardcoded value to specify the representation of interest for decimal 0.1 .
The main 2 reasons to use hex floats over decimals are accuracy and speed.
The algorithms for accurately converting between decimal constants and the underlying binary format of floating point numbers are surprisingly complicated, and even nowadays conversion errors still occasionally arise.
Converting between hexadecimal and binary is a much simpler endeavour, and guaranteed to be exact. An example use case is when it is critical that you use a specific floating point number, and not one either side (e.g. for implementations of special functions such as exp). This simplicity also makes the conversion much faster (it doesn't require any intermediate "bignum" arithmetic): in some cases I've seen 3x speed up for read/write operations for hex float vs decimals.

C++ Type of variables - value

I am the beginner, but I think same important thinks I should learn as soon as it possible.
So I have a code:
float fl=8.28888888888888888888883E-5;
cout<<"The value = "<<fl<<endl;
But my .exe file after run show:
8.2888887845911086e-005
I suspected the numbers to limit of the type and rest will be the zero, but I saw digits, which are random. Maybe it gives digits from memory after varible?
Could you explain me how it does works?
I suspected the numbers to limit of the type and rest will be the zero
Yes, this is exactly what happens, but it happens in binary. This program will show it by using the hexadecimal printing format %a:
#include <stdio.h>
int main(int c, char *v[]) {
float fl = 8.28888888888888888888883E-5;
printf("%a\n%a\n", 8.28888888888888888888883E-5, fl);
}
It shows:
0x1.5ba94449649e2p-14
0x1.5ba944p-14
In these results, 0x1.5ba94449649e2p-14 is the hexadecimal representation of the double closest to 8.28888888888888888888883*10-5, and 0x1.5ba944p-14
is the representation of the conversion to float of that number. As you can see, the conversion simply truncated the last digits (in this case. The conversion is done according to the rounding mode, and when the rounding goes up instead of down, it changes one or more of the last digits).
When you observe what happens in decimal, the fact that float and double are binary floating-point formats on your computer means that there are extra digits in the representation of the value.
I suspected the numbers to limit of the type and rest will be the zero
That is what happens internally. Excess bits beyond what the type can store are lost.
But that's in the binary representation. When you convert it to decimal, you can get trailing non-zero digits.
Example:
0b0.00100 is 0.125 in decimal
What you're seeing is a result of the fact that you cannot exactly represent a floating-point number in memory. Because of this, floats will be stored as the nearest value that can be stored in memory. A float usually has 24 bits used to represent the mantissa, which translates to about 6 decimal places (this is implementation defined though, so you shouldn't rely on this). When printing more than 6 decimal digits, you'll notice your value isn't stored in memory as the value you intended, and you'll see random digits.
So to recap, the problem you encountered is caused by the fact that base-10 decimal numbers cannot be represented in memory, instead the closest number to it is stored and this number will then be used.
each data type has range after this range all number is from memory or rubbish so you have to know this ranges and deal with it when you write code.
you can know this ranges from here or here

Weird bug with floats in if-statement

So in my C++ code I have the following line of code for debugging purposes:
if(float1 != float2)
{
std::cout<<float1<<" "<<float2<<std::endl;
}
What's happening is that the program is entering into the if-statement...but when I print out the two float values they are the same. But if they were the same, then it should bypass this if-statement completely. So I'm really confused as to why this is happening.
The floats may just have very similar values. By default, the I/O library will truncate the output of floating point values. You can ensure that you get the full precision by calling the precision member function of std::cout:
if(float1 != float2)
{
std::cout.precision(9);
std::cout<<float1<<" "<<float2<<std::endl;
}
Now you should see the difference. The value 9 is the number of base-10 digits representable by a IEEE 754 32 bit float (see #EricPostpischil's comment below).
Floating-point value are typically stored in computer memory in binary format. Meanwhile, values you print through cout are represented in decimal format. The conversion from binary floating-point representation to decimal representation can be lossy, depending on your conversion settings. This immediatelty means that what you print is not necessarily exactly the same as what is actually stored in memory. This explains why the direct comparison between float1 and float2 might say that they are different, while the decimal printout might look identical.

What happens in C++ when an integer type is cast to a floating point type or vice-versa?

Do the underlying bits just get "reinterpreted" as a floating point value? Or is there a run-time conversion to produce the nearest floating point value?
Is endianness a factor on any platforms (i.e., endianness of floats differs from ints)?
How do different width types behave (e.g., int to float vs. int to double)?
What does the language standard guarantee about the safety of such casts/conversions? By cast, I mean a static_cast or C-style cast.
What about the inverse float to int conversion (or double to int)? If a float holds a small magnitude value (e.g., 2), does the bit pattern have the same meaning when interpreted as an int?
Do the underlying bits just get "reinterpreted" as a floating point value?
No, the value is converted according to the rules in the standard.
is there a run-time conversion to produce the nearest floating point value?
Yes there's a run-time conversion.
For floating point -> integer, the value is truncated, provided that the source value is in range of the integer type. If it is not, behaviour is undefined. At least I think that it's the source value, not the result, that matters. I'd have to look it up to be sure. The boundary case if the target type is char, say, would be CHAR_MAX + 0.5. I think it's undefined to cast that to char, but as I say I'm not certain.
For integer -> floating point, the result is the exact same value if possible, or else is one of the two floating point values either side of the integer value. Not necessarily the nearer of the two.
Is endianness a factor on any platforms (i.e., endianness of floats differs from ints)?
No, never. The conversions are defined in terms of values, not storage representations.
How do different width types behave (e.g., int to float vs. int to double)?
All that matters is the ranges and precisions of the types. Assuming 32 bit ints and IEEE 32 bit floats, it's possible for an int->float conversion to be imprecise. Assuming also 64 bit IEEE doubles, it is not possible for an int->double conversion to be imprecise, because all int values can be exactly represented as a double.
What does the language standard guarantee about the safety of such casts/conversions? By cast, I mean a static_cast or C-style cast.
As indicated above, it's safe except in the case where a floating point value is converted to an integer type, and the value is outside the range of the destination type.
If a float holds a small magnitude value (e.g., 2), does the bit pattern have the same meaning when interpreted as an int?
No, it does not. The IEEE 32 bit representation of 2 is 0x40000000.
For reference, this is what ISO-IEC 14882-2003 says
4.9 Floating-integral conversions
An rvalue of a floating point type can be converted to an rvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type. [Note:If the destination type is `bool, see 4.12. ]
An rvalue of an integer type or of an enumeration type can be converted to an rvalue of a floating point type. The result is exact if possible. Otherwise, it is an implementation-defined choice of either the next lower or higher representable value. [Note:loss of precision occurs if the integral value cannot be represented exactly as a value of the floating type. ] If the source type is bool, the value falseis converted to zero and the value true is converted to one.
Reference: What Every Computer Scientist Should Know About Floating-Point Arithmetic
Other highly valuable references on the subject of fast float to int conversions:
Know your FPU
Let's Go To The (Floating) Point
Know your FPU: Fixing Floating Fast
Have a good read!
There are normally run-time conversions, as the bit representations are not generally compatible (with the exception that binary 0 is normally both 0 and 0.0). The C and C++ standards deal only with value, not representation, and specify generally reasonable behavior. Remember that a large int value will not normally be exactly representable in a float, and a large float value cannot be represented by an int.
Therefore:
All conversions are by value, not bit patterns. Don't worry about the bit patterns.
Don't worry about endianness, either, since that's a matter of bitwise representation.
Converting int to float can lose precision if the integer value is large in absolute value; it is less likely to with double, since double is more precise, and can represent many more exact numbers. (The details depend on what representations the system is actually using.)
The language definitions say nothing about bit patterns.
Converting from float to int is also a matter of values, not bit patterns. An exact floating-point 2.0 will convert to an integral 2 because that's how the implementation is set up, not because of bit patterns.
When you convert an integer to a float, you are not liable to loose any precision unless you are dealing with extremely large integers.
When you convert a float to an int you are essentially performing the floor() operation. so it just drops the bits after the decimal
For more information on floating point read: http://www.lahey.com/float.htm
The IEEE single-precision format has 24 bits of mantissa, 8 bits of exponent, and a sign bit. The internal floating-point registers in Intel microprocessors such as the Pentium have 64 bits of mantissa, 15 bits of exponent and a sign bit. This allows intermediate calculations to be performed with much less loss of precision than many other implementations. The down side of this is that, depending upon how intermediate values are kept in registers, calculations that look the same can give different results.
So if your integer uses more than 24 bits (excluding the hidden leading bit), then you are likely to loose some precision in conversion.
Reinterpreted? The term "reinterpretation" usually refers to raw memory reinterpretation. It is, of course, impossible to meaningfully reinterpret an integer value as a floating-point value (and vice versa) since their physical representations are generally completely different.
When you cast the types, a run-time conversion is being performed (as opposed to reinterpretation). The conversion is normally not just conceptual, it requires an actual run-time effort, because of the difference in physical representation. There are no language-defined relationships between the bit patterns of source and target values. Endianness plays no role in it either.
When you convert an integer value to a floating-point type, the original value is converted exactly if it can be represented exactly by the target type. Otherwise, the value will be changed by the conversion process.
When you convert a floating-point value to integer type, the fractional part is simply discarded (i.e. not the nearest value is taken, but the number is rounded towards zero). If the result does not fit into the target integer type, the behavior is undefined.
Note also, that floating-point to integer conversions (and the reverse) are standard conversions and formally require no explicit cast whatsoever. People might sometimes use an explicit cast to suppress compiler warnings.
If you cast the value itself, it will get converted (so in a float -> int conversion 3.14 becomes 3).
But if you cast the pointer, then you will actually 'reinterpret' the underlying bits. So if you do something like this:
double d = 3.14;
int x = *reinterpret_cast<int *>(&d);
x will have a 'random' value that is based on the representation of floating point.
Converting FP to integral type is nontrivial and not even completely defined.
Typically your FPU implements a hardware instruction to convert from IEEE format to int. That instruction might take parameters (implemented in hardware) controlling rounding. Your ABI probably specifies round-to-nearest-even. If you're on X86+SSE, it's "probably not too slow," but I can't find a reference with one Google search.
As with anything FP, there are corner cases. It would be nice if infinity were mapped to (TYPE)_MAX but that is alas not typically the case — the result of int x = INFINITY; is undefined.