Lets say I have an integer:
unsigned long long int data = 4599331010119547059;
Now I want to convert this data to a double. I basically want to change the type, but keep the bits exactly as they were. For the given example, the float value is 0.31415926536.
How can I do that in C++? I saw some methods using Union but many advised against using this approach.
Since C++20, you can use std::bit_cast:
std::bit_cast<double>(data)
Prior to C++20, you can use std::memcpy:
double d;
static_assert(sizeof d == sizeof data);
std::memcpy(&d, &data, sizeof d);
Note that result will vary depending on floating point representation (IEEE-754 is ubiquitous though) as well as whether floating point and integer types have the same endianness.
Taking the question on its face value (assuming you have a valid reason to do this!) this is the only proper way of doing this in current C++ standard:
int i = get_int();
float x;
static_assert(sizeof(float) == sizeof(int), "!!!");
memcpy(&x, &i, sizeof(x));
You can use reinterpret_cast:
float f = reinterpret_cast<float&>(data);
For your value, I don't get 0.314... but that's how you could do it.
Related
I was investigating the structure of floating-point numbers, and I've found that most of compilers use IEEE 754 standard to store floating point numbers.
And when I tried to do:
float a=0x3f520000; //have to be equal to 0.8203125 in IEEE 754
printf("value of 'a' is: %X [%x] %f\n",(int)a,(int)a, a);
it produces the result:
value of 'a' is: 3F520000 [3f520000] 1062338560.000000
but if I try:
int b=0x3f520000;
float* c = (float*)&b;
printf("value of 'c' is: %X [%x] %f\r\n", *(int*)c, *(int*)c, c[0]);
it gives:
value of 'c' is: 3F520000 [3f520000] 0.820313
The second try gave me the right answer. What is it wrong with the first try? And why does the result differ from that when I cast int to float via pointer?
The difference is that the first converts the value (0x3f520000 is the integer 1062338560), and is equivalent to this:
float a = 1062338560;
printf("value of 'a' is: %X [%x] %f\n",(int)a,(int)a, a);
The second reinterprets the representation of the int - 111111010100100000000000000000 - as being the representation of a float instead.
(It's also undefined, so you shouldn't expect it to do anything in particular.)
[Note: This answer assumes C, not C++, which have different rules]
With
float a=0x3f520000;
you take the integer value 1062338560 and the compiler will convert it to 1062338560.0f.
If you want hexadecimal floating point constant you must use exponent-format using the letter p. As in 0x1.a40010c6f7a0bp-1 (which is the hexadecimal notation for 0.820313).
What happens with
int b=0x3f520000;
float* c = (float*)&b;
is that you break strict aliasing and tell the compiler that c is pointing to a floating-point value (the strict aliasing break is because b isn't a floating point value). The compiler will then reinterpret the bits in *c as a float value.
0x3f520000 is an integer constant. When assigned to a float, the integer is converted.
Some more proper example of how to convert in the second case:
#include <stdio.h>
#include <string.h>
#include <stdint.h>
int main() {
uint32_t fi = 0x3f520000;
float f;
memcpy(&f, &fi, sizeof(f));
printf("%.7g\n", f);
}
it prints:
0.8203125
so that is what you expected.
The approach I used is memcpy that is the safest for all compilers and best choice for modern compilers (GCC since approx. 4.6, Clang since 3.x) that interpret memcpy as "bit cast" in such case and optimize it in a efficient and safe way (at least in "hosted" mode). That's still safe for older compilers, but not nessessarily efficient in the same way; some can prefer cast through union or ever through different pointer type. On dangers of that ways, see here or generally search "type punning and strict aliasing".
(Also, there could be some weird platforms that suffer from endianness issue that integer endianness differs from float one; ones that have byte different than 8 bits, and so on. I don't consider them here.)
UPDATE: I was starting answering to initial version of the question. Yep, bit casting and value conversion will give principally different results. That's how float numbers work.
Is there a way i could initialize a float type variable with hexadecimal number? what i want to do is say i have single precision representation for 4 which is 0x40800000 in hex. I want to do something like float a = 0x40800000 in which case it takes the hex value as integer. What can i do to make it treat as floating point number?
One option is to use type punning via a union. This is defined behaviour in C since C99 (previously this was implementation defined).
union {
float f;
uint32_t u;
} un;
un.u = 0x40800000;
float a = un.f;
As you tagged this C++, you could also use reinterpret_cast.
uint32_t u = 0x40800000;
float a = *reinterpret_cast<float*>(&u);
Before doing either of these, you should also confirm that they're the same size:
assert(sizeof(float) == sizeof(uint32_t));
You can do this if you introduce a temporary integer type variable, cast it to a floating point type and dereference it. You must be careful about the sizes of the types involved, and know that they may change. With my compiler, for example, this works:
unsigned i = 0x40800000;
float a = *(float*)&i;
printf("%f\n", a);
// output 4.00000
I'm not sure how you're getting your the value "0x40800000".
If that's coming in as an int you can just do:
const auto foo = 0x40800000;
auto a = *(float*)&foo;
If that's coming in as a string you can do:
float a;
sscanf("0x40800000", "0x%x", &a);
I am trying to create a float from a hexadecimal representation I got from here. For the representation of 32.50002, the site shows the IEEE 754 hexadecimal representation as 0x42020005.
In my code, I have this: float f = 0x42020005;. However, when I print the value, I get 1.10E+9 instead of 32.50002. Why is this?
I am using Microsoft Visual C++ 2010.
When you assign a value to a float variable via =, you don’t assign its internal representation, you assign its value. 0x42020005 in decimal is 1107427333, and that’s the value you are assigning.
The underlying representation of a float cannot be retrieved in a platform independent way. However, making some assumptions (namely, that the float is in fact using IEEE 754 format), we can trick a bit:
float f;
uint32_t rep = 0x42020005;
std::memcpy(&f, &rep, sizeof f);
Will give the desired result.
0x42020005 actually is int value of 1107427333.
You can try out this code. Should work... Use union:
union IntFloat {
uint32_t i;
float f;
};
and call it when you need to convert the value.
union IntFloat val;
val.i = 0x42020005;
printf("%f\n", val.f);
0x42020005 is an int with value of 1107427333.
float f = 0x42020005; is equal with
float f = 1107427333;
I have an unsigned long long (or uint64_t) value and want to convert it to a double. The double shall have the same bit pattern as the long value. This way I can set the bits of the double "by hand".
unsigned long long bits = 1ULL;
double result = /* some magic here */ bits;
I am looking for a way to do this.
The portable way to do this is with memcpy (you may also be able to conditionally do it with reinterpret_cast or a union, but those aren't certain to be portable because they violate the letter of the strict-alias rules):
// First, static assert that the sizes are the same
memcpy(&result, &bits, sizeof(bits));
But before you do make sure you know exactly what you're doing and what floating point representation is being used (although IEEE754 is a popular/common choice). You'll want to avoid all kinds of problem values like infinity, NaN, and denormal numbers.
Beware of union and reinterpret_cast<double*>(&bits), for both of these methods are UB. Pretty much all you can do is memcpy.
since C++ 20 we have std::bit_cast() to to such conversions
example:
double d = 1.5;
uint64_t i = std::bit_cast<uint64_t>(d); //use the same bits in an integer
double dd = std::bit_cast<double>(i); //back to floating point again
The following uses a void pointer.
unsigned long long bits = 1ULL;
void* tempPtr=(void*)&bits;
double result = *(double*)tempPtr;
I have a double number, I want to represent it in IEEE 754 64-bit binary string.
Currently i'm using a code like this:
double noToConvert;
unsigned long* valueRef = reinterpret_cast<unsigned long*>(&noToConvert);
bitset<64> lessSignificative(*valueRef);
bitset<64> mostSignificative(*(++valueRef));
mostSignificative <<= 32;
mostSignificative |= lessSignificative;
RowVectorXd binArray = RowVectorXd::Zero(mostSignificative.size());
for(unsigned int i = 0; i <mostSignificative.size();i++)
{
(mostSignificative[i] == 0) ? (binArray(i) = 0) : (binArray(i) = 1);
}
The above code just works fine without any problem. But If you see, i'm using reinterpret_cast and using unsigned long. So, this code is very much compiler dependent. Could anyone show me how to write a code that is platform independent and without using any libraries. i'm ok, if we use the standard libraries and even bitset, but i dont want to use any machine or compiler dependent code.
Thanks in advance.
If you're willing to assume that double is the IEEE-754 double type:
#include <cstdint>
#include <cstring>
uint64_t getRepresentation(const double number) {
uint64_t representation;
memcpy(&representation, &number, sizeof representation);
}
If you don't even want to make that assumption:
#include <cstring>
char *getRepresentation(const double number) {
char *representation = new char[sizeof number];
memcpy(representation, &number, sizeof number);
return representation;
}
Why not use the union?
bitset<64> binarize(unsigned long* input){
union binarizeUnion
{
unsigned long* intVal;
bitset<64> bits;
} binTransfer;
binTransfer.intVal=input;
return (binTransfer.bits);
}
The simplest way to get this is to memcpy the double into an array of char:
char double_as_char[sizeof(double)];
memcpy(double_as_char, &noToConvert, sizeof(double_as_char));
and then extract the bits from double_as_char. C and C++ define that in the standard as legal.
Now, if you want to actually extract the various components of a double, you can use the following:
sign= noToConvert<=-0.0f;
int exponent;
double normalized_mantissa= frexp(noToConvert, &exponent);
unsigned long long mantissa= normalized_mantissa * (1ull << 53);
Since the value returned by frexp is in [0.5, 1), you need to shift it one extra bit to get all the bits in the mantissa as an integer. Then you just need to map that into the binary represenation you want, although you'll have to adjust the exponent to include the implicit bias as well.
The function print_raw_double_binary() in my article Displaying the Raw Fields of a Floating-Point Number should be close to what you want. You'd probably want to replace the casting of double to int with a union, since the former violates "strict aliasing" (although even use of a union to access something different than what is stored is technically illegal).