float f = 12.5;
unsigned int _f = *reinterpret_cast<int*>(&f);
std::cout << _f << std::endl; // result: 1095237632
Can some explain me how such casting works? And what is represented by _f?
EDIT
So this number I got 1095237632 after converting to binary is 0b1000001010010000000000000000000 and this binary number is 12.5 in IEEE-754. Do I get it right?
Let's look at two functions. One of them casts float to int it regularly, and one of them reinterpret casts it using reinterpret_cast:
int cast_to_int(float f) {
// Rounds f to an int, rounding towards 0
return (int)f;
}
int reinterpret_cast_to_int(float i) {
// Just copies the bits
return *reinterpret_cast<int*>(&i);
}
So what actually happens? Let's look at the assembly:
cast_to_int(float):
cvttss2si eax, xmm0 // We cast to an int
ret
reinterpret_cast_to_int(float):
movd eax, xmm0 // We directly copy the bits
ret
In the first case, there's an assembly instruction that performs the conversion:
cast_to_int(0.7) -> 0
cast_to_int(1.0) -> 1
cast_to_int(1.5) -> 1
cast_to_int(2.1) -> 2
In the second case, reinterpret_cast just directly transforms the underlying representation of the bits. It's not actually doing anything, and the function just copyies the input register to the output register.
Under the hood, floats have a very different bit representation than ints, and that's why you're getting weird numbers.
Nobody can explain (*), because it does not work.
From cppreference:
Unlike static_cast, but like const_cast, the reinterpret_cast
expression does not compile to any CPU instructions (except when
converting between integers and pointers or on obscure architectures
where pointer representation depends on its type). It is purely a
compile-time directive which instructs the compiler to treat
expression as if it had the type new_type.
Only the following conversions can be done with reinterpret_cast,
except when such conversions would cast away constness or volatility.
And then follows a list of rules covering what reinterpret casts are allowed. Casting type A to a completely unrelated type B is not among them and your code exhibits undefined behaviour.
(*) Strictly speaking not true. You are treating a float as an int and if you look at their representation on your hardware and if you inspect the output of your compiler you can work out why you get the value you get, though undefined behaviour is undefined and its not worth entering details unless you are willing to write nonportable code.
As the comments below say, the usage of reinterpret_cast ist not safe for unrelated types. So don't use it this way.
First, assigning int to uint should be avoided.
For x86/x64 systemns, float is usually represented as 4 bytes, like uint32_t, and they have mostely the same alignment.
Many compilers allow the following:
uint32_t _f = *reinterpret_cast(&f);
But this leads to undefined behavior due to some reasons (thanks for the comments):
- optimizations
- alignment
- ...
Use memcpy instead.
If alignment is the same and the values are stored in memory, the following effect desribes what happens when used reinterpret_cast:
The memory location of the 4 bytes float is &f. With a reinterpret cast to uint32_t, this memory is reinterpretet as an uint32. The dereferenced value _f contains the same bytes as the float f, but interpreted as uint32.
You could cast it back and get the original value 12.5:
float f = 12.5;
uint32_t _f = *reinterpret_cast<uint32_t*>(&f);
float _fnew = *reinterpret_cast<float*>(&_f);
std::cout << _fnew << std::endl; // result: 12.5
reinterpret_cast only reinterpretes memory locations (addresses).
If the alignment is not the same or the values are stored in registers, are optimized an so on, the cast can lead to undefined values.
Related
I saw this cast:
const std::uint32_t tmp = (some number);
float ToFloat = *(float*)(&tmp);
But when casting, the wrong number gets into the float. Moreover, if I use a static cast, then it is cast correctly.
Why is this happening?
How exactly does static_cast work?
*(float*)(&Tmp) means that we are trying to dereference the pointer to the float, which is located at the address &tmp. Is it right?
Why is this happening?
The program has undefined behavior because you read from an int through the eyes of a float. This is not allowed in C++.
How exactly does static_cast work?
In this case, it does the same conversion that float ToFloat = Tmp; would do. That is, it converts the int to float and assigns it to ToFloat.
*(float*)(&Tmp) means that we are trying to dereference the pointer to the float, which is located at the address &Tmp. Is it right?
There is no float at &Tmp, only an int. You however tell the compiler that the pointer is pointing at a float and dereference that pointer. The bit patterns of ints and floats are very different so not only does it have undefined behavior, you will very unlikely get the correct result.
Storing Integers
Integers are stored in their binary (base 2) representation. int32_t occupies 32 bits.
Storing Floats
In most of the C++ compilers use a standard called IEEE 754, to store floats and doubles. Which differs a lot from how integers are stored.
You can check how numbers are stored yourself using some sort of a IEEE-754 Floating Point Converter.
Example
Let us consider a simple example of number 5.
const uint Tmp = 5;
Tmp would be stored as 0x00000005 in memory, therefore &Tmp would point to the same.
float ToFloat = *(float*)(&Tmp);
Casting it into a float pointer would treat 0x00000005 to be in IEEE 754 format, having value 7.00649232162e-45.
static_cast
The static cast performs conversions between compatible types using safe implicit conversions.
Note that you are converting from int* to float* in your question which is not a safe conversion. You can convert from int to float safely.
const uint32_t Tmp = 5;
float ToFloat = (float)Tmp; // 5
More about static_cast conversion.
What this code attempts to do is to convert a 32-bit representation of a floating-point number into an actual floating-point value. An IEEE 754 binary floating-point value is (most of the time) a representation of a number of the form (−1)s × m × 2e. For a given floating-point format, a certain number of bits within a machine word is allocated for each of s (the sign bit), m (the mantissa or significand bits) and e (the exponent bits). Those bits can be manipulated directly within the bit representation in order to construct any possible floating-point value, bypassing the usual abstractions of ordinary arithmetic operations provided by the language. The converter mentioned in Aryan’s answer is a pretty good way to explore how this works.
Unfortunately, this code is not standard-conforming C++ (or, in fact, C), and that’s thanks to a little something called strict aliasing. In short: if a variable has been declared as being of integer type, it can only be accessed via pointers to that same integer type. Accessing it via a pointer to a different type is not allowed.
A more-or-less officially-sanctioned version would be:
inline static float float_from_bits(std::uint32_t x) {
float result;
static_assert(sizeof(result) == sizeof(x),
"float must be 32 bits wide");
std::memcpy(&result, &x, sizeof(result));
return result;
}
Or, in C++20, simply std::bit_cast<float>(x). This is still not quite guaranteed to return expected results by the standard, but at least it doesn’t violate that particular rule.
A direct static_cast would instead convert the integer into a floating-point value denoting (approximately) the same quantity. This is not a direct bit conversion: performing it entails computing the appropriate values of s, m and e and putting them in the appropriate place according to the number format.
cppreference/reinterpret_cast conversion/Explanation says
Unlike static_cast, but like const_cast, the reinterpret_cast expression does not compile to any CPU instructions (except when converting between integers and pointers or on obscure architectures where pointer representation depends on its type). It is purely a compile-time directive which instructs the compiler to treat expression as if it had the type new-type. Only the following conversions can be done with reinterpret_cast, except when such conversions would cast away constness or volatility.
... 5) Any object pointer type T1* can be converted to another object pointer type cv T2*. This is exactly equivalent to static_cast<cv T2*>(static_cast<cv void*>(expression)) (which implies that if T2's alignment requirement is not stricter than T1's, the value of the pointer does not change and conversion of the resulting pointer back to its original type yields the original value). In any case, the resulting pointer may only be dereferenced safely if allowed by the type aliasing rules (see below)
I thought that the reinterpret_cast guaranteed the same bit pattern, and is always compile-time statement . But in the above quote, there is exception on obscure architectures where pointer representation depends on its type, and the conversion of any object pointer type T1 to another object pointer type T2 is exactly equivalent to static_cast<cv T2>(static_cast<cv void*>(expr) ). for example,
int a = 10;
void* b = static_cast<void*>(&a); // OK.
static_cast<int*>(b) = 20; // OK. back to the original type.
void* b2 = reinterpret_cast<void*>(&a); // char* b2 = static_cast<void*>(static_cast<void*>(&a) );
static_cast<int*>(b2) = 30; // also OK? because the resulting pointer is equivalent to b, so can be back to the original type.
Is b resolved in run-time(can the bit pattern be changed)?. If so, is there difference between reinterpret_cast and static_cast when do any pointer-to-pointer conversion?.
Changes to the bit-pattern of a pointer aren't really quite a rare as implied, nor is the hardware necessarily quite a obscure as implied. The most common situation involves alignment requirements. A fair number of architectures require "natural alignment". That is, an object with a size of N bits also requires N-bit alignment (e.g., a 32-bit object requires 32-bit alignment).
For example:
// address with all the bits set in the least significant byte
char *x = (char *)0xabcdef;
long *y = reinterpret_cast<long *>(x);
long z = *y;
std::cout << (void *)y;
On an x86 machine, y will usually contain exactly the bit pattern you requested (because x86 imposes few alignment requirements at the hardware level).
But on something like a SPARC, MIPS or Alpha, attempting to dereference a pointer to long with the three least significant bits set will generate a processor fault. That leaves the compiler with a choice: generate a pointer that can't be dereferenced, or clear some of the bits in the pointer. At least a few choose the latter. It's certainly not guaranteed. Some will leave the value as-is, so dereferencing it just produces a processor fault. But others try to make it into a legitimate pointer to a long by zeroing the three (or so) least significant bits.
While studying the behavior of casts in C++, I discovered that reinterpret_cast-ing from float* to int* only works for 0:
float x = 0.0f;
printf("%i\n", *reinterpret_cast<int*>(&x));
prints 0, whereas
float x = 1.0f;
printf("%i\n", *reinterpret_cast<int*>(&x));
prints 1065353216.
Why? I expected the latter to print 1 just like static_cast would.
A reinterpret_cast says to reinterpret the bits of its operand. In your example, the operand, &x, is the address of a float, so it is a float *. reinterpret_cast<int *> asks to reinterpret these bits as an int *, so you get an int *. If pointers to int and pointers to float have the same representation in your C++ implementation, this may1 work to give you an int * that points to the memory of the float.
However, the reinterpret_cast does not change the bits that are pointed to. They still have their same values. In the case of a float with value zero, the bits used to represent it are all zeros. When you access these through a dereferenced int *, they are read and interpreted as an int. Bits that are all zero represent an int value of zero.
In the case of a float with value one, the bits used to represent it in your C++ implementation are, using hexadecimal to show them, 3f80000016. This is because the exponent field of the format is stored with an offset, so there are some non-zero bits to show the value of the exponent. (That is part of how the floating-point format is encoded. Conceptually, 1.0f is represented as + 1.000000000000000000000002 • 20. Then the + sign and the bits after the “1.” are stored literally as zero bits. However, the exponent is stored by adding 127 and storing the result as an eight-bit integer. So an exponent of 0 is stored as 127. The represented value of the exponent is zero, but the bits that represent it are not zero.) When you access these bits through a dereferenced int *, they are read and interpreted as an int. These bits represent an int value of 1065353216 (which equals 3f80000016).
Footnote
1 The C++ standard does not guarantee this, and what actually happens is dependent on other factors.
In both cases the behaviour of the program is undefined, because you access an object through a glvalue that doesn't refer to an object of same or compatible type.
What you have observed is one possible behaviour. The behaviour could have been different, but there is no guarantee of that it would have been, and it wasn't. Whether you expected one result or another is not guaranteed to have effect on the behaviour.
I expected the latter to print 1 just like static_cast would.
It is unreasonable to expect reinterpret_cast to behave as static_cast would. They are wildly different and one can not be substituted for the other. Using static_cast to convert the pointers would make the program ill-formed.
rainterpret_cast should not be used unless one knows what it does, and knows that its use is correct. The practical use cases are rare.
Here are a few examples that have well defined behaviour, and are guaranteed to print 1:
int i = x;
printf("%i\n", i);
printf("%i\n", static_cast<int>(x));
printf("%g\n", x);
printf("%.0f\n", x);
Given that we've concluded that behaviour is undefined, there is no need for further analysis.
But we can consider why the behaviour may have happened to be what we observed. It is however important to understand that these considerations will not be useful in controlling what the result will be while the behaviour is undefined.
The binary representations of 32 bit IEEE 754 floating point number for 1.0f and +0.0f happen to be:
0b00111111100000000000000000000000
0b00000000000000000000000000000000
Which also happen to be the binary representation of the integer 1065353216 and 0. Is it a coincidence that the output of the programs were these specific integers whose binary representation match the representation of the float value? Could be in theory, but it probably isn't.
float has a different representation than int, so that you cannot treat float representation as int. That's undefined behaviour in C++.
It so happens that on modern architectures a 0-bit pattern represents any fundamental type with value of 0 (so that one can memset with zeroes a float or double, or integer types, or pointer types and get 0-valued object, that's what that calloc function does). This is why that cast-and-dereference "works" for 0 value, but that is still undefined behaviour. The C++ standard doesn't require a 0-bit pattern to represent a 0 floating point value, neither it requires any specific representation of floating point numbers.
A conversion of float to int is implicit and no cast is required.
A solution:
float x = 1.0f;
int x2 = x;
printf("%i\n", x2);
// or
printf("%i\n", static_cast<int>(x));
void* is a useful feature of C and derivative languages. For example, it's possible to use void* to store objective-C object pointers in a C++ class.
I was working on a type conversion framework recently and due to time constraints was a little lazy - so I used void*... That's how this question came up:
Why can I typecast int to void*, but not float to void* ?
BOOL is not a C++ type. It's probably typedef or defined somewhere, and in these cases, it would be the same as int. Windows, for example, has this in Windef.h:
typedef int BOOL;
so your question reduces to, why can you typecast int to void*, but not float to void*?
int to void* is ok but generally not recommended (and some compilers will warn about it) because they are inherently the same in representation. A pointer is basically an integer that points to an address in memory.
float to void* is not ok because the interpretation of the float value and the actual bits representing it are different. For example, if you do:
float x = 1.0;
what it does is it sets the 32 bit memory to 00 00 80 3f (the actual representation of the float value 1.0 in IEEE single precision). When you cast a float to a void*, the interpretation is ambiguous. Do you mean the pointer that points to location 1 in memory? or do you mean the pointer that points to location 3f800000 (assuming little endian) in memory?
Of course, if you are sure which of the two cases you want, there is always a way to get around the problem. For example:
void* u = (void*)((int)x); // first case
void* u = (void*)(((unsigned short*)(&x))[0] | (((unsigned int)((unsigned short*)(&x))[1]) << 16)); // second case
Pointers are usually represented internally by the machine as integers. C allows you to cast back and forth between pointer type and integer type. (A pointer value may be converted to an integer large enough to hold it, and back.)
Using void* to hold integer values in unconventional. It's not guaranteed by the language to work, but if you want to be sloppy and constrain yourself to Intel and other commonplace platforms, it will basically scrape by.
Effectively what you're doing is using void* as a generic container of however many bytes are used by the machine for pointers. This differs between 32-bit and 64-bit machines. So converting long long to void* would lose bits on a 32-bit platform.
As for floating-point numbers, the intention of (void*) 10.5f is ambiguous. Do you want to round 10.5 to an integer, then convert that to a nonsense pointer? No, you want the bit-pattern used by the FPU to be placed into a nonsense pointer. This can be accomplished by assigning float f = 10.5f; void *vp = * (uint32_t*) &f;, but be warned that this is just nonsense: pointers aren't generic storage for bits.
The best generic storage for bits is char arrays, by the way. The language standards guarantee that memory can be manipulated through char*. But you have to mind data alignment requirements.
Standard says that 752 An integer may be converted to any pointer type. Doesn't say anything about pointer-float conversion.
Considering any of you want you transfer float value as void *, there is a workaround using type punning.
Here is an example;
struct mfloat {
union {
float fvalue;
int ivalue;
};
};
void print_float(void *data)
{
struct mfloat mf;
mf.ivalue = (int)data;
printf("%.2f\n", mf.fvalue);
}
struct mfloat mf;
mf.fvalue = 1.99f;
print_float((void *)(mf.ivalue));
we have used union to cast our float value(fvalue) as an integer(ivalue) to void*, and vice versa
The question is based on a false premise, namely that void * is somehow a "generic" or "catch-all" type in C or C++. It is not. It is a generic object pointer type, meaning that it can safely store pointers to any type of data, but it cannot itself contain any type of data.
You could use a void * pointer to generically manipulate data of any type by allocating sufficient memory to hold an object of any given type, then using a void * pointer to point to it. In some cases you could also use a union, which is of course designed to be able to contain objects of multiple types.
Now, because pointers can be thought of as integers (and indeed, on conventionally-addressed architectures, typically are integers) it is possible and in some circles fashionable to stuff an integer into a pointer. Some library API's have even documented and supported this usage — one notable example was X Windows.
Conversions between pointers and integers are implementation-defined, and these days typically draw warnings, and so typically require an explicit cast, not so much to force the conversion as simply to silence the warning. For example, both the code fragments below print 77, but the first one probably draws compiler warnings.
/* fragment 1: */
int i = 77;
void *p = i;
int j = p;
printf("%d\n", j);
/* fragment 2: */
int i = 77;
void *p = (void *)(uintptr_t)i;
int j = (int)p;
printf("%d\n", j);
In both cases, we are not really using the void * pointer p as a pointer at all: we are merely using it as a vessel for some bits. This relies on the fact that on a conventionally-addressed architecture, the implementation-defined behavior of a pointer/integer conversion is the obvious one, which to an assembly-language programmer or an old-school C programmer doesn't seem like a "conversion" at all. And if you can stuff an int into a pointer, it's not surprising if you can stuff in other integral types, like bool, as well.
But what about trying to stuff a floating-point value into a pointer? That's considerably more problematic. Stuffing an integer value into a pointer, though implementation-defined, makes perfect sense if you're doing bare-metal programming: you're taking the numeric value of the integer, and using it as a memory address. But what would it mean to try to stuff a floating-point value into a pointer?
It's so meaningless that the C Standard doesn't even label it "undefined".
It's so meaningless that a typical compiler won't even attempt it.
And if you think about it, it's not even obvious what it should do.
Would you want to use the numeric value, or the bit pattern, as the thing to try to stuff into the pointer? Stuffing in the numeric value is closer to how floating-point-to-integer conversions work, but you'd lose your fractional part. Using the bit pattern is what you'd probably want, but accessing the bit pattern of a floating-point value is never something that C makes easy, as generations of programmers who have attempted things like
uint32_t hexval = (uint32_t)3.0;
have discovered.
Nevertheless, if you were bound and determined to store a floating-point value in a void * pointer, you could probably accomplish it, using sufficiently brute-force casts, although the results are probably both undefined and machine-dependent. (That is, I think there's a strict aliasing violation here, and if pointers are bigger than floats, as of course they are on a 64-bit architecture, I think this will probably only work if the architecture is little-endian.)
float f = 77.75;
void *p = (void *)(uintptr_t)*(uint32_t *)&f;
float f2 = *(float *)&p;
printf("%f\n", f2);
dmr help me, this actually does print 77.75 on my machine.
How do type casting happen without loss of data inside the compiler?
For example:
int i = 10;
UINT k = (UINT) k;
float fl = 10.123;
UINT ufl = (UINT) fl; // data loss here?
char *p = "Stackoverflow Rocks";
unsigned char *up = (unsigned char *) p;
How does the compiler handle this type of typecasting? A low-level example showing the bits would be highly appreciated.
Well, first note that a cast is an explicit request to convert a value of one type to a value of another type. A cast will also always produce a new object, which is a temporary returned by the cast operator. Casting to a reference type, however, will not create a new object. The object referenced by the value is reinterpreted as a reference of a different type.
Now to your question. Note that there are two major types of conversions:
Promotions: This type can be thought of casting from a possibly more narrow type to a wider type. Casting from char to int, short to int, float to double are all promotions.
Conversions: These allow casting from long to int, int to unsigned int and so forth. They can in principle cause loss of information. There are rules for what happens if you assign a -1 to an unsigned typed object for example. In some cases, a wrong conversion can result in undefined behavior. If you assign a double larger than what a float can store to a float, the behavior is not defined.
Let's look at your casts:
int i = 10;
unsigned int k = (unsigned int) i; // :1
float fl = 10.123;
unsigned int ufl = (unsigned int) fl; // :2
char *p = "Stackoverflow Rocks";
unsigned char *up = (unsigned char *) p; // :3
This cast causes a conversion to happen. No loss of data happens, since 10 is guaranteed to be stored by an unsigned int. If the integer were negative, the value would basically wrap around the maximal value of an unsigned int (see 4.7/2).
The value 10.123 is truncated to 10. Here, it does cause lost of information, obviously. As 10 fits into an unsigned int, the behavior is defined.
This actually requires more attention. First, there is a deprecated conversion from a string literal to char*. But let's ignore that here. (see here). More importantly, what does happen if you cast to an unsigned type? Actually, the result of that is unspecified per 5.2.10/7 (note the semantics of that cast is the same as using reinterpret_cast in this case, since that is the only C++ cast being able to do that):
A pointer to an object can be explicitly converted to a pointer to
an object of different type. Except that converting an rvalue of type “pointer to T1” to the type "pointer to T2" (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value, the result of such a pointer conversion is unspecified.
So you are only safe to use the pointer after you cast back to char * again.
The two C-style casts in your example are different kinds of cast. In C++, you'd normally write them
unsigned int uf1 = static_cast<unsigned int>(fl);
and
unsigned char* up = reinterpret_cast<unsigned char*>(p);
The first performs an arithmetic cast, which truncates the floating point number, so there is data loss.
The second makes no changes to data - it just instructs the compiler to treat the pointer as a different type. Care needs to be taken with this kind of cast: it can be very dangerous.
"Type" in C and C++ is a property assigned to variables when they're handled in the compiler. The property doesn't exist at runtime anymore, except for virtual functions/RTTI in C++.
The compiler uses the type of variables to determine a lot of things. For instance, in the assignment of a float to an int, it will know that it needs to convert. Both types are probably 32 bits, but with different meanings. It's likely that the CPU has an instruction, but otherwise the compiler would know to call a conversion function. I.e.
& __stack[4] = float_to_int_bits(& __stack[0])
The conversion from char* to unsigned char* is even simpeler. That is just a different label. At bit level, p and up are identical. The compiler just needs to remember that *p requires sign-extension while *up does not.
Casts mean different things depending on what they are. They can just be renamings of a data type, with no change in the bits represented (most casts between integral types and pointers are like this), or conversions that don't even preserve length (such as between double and int on most compilers). In many cases, the meaning of a cast is simply unspecified, meaning the compiler has to do something reasonable but doesn't have to document exactly what.
A cast doesn't even need to result in a usable value. Something like
char * cp;
float * fp;
cp = malloc(100);
fp = (float *)(cp + 1);
will almost certainly result in a misaligned pointer to float, which will crash the program on some systems if the program attempts to use it.