define a float variable a, convert a to float & and int &, what does this mean? After the converting , a is a reference of itself? And why the two result is different?
#include <iostream>
using namespace std;
int
main(void)
{
float a = 1.0;
cout << (float &)a <<endl;
cout << (int &)a << endl;
return 0;
}
thinkpad ~ # ./a.out
1
1065353216
cout << (float &)a <<endl;
cout << (int &)a << endl;
The first one treats the bits in a like it's a float. The second one treats the bits in a like it's an int. The bits for float 1.0 just happen to be the bits for integer 1065353216.
It's basically the equivalent of:
float a = 1.0;
int* b = (int*) &a;
cout << a << endl;
cout << *b << endl;
(int &) a casts a to a reference to an integer. In other words, an integer reference to a. (Which, as I said, treats the contents of a as an integer.)
Edit: I'm looking around now to see if this is valid. I suspect that it's not. It's depends on the type being less than or equal to the actual size.
It means undefined behavior:-).
Seriously, it is a form of type punning. a is a float, but a is also a block of memory (typically four bytes) with bits in it. (float&)a means to treat that block of memory as if it were a float (in other words, what it actually is); (int&)a means to treat it as an int. Formally, accessing an object (such as a) through an lvalue expression with a type other than the actual type of the object is undefined behavior, unless the type is a character type. Practically, if the two types have the same size, I would expect the results to be a reinterpretation of the bit pattern.
In the case of a float, the bit pattern contains bits for the sign, an exponent and a mantissa. Typically, the exponent will use some excess-n notation, and only 0.0 will have 0 as an exponent. (Some representations, including the one used on PCs, will not store the high order bit of the mantissa, since in a normalized form in base 2, it must always be 1. In such cases, the stored mantissa for 1.0 will have all bits 0.) Also typically (and I don't know of any exceptions here), the exponent will be stored in the high order bits. The result is when you "type pun" a floating point value to a an integer of the same size, the value will be fairly large, regardless of the floating point value.
The values are different because interpreting a float as an int & (reference to int) throws the doors wide open. a is not an int, so pretty much anything could actually happen when you do that. As it happens, looking at that float like it's an int gives you 1065353216, but depending on the underlying machine architecture it could be 42 or an elephant in a pink tutu or even crash.
Note that this is not the same as casting to an int, which understands how to convert from float to int. Casting to int & just looks at bits in memory without understanding what the original meaning is.
Related
In an attempt to verify (using VS2012) a book's claim (2nd sentence) that
When we assign an integral value to an object of floating-point type, the fractional part is zero.
Precision may be lost if the integer has more bits than the floating-point object can accommodate.
I wrote the following wee prog:
#include <iostream>
#include <iomanip>
using std::cout;
using std::setprecision;
int main()
{
long long i = 4611686018427387905; // 2^62 + 2^0
float f = i;
std::streamsize prec = cout.precision();
cout << i << " " << setprecision(20) << f << setprecision(prec) << std::endl;
return 0;
}
The output is
4611686018427387905 4611686018427387900
I expected output of the form
4611686018427387905 4611690000000000000
How is a 4-byte float able to retain so much info about an 8-byte integer? Is there a value for i that actually demonstrates the claim?
Floats don't store their data in base 10, they store it in base 2. Thus, 4611690000000000000 isn't actually a very round number. It's binary representation is:
100000000000000000000111001111100001000001110001010000000000000.
As you can see, that would take a lot of data to precisely record. The number that's actually printed, however, has the following binary representation:
11111111111111111111111111111111111111111111111111111111111100
As you can see, that's a much rounder number, and the fact that it's off by 4 from a power of two is likely due to rounding in the convert-to-base-10 algorithm.
As an example of a number that won't fit in a float properly, try the number you expected:
4611690000000000000
You'll notice that that will come out very differently.
The float retains so much information because you're working with a number that is so close to a power of 2.
The float format stores numbers in basically binary scientific notation. In your case, it gets stored as something like
1.0000000...[61 zeroes]...00000001 * 2^62.
The float format can't store 62 decimal places, so the final 1 gets cut off... but we're left with 2^62, which is almost exactly equal to the number you're trying to store.
I'm bad at manufacturing examples, but CERT isn't; you can view an example of what happens with bungled number conversions here. Note that the example is in Java, but C++ uses the same floating point types; additionally, the first example is a conversion between a 4-byte int and a 4-byte float, but this further proves your point (there's less integer information that needs to be stored than there is in your example, yet it still fails).
I have such code:
double x = 100.1;
double y;
int *p;
p = (int *) &x;
y = *p;
cout << "y value is: " << y << endl;
I understand that it's wrong and we get 4 bytes instead of our 8 bytes. I also know floating numbers have some tricky representation at memory (but don't know how), but this script gives me such output:
y value is: 1.71799e+09
As I understand it's also value with floating point. By my pointer is int and I also expect int result. Why it's not? What I should read to understand?
By my pointer is int and I also expect int result. Why it's not?
Because y is a double, not an int. Now, if you mean to ask why doesn't y have an integral value, since it should be the result of the conversion of an int (which must* have an integral value) to double, take a look at the output of the following:
std::cout << "y value is: " << std::fixed << y << '\n';
The output you should see is:
y value is: 1717986918.000000
So, y does have an integral value, you simply printed it out using scientific notation. And if you wonder why y has this particular integral value, it's because that's the value *p had.
std::cout << "*p is: " << *p << '\n';
*p is: 1717986918
If you were to look at a memory dump of x with the value of 100.1, you would see:
66 66 66 66 66 06 59 40
And when you access this as an int, you get 66 66 66 66, which, converted to decimal, is 1717986918.
One other thing to note is that none of this is guaranteed by C++. In fact your program is not actually legal, because you're violating the aliasing rules when you access the double x through a pointer to int. Your program could be made legal and get the same results, but C++ doesn't specify the representation of floating point values, so the particular integral value could legally be different.
What I should read to understand?
Here's an article that plays with the representation of floats: http://ridiculousfish.com/blog/posts/float.html
C++ doesn't specify how floats are represented, but most C++ implementations use IEEE-754, so you might also take a look at that specification. Here's the wikipedia page to start you on that: https://en.wikipedia.org/wiki/IEEE_floating_point
To learn about C++'s aliasing rules you can find and read the spec. strict aliasing is covered in ยง3.10/10, IIRC. There are also plenty of questions and answers about aliasing here on SO.
1.71799e+09 is the floating point representation of the integer 1,717,99x,xxx.
By my pointer is int and I also expect int result. Why it's not?
Your pointer is int *, and you are getting an int result.
But that's not what you're displaying. You're displaying y which is a double.
1.71799e+09 is the double representation of the int you got.
When you do
y = *p;
The variable y takes the int pointed to by p, and then converts it to a double.
Suppose sizeof(double) is 8 and sizeof(int) is 4
Doing x = 100; you put the double value 100 in x on 8 bytes. Then when you do y = *p; you would only use 4 of the 8 bytes, as an int value, then convert that number to a double.
The binary format of a double and of an int are different. E.g. 100 as double is not represented the same way as 100 int (let alone the size difference, e.g. 8 vs 4).
See this link
double x = 100.1;
double y;
int *p;
p = (int *) &x;
// Set a pointer of type integer to point at a floating point. Better C++ style
// would be p = reinterpret_cast<int *>(&x);
// which explains more clearly what you are doing - reinterpreting the pointer as
// a different type.
y = *p;
// *p is an integer. So y is set to the value of *p, which is some crazy
// integer value formed from the interpretation of a double as integer.
cout << "y value is: " << y << endl;
See comments in the code above. You are making a pointer point to data of a different type, then wondering why the binary digits transformed don't represent what it originally represented...
(Technically, this code is in violation of the rules about aliasing - basically, you are using the same piece of memory for two different purposes, and the compiler is not obliged to "do the right thing" in this case - I suspect it does something of what you can expect in this particular case - if what you expect is that you get an integer value represented by the four first bytes of a double, that is).
If you want y to actually contain the integer representation of the number in x, you want:
y = (int)x;
There is no sensible way we can use a pointer to do this conversion.
You're triggering undefined behavoir when you run the following line:
p = (int *) &x;
Once you've triggered undefined behavoir the rest of your question is fairly pointless as reasoning about undefined behavoir doesn't make sense (although looking at the rest of your question, it may be that this bit of undefined behavoir doesn't affect what you're really asking. I'm not sure though.)
Read C++ pointers and data types documentation
http://www.cplusplus.com/doc/tutorial/pointers/
http://www.cplusplus.com/doc/tutorial/variables/
I have the followig code and I want to know why I have the following output:
#include <iostream>
int main() {
double nValue = 5;
void *pVoid = &nValue;
short *pInt = static_cast<short*>(pVoid);
std::cout << *pInt << std::endl;
return 0;
}
And it outputs me '0'. I want to know why is this happening. Thank you!
You have UB (Undefined Behaviour), as you're violating pointer aliasing rules. This means anything can happen.
For example, the compiler has all rights to expect a short* will never refer to a double object, so it can pretty much interpret *pInt however it wants.
Or it's possible that the compiler interprets the code literally, and it just so happens that on your platform, the binary representation of 5.0 starts with two (or sizeof(short)) bytes of zeroes.
You are casting memory block with double data to short. Since you store a small value in that block it is not stored in first bits. Thus first bits are zero. But it rely on internal double representation and short size, so it is not guaranteed to be the same on different platforms
You're trying to interpret the contents of a double as a short. This invokes undefined behavior - your program is free to do anything.
pVoid points to a bit pattern that represents a double. The type information is lost for the compiler once you use a void*. When casting the void* to short*, you claim that that the bit pattern pointed to is a short, which it isn't. The representation of a short and a double in memory are completely different. When you dereference pInt, the memory at that location happens to be 0. The compiler no longer knows at this point that the value's type is really double, so no implicit conversion is possible, if that's what you were expecting.
Just for fun, most probably the double representation of 5 on your machine is this
*double = (5)
0000000000000000000000000000000000000000000000000000000000000101
^^^^^^^^^^^^^^^^
*short = (0)
This should show you why this is happeneing
int main() {
double nValue = 5;
short nValue_short = 5;
std::bitset<sizeof(double)*8> bit_double(nValue);
std::bitset<sizeof(short)*8> bit_short(nValue_short);
std::cout << bit_double << std::endl;
std::cout << bit_short << std::endl;
return 0;
}
i run this code in c++:
#include <iostream>
using namespace std;
int main()
{
float f = 7.0;
short s = *(short *)&f;
cout << sizeof(float) << endl
<< sizeof(short) << endl
<< s << endl;
return 0;
}
i get the following out pot:
4
2
0
but, in a lecture given in Stanford university, Professor Jerry Cain says he is sure the out pot well not be 0.
the lecture is can be fond here. he says that around the 48 minute.
is he wrong, or that some standard change since? or is there a difference between platforms?
I'm using g++ to compile my code.
EDIT: in the next lecture he does mention "big endian" and "small endian" and says that they well affect the result.
static void bitPrint(float f)
{
assert(sizeof(int) == sizeof(float));
int *data = reinterpret_cast<int*>(&f);
for (int i = 0; i < sizeof(int) * 8; ++i)
{
int bit = (1 << i) & *data;
if (bit) bit = 1;
cout << bit;
}
cout << endl;
}
int main()
{
float f = 7.0;
bitPrint(f);
return 0;
}
This program prints 00000000000000000000011100000010
Since the sizeof(short) == 2 on your platform you get the first 2 bytes which are both zeros
Note that since size of types and possibly float implementation (not sure about this) are implementation defined different output can be seen on different platforms.
Well, let's see. First you write a float into the memory. It occupies 4 bytes, and it's value is 7. A float in the memory looks something like "sign bit -> exponent bits -> mantissa bits". I'm not sure how many bits are there for each part exactly, probably that depends on your platform.
Since the float's value is 7, it only occupies some of the least-significant bits on the right (I assume big-endian).
Your short pointer points to the beginning of the float, which means to the most significant bit. Since the value is greater than 0, the sign bit is zero. Since the float value is far on the right, we can say that those two most significant bytes are filled with zeros.
Now, provided that a size of short is 2, which means we will only take two bytes out of float's 4 bytes, we get our 0.
I believe though, that this result is rather UB and can differ on different platforms, compilers, etc.
Accessing data through a pointer to a different type than it was stored as gives (except in a few special cases) undefined behavour.
Firstly it's platform dependent how the data it stored so different systems may well give different values, and secondly the compiler might well generate code that doesn't even see the value you'd expect as it's allowed to do anything it likes when you do this (It's undefined behavour due to the strict aliases rules).
Having said that there are probably reasons why the number you are seeing is valid, but you can't rely on it unless you specifically know your platform will do what you expect, it's not guarenteed by the standard.
He's "pretty" sure it's not zero, he says that explicitly.
However, given that the representation of a short can be big-endian or little-endian, I wouldn't be so certain. In any case, this is a throwaway line at the end of a fifty-minute lecture so we can forgive him a little. It may be he came back in the next lecture with a clarification.
You would need to examine the underlying bits at (at least) a byte-by-byte level to understand what's going on.
Here are the goals I'm trying to achieve:
I need to pack 32 bit IEEE floats into 30 bits.
I want to do this by decreasing the size of mantissa by 2 bits.
The operation itself should be as fast as possible.
I'm aware that some precision will be lost, and this is acceptable.
It would be an advantage, if this operation would not ruin special cases like SNaN, QNaN, infinities, etc. But I'm ready to sacrifice this over speed.
I guess this questions consists of two parts:
1) Can I just simply clear the least significant bits of mantissa? I've tried this, and so far it works, but maybe I'm asking for trouble... Something like:
float f;
int packed = (*(int*)&f) & ~3;
// later
f = *(float*)&packed;
2) If there are cases where 1) will fail, then what would be the fastest way to achieve this?
Thanks in advance
You actually violate the strict aliasing rules (section 3.10 of the C++ standard) with these reinterpret casts. This will probably blow up in your face when you turn on the compiler optimizations.
C++ standard, section 3.10 paragraph 15 says:
If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
Specifically, 3.10/15 doesn't allow us to access a float object via an lvalue of type unsigned int. I actually got bitten myself by this. The program I wrote stopped working after turning on optimizations. Apparently, GCC didn't expect an lvalue of type float to alias an lvalue of type int which is a fair assumption by 3.10/15. The instructions got shuffled around by the optimizer under the as-if rule exploiting 3.10/15 and it stopped working.
Under the following assumptions
float really corresponds to a 32bit IEEE-float,
sizeof(float)==sizeof(int)
unsigned int has no padding bits or trap representations
you should be able to do it like this:
/// returns a 30 bit number
unsigned int pack_float(float x) {
unsigned r;
std::memcpy(&r,&x,sizeof r);
return r >> 2;
}
float unpack_float(unsigned int x) {
x <<= 2;
float r;
std::memcpy(&r,&x,sizeof r);
return r;
}
This doesn't suffer from the "3.10-violation" and is typically very fast. At least GCC treats memcpy as an intrinsic function. In case you don't need the functions to work with NaNs, infinities or numbers with extremely high magnitude you can even improve accuracy by replacing "r >> 2" with "(r+1) >> 2":
unsigned int pack_float(float x) {
unsigned r;
std::memcpy(&r,&x,sizeof r);
return (r+1) >> 2;
}
This works even if it changes the exponent due to a mantissa overflow because the IEEE-754 coding maps consecutive floating point values to consecutive integers (ignoring +/- zero). This mapping actually approximates a logarithm quite well.
Blindly dropping the 2 LSBs of the float may fail for small number of unusual NaN encodings.
A NaN is encoded as exponent=255, mantissa!=0, but IEEE-754 doesn't say anything about which mantiassa values should be used. If the mantissa value is <= 3, you could turn a NaN into an infinity!
You should encapsulate it in a struct, so that you don't accidentally mix the usage of the tagged float with regular "unsigned int":
#include <iostream>
using namespace std;
struct TypedFloat {
private:
union {
unsigned int raw : 32;
struct {
unsigned int num : 30;
unsigned int type : 2;
};
};
public:
TypedFloat(unsigned int type=0) : num(0), type(type) {}
operator float() const {
unsigned int tmp = num << 2;
return reinterpret_cast<float&>(tmp);
}
void operator=(float newnum) {
num = reinterpret_cast<int&>(newnum) >> 2;
}
unsigned int getType() const {
return type;
}
void setType(unsigned int type) {
this->type = type;
}
};
int main() {
const unsigned int TYPE_A = 1;
TypedFloat a(TYPE_A);
a = 3.4;
cout << a + 5.4 << endl;
float b = a;
cout << a << endl;
cout << b << endl;
cout << a.getType() << endl;
return 0;
}
I can't guarantee its portability though.
How much precision do you need? If 16-bit float is enough (sufficient for some types of graphics), then ILM's 16-bit float ("half"), part of OpenEXR is great, obeys all kinds of rules (http://www.openexr.com/), and you'll have plenty of space left over after you pack it into a struct.
On the other hand, if you know the approximate range of values they're going to take, you should consider fixed point. They're more useful than most people realize.
I can't select any of the answers as the definite one, because most of them have valid information, but not quite what I was looking for. So I'll just summarize my conclusions.
The method for conversion I've posted in my question's part 1) is clearly wrong by C++ standard, so other methods to extract float's bits should be used.
And most important... as far as I understand from reading the responses and other sources about IEEE754 floats, it's ok to drop the least significant bits from mantissa. It will mostly affect only precision, with one exception: sNaN. Since sNaN is represented by exponent set to 255, and mantissa != 0, there can be situation where mantissa would be <= 3, and dropping last two bits would convert sNaN to +/-Infinity. But since sNaN are not generated during floating point operations on CPU, its safe under controlled environment.