Binary representation of a double - c++

I was bored and wanted to see what the binary representation of double's looked like. However, I noticed something weird on windows. The following lines of codes demonstrate
double number = 1;
unsigned long num = *(unsigned long *) &number;
cout << num << endl;
On my Macbook, this gives me a nonzero number. On my Windows machine it gives me 0.
I was expecting that it would give me a non zero number, since the binary representation of 1.0 as a double should not be all zeros. However, I am not really sure if what I am trying to do is well defined behavior.
My question is, is the code above just stupid and wrong? And, is there a way I can print out the binary representation of a double?
Thanks.

1 double is 3ff0 0000 0000 0000. long is a 4 byte int. On a little endian hardware you're reading the 0000 0000 part.

If your compiler supports it (GCC does) then use a union. This is undefined behavior according to the C++ standard (strict aliasing rule):
#include <iostream>
int main() {
union {
unsigned long long num;
double fp;
} pun;
pun.fp = 1.0;
std::cout << std::hex << pun.num << std::endl;
}
The output is
3ff0000000000000

Related

Casting from long double to unsigned long long appears broken in the MSVC C++ compiler

Consider the following code:
#include <iostream>
using namespace std;
int main(int argc, char *argv[])
{
long double test = 0xFFFFFFFFFFFFFFFF;
cout << "1: " << test << endl;
unsigned long long test2 = test;
cout << "2: " << test2 << endl;
cout << "3: " << (unsigned long long)test << endl;
return 0;
}
Compiling this code with GCC g++ (7.5.0) and running produces the following output as expected:
1: 1.84467e+19
2: 18446744073709551615
3: 18446744073709551615
However compiling this with the Microsoft Visual C++ compiler (16.8.31019.35, both 64-bit and 32-bit) and running produces the following output:
1: 1.84467e+19
2: 9223372036854775808
3: 9223372036854775808
When casting a value to an unsigned long long, the MSVC compiler won't give a value lager than the max of a (signed) long long.
Am I doing something wrong? 
Am I running into a compiler limitation that I do not know about?
Does anyone know of a possible workaround to this problem?
Because a MSVC long double is really just a double (as pointed out by #drescherjm in the comments), it does not have enough precision to contain the exact value of 0xFFFFFFFFFFFFFFFF. When this value is stored in the long double it gets "rounded" to a value that is lager than 0xFFFFFFFFFFFFFFFF. This then causes undefined behaviour when converting to an unsigned long long.
You are seeing undefined behaviour because, as pointed out in the comments, a long double is the same as a double in MSVC and the 'converted' value of your 0xFFFFFFFFFFFFFFFF (or ULLONG_MAX) actually gets 'rounded' to a slightly (but significantly) larger value, as can be seen in the following code:
int main(int argc, char* argv[])
{
long double test = 0xFFFFFFFFFFFFFFFF;
cout << 0xFFFFFFFFFFFFFFFFuLL << endl;
cout << fixed << setprecision(16) << endl;
cout << test << endl;
return 0;
}
Output:
18446744073709551615
18446744073709551616.0000000000000000
Thus, when converting that floating-point value back to an unsigned long long, you are falling foul of the conversion rules specified in this Microsoft document:
For conversion to unsigned long or unsigned long long, the result of converting an out-of-range value may be some value other than the
highest or lowest representable value. Whether the result is a
sentinel or saturated value or not depends on the compiler options and
target architecture. Future compiler releases may return a saturated
or sentinel value instead.
This UB can be further 'verified' (for want of a better term) by switching to the clang-cl compiler that can be used from within Visual Studio. For your original code, this then gives 0 for the values on both the "2" and "3" output lines.
Assuming that the clang (LLVM) compiler is not bound by the aforementioned "Microsoft Rules," we can, instead, fall back on the C++ Standard:
7.10 Floating-integral conversions      [conv.fpint]
1     A prvalue of a floating-point type
can be converted to a prvalue of an integer type. The conversion
truncates; that is, the fractional part is discarded. The behavior is
undefined if the truncated value cannot be represented in the
destination type.

conversion of integers into binary in c++

As we know, each value is stored in binary form inside memory. So, in C++, will these two values have different binary numbers when stored inside memory ?
unsigned int a = 90;
signed int b = 90;
So, in C++, will these two values have different binary numbers when stored inside memory ?
The C++ language doesn't specify whether they do. Ultimately, the binary representation is dictated by the hardware, so the answer technically depends on that.
That said, I haven't encountered hardware and C++ implementation where identically valued signed and unsigned variants of an integer didn't have identical binary representation. As such, I would find it surprising if the binary representations were different.
Sidenote: Since "byte" is the smallest addressable unit of memory in C++, there isn't a way in the language to observe a directional order of individual bits in memory.
Consider the value 63. In binary it is 111111 and in hex it is 3f.
Because char is special in C++, and any object can be viewed as a sequence of bytes, you can directly look at the binary representation:
#include <iostream>
#include <iomanip>
int main()
{
unsigned int a = 63;
signed int b = 63;
std::cout << std::hex;
char* a_bin = reinterpret_cast<char*>(&a);
for (int i=0; i < sizeof(unsigned int); ++i)
std::cout << std::setw(4) << std::setfill('0') << static_cast<unsigned>(*(a_bin+i)) << " ";
std::cout << "\n";
char* b_bin = reinterpret_cast<char*>(&b);
for (int i=0; i < sizeof(signed int); ++i)
std::cout << std::setw(4) << std::setfill('0') << static_cast<unsigned>(*(b_bin+i)) << " ";
}
Unfortunately, there is no std::bin io-manipulator, so I used std::hex (it is sticky). The reinterpret_cast is ok, because of the aforementioned special rules for char. Because std::cout << has special overload to print characters, but we want to see numerical values, another cast is needed. The output of the above is:
003f 0000 0000 0000
003f 0000 0000 0000
Live Demo
As already mentioned in a comment, the byte order is implementation defined. Moreover, I have to admit that I am not aware about the very details of what the standard has to say about this. Be careful with assumptions about byte representation, especially when transfering objects between two programs or over a wire. You would typically use some form of de-/serialization, such that you are in control of the byte representations to be transfered.
TL;DR: Typically yes, in general you need to carefully consider what the C++ standard mandates, and I am not aware of signed and unsigned being guaranteed to have same byte representations.

c++ print number in hexadecimal right after floor function

I've noticed some weird behaviour in c++ which i don't understand,
i'm trying to print a truncated double in a hexadecimal representation
this code output is 17 which is a decimal representation
double a = 17.123;
cout << hex << floor(a) << '\n';
while this code output is 11 and also my desirable output
double a = 17.123;
long long aASll = floor(a);
cout << hex << aASll << '\n';
as double can get really big numbers i'm afraid of wrong output while storing the truncated number in long long variable, any suggestions or improvements?
Quoting CPPreference's documentation page for std::hex (and friends)
Modifies the default numeric base for integer I/O.
This suggests that std::hex does not have any effect on floating point inputs. The best you are going to get is
cout << hex << static_cast<long long>(floor(a)) << '\n';
or a function that does the same.
uintmax_t from <cstdint> may be useful to get the largest available integer if the values are always positive. After all, what is a negative hex number?
Since a double value can easily exceed the maximum resolution of available integers, this won't cover the whole range. If the floored values exceed what can fit in an integer type, you are going to have to do the conversion by hand or use a big integer library.
Side note: std::hexfloat does something very different and does not work correctly in all compilers due to some poor wording in the current Standard that is has since been hammered out and should be corrected in the next revision.
Just write your own version of floor and have it return an integral value. For example:
long long floorAsLongLong(double d)
{
return (long long)floor(d);
}
int main() {
double a = 17.123;
cout << hex << floorAsLongLong(a) << endl;
}

Bit representation of float using an int pointer

I have the following exercise:
Implement a function void float to bits(float x) which prints the bit
representation of x. Hint: Casting a float to an int truncates the
fractional part, but no information is lost casting a float pointer to
an int pointer.
Now, I know that a float is represented by a sign-bit, some bits for its mantissa, some bits for the basis and some bits for the exponent. It depends on my system how many bits are used.
The problem we are facing here is that our number basically has two parts. Let's consider 8.7 the bit representation of this number would be (to my understanding) the following: 1000.0111
Now, float's are stored wit a leading zero, so 8.8 would become 0.88*10^1
So I somehow have to get all the information out of my memory. I don't really see how I should do that. What should that hint hint me to? What's the difference between a integer pointer and a float pointer?
Currently I have this:
void float_to_bits() {
float a = 4.2345678f;
int* b;
b = (int*)(&a);
*b = a;
std::cout << *(b) << "\n";
}
But I really don't get the bigger picture behind the hint here. How do I get the mantissa, the exponent, the sign and the basis? I also tried playing around with the bit-wise operators >>, <<. But I just don't see how this should help me here, since they won't change the pointers position. It's useful to get e.g. the bit representation of an integer but that's about it, no idea what use it'd be here.
The hint your teacher gave is misleading: casting pointer between different types is at best implementation defined. However, memcpy(...)ing an object to a suutably sized array if unsigned char is defined. The content if the resulting array can then be decomposed into bits. Here is a quick hack to represent the bits using hexadecimal values:
#include <iostream>
#include <iomanip>
#include <cstring>
int main() {
float f = 8.7;
unsigned char bytes[sizeof(float)];
std::memcpy(bytes, &f, sizeof(float));
std::cout << std::hex << std::setfill(‘0’);
for (int b: bytes) {
std::cout << std::setw(2) << b;
}
std::cout << ‘\n’;
}
Note that IEEE 754 binary floating points do not store the full significand (the standard doesn’t use mantissa as a term) except for denormalized values: the 32 bit floats store
1 bit for the sign
8 bits for the exponent
23 bits for the normalized significand with the non-zero high bit being implied
The hint directs you how to pass the Float into an Integer without passing through value conversion.
When you assign floating-point value to an integer, the processor removes the fraction part. int i = (int) 4.502f; will result in i=4;
but when you make a int pointer (int*) point to a float's location,
no conversion is made, also when you read the int* value.
to show the representation, i like seeing HEX numbers,
thats why my first example was given in HEX
(each Hexa-decimal digit represents 4 binary digits).
but it is also possible to print as binary,
and there are many ways (I like this one best!)
Follows an annotated example code:
Also available # Culio
#include <iostream>
#include <bitset>
using namespace std;
int main()
{
float a = 4.2345678f; // allocate space for a float. Call it 'a' and put the floating point value of `4.2345678f` in it.
unsigned int* b; // allocate a space for a pointer (address), call the space b, (hint to compiler, this will point to integer number)
b = (unsigned int*)(&a); // GREAT, exactly what you needed! take the float 'a', get it's address '&'.
// by default, it is an address pointing at float (float*) , so you correctly cast it to (int*).
// Bottom line: Set 'b' to the address of a, but treat this address of an int!
// The Hint implied that this wont cause type conversion:
// int someInt = a; // would cause `someInt = 4` same is your line below:
// *b = a; // <<<< this was your error.
// 1st thing, it aint required, as 'b' already pointing to `a` address, hence has it's value.
// 2nd by this, you set the value pointed by `b` to 'a' (including conversion to int = 4);
// the value in 'a' actually changes too by this instruction.
cout << a << " in binary " << bitset<32>(*b) << endl;
cout << "Sign " << bitset<1>(*b >> 31) << endl; // 1 bit (31)
cout << "Exp " << bitset<8>(*b >> 23) << endl; // 8 bits (23-30)
cout << "Mantisa " << bitset<23>(*b) << endl; // 23 bits (0-22)
}

shifting the binary numbers in c++

#include <iostream>
int main()
{
using namespace std;
int number, result;
cout << "Enter a number: ";
cin >> number;
result = number << 1;
cout << "Result after bitshifting: " << result << endl;
}
If the user inputs 12, the program outputs 24.
In a binary representation, 12 is 0b1100. However, the result the program prints is 24 in decimal, not 8 (0b1000).
Why does this happen? How may I get the result I except?
Why does the program output 24?
You are right, 12 is 0b1100 in its binary representation. That being said, it also is 0b001100 if you want. In this case, bitshifting to the left gives you 0b011000, which is 24. The program produces the excepted result.
Where does this stop?
You are using an int variable. Its size is typically 4 bytes (32 bits) when targeting 32-bit. However, it is a bad idea to rely on int's size. Use stdint.h when you need specific sizes variables.
A word of warning for bitshifting over signed types
Using the << bitshift operator over negative values is undefined behavior. >>'s behaviour over negative values is implementation-defined. In your case, I would recommend you to use an unsigned int (or just unsigned which is the same), because int is signed.
How to get the result you except?
If you know the size (in bits) of the number the user inputs, you can use a bitmask using the & (bitwise AND) operator. e.g.
result = (number << 1) & 0b1111; // 0xF would also do the same