How do I flip the bits of a double?

How do I flip the bits of a double? - c++

Consider this code:
#include <iostream>
int main(){
double k = ~0.0;
std::cout << k << "\n";
}
It doesn't compile. I want to get a double value with all the bits set, which would be a NaN. Why doesn't this code work, and how do I flip all the bits of a double?

Regarding the code in the original question:
The 0 here is the int literal 0. ~0 is an int with value -1. You are initializing k with the int -1. The conversion from int to double doesn't change the numerical value (but does change the bit pattern), and then you print out the resulting double (which is still representing -1).
Now, for the current question: You can't apply bitwise NOT to a double. It's just not an allowed operation, precisely because it tends not to do anything useful to floating point values. It exists for built in integral types (plus anything with operator~) only.
If you would like to flip all the bits in an object, the standard conformant way is to do something like this:
#include <memory>
void flip_bits(auto &x) {
// iterate through bytes of x and flip all of them
std::byte *p = reinterpret_cast<std::byte*>(std::addressof(x));
for(std::size_t i = 0; i < sizeof(x); i++) p[i] = ~p[i];
}
Then
int main() {
double x = 0;
flip_bits(x);
std::cout << x << "\n";
}
may (will usually) print some variation of nan (dependent on how your implementation actually represents double, of course).
Example on Godbolt

// the numeric constant ~0 is an integer
int foo = ~0;
std::cout << foo << '\n'; //< prints -1
// now it converts the int value of -1 to a double.
double k = foo;
If you want to invert all of the bits you'll need to use a union with a uint64.
#include <iostream>
#include <cstdint>
int main(){
union {
double k;
uint64_t u;
} double_to_uint64;
double_to_uint64.u = ~0ULL;
std::cout << double_to_uint64.k;
}
Which will result in a -NAN.

Related

Unexpected boolean result with vector size and -1 [duplicate]

Please take a look at this simple program:
#include <iostream>
#include <vector>
using namespace std;
int main() {
vector<int> a;
std::cout << "vector size " << a.size() << std::endl;
int b = -1;
if (b < a.size())
std::cout << "Less";
else
std::cout << "Greater";
return 0;
}
I'm confused by the fact that it outputs "Greater" despite it's obvious that -1 is less than 0. I understand that size method returns unsigned value but comparison is still applied to -1 and 0. So what's going on? can anyone explain this?

Because the size of a vector is an unsigned integral type. You are comparing an unsigned type with a signed one, and the two's complement negative signed integer is being promoted to unsigned. That corresponds to a large unsigned value.
This code sample shows the same behaviour that you are seeing:
#include <iostream>
int main()
{
std::cout << std::boolalpha;
unsigned int a = 0;
int b = -1;
std::cout << (b < a) << "\n";
}
output:
false

The signature for vector::size() is:
size_type size() const noexcept;
size_type is an unsigned integral type. When comparing an unsigned and a signed integer, the signed one is promoted to unsigned. Here, -1 is negative so it rolls over, effectively yielding the maximal representable value of the size_type type. Hence it will compare as greater than zero.

-1 unsigned is a higher value than zero because the high bit is set to indicate that it's negative but unsigned comparison uses this bit to expand the range of representable numbers so it's no longer used as a sign bit. The comparison is done as (unsigned int)-1 < 0 which is false.

why for (int i=0; i<-1; i++) runs? [duplicate]

Please take a look at this simple program:
#include <iostream>
#include <vector>
using namespace std;
int main() {
vector<int> a;
std::cout << "vector size " << a.size() << std::endl;
int b = -1;
if (b < a.size())
std::cout << "Less";
else
std::cout << "Greater";
return 0;
}
I'm confused by the fact that it outputs "Greater" despite it's obvious that -1 is less than 0. I understand that size method returns unsigned value but comparison is still applied to -1 and 0. So what's going on? can anyone explain this?

Because the size of a vector is an unsigned integral type. You are comparing an unsigned type with a signed one, and the two's complement negative signed integer is being promoted to unsigned. That corresponds to a large unsigned value.
This code sample shows the same behaviour that you are seeing:
#include <iostream>
int main()
{
std::cout << std::boolalpha;
unsigned int a = 0;
int b = -1;
std::cout << (b < a) << "\n";
}
output:
false

The signature for vector::size() is:
size_type size() const noexcept;
size_type is an unsigned integral type. When comparing an unsigned and a signed integer, the signed one is promoted to unsigned. Here, -1 is negative so it rolls over, effectively yielding the maximal representable value of the size_type type. Hence it will compare as greater than zero.

-1 unsigned is a higher value than zero because the high bit is set to indicate that it's negative but unsigned comparison uses this bit to expand the range of representable numbers so it's no longer used as a sign bit. The comparison is done as (unsigned int)-1 < 0 which is false.

Why 0<-1 condition failing in a for loop in c++ and creating infinite loop [duplicate]

Please take a look at this simple program:
#include <iostream>
#include <vector>
using namespace std;
int main() {
vector<int> a;
std::cout << "vector size " << a.size() << std::endl;
int b = -1;
if (b < a.size())
std::cout << "Less";
else
std::cout << "Greater";
return 0;
}
I'm confused by the fact that it outputs "Greater" despite it's obvious that -1 is less than 0. I understand that size method returns unsigned value but comparison is still applied to -1 and 0. So what's going on? can anyone explain this?

Because the size of a vector is an unsigned integral type. You are comparing an unsigned type with a signed one, and the two's complement negative signed integer is being promoted to unsigned. That corresponds to a large unsigned value.
This code sample shows the same behaviour that you are seeing:
#include <iostream>
int main()
{
std::cout << std::boolalpha;
unsigned int a = 0;
int b = -1;
std::cout << (b < a) << "\n";
}
output:
false

The signature for vector::size() is:
size_type size() const noexcept;
size_type is an unsigned integral type. When comparing an unsigned and a signed integer, the signed one is promoted to unsigned. Here, -1 is negative so it rolls over, effectively yielding the maximal representable value of the size_type type. Hence it will compare as greater than zero.

-1 unsigned is a higher value than zero because the high bit is set to indicate that it's negative but unsigned comparison uses this bit to expand the range of representable numbers so it's no longer used as a sign bit. The comparison is done as (unsigned int)-1 < 0 which is false.

Conceptual problem in Union

My code is this
// using_a_union.cpp
#include <stdio.h>
union NumericType
{
int iValue;
long lValue;
double dValue;
};
int main()
{
union NumericType Values = { 10 }; // iValue = 10
printf("%d\n", Values.iValue);
Values.dValue = 3.1416;
printf("%d\n", Values.iValue); // garbage value
}
Why do I get garbage value when I try to print Values.iValue after doing Values.dValue = 3.1416?
I thought the memory layout would be like this. What happens to Values.iValue and
Values.lValue; when I assign something to Values.dValue ?

In a union, all of the data members overlap. You can only use one data member of a union at a time.
iValue, lValue, and dValue all occupy the same space.
As soon as you write to dValue, the iValue and lValue members are no longer usable: only dValue is usable.
Edit: To address the comments below: You cannot write to one data member of a union and then read from another data member. To do so results in undefined behavior. (There's one important exception: you can reinterpret any object in both C and C++ as an array of char. There are other minor exceptions, like being able to reinterpret a signed integer as an unsigned integer.) You can find more in both the C Standard (C99 6.5/6-7) and the C++ Standard (C++03 3.10, if I recall correctly).
Might this "work" in practice some of the time? Yes. But unless your compiler expressly states that such reinterpretation is guaranteed to be work correctly and specifies the behavior that it guarantees, you cannot rely on it.

Because floating point numbers are represented differently than integers are.
All of those variables occupy the same area of memory (with the double occupying more obviously). If you try to read the first four bytes of that double as an int you are not going to get back what you think. You are dealing with raw memory layout here and you need to know how these types are represented.
EDIT: I should have also added (as James has already pointed out) that writing to one variable in a union and then reading from another does invoke undefined behavior and should be avoided (unless you are re-interpreting the data as an array of char).

Well, let's just look at simpler example first. Ed's answer describes the floating part, but how about we examine how ints and chars are stored first!
Here's an example I just coded up:
#include "stdafx.h"
#include <iostream>
using namespace std;
union Color {
int value;
struct {
unsigned char R, G, B, A;
};
};
int _tmain(int argc, _TCHAR* argv[])
{
Color c;
c.value = 0xFFCC0000;
cout << (int)c.R << ", " << (int)c.G << ", " << (int)c.B << ", " << (int)c.A << endl;
getchar();
return 0;
}
What would you expect the output to be?
255, 204, 0, 0
Right?
If an int is 32 bits, and each of the chars is 8 bits, then R should correspond to the to the left-most byte, G the second one, and so forth.
But that's wrong. At least on my machine/compiler, it appears ints are stored in reverse byte order. I get,
0, 0, 204, 255
So to make this give the output we'd expect (or the output I would have expected anyway), we have to change the struct to A,B,G,R. This has to do with endianness.
Anyway, I'm not an expert on this stuff, just something I stumbled upon when trying to decode some binaries. The point is, floats aren't necessarily encoded the way you'd expect either... you have to understand how they're stored internally to understand why you're getting that output.

You've done this:
union NumericType Values = { 10 }; // iValue = 10
printf("%d\n", Values.iValue);
Values.dValue = 3.1416;
How a compiler uses memory for this union is similar to using the variable with largest size and alignment (any of them if there are several), and reinterpret cast when one of the other types in the union is written/accessed, as in:
double dValue; // creates a variable with alignment & space
// as per "union Numerictype Values"
*reinterpret_cast<int*>(&dValue) = 10; // separate step equiv. to = { 10 }
printf("%d\n", *reinterpret_cast<int*>(dValue)); // print as int
dValue = 3.1416; // assign as double
printf("%d\n", *reinterpret_cast<int*>(dValue)); // now print as int
The problem is that in setting dValue to 3.1416 you've completely overwritten the bits that used to hold the number 10. The new value may appear to be garbage, but it's simply the result of interpreting the first (sizeof int) bytes of the double 3.1416, trusting there to be a useful int value there.
If you want the two things to be independent - so setting the double doesn't affect the earlier-stored int - then you should use a struct/class.
It may help you to consider this program:
#include <iostream>
void print_bits(std::ostream& os, const void* pv, size_t n)
{
for (int i = 0; i < n; ++i)
{
uint8_t byte = static_cast<const uint8_t*>(pv)[i];
for (int j = 0; j < 8; ++j)
os << ((byte & (128 >> j)) ? '1' : '0');
os << ' ';
}
}
union X
{
int i;
double d;
};
int main()
{
X x = { 10 };
print_bits(std::cout, &x, sizeof x);
std::cout << '\n';
x.d = 3.1416;
print_bits(std::cout, &x, sizeof x);
std::cout << '\n';
}
Which, for me, produced this output:
00001010 00000000 00000000 00000000 00000000 00000000 00000000 00000000
10100111 11101000 01001000 00101110 11111111 00100001 00001001 01000000
Crucially, the first half of each line shows the 32 bits that are used for iValue: note the 1010 binary in the least significant byte (on the left on an Intel CPU like mine) is 10 decimal. Writing 3.1416 changes the entire 64-bits to a pattern representing 3.1416 (see http://en.wikipedia.org/wiki/Double_precision_floating-point_format). The old 1010 pattern is overwritten, clobbered, an electromagnetic memory no more.

C++ floating point to integer type conversions

What are the different techniques used to convert float type of data to integer in C++?
#include <iostream>
using namespace std;
struct database {
int id, age;
float salary;
};
int main() {
struct database employee;
employee.id = 1;
employee.age = 23;
employee.salary = 45678.90;
/*
How can i print this value as an integer
(with out changing the salary data type in the declaration part) ?
*/
cout << endl << employee.id << endl << employee.
age << endl << employee.salary << endl;
return 0;
}

What you are looking for is 'type casting'. typecasting (putting the type you know you want in brackets) tells the compiler you know what you are doing and are cool with it. The old way that is inherited from C is as follows.
float var_a = 9.99;
int var_b = (int)var_a;
If you had only tried to write
int var_b = var_a;
You would have got a warning that you can't implicitly (automatically) convert a float to an int, as you lose the decimal.
This is referred to as the old way as C++ offers a superior alternative, 'static cast'; this provides a much safer way of converting from one type to another. The equivalent method would be (and the way you should do it)
float var_x = 9.99;
int var_y = static_cast<int>(var_x);
This method may look a bit more long winded, but it provides much better handling for situations such as accidentally requesting a 'static cast' on a type that cannot be converted. For more information on the why you should be using static cast, see this question.

Normal way is to:
float f = 3.4;
int n = static_cast<int>(f);

Size of some float types may exceed the size of int.
This example shows a safe conversion of any float type to int using the int safeFloatToInt(const FloatType &num); function:
#include <iostream>
#include <limits>
using namespace std;
template <class FloatType>
int safeFloatToInt(const FloatType &num) {
//check if float fits into integer
if ( numeric_limits<int>::digits < numeric_limits<FloatType>::digits) {
// check if float is smaller than max int
if( (num < static_cast<FloatType>( numeric_limits<int>::max())) &&
(num > static_cast<FloatType>( numeric_limits<int>::min())) ) {
return static_cast<int>(num); //safe to cast
} else {
cerr << "Unsafe conversion of value:" << num << endl;
//NaN is not defined for int return the largest int value
return numeric_limits<int>::max();
}
} else {
//It is safe to cast
return static_cast<int>(num);
}
}
int main(){
double a=2251799813685240.0;
float b=43.0;
double c=23333.0;
//unsafe cast
cout << safeFloatToInt(a) << endl;
cout << safeFloatToInt(b) << endl;
cout << safeFloatToInt(c) << endl;
return 0;
}
Result:
Unsafe conversion of value:2.2518e+15
2147483647
43
23333

For most cases (long for floats, long long for double and long double):
long a{ std::lround(1.5f) }; //2l
long long b{ std::llround(std::floor(1.5)) }; //1ll

Check out the boost NumericConversion library. It will allow to explicitly control how you want to deal with issues like overflow handling and truncation.

I believe you can do this using a cast:
float f_val = 3.6f;
int i_val = (int) f_val;

the easiest technique is to just assign float to int, for example:
int i;
float f;
f = 34.0098;
i = f;
this will truncate everything behind floating point or you can round your float number before.

One thing I want to add. Sometimes, there can be precision loss. You may want to add some epsilon value first before converting. Not sure why that works... but it work.
int someint = (somedouble+epsilon);

This is one way to convert IEEE 754 float to 32-bit integer if you can't use floating point operations. It has also a scaler functionality to include more digits to the result. Useful values for scaler are 1, 10 and 100.
#define EXPONENT_LENGTH 8
#define MANTISSA_LENGTH 23
// to convert float to int without floating point operations
int ownFloatToInt(int floatBits, int scaler) {
int sign = (floatBits >> (EXPONENT_LENGTH + MANTISSA_LENGTH)) & 1;
int exponent = (floatBits >> MANTISSA_LENGTH) & ((1 << EXPONENT_LENGTH) - 1);
int mantissa = (floatBits & ((1 << MANTISSA_LENGTH) - 1)) | (1 << MANTISSA_LENGTH);
int result = mantissa * scaler; // possible overflow
exponent -= ((1 << (EXPONENT_LENGTH - 1)) - 1); // exponent bias
exponent -= MANTISSA_LENGTH; // modify exponent for shifting the mantissa
if (exponent <= -(int)sizeof(result) * 8) {
return 0; // underflow
}
if (exponent > 0) {
result <<= exponent; // possible overflow
} else {
result >>= -exponent;
}
if (sign) result = -result; // handle sign
return result;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How do I flip the bits of a double? - c++

Consider this code: #include <iostream> int main(){ double k = ~0.0; std::cout << k << "\n"; } It doesn't compile. I want to get a double value with all the bits set, which would be a NaN. Why doesn't this code work, and how do I flip all the bits of a double?

Related

Unexpected boolean result with vector size and -1 [duplicate]

why for (int i=0; i<-1; i++) runs? [duplicate]

Why 0<-1 condition failing in a for loop in c++ and creating infinite loop [duplicate]

Conceptual problem in Union

C++ floating point to integer type conversions

Categories

Resources