printing the bits of float - c++

I know. I know. This question has been answered before, but I have a slightly different and a bit more specific question.
My goal is, as the title suggest, to cout the 32 bits sequence of a float.
And the solution provided in the previous questions is to use a union.
union ufloat{
float f;
uint32_t u;
};
This is all good and well. And I have managed to print the bits of a floating number.
union ufloat uf;
uf.f = numeric_limits<float>::max();
cout << bitset<32>(uf.f) << "\n"; // This give me 0x00000000, which is wrong.
cout << bitset<32>(uf.u) << "\n"; // This give me the correct bit sequence.
My question is why does bitset<32>(uf.f) doesn't work, but bitset<32>(uf.u) does?
The green-ticked anwser of this question
Obtaining bit representation of a float in C says something about "type-punning", and I presume it's got something to do with that. But I am not sure how exactly.
Can someone please clearify? Thanks

The constructor of std::bitset you are calling is:
constexpr bitset( unsigned long long val ) noexcept;
When you do bitset<32>(uf.f), you are converting a float to an unsigned long long value. For numeric_limits<float>::max(), that's undefined behavior because it's outside the range of the destination type, and in your case it happened to produce zero.
When you do bitset<32>(uf.u), you are again relying on undefined behavior, and in this case it happens to do what you want: convert a uint32_t that contains the bits of a float to an unsigned long long value.
In C++, what you should do instead is use memcpy:
uint32_t u;
std::memcpy(&u, &uf.f, sizeof(u));
std::cout << std::bitset<32>(u) << "\n";

This question is tagged C++, but the linked one is about C.
Those are different languages, with different rules. In particular, as mentioned in the comments by Yksisarvinen:
Type punning is a way to read bytes of memory as if they were different type than they really are. This is possible to do through union in C, but in C++ reading anything but the last written to member of union is Undefined Behaviour.
Since C++20, we can use std::bit_cast.
auto f{ std::numeric_limits<float>::max() }; // -> 3.40282e+38
auto u{ std::bit_cast<unsigned>(f) }; // -> 7f7fffff
See e.g. https://godbolt.org/z/d9T3G6qGx.

Related

How to iterate over every bit of a type in C++

I wanted to write the Digital Search Tree in C++ using templates. To do that given a type T and data of type T I have to iterate over bits of this data. Doing this on integers is easy, one can just shift the number to the right an appropriate number of positions and "&" the number with 1, like it was described for example here How to get nth bit values . The problem starts when one tries to do get i'th bit from the templated data. I wrote something like this
#include <iostream>
template<typename T>
bool getIthBit (T data, unsigned int bit) {
return ((*(((char*)&data)+(bit>>3)))>>(bit&7))&1;
}
int main() {
uint32_t a = 16;
for (int i = 0; i < 32; i++) {
std::cout << getIthBit (a, i);
}
std::cout << std::endl;
}
Which works, but I am not exactly sure if it is not undefined behavior. The problem with this is that to iterate over all bits of the data, one has to know how many of them are, which is hard for struct data types because of padding. For example here
#include <iostream>
struct s {
uint32_t i;
char c;
};
int main() {
std::cout << sizeof (s) << std::endl;
}
The actual data has 5 bytes, but the output of the program says it has 8. I don't know how to get the actual size of the data, or if it is at all possible. A question about this was asked here How to check the size of struct w/o padding? , but the answers are just "don't".
It's easy to know know how many bits there are in a type. There's exactly CHAR_BIT * sizeof(T). sizeof(T) is the actual size of the type in bytes. But indeed, there isn't a general way within standard C++ to know which of those bits - that are part of the type - are padding.
I recommend not attempting to support types that have padding as keys of your DST.
Following trick might work for finding padding bits of trivially copyable classes:
Use std::memset to set all bits of the object to 0.
For each sub object with no sub objects of their own, set all bits to 1 using std::memset.
For each sub object with their own sub objects, perform the previous and this step recursively.
Check which bits stayed 0.
I'm not sure if there are any technical guarantees that the padding actually stays 0, so whether this works may be unspecified. Furthermore, there can be non-classes that have padding, and the described trick won't detect those. long double is typical example; I don't know if there are others. This probably won't detect unused bits of integers that underlie bitfields.
So, there are a lot of caveats, but it should work in your example case:
s sobj;
std::memset(&sobj, 0, sizeof sobj);
std::memset(&sobj.i, -1, sizeof sobj.i);
std::memset(&sobj.c, -1, sizeof sobj.c);
std::cout << "non-padding bits:\n";
unsigned long long ull;
std::memcpy(&ull, &sobj, sizeof sobj);
std::cout << std::bitset<sizeof sobj * CHAR_BIT>(ull) << std::endl;
There's a Standard way to know if a type has unique representation or not. It is std::has_unique_object_representations, available since C++17.
So if an object has unique representations, it is safe to assume that every bit is significant.
There's no standard way to know if non-unique representation caused by padding bytes/bits like in struct { long long a; char b; }, or by equivalent representations¹. And no standard way to know padding bits/bytes offsets.
Note that "actual size" concept may be misleading, as padding can be in the middle, like in struct { char a; long long b; }
Internally compiler has to distinguish padding bits from value bits to implement C++20 atomic<T>::compare_exchange_*. MSVC does this by zeroing padding bits with __builtin_zero_non_value_bits. Other compiler may use other name, another approach, or not expose atomic<T>::compare_exchange_* internals to this level.
¹ like multiple NaN floating point values

Type casting struct to integer and vice versa in C++

So, I've seen this thread Type casting struct to integer c++ about how to cast between integers and structs (bitfields) and undoubtly, writing a proper conversion function or overloading the relevant casting operators is the way to go for any cases where there is an operating system involved.
However, when writing firmware for a small embedded system where only one flash image is run, the case might be different insofar, as security isn't so much of a concern while performance is.
Since I can test whether the code works properly (meaning the bits of a bitfield are arranged the way I would expect them to be) each time when compiling my code, the answer might be different here.
So, my question is, whether there is a 'proper' way to convert between bitfield and unsigned int that does compile to no operations in g++ (maybe shifts will get optimised away when the compiler knows the bits are arranged correctly in memory).
This is an excerpt from the original question:
struct {
int part1 : 10;
int part2 : 6;
int part3 : 16;
} word;
I can then set part2 to be equal to whatever value is requested, and set the other parts as 0.
word.part1 = 0;
word.part2 = 9;
word.part3 = 0;
I now want to take that struct, and convert it into a single 32 bit integer. I do have it compiling by forcing the casting, but it does not seem like a very elegant or secure way of converting the data.
int x = *reinterpret_cast<int*>(&word);
EDIT:
Now, quite some time later, I have learned some things:
1) Type punning (changing the interpretation of data) by means of pointer casting is, undefined behaviour since C99 and C++98. These language changes introduced strict aliasing rules (They allow the compiler to reason that data is only accessed through pointers of compatible type) to allow for better optimisations. In effect, the compiler will not need to keep the ordering between accesses (or do the off-type access at all). For most cases, this does not seem to present a [immediate] problem, but when using higher optimisation settings (for gcc that is -O which includes -fstrict-aliasing) this will become a problem.
For examples see https://blog.regehr.org/archives/959
2) Using unions for type punning also seems to involve undefined behaviour in C++ but not C (See https://stackoverflow.com/a/25672839/4360539), however GCC (and probably others) does explicitly allow it: (See https://gcc.gnu.org/bugs/#nonbugs).
3) The only really reliable way of doing type punning in C++ seems to be using memcpy to copy the data to a new location and perform whatever is to be done and then to use another memcpy to return the changes. I did read somewhere on SO, that GCC (or most compilers probably) should be able to optimise the memcpy to a mere register copy for register-sized data types, but I cannot find it again.
So probably the best thing to do here is to use the union if you can be sure the code is compiled by a compiler supporting type punning through a union. For the other cases, further investigation would be needed how the compiler treats bigger data structures and memcpy and if this really involves copying back and forth, probably sticking with bitwise operations is the best idea.
union {
struct {
int part1: 10;
int part2: 6;
int part3: 16;
} parts;
int whole;
} word;
Then just use word.whole.
I had the same problem. I am guessing this is not very relevant today. But this is how I solved it:
#include <iostream>
struct PACKED{
int x:10;
int y:10;
int z:12;
PACKED operator=(int num )
{
*( int* )this = num;
return *this;
}
operator int()
{
int *x;
x = (int*)this;
return *x;
}
} __attribute__((packed));
int main(void) {
std::cout << "size: " << sizeof(PACKED) << std::endl;
PACKED bf;
bf = 0xFFF00000;
std::cout << "Values ( x, y, z ) = " << bf.x << " " << bf.y << " " << bf.z << std::endl;
int testint;
testint = bf;
std::cout << "As integer: " << testint << std::endl;
return 0;
}
This now fits on a int, and is assignable by standard ints. However I do not know how portable this solution is. The output of this is then:
size: 4
Values ( x, y, z ) = 0 0 -1
As integer: -1048576

Pointers typecast

I have the followig code and I want to know why I have the following output:
#include <iostream>
int main() {
double nValue = 5;
void *pVoid = &nValue;
short *pInt = static_cast<short*>(pVoid);
std::cout << *pInt << std::endl;
return 0;
}
And it outputs me '0'. I want to know why is this happening. Thank you!
You have UB (Undefined Behaviour), as you're violating pointer aliasing rules. This means anything can happen.
For example, the compiler has all rights to expect a short* will never refer to a double object, so it can pretty much interpret *pInt however it wants.
Or it's possible that the compiler interprets the code literally, and it just so happens that on your platform, the binary representation of 5.0 starts with two (or sizeof(short)) bytes of zeroes.
You are casting memory block with double data to short. Since you store a small value in that block it is not stored in first bits. Thus first bits are zero. But it rely on internal double representation and short size, so it is not guaranteed to be the same on different platforms
You're trying to interpret the contents of a double as a short. This invokes undefined behavior - your program is free to do anything.
pVoid points to a bit pattern that represents a double. The type information is lost for the compiler once you use a void*. When casting the void* to short*, you claim that that the bit pattern pointed to is a short, which it isn't. The representation of a short and a double in memory are completely different. When you dereference pInt, the memory at that location happens to be 0. The compiler no longer knows at this point that the value's type is really double, so no implicit conversion is possible, if that's what you were expecting.
Just for fun, most probably the double representation of 5 on your machine is this
*double = (5)
0000000000000000000000000000000000000000000000000000000000000101
^^^^^^^^^^^^^^^^
*short = (0)
This should show you why this is happeneing
int main() {
double nValue = 5;
short nValue_short = 5;
std::bitset<sizeof(double)*8> bit_double(nValue);
std::bitset<sizeof(short)*8> bit_short(nValue_short);
std::cout << bit_double << std::endl;
std::cout << bit_short << std::endl;
return 0;
}

How does one safely static_cast between unsigned int and int?

I have an 8-character string representing a hexadecimal number and I need to convert it to an int. This conversion has to preserve the bit pattern for strings "80000000" and higher, i.e., those numbers should come out negative. Unfortunately, the naive solution:
int hex_str_to_int(const string hexStr)
{
stringstream strm;
strm << hex << hexStr;
unsigned int val = 0;
strm >> val;
return static_cast<int>(val);
}
doesn't work for my compiler if val > MAX_INT (the returned value is 0). Changing the type of val to int also results in a 0 for the larger numbers. I've tried several different solutions from various answers here on SO and haven't been successful yet.
Here's what I do know:
I'm using HP's C++ compiler on OpenVMS (using, I believe, an Itanium processor).
sizeof(int) will be at least 4 on every architecture my code will run on.
Casting from a number > INT_MAX to int is implementation-defined. On my machine, it usually results in a 0 but interestingly casting from long to int results in INT_MAX when the value is too big.
This is surprisingly difficult to do correctly, or at least it has been for me. Does anyone know of a portable solution to this?
Update:
Changing static_cast to reinterpret_cast results in a compiler error. A comment prompted me to try a C-style cast: return (int)val in the code above, and it worked. On this machine. Will that still be safe on other architectures?
Quoting the C++03 standard, §4.7/3 (Integral Conversions):
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined.
Because the result is implementation-defined, by definition it is impossible for there to be a truly portable solution.
While there are ways to do this using casts and conversions, most rely on undefined behavior that happen to have well-defined behaviors on some machines / with some compilers. Instead of relying on undefined behavior, copy the data:
int signed_val;
std::memcpy (&signed_val, &val, sizeof(int));
return signed_val;
You can negate an unsigned twos-complement number by taking the complement and adding one. So let's do that for negatives:
if (val < 0x80000000) // positive values need no conversion
return val;
if (val == 0x80000000) // Complement-and-addition will overflow, so special case this
return -0x80000000; // aka INT_MIN
else
return -(int)(~val + 1);
This assumes that your ints are represented with 32-bit twos-complement representation (or have similar range). It does not rely on any undefined behavior related to signed integer overflow (note that the behavior of unsigned integer overflow is well-defined - although that should not happen here either!).
Note that if your ints are not 32-bit, things get more complex. You may need to use something like ~(~0U >> 1) instead of 0x80000000. Further, if your ints are no twos-complement, you may have overflow issues on certain values (for example, on a ones-complement machine, -0x80000000 cannot be represented in a 32-bit signed integer). However, non-twos-complement machines are very rare today, so this is unlikely to be a problem.
Here's another solution that worked for me:
if (val <= INT_MAX) {
return static_cast<int>(val);
}
else {
int ret = static_cast<int>(val & ~INT_MIN);
return ret | INT_MIN;
}
If I mask off the high bit, I avoid overflow when casting. I can then OR it back safely.
C++20 will have std::bit_cast that copies bits verbatim:
#include <bit>
#include <cassert>
#include <iostream>
int main()
{
int i = -42;
auto u = std::bit_cast<unsigned>(i);
// Prints 4294967254 on two's compliment platforms where int is 32 bits
std::cout << u << "\n";
auto roundtripped = std::bit_cast<int>(u);
assert(roundtripped == i);
std::cout << roundtripped << "\n"; // Prints -42
return 0;
}
cppreference shows an example of how one can implement their own bit_cast in terms of memcpy (under Notes).
While OpenVMS is not likely to gain C++20 support anytime soon, I hope this answer helps someone arriving at the same question via internet search.
unsigned int u = ~0U;
int s = *reinterpret_cast<int*>(&u); // -1
Сontrariwise:
int s = -1;
unsigned int u = *reinterpret_cast<unsigned int*>(&s); // all ones

Packing 32bit floats into 30 bits (c++)

Here are the goals I'm trying to achieve:
I need to pack 32 bit IEEE floats into 30 bits.
I want to do this by decreasing the size of mantissa by 2 bits.
The operation itself should be as fast as possible.
I'm aware that some precision will be lost, and this is acceptable.
It would be an advantage, if this operation would not ruin special cases like SNaN, QNaN, infinities, etc. But I'm ready to sacrifice this over speed.
I guess this questions consists of two parts:
1) Can I just simply clear the least significant bits of mantissa? I've tried this, and so far it works, but maybe I'm asking for trouble... Something like:
float f;
int packed = (*(int*)&f) & ~3;
// later
f = *(float*)&packed;
2) If there are cases where 1) will fail, then what would be the fastest way to achieve this?
Thanks in advance
You actually violate the strict aliasing rules (section 3.10 of the C++ standard) with these reinterpret casts. This will probably blow up in your face when you turn on the compiler optimizations.
C++ standard, section 3.10 paragraph 15 says:
If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
Specifically, 3.10/15 doesn't allow us to access a float object via an lvalue of type unsigned int. I actually got bitten myself by this. The program I wrote stopped working after turning on optimizations. Apparently, GCC didn't expect an lvalue of type float to alias an lvalue of type int which is a fair assumption by 3.10/15. The instructions got shuffled around by the optimizer under the as-if rule exploiting 3.10/15 and it stopped working.
Under the following assumptions
float really corresponds to a 32bit IEEE-float,
sizeof(float)==sizeof(int)
unsigned int has no padding bits or trap representations
you should be able to do it like this:
/// returns a 30 bit number
unsigned int pack_float(float x) {
unsigned r;
std::memcpy(&r,&x,sizeof r);
return r >> 2;
}
float unpack_float(unsigned int x) {
x <<= 2;
float r;
std::memcpy(&r,&x,sizeof r);
return r;
}
This doesn't suffer from the "3.10-violation" and is typically very fast. At least GCC treats memcpy as an intrinsic function. In case you don't need the functions to work with NaNs, infinities or numbers with extremely high magnitude you can even improve accuracy by replacing "r >> 2" with "(r+1) >> 2":
unsigned int pack_float(float x) {
unsigned r;
std::memcpy(&r,&x,sizeof r);
return (r+1) >> 2;
}
This works even if it changes the exponent due to a mantissa overflow because the IEEE-754 coding maps consecutive floating point values to consecutive integers (ignoring +/- zero). This mapping actually approximates a logarithm quite well.
Blindly dropping the 2 LSBs of the float may fail for small number of unusual NaN encodings.
A NaN is encoded as exponent=255, mantissa!=0, but IEEE-754 doesn't say anything about which mantiassa values should be used. If the mantissa value is <= 3, you could turn a NaN into an infinity!
You should encapsulate it in a struct, so that you don't accidentally mix the usage of the tagged float with regular "unsigned int":
#include <iostream>
using namespace std;
struct TypedFloat {
private:
union {
unsigned int raw : 32;
struct {
unsigned int num : 30;
unsigned int type : 2;
};
};
public:
TypedFloat(unsigned int type=0) : num(0), type(type) {}
operator float() const {
unsigned int tmp = num << 2;
return reinterpret_cast<float&>(tmp);
}
void operator=(float newnum) {
num = reinterpret_cast<int&>(newnum) >> 2;
}
unsigned int getType() const {
return type;
}
void setType(unsigned int type) {
this->type = type;
}
};
int main() {
const unsigned int TYPE_A = 1;
TypedFloat a(TYPE_A);
a = 3.4;
cout << a + 5.4 << endl;
float b = a;
cout << a << endl;
cout << b << endl;
cout << a.getType() << endl;
return 0;
}
I can't guarantee its portability though.
How much precision do you need? If 16-bit float is enough (sufficient for some types of graphics), then ILM's 16-bit float ("half"), part of OpenEXR is great, obeys all kinds of rules (http://www.openexr.com/), and you'll have plenty of space left over after you pack it into a struct.
On the other hand, if you know the approximate range of values they're going to take, you should consider fixed point. They're more useful than most people realize.
I can't select any of the answers as the definite one, because most of them have valid information, but not quite what I was looking for. So I'll just summarize my conclusions.
The method for conversion I've posted in my question's part 1) is clearly wrong by C++ standard, so other methods to extract float's bits should be used.
And most important... as far as I understand from reading the responses and other sources about IEEE754 floats, it's ok to drop the least significant bits from mantissa. It will mostly affect only precision, with one exception: sNaN. Since sNaN is represented by exponent set to 255, and mantissa != 0, there can be situation where mantissa would be <= 3, and dropping last two bits would convert sNaN to +/-Infinity. But since sNaN are not generated during floating point operations on CPU, its safe under controlled environment.