Mapping a structure to uint64_t - c++

What is the right way to convert the structure below to uint64_t?
struct Data
{
uint64_t sign : 1;
uint64_t exp : 4;
uint64_t man : 8;
};
static_assert(sizeof(Data) == sizeof(uint64_t));
the obvious one is
Data data;
uint64_t n = *(reinterpret_cast<const uint64_t*>(&data));
but it does not compile as constexpr and produces the following warnings in GCC:
dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
EDIT1:
The value of resulting uint64_t may be different with different compilers. But the value of Data structure should be the same when I convert it back from uint64_t.
So, saying more exactly I need:
Data data;
uint64_t n = convert(data);
Data data2 = convert_back(n);
static_assert(data == data2);

You std::memcpy the structure to an uint64_t. std::memcpy is a legal way to perform type punning, it is allowed since your structure is trivially-copyable.
In C++20 there is also std::bit_cast.
Unlike std::memcpy it's constexpr, but to make it work at compile-time I had to add uint64_t : 51; at the end of the struct.
Interestingly, Clang (unlike GCC and MSVC) refused to perform it at compile-time:
note: constexpr bit_cast involving bit-field is not yet supported

The conversion is not "obvious" because the exact layout of Data is implementation defined (see bit field). Moreover, your cast via pointers breaks strict aliasing, as the error suggests. There is no uint64_t stored at the adress of data.
You can convert it like this:
constexpr uint64_t Data_2_uint(const Data& d){
return d.sign + (d.exp << 1) + (d.man << 5);
}

If you are looking for an implicit conversion, you may use union to access the same data structure with a different alias
union Data {
struct {
uint64_t sign : 1;
uint64_t exp : 4;
uint64_t man : 8;
}
uint64_t data_as_uint64;
}
To make sure you don't have memory representation problems with padding of the fields, you should use #pragma pack, and set the packing to 1
#pragma pack(push, 1)
union Data {
struct {
uint64_t sign : 1;
uint64_t exp : 4;
uint64_t man : 8;
}
uint64_t data_as_uint64;
}
#pragma pack(pop)
Notice that #pragma pack is compiler dependent, and you need to make sure your compiler supports it.

Related

Type Punning via constexpr union

I am maintaining an old code base, that is using a union of an integer type with a bit-field struct for type-punning. My compiler is VS2017. As an example, the code is similar to the following:
struct FlagsType
{
unsigned flag0 : 1;
unsigned flag1 : 1;
unsigned flag2 : 1;
};
union FlagsTypeUnion
{
unsigned flagsAsInt;
FlagsType flags;
};
bool isBitSet(unsigned flagNum, FlagsTypeUnion flags)
{
return ((1u << flagNum) & flags.flagsAsInt);
}
This code has a number of undefined behavior issues. Namely, it is hotly debated whether type punning is defined behavior or not, but on top of that, the implementation of packing bit-fields is implementation-defined. To address these issues, I would like to add static-assert statements to validate that the VS implementation enables using this type of approach. However, when I tried to add the following code, I get error C2131: expression did not evaluate to a constant.
union FlagsTypeUnion
{
unsigned flagsAsInt;
FlagsType flags;
constexpr FlagsTypeUnion(unsigned const f = 0) : flagsAsInt{ f } {}
};
static_assert(FlagsTypeUnion{ 1 }.flags.flag0,
"The code currently assumes bit-fields are packed from LSB to MSB");
Is there any way to add compile-time checks to verify the type-punning and bit-packing code works as the runtime code is assuming? Unfortunately, this code is spread throughout the code base, so changing the structures isn't really feasible.
You might use std::bit_cast (C++20):
struct FlagsType
{
unsigned flag0 : 1;
unsigned flag1 : 1;
unsigned flag2 : 1;
unsigned padding : 32 - 3; // Needed for gcc
};
static_assert(std::is_trivially_constructible_v<FlagsType>);
constexpr FlagsType makeFlagsType(bool flag0, bool flag1, bool flag2)
{
FlagsType res{};
res.flag0 = flag0;
res.flag1 = flag1;
res.flag2 = flag2;
return res;
}
static_assert(std::bit_cast<unsigned>(makeFlagsType(true, false, false)) == 1);
Demo
clang doesn't support it (yet) though.
gcc requires to add explicitly the padding bits for the constexpr check.

Is it possible to copy 4 uint8_t into 4 int16 in one instruction?

I wonder if it is possible to copy four uint8_t values stored in one uint32_t into proper places in uint64_t as fast as possible. I am looking for equivalent of:
union
{
struct {uint8_t a; uint8_t b; uint8_t c; uint8_t d};
uint32_t whole;
} x32;
union
{
struct {int16_t a; int16_t b; int16_t c; int16_t d};
uint64_t whole;
} x64;
x64.a=x32.a;
x64.b=x32.b;
x64.c=x32.c;
x64.d=x32.d;
The problem is: I cannot use MMX/SSE.
No. There's no other way to move the data and zero-extend it like you're doing.
Type punning through union does not have support in the C++ standard. Instead, use ors and shifts to compose the value together. Correctness is more important than fast but broken code.
uint8_t a,b,c,d;
uint64_t whole;
whole = a | (uint64_t (b) << 1*16) | (uint64_t (c) << 2*16) | (uint64_t (d) << 3*16)
No, it is not possible, since hardly the hardware would provide such (very specific) assembly instruction.

Getting entire value from bitfields

I wish to create a Block struct for use in a voxel game I am building (just background context), however I have run into issues with my saving and loading.
I can either represent a block as a single Uint16 and shift the bits to get the different elements such as blockID and health, or I can use a bitfield such as the one below:
struct Block
{
Uint16 id : 8;
Uint16 health : 6;
Uint16 visible : 1;
Uint16 structural : 1;
}
With the first method, when I wish to save the Block data I can simply convert the value of the Uint16 into a hex value and write it to a file. With loading I can simply read and convert the number back, then go back to reading the individual bits with manual bit shifting.
My issue is that I cannot figure out how to get the whole value of the Uint16 I am using with the bitfields method, which means I cannot save the block data as a single hex value.
So, the question is how do I go about getting the actual single Uint16 stored inside my block struct that is made up from the different bit fields. If it is not possible then that is fine, as I have already stated my manual bit shifting approach works just fine. I just wanted to profile to see which method of storing and modifying data is faster really.
If I have missed out a key detail or there is any extra information you need to help me out here, by all means do ask.
A union is probably the cleanest way:
#include <iostream>
typedef unsigned short Uint16;
struct S {
Uint16 id : 8;
Uint16 health : 6;
Uint16 visible : 1;
Uint16 structural : 1;
};
union U {
Uint16 asInt;
S asStruct;
};
int main() {
U u;
u.asStruct.id = 0xAB;
u.asStruct.health = 0xF;
u.asStruct.visible = 1;
u.asStruct.structural = 1;
std::cout << std::hex << u.asInt << std::endl;
}
This prints out cfab.
Update:
After further consideration and reading more deeply about this I have decided that any kind of type punning is bad. Instead I would recommend just biting the bullet and explicitly do the bit-twiddling to construct your value for serialization:
#include <iostream>
typedef unsigned short Uint16;
struct Block
{
Uint16 id : 8;
Uint16 health : 6;
Uint16 visible : 1;
Uint16 structural : 1;
operator Uint16() {
return structural | visible << 2 | health << 4 | id << 8;
}
};
int main() {
Block b{0xAB, 0xF, 1, 1};
std::cout << std::hex << Uint16(b) << std::endl;
}
This has the further bonus that it prints abf5 which matches the initializer order.
If you are worried about performance, instead of using the operator member function you could have a function that the compiler optimizes away:
...
constexpr Uint16 serialize(const Block& b) {
return b.structural | b.visible << 2 | b.health << 4 | b.id << 8;
}
int main() {
Block b{0xAB, 0xF, 1, 1};
std::cout << std::hex << serialize(b) << std::endl;
}
And finally if speed is more important than memory, I would recommend getting rid of the bit fields:
struct Block
{
Uint16 id;
Uint16 health;
Uint16 visible;
Uint16 structural;
};
There are at least two methods for what you want:
Bit Shifting
Casting
Bit Shifting
You can build a uint16_t from your structure by shifting the bit fields into a uint16_t:
uint16_t halfword;
struct Bit_Fields my_struct;
halfword = my_struct.id << 8;
halfword = halfword | (my_struct.health << 2);
halfword = halfword | (my_struct.visible << 1);
halfword = halfword | (my_struct.structural);
Casting
Another method is to cast the instance of the structure to a uint16_t:
uint16_t halfword;
struct Bit_Fields my_struct;
halfword = (uint16_t) my_struct;
Endianess
One issue of concern is Endianness; or the byte ordering of multi-byte values. This may play a part with where the bits lie within the 16-bit unit.
Living on the edge (of undefined-behavior)..
The naive solution would be to reinterpret_cast a reference to the object to the underlying type of your bit-field, abusing the fact that the first non-static data-member of a standard-layout class is located at the same address as the object itself.
struct A {
uint16_t id : 8;
uint16_t health : 6;
uint16_t visible : 1;
uint16_t structural : 1;
};
A a { 0, 0, 0, 1 };
uint16_t x = reinterpret_cast<uint16_t const&> (a);
The above might look accurate, and it will often (not always) yield the expected result - but it suffers from two big problems:
The allocation of bit-fields within an object is implementation-defined, and;
the class type must be standard-layout.
There is nothing saying that the bit-fields will, physically, be stored in the order you declare them, and even if that was the case a compiler might insert padding between every bit-field (as this is allowed).
To sum things up; how bit-fields end up in memory is highly implementation-defined, trying to reason about the behavior requires you to look into your implementations documentation on the matter.
What about using a union?
Accessing inactive union member - undefined?
Recommendation
Stick with the bit-fiddling approach, unless you can absolutely prove that every implementation on which the code is ran handles it the way you would want it to.
What does the standard (N4296) say?
9.6p1 Bit-fields [class.bit]
[...] Allocation of bit-fields within a class object is implementation-defined. Alignment of bit-fields is implementation-defined. [...]
9.2p20 Classes [class]
If a standard-layout class object has any non-static data members, its address is the same as the address of its first non-static data member. [...]
This isn't a good usage of bit fields (and really, there are very few).
There is no guarantee that the order of your bit fields will be the same as the order they're declared in; it could change between builds of your application.
You'll have to manually store your members in a uint16_t using the shift and bitwise-or operators. As a general rule, you should never just dump or blindly copy data when dealing with external storage; you should manually serialize/deserialize it, to ensure it's in the format you expect.
You can use union:
typedef union
{
struct
{
Uint16 id : 8;
Uint16 health : 6;
Uint16 visible : 1;
Uint16 structural : 1;
} Bits;
Uint16 Val;
} TMyStruct;

Specifying bit size of array elements in a struct

Now I have a struct looking like this:
struct Struct {
uint8_t val1 : 2;
uint8_t val2 : 2;
uint8_t val3 : 2;
uint8_t val4 : 2;
} __attribute__((packed));
Is there a way to make all the vals a single array? The point is not space taken, but the location of all the values: I need them to be in memory without padding, and each occupying 2 bits. It's not important to have array, any other data structure with simple access by index will be ok, and not matter if it's plain C or C++. Read/write performance is important - it should be same (similar to) as simple bit operations, which are used now for indexed access.
Update:
What I want exactly can be described as
struct Struct {
uint8_t val[4] : 2;
} __attribute__((packed));
No, C only supports bitfields as structure members, and you cannot have arrays of them. I don't think you can do:
struct twobit {
uint8_t val : 2;
} __attribute__((packed));
and then do:
struct twobit array[32];
and expect array to consist of 32 2-bit integers, i.e. 8 bytes. A single char in memory cannot contain parts of different structs, I think. I don't have the paragraph and verse handy right now though.
You're going to have to do it yourself, typically using macros and/or inline functions to do the indexing.
You have to manually do the bit stuff that's going on right now:
constexpr uint8_t get_mask(const uint8_t n)
{
return ~(((uint8_t)0x3)<<(2*n));
}
struct Struct2
{
uint8_t val;
inline void set_val(uint8_t v,uint8_t n)
{
val = (val&get_mask(n))|(v<<(2*n));
}
inline uint8_t get_val(uint8_t n)
{
return (val&~get_mask(n))>>(2*n);
}
//note, return type only, assignment WONT work.
inline uint8_t operator[](uint8_t n)
{
return get_val(n);
}
};
Note that you may be able to get better performance if you use actual assembly commands.
Also note that, (almost) no matter what, a uint8_t [4] will have better performance than this, and a processor aligned type (uint32_t) may have even better performance.

Warning: cast increases required alignment

I'm recently working on this platform for which a legacy codebase issues a large number of "cast increases required alignment to N" warnings, where N is the size of the target of the cast.
struct Message
{
int32_t id;
int32_t type;
int8_t data[16];
};
int32_t GetMessageInt(const Message& m)
{
return *reinterpret_cast<int32_t*>(&data[0]);
}
Hopefully it's obvious that a "real" implementation would be a bit more complex, but the basic point is that I've got data coming from somewhere, I know that it's aligned (because I need the id and type to be aligned), and yet I get the message that the cast is increasing the alignment, in the example case, to 4.
Now I know that I can suppress the warning with an argument to the compiler, and I know that I can cast the bit inside the parentheses to void* first, but I don't really want to go through every bit of code that needs this sort of manipulation (there's a lot because we load a lot of data off of disk, and that data comes in as char buffers so that we can easily pointer-advance), but can anyone give me any other thoughts on this problem? I mean, to me it seems like such an important and common option that you wouldn't want to warn, and if there is actually the possibility of doing it wrong then suppressing the warning isn't going to help. Finally, can't the compiler know as I do how the object in question is actually aligned in the structure, so it should be able to not worry about the alignment on that particular object unless it got bumped a byte or two?
One possible alternative might be:
int32_t GetMessageInt(const Message& m)
{
int32_t value;
memcpy(&value, &(data[0]), sizeof(int32_t));
return value;
}
For x86 architecture, the alignment isn't going to matter that much, it's more a performance issue that isn't really relevant for the code you have provided. For other architectures (eg MIPS) misaligned accesses cause CPU exceptions.
OK, here's another alternative:
struct Message
{
int32_t id;
int32_t type;
union
{
int8_t data[16];
int32_t data_as_int32[16 * sizeof(int8_t) / sizeof(int32_t)];
// Others as required
};
};
int32_t GetMessageInt(const Message& m)
{
return m.data_as_int32[0];
}
Here's variation on the above that includes the suggestions from cpstubing06:
template <size_t N>
struct Message
{
int32_t id;
int32_t type;
union
{
int8_t data[N];
int32_t data_as_int32[N * sizeof(int8_t) / sizeof(int32_t)];
// Others as required
};
static_assert((N * sizeof(int8_t) % sizeof(int32_t)) == 0,
"N is not a multiple of sizeof(int32_t)");
};
int32_t GetMessageInt(const Message<16>& m)
{
return m.data_as_int32[0];
}
// Runtime size checks
template <size_t N>
void CheckSize()
{
assert(sizeof(Message<N>) == N * sizeof(int8_t) + 2 * sizeof(int32_t));
}
void CheckSizes()
{
CheckSize<8>();
CheckSize<16>();
// Others as required
}