Warning: cast increases required alignment

Warning: cast increases required alignment - c++

I'm recently working on this platform for which a legacy codebase issues a large number of "cast increases required alignment to N" warnings, where N is the size of the target of the cast.
struct Message
{
int32_t id;
int32_t type;
int8_t data[16];
};
int32_t GetMessageInt(const Message& m)
{
return *reinterpret_cast<int32_t*>(&data[0]);
}
Hopefully it's obvious that a "real" implementation would be a bit more complex, but the basic point is that I've got data coming from somewhere, I know that it's aligned (because I need the id and type to be aligned), and yet I get the message that the cast is increasing the alignment, in the example case, to 4.
Now I know that I can suppress the warning with an argument to the compiler, and I know that I can cast the bit inside the parentheses to void* first, but I don't really want to go through every bit of code that needs this sort of manipulation (there's a lot because we load a lot of data off of disk, and that data comes in as char buffers so that we can easily pointer-advance), but can anyone give me any other thoughts on this problem? I mean, to me it seems like such an important and common option that you wouldn't want to warn, and if there is actually the possibility of doing it wrong then suppressing the warning isn't going to help. Finally, can't the compiler know as I do how the object in question is actually aligned in the structure, so it should be able to not worry about the alignment on that particular object unless it got bumped a byte or two?

One possible alternative might be:
int32_t GetMessageInt(const Message& m)
{
int32_t value;
memcpy(&value, &(data[0]), sizeof(int32_t));
return value;
}
For x86 architecture, the alignment isn't going to matter that much, it's more a performance issue that isn't really relevant for the code you have provided. For other architectures (eg MIPS) misaligned accesses cause CPU exceptions.
OK, here's another alternative:
struct Message
{
int32_t id;
int32_t type;
union
{
int8_t data[16];
int32_t data_as_int32[16 * sizeof(int8_t) / sizeof(int32_t)];
// Others as required
};
};
int32_t GetMessageInt(const Message& m)
{
return m.data_as_int32[0];
}
Here's variation on the above that includes the suggestions from cpstubing06:
template <size_t N>
struct Message
{
int32_t id;
int32_t type;
union
{
int8_t data[N];
int32_t data_as_int32[N * sizeof(int8_t) / sizeof(int32_t)];
// Others as required
};
static_assert((N * sizeof(int8_t) % sizeof(int32_t)) == 0,
"N is not a multiple of sizeof(int32_t)");
};
int32_t GetMessageInt(const Message<16>& m)
{
return m.data_as_int32[0];
}
// Runtime size checks
template <size_t N>
void CheckSize()
{
assert(sizeof(Message<N>) == N * sizeof(int8_t) + 2 * sizeof(int32_t));
}
void CheckSizes()
{
CheckSize<8>();
CheckSize<16>();
// Others as required
}

Related

Reserving a bit for discriminating the type of a union in C++

I currently have code that looks like this:
union {
struct {
void* buffer;
uint64_t n : 63;
uint64_t flag : 1;
} a;
struct {
unsigned char buffer[15];
unsigned char n : 7;
unsigned char flag : 1;
} b;
} data;
It is part of an attempted implementation of a data structure that does small-size optimization. Although it works on my machine with the compiler I am using, I am aware that there is no guarantee that the two flag bits from each of the structs actually end up in the same bit. Even if they did, it would still technically be undefined behavior to read it from the struct that wasn't most recently written. I would like to use this bit to discriminate between which of the two types is currently stored.
Is there a safe and portable way to achieve the same thing without increasing the size of the union? For our purpose, it can not be larger than 16 bytes.
If not, could it be achieved by sacrificing an entire byte (of n in the first struct and of buffer in the second), instead of a bit?

Is there any portable way to ensure a struct is defined without padding bytes using c++11? [duplicate]

This question already has answers here:
Compile-time check to make sure that there is no padding anywhere in a struct
(4 answers)
Closed 3 years ago.
Lets consider the following task:
My C++ module as part of an embedded system receives 8 bytes of data, like: uint8_t data[8].
The value of the first byte determines the layout of the rest (20-30 different). In order to get the data effectively, I would create different structs for each layout and put each to a union and read the data directly from the address of my input through a pointer like this:
struct Interpretation_1 {
uint8_t multiplexer;
uint8_t timestamp;
uint32_t position;
uint16_t speed;
};
// and a lot of other struct like this (with bitfields, etc..., layout is not defined by me :( )
union DataInterpreter {
Interpretation_1 movement;
//Interpretation_2 temperatures;
//etc...
};
...
uint8_t exampleData[8] {1u, 10u, 20u,0u,0u,0u, 5u,0u};
DataInterpreter* interpreter = reinterpret_cast<DataInterpreter*>(&exampleData);
std::cout << "position: " << +interpreter->movement.position << "\n";
The problem I have is, the compiler can insert padding bytes to the interpretation structs and this kills my idea. I know I can use
with gcc: struct MyStruct{} __attribute__((__packed__));
with MSVC: I can use #pragma pack(push, 1) MyStruct{}; #pragma pack(pop)
with clang: ? (I could check it)
But is there any portable way to achieve this? I know c++11 has e.g. alignas for alignment control, but can I use it for this? I have to use c++11 but I would be just interested if there is a better solution with later version of c++.

But is there any portable way to achieve this?
No, there is no (standard) way to "make" a type that would have padding to not have padding in C++. All objects are aligned at least as much as their type requires and if that alignment doesn't match with the previous sub objects, then there will be padding and that is unavoidable.
Furthermore, there is another problem: You're accessing through a reinterpreted pointed that doesn't point to an object of compatible type. The behaviour of the program is undefined.
We can conclude that classes are not generally useful for representing arbitrary binary data. The packed structures are non-standard, and they also aren't compatible across different systems with different representations for integers (byte endianness).
There is a way to check whether a type contains padding: Compare the size of the sub objects to the size of the complete object, and do this recursively to each member. If the sizes don't match, then there is padding. This is quite tricky however because C++ has minimal reflection capabilities, so you need to resort either hard coding or meta programming.
Given such check, you can make the compilation fail on systems where the assumption doesn't hold.
Another handy tool is std::has_unique_object_representations (since C++17) which will always be false for all types that have padding. But note that it will also be false for types that contain floats for example. Only types that return true can be meaningfully compared for equality with std::memcmp.

Reading from unaligned memory is undefined behavior in C++. In other words, the compiler is allowed to assume that every uint32_t is located at a alignof(uint32_t)-byte boundary and every uint16_t is located at a alignof(uint16_t)-byte boundary. This means that if you somehow manage to pack your bytes portably, doing interpreter->movement.position will still trigger undefined behaviour.
(In practice, on most architectures, unaligned memory access will still work, but albeit incur a performance penalty.)
You could, however, write a wrapper, like how std::vector<bool>::operator[] works:
#include <cstdint>
#include <cstring>
#include <iostream>
#include <type_traits>
template <typename T>
struct unaligned_wrapper {
static_assert(std::is_trivial<T>::value);
std::aligned_storage_t<sizeof(T), 1> buf;
operator T() const noexcept {
T ret;
memcpy(&ret, &buf, sizeof(T));
return ret;
}
unaligned_wrapper& operator=(T t) noexcept {
memcpy(&buf, &t, sizeof(T));
return *this;
}
};
struct Interpretation_1 {
unaligned_wrapper<uint8_t> multiplexer;
unaligned_wrapper<uint8_t> timestamp;
unaligned_wrapper<uint32_t> position;
unaligned_wrapper<uint16_t> speed;
};
// and a lot of other struct like this (with bitfields, etc..., layout is not defined by me :( )
union DataInterpreter {
Interpretation_1 movement;
//Interpretation_2 temperatures;
//etc...
};
int main(){
uint8_t exampleData[8] {1u, 10u, 20u,0u,0u,0u, 5u,0u};
DataInterpreter* interpreter = reinterpret_cast<DataInterpreter*>(&exampleData);
std::cout << "position: " << interpreter->movement.position << "\n";
}
This would ensure that every read or write to the unaligned integer is transformed to a bytewise memcpy, which does not have any alignment requirement. There might be a performance penalty for this on architectures with the ability to access unaligned memory quickly, but it would work on any conforming compiler.

How can I refer to sub-elements in a particular variable?

My problem is as follows: I have a 64 bit variable, of type uint64_t (so I know it's specified to be at least 64 bits wide).
I want to be able to refer to different parts of it, for example breaking it down into two uint32_ts, four uint16_ts or eight uint8_ts. Is there a standards compliant way to do it that doesn't rely on undefined behavior?
My approach is as follows:
class Buffer
{
uint64_t m_64BitBuffer;
public:
uint64_t & R64() { return m_64BitBuffer; }
uint32_t & R32(R32::Part part) { return *(reinterpret_cast<uint32_t*>(&m_64BitBuffer)+part); }
uint16_t & R16(R16::Part part) { return *(reinterpret_cast<uint16_t*>(&m_64BitBuffer)+part); }
uint8_t & R8(R8::Part part) { return *(reinterpret_cast<uint8_t*>(&m_64BitBuffer)+part); }
};
Where R32::Part, R16::Part and R8::Part are enums that define values between 0 and 1, 0 and 3 and 0 and 7 respectively.
I imagine this should be ok. There should be no issues with alignment, for example. I'd like to know if I'm breaking any rules, and if so, how to do this properly.

Type-punning through a union is allowed by some compilers, so you could simply have the following anonymous union member:
union {
uint64_t val;
struct { uint32_t as32[2]; };
struct { uint16_t as16[4]; };
struct { uint8_t as8[8]; };
} u;
Access to each part is as easy as reading from the appropriate member.

Specifying bit size of array elements in a struct

Now I have a struct looking like this:
struct Struct {
uint8_t val1 : 2;
uint8_t val2 : 2;
uint8_t val3 : 2;
uint8_t val4 : 2;
} __attribute__((packed));
Is there a way to make all the vals a single array? The point is not space taken, but the location of all the values: I need them to be in memory without padding, and each occupying 2 bits. It's not important to have array, any other data structure with simple access by index will be ok, and not matter if it's plain C or C++. Read/write performance is important - it should be same (similar to) as simple bit operations, which are used now for indexed access.
Update:
What I want exactly can be described as
struct Struct {
uint8_t val[4] : 2;
} __attribute__((packed));

No, C only supports bitfields as structure members, and you cannot have arrays of them. I don't think you can do:
struct twobit {
uint8_t val : 2;
} __attribute__((packed));
and then do:
struct twobit array[32];
and expect array to consist of 32 2-bit integers, i.e. 8 bytes. A single char in memory cannot contain parts of different structs, I think. I don't have the paragraph and verse handy right now though.
You're going to have to do it yourself, typically using macros and/or inline functions to do the indexing.

You have to manually do the bit stuff that's going on right now:
constexpr uint8_t get_mask(const uint8_t n)
{
return ~(((uint8_t)0x3)<<(2*n));
}
struct Struct2
{
uint8_t val;
inline void set_val(uint8_t v,uint8_t n)
{
val = (val&get_mask(n))|(v<<(2*n));
}
inline uint8_t get_val(uint8_t n)
{
return (val&~get_mask(n))>>(2*n);
}
//note, return type only, assignment WONT work.
inline uint8_t operator[](uint8_t n)
{
return get_val(n);
}
};
Note that you may be able to get better performance if you use actual assembly commands.
Also note that, (almost) no matter what, a uint8_t [4] will have better performance than this, and a processor aligned type (uint32_t) may have even better performance.

When is it worthwhile to use bit fields?

Is it worthwhile using C's bit-field implementation? If so, when is it ever used?
I was looking through some emulator code and it looks like the registers for the chips are not being implemented using bit fields.
Is this something that is avoided for performance reasons (or some other reason)?
Are there still times when bit-fields are used? (ie firmware to put on actual chips, etc)

Bit-fields are typically only used when there's a need to map structure fields to specific bit slices, where some hardware will be interpreting the raw bits. An example might be assembling an IP packet header. I can't see a compelling reason for an emulator to model a register using bit-fields, as it's never going to touch real hardware!
Whilst bit-fields can lead to neat syntax, they're pretty platform-dependent, and therefore non-portable. A more portable, but yet more verbose, approach is to use direct bitwise manipulation, using shifts and bit-masks.
If you use bit-fields for anything other than assembling (or disassembling) structures at some physical interface, performance may suffer. This is because every time you read or write from a bit-field, the compiler will have to generate code to do the masking and shifting, which will burn cycles.

One use for bitfields which hasn't yet been mentioned is that unsigned bitfields provide arithmetic modulo a power-of-two "for free". For example, given:
struct { unsigned x:10; } foo;
arithmetic on foo.x will be performed modulo 210 = 1024.
(The same can be achieved directly by using bitwise & operations, of course - but sometimes it might lead to clearer code to have the compiler do it for you).

FWIW, and looking only at the relative performance question - a bodgy benchmark:
#include <time.h>
#include <iostream>
struct A
{
void a(unsigned n) { a_ = n; }
void b(unsigned n) { b_ = n; }
void c(unsigned n) { c_ = n; }
void d(unsigned n) { d_ = n; }
unsigned a() { return a_; }
unsigned b() { return b_; }
unsigned c() { return c_; }
unsigned d() { return d_; }
volatile unsigned a_:1,
b_:5,
c_:2,
d_:8;
};
struct B
{
void a(unsigned n) { a_ = n; }
void b(unsigned n) { b_ = n; }
void c(unsigned n) { c_ = n; }
void d(unsigned n) { d_ = n; }
unsigned a() { return a_; }
unsigned b() { return b_; }
unsigned c() { return c_; }
unsigned d() { return d_; }
volatile unsigned a_, b_, c_, d_;
};
struct C
{
void a(unsigned n) { x_ &= ~0x01; x_ |= n; }
void b(unsigned n) { x_ &= ~0x3E; x_ |= n << 1; }
void c(unsigned n) { x_ &= ~0xC0; x_ |= n << 6; }
void d(unsigned n) { x_ &= ~0xFF00; x_ |= n << 8; }
unsigned a() const { return x_ & 0x01; }
unsigned b() const { return (x_ & 0x3E) >> 1; }
unsigned c() const { return (x_ & 0xC0) >> 6; }
unsigned d() const { return (x_ & 0xFF00) >> 8; }
volatile unsigned x_;
};
struct Timer
{
Timer() { get(&start_tp); }
double elapsed() const {
struct timespec end_tp;
get(&end_tp);
return (end_tp.tv_sec - start_tp.tv_sec) +
(1E-9 * end_tp.tv_nsec - 1E-9 * start_tp.tv_nsec);
}
private:
static void get(struct timespec* p_tp) {
if (clock_gettime(CLOCK_REALTIME, p_tp) != 0)
{
std::cerr << "clock_gettime() error\n";
exit(EXIT_FAILURE);
}
}
struct timespec start_tp;
};
template <typename T>
unsigned f()
{
int n = 0;
Timer timer;
T t;
for (int i = 0; i < 10000000; ++i)
{
t.a(i & 0x01);
t.b(i & 0x1F);
t.c(i & 0x03);
t.d(i & 0xFF);
n += t.a() + t.b() + t.c() + t.d();
}
std::cout << timer.elapsed() << '\n';
return n;
}
int main()
{
std::cout << "bitfields: " << f<A>() << '\n';
std::cout << "separate ints: " << f<B>() << '\n';
std::cout << "explicit and/or/shift: " << f<C>() << '\n';
}
Output on my test machine (numbers vary by ~20% run to run):
bitfields: 0.140586
1449991808
separate ints: 0.039374
1449991808
explicit and/or/shift: 0.252723
1449991808
Suggests that with g++ -O3 on a pretty recent Athlon, bitfields are worse than a few times slower than separate ints, and this particular and/or/bitshift implementation's at least twice as bad again ("worse" as other operations like memory read/writes are emphasised by the volatility above, and there's loop overhead etc, so the differences are understated in the results).
If you're dealing in hundreds of megabytes of structs that can be mainly bitfields or mainly distinct ints, the caching issues may become dominant - so benchmark in your system.
update from 2021 with an AMD Ryzen 9 3900X and -O2 -march=native:
bitfields: 0.0224893
1449991808
separate ints: 0.0288447
1449991808
explicit and/or/shift: 0.0190325
1449991808
Here we see everything has changed massively, the main implication being - benchmark with the systems you care about.
UPDATE: user2188211 attempted an edit which was rejected but usefully illustrated how bitfields become faster as the amount of data increases: "when iterating over a vector of a few million elements in [a modified version of] the above code, such that the variables do not reside in cache or registers, the bitfield code may be the fastest."
template <typename T>
unsigned f()
{
int n = 0;
Timer timer;
std::vector<T> ts(1024 * 1024 * 16);
for (size_t i = 0, idx = 0; i < 10000000; ++i)
{
T& t = ts[idx];
t.a(i & 0x01);
t.b(i & 0x1F);
t.c(i & 0x03);
t.d(i & 0xFF);
n += t.a() + t.b() + t.c() + t.d();
idx++;
if (idx >= ts.size()) {
idx = 0;
}
}
std::cout << timer.elapsed() << '\n';
return n;
}
Results on from an example run (g++ -03, Core2Duo):
0.19016
bitfields: 1449991808
0.342756
separate ints: 1449991808
0.215243
explicit and/or/shift: 1449991808
Of course, timing's all relative and which way you implement these fields may not matter at all in the context of your system.

I've seen/used bit fields in two situations: Computer Games and Hardware Interfaces. The hardware use is pretty straight forward: the hardware expects data in a certain bit format you can either define manually or through pre-defined library structures. It depends on the specific library whether they use bit fields or just bit manipulation.
In the "old days" computers games used bit fields frequently to make the most use of computer/disk memory as possible. For example, for a NPC definition in a RPG you might find (made up example):
struct charinfo_t
{
unsigned int Strength : 7; // 0-100
unsigned int Agility : 7;
unsigned int Endurance: 7;
unsigned int Speed : 7;
unsigned int Charisma : 7;
unsigned int HitPoints : 10; //0-1000
unsigned int MaxHitPoints : 10;
//etc...
};
You don't see it so much in more modern games/software as the space savings has gotten proportionally worse as computers get more memory. Saving a 1MB of memory when your computer only has 16MB is a big deal but not so much when you have 4GB.

The primary purpose of bit-fields is to provide a way to save memory in massively instantiated aggregate data structures by achieving tighter packing of data.
The whole idea is to take advantage of situations where you have several fields in some struct type, which don't need the entire width (and range) of some standard data type. This provides you with the opportunity to pack several of such fields in one allocation unit, thus reducing the overall size of the struct type. And extreme example would be boolean fields, which can be represented by individual bits (with, say, 32 of them being packable into a single unsigned int allocation unit).
Obviously, this only makes sense in situation where the pros of the reduced memory consumption outweigh the cons of slower access to values stored in bit-fields. However, such situations arise quite often, which makes bit-fields an absolutely indispensable language feature. This should answer your question about the modern use of bit-fields: not only they are used, they are essentially mandatory in any practically meaningful code oriented on processing large amounts of homogeneous data (like large graphs, for one example), because their memory-saving benefits greatly outweigh any individual-access performance penalties.
In a way, bit-fields in their purpose are very similar to such things as "small" arithmetic types: signed/unsigned char, short, float. In the actual data-processing code one would not normally use any types smaller than int or double (with few exceptions). Arithmetic types like signed/unsigned char, short, float exist just to serve as "storage" types: as memory-saving compact members of struct types in situations where their range (or precision) is known to be sufficient. Bit-fields is just another step in the same direction, that trades a bit more performance for much greater memory-saving benefits.
So, that gives us a rather clear set of conditions under which it is worthwhile to employ bit-fields:
Struct type contains multiple fields that can be packed into a smaller number of bits.
The program instantiates a large number of objects of that struct type.
If the conditions are met, you declare all bit-packable fields contiguously (typically at the end of the struct type), assign them their appropriate bit-widths (and, usually, take some steps to ensure that the bit-widths are appropriate). In most cases it makes sense to play around with ordering of these fields to achieve the best packing and/or performance.
There's also a weird secondary use of bit-fields: using them for mapping bit groups in various externally-specified representations, like hardware registers, floating-point formats, file formats etc. This has never been intended as a proper use of bit-fields, even though for some unexplained reason this kind of bit-field abuse continues to pop-up in real-life code. Just don't do this.

One use for bit fields used to be to mirror hardware registers when writing embedded code. However, since the bit order is platform-dependent, they don't work if the hardware orders its bits different from the processor. That said, I can't think of a use for bit fields any more. You're better off implementing a bit manipulation library that can be ported across platforms.

Bit fields were used in the olden days to save program memory.
They degrade performance because registers can not work with them so they have to be converted to integers to do anything with them. They tend to lead to more complex code that is unportable and harder to understand (since you have to mask and unmask things all the time to actually use the values.)
Check out the source for http://www.nethack.org/ to see pre ansi c in all its bitfield glory!

In the 70s I used bit fields to control hardware on a trs80. The display/keyboard/cassette/disks were all memory mapped devices. Individual bits controlled various things.
A bit controlled 32 column vs 64 column display.
Bit 0 in that same memory cell was the cassette serial data in/out.
As I recall, the disk drive control had a number of them. There were 4 bytes in total. I think there was a 2 bit drive select. But it was a long time ago. It was kind of impressive back then in that there were at least two different c compilers for the platform.
The other observation is that bit fields really are platform specific. There is no expectation that a program with bit fields should port to another platform.

In modern code, there's really only one reason to use bitfields: to control the space requirements of a bool or an enum type, within a struct/class. For instance (C++):
enum token_code { TK_a, TK_b, TK_c, ... /* less than 255 codes */ };
struct token {
token_code code : 8;
bool number_unsigned : 1;
bool is_keyword : 1;
/* etc */
};
IMO there's basically no reason not to use :1 bitfields for bool, as modern compilers will generate very efficient code for it. In C, though, make sure your bool typedef is either the C99 _Bool or failing that an unsigned int, because a signed 1-bit field can hold only the values 0 and -1 (unless you somehow have a non-twos-complement machine).
With enumeration types, always use a size that corresponds to the size of one of the primitive integer types (8/16/32/64 bits, on normal CPUs) to avoid inefficient code generation (repeated read-modify-write cycles, usually).
Using bitfields to line up a structure with some externally-defined data format (packet headers, memory-mapped I/O registers) is commonly suggested, but I actually consider it a bad practice, because C doesn't give you enough control over endianness, padding, and (for I/O regs) exactly what assembly sequences get emitted. Have a look at Ada's representation clauses sometime if you want to see how much C is missing in this area.

Boost.Thread uses bitfields in its shared_mutex, on Windows at least:
struct state_data
{
unsigned shared_count:11,
shared_waiting:11,
exclusive:1,
upgrade:1,
exclusive_waiting:7,
exclusive_waiting_blocked:1;
};

An alternative to consider is to specify bit field structures with a dummy structure (never instantiated) where each byte represents a bit:
struct Bf_format
{
char field1[5];
char field2[9];
char field3[18];
};
With this approach sizeof gives the width of the bit field, and offsetof give the offset of the bit field. At least in the case of GNU gcc, compiler optimization of bit-wise operations (with constant shifts and masks) seems to have gotten to rough parity with (base language) bit fields.
I have written a C++ header file (using this approach) which allows structures of bit fields to be defined and used in a performant, much more portable, much more flexible way: https://github.com/wkaras/C-plus-plus-library-bit-fields . So, unless you are stuck using C, I think there would rarely be a good reason to use the base language facility for bit fields.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js