I had a discussion this morning with a colleague regarding the correctness of a "coding trick" to detect endianness.
The trick was:
bool is_big_endian()
{
union
{
int i;
char c[sizeof(int)];
} foo;
foo.i = 1;
return (foo.c[0] == 1);
}
To me, it seems that this usage of an union is incorrect because setting one member of the union and reading another is not well-defined. But I have to admit that this is just a feeling and I lack actual proofs to strengthen my point.
Is this trick correct ? Who is right here ?
Your code is not portable. It might work on some compilers or it might not.
You are right about the behaviour being undefined when you try to access the inactive member of the union [as it is in the case of the code given]
$9.5/1
In a union, at most one of the data members can be active at any time, that is, the value of at most one of the data members can be stored in a union at any time.
So foo.c[0] == 1 is incorrect because c is not active at that moment. Feel free to correct me if you think I am wrong.
Don't do this, better use something like the following:
#include <arpa/inet.h>
//#include <winsock2.h> // <-- for Windows use this instead
#include <stdint.h>
bool is_big_endian() {
uint32_t i = 1;
return i == htonl(i);
}
Explanation:
The htonl function converts a u_long from host to TCP/IP network byte order (which is big-endian).
References:
http://linux.die.net/man/3/htonl
http://msdn.microsoft.com/de-de/library/ms738556%28v=vs.85%29.aspx
You're correct that that code doesn't have well-defined behavior. Here's how to do it portably:
#include <cstring>
bool is_big_endian()
{
static unsigned const i = 1u;
char c[sizeof(unsigned)] = { };
std::memcpy(c, &i, sizeof(c));
return !c[0];
}
// or, alternatively
bool is_big_endian()
{
static unsigned const i = 1u;
return !*static_cast<char const*>(static_cast<void const*>(&i));
}
The function should be named is_little_endian. I think you can use this union trick. Or also a cast to char.
The code has undefined behavior, although some (most?) compilers will
define it, at least in limited cases.
The intent of the standard is that reinterpret_cast be used for
this. This intent isn't well expressed, however, since the standard
can't really define the behavior; there is no desire to define it when
the hardware won't support it (e.g. because of alignment issues). And
it's also clear that you can't just reinterpret_cast between two
arbitrary types and expect it to work.
From a quality of implementation point of view, I would expect both the
union trick and reinterpret_cast to work, if the union or the
reinterpret_cast is in the same functional block; the union should
work as long as the compiler can see that the ultimate type is a union
(although I've used compilers where this wasn't the case).
Related
I was scrolling through some posts and I read about something called the strict aliasing rule. It looked awfully close to some code I've seen in a club project, relevant snippet below.
LibSerial::DataBuffer dataBuffer;
size_t BUFFER_SIZE = sizeof(WrappedPacket);
while(true) {
serial_port.Read(dataBuffer, sizeof(WrappedPacket));
uint8_t *rawDataBuffer = dataBuffer.data();
//this part
auto *wrappedPacket = (WrappedPacket *) rawDataBuffer;
...
the struct definitions are:
typedef struct __attribute__((__packed__)) TeensyData {
int16_t adc0, adc1, adc2, adc3, adc4, adc5, adc6, adc7, adc8, adc9, adc10, adc11;
int32_t loadCell0;
double tc0, tc1, tc2, tc3, tc4, tc5, tc6, tc7;
} TeensyData;
typedef struct __attribute__((__packed__)) WrappedPacket {
TeensyData dataPacket;
uint16_t packetCRC;
} WrappedPacket;
Hopefully it's pretty obvious that I'm new to C++. So 1) is this a violation of the rule? and 2) if it is, what alternative solutions are there?
Yeah, it's a violation. The rule lets you use raw byte access (char*, unsigned char*, std::byte*) to access other data types. It doesn't let you use other data types to access arrays of raw bytes.
The solution is memcpy:
WrappedPacket wpkt;
std::memcpy(&wpkt, dataBuffer.data(), sizeof wpkt);
Yes, this is a strict aliasing violation and UB. But I wouldn't worry too much about it.
I wouldn't want to memcpy a big struct solely for formal correctness, in hope that the compiler optimizes it out. But I would add std::launder just in case1:
auto *wrappedPacket = std::launder(reinterpret_cast<WrappedPacket *>(rawDataBuffer));
Here's my reasoning:
.Read most probably boils down to an opaque library call, so the compiler must optimize under the assumption that it does something that makes the code legal. For example, it could apply placement-new to the provided buffer, with the right type.
If we assume that it's the case, then std::launder would "bless" the pointer to point to the said imaginary object, fixing the UB.
We, programmers, know that this assumption is false, but the compiler doesn't.
I'm not sure if C++20 implicit lifetimes relax the rules here and make launder unnecessary. I would keep it.
1 launder doesn't fix UB here. But it does reduce the probability of it blowing up in your face.
Assume I have the following class that is more or less a generic mechanism to store arbitrary values/types in the same class (this is in the context of a database)
class RawStruct {
private:
uint16_t value1; /// value[0..1]
uint16_t value2; /// value[2..3]
uint16_t value3; /// value[4..5]
uint16_t value4; /// value[6..7]
uint8_t flags; ///
uint8_t padding; /// unused.
INLINE_HOST_DEVICE RawStruct(const int64_t value) : RawStruct(toFlags(XMLDatatype::xsd_long),(uint64_t)value) {}
INLINE_HOST_DEVICE RawStruct(uint8_t flags, uint64_t value) {setValue(flags,value);}
INLINE_HOST_ONLY void setValue(uint8_t flags, uint64_t value) {
*((uint64_t*)&this->value1) = value;
/*
*((uint16_t*)&this->value1) = (value);
*((uint16_t*)&this->value2) = (value) >> 16;
*((uint16_t*)&this->value3) = (value) >> 32;
*((uint16_t*)&this->value4) = (value) >> 48;
*/
this->flags = flags;
this->padding = 0;
}
Now some code that invokes the above:
int value=42;
char* data = (char*)malloc(6*sizeof(RawStruct));
RawStruct* rsArray = (RawStruct*)data;
rsArray[0] = RawStruct((long) value);
Now my problem: When compiling a complex piece of code that more or less behaves like the invocation above I get some incorrect value being stored in the RawStruct array. This only happens in our testing on the ARM platform with optimizations (-O3). On X86 all our testing passes. They also pass on arm when using debug mode (-g and no optimizations).
My question would be : Are we violating some rule inside that setValue method by looking at the four uint16_t-s as one big uint64_t ?
Do I hit maybe some incorrect code generation by the compiler ?
The code works fine if I comment the current cast/assignment and replace with the four individual assignments that are commented out above (inside setValue() helper).
The line
*((uint64_t*)&this->value1) = value;
has undefined behavior. &this->value1 is a pointer to a uint16_t. You cannot use a pointer of type uint64_t* to write to that uint16_t object.
It violates the aliasing rules and even if they weren't a thing, the C++ object model wouldn't have any concept of writing to multiple consecutive objects in one access like this.
The alignment requirements of the types might also be a problem. this->value1 may not be aligned correctly for a uint64_t, causing undefined behavior as well (at least in general, probably not with the malloc).
And even further, there is no guarantee that there won't be padding between the members in general, although you would probably know if this was the case based on the target ABI.
If you find yourself using C-style casts in C++, rethink what you are doing. If it can't be done with a C++-style static_cast instead, it is very likely leading to UB (and even static_cast allows some potentially dangerous cast).
For example (uint64_t)value can also be written as static_cast<uint64>(value) and will work, while static_cast<uint64_t*>(&this->value1) will fail, which should be considered a warning that there is danger in the cast. (C++-style would require reinterpret_cast here.)
Also, malloc in C++ code is very likely wrong and leading to UB as well. It should be new (or rather std::unique_ptr or std::vector or some equivalent) instead. (In your particular example here it turns out to be fine, but if your class wasn't that trivial, it wouldn't.)
This question already has answers here:
Compile-time check to make sure that there is no padding anywhere in a struct
(4 answers)
Closed 3 years ago.
Lets consider the following task:
My C++ module as part of an embedded system receives 8 bytes of data, like: uint8_t data[8].
The value of the first byte determines the layout of the rest (20-30 different). In order to get the data effectively, I would create different structs for each layout and put each to a union and read the data directly from the address of my input through a pointer like this:
struct Interpretation_1 {
uint8_t multiplexer;
uint8_t timestamp;
uint32_t position;
uint16_t speed;
};
// and a lot of other struct like this (with bitfields, etc..., layout is not defined by me :( )
union DataInterpreter {
Interpretation_1 movement;
//Interpretation_2 temperatures;
//etc...
};
...
uint8_t exampleData[8] {1u, 10u, 20u,0u,0u,0u, 5u,0u};
DataInterpreter* interpreter = reinterpret_cast<DataInterpreter*>(&exampleData);
std::cout << "position: " << +interpreter->movement.position << "\n";
The problem I have is, the compiler can insert padding bytes to the interpretation structs and this kills my idea. I know I can use
with gcc: struct MyStruct{} __attribute__((__packed__));
with MSVC: I can use #pragma pack(push, 1) MyStruct{}; #pragma pack(pop)
with clang: ? (I could check it)
But is there any portable way to achieve this? I know c++11 has e.g. alignas for alignment control, but can I use it for this? I have to use c++11 but I would be just interested if there is a better solution with later version of c++.
But is there any portable way to achieve this?
No, there is no (standard) way to "make" a type that would have padding to not have padding in C++. All objects are aligned at least as much as their type requires and if that alignment doesn't match with the previous sub objects, then there will be padding and that is unavoidable.
Furthermore, there is another problem: You're accessing through a reinterpreted pointed that doesn't point to an object of compatible type. The behaviour of the program is undefined.
We can conclude that classes are not generally useful for representing arbitrary binary data. The packed structures are non-standard, and they also aren't compatible across different systems with different representations for integers (byte endianness).
There is a way to check whether a type contains padding: Compare the size of the sub objects to the size of the complete object, and do this recursively to each member. If the sizes don't match, then there is padding. This is quite tricky however because C++ has minimal reflection capabilities, so you need to resort either hard coding or meta programming.
Given such check, you can make the compilation fail on systems where the assumption doesn't hold.
Another handy tool is std::has_unique_object_representations (since C++17) which will always be false for all types that have padding. But note that it will also be false for types that contain floats for example. Only types that return true can be meaningfully compared for equality with std::memcmp.
Reading from unaligned memory is undefined behavior in C++. In other words, the compiler is allowed to assume that every uint32_t is located at a alignof(uint32_t)-byte boundary and every uint16_t is located at a alignof(uint16_t)-byte boundary. This means that if you somehow manage to pack your bytes portably, doing interpreter->movement.position will still trigger undefined behaviour.
(In practice, on most architectures, unaligned memory access will still work, but albeit incur a performance penalty.)
You could, however, write a wrapper, like how std::vector<bool>::operator[] works:
#include <cstdint>
#include <cstring>
#include <iostream>
#include <type_traits>
template <typename T>
struct unaligned_wrapper {
static_assert(std::is_trivial<T>::value);
std::aligned_storage_t<sizeof(T), 1> buf;
operator T() const noexcept {
T ret;
memcpy(&ret, &buf, sizeof(T));
return ret;
}
unaligned_wrapper& operator=(T t) noexcept {
memcpy(&buf, &t, sizeof(T));
return *this;
}
};
struct Interpretation_1 {
unaligned_wrapper<uint8_t> multiplexer;
unaligned_wrapper<uint8_t> timestamp;
unaligned_wrapper<uint32_t> position;
unaligned_wrapper<uint16_t> speed;
};
// and a lot of other struct like this (with bitfields, etc..., layout is not defined by me :( )
union DataInterpreter {
Interpretation_1 movement;
//Interpretation_2 temperatures;
//etc...
};
int main(){
uint8_t exampleData[8] {1u, 10u, 20u,0u,0u,0u, 5u,0u};
DataInterpreter* interpreter = reinterpret_cast<DataInterpreter*>(&exampleData);
std::cout << "position: " << interpreter->movement.position << "\n";
}
This would ensure that every read or write to the unaligned integer is transformed to a bytewise memcpy, which does not have any alignment requirement. There might be a performance penalty for this on architectures with the ability to access unaligned memory quickly, but it would work on any conforming compiler.
There was a similar question here, but the user in that question seemed to have a much larger array, or vector. If I have:
bool boolArray[4];
And I want to check if all elements are false, I can check [ 0 ], [ 1 ] , [ 2 ] and [ 3 ] either separately, or I can loop through it. Since (as far as I know) false should have value 0 and anything other than 0 is true, I thought about simply doing:
if ( *(int*) boolArray) { }
This works, but I realize that it relies on bool being one byte and int being four bytes. If I cast to (std::uint32_t) would it be OK, or is it still a bad idea? I just happen to have 3 or 4 bools in an array and was wondering if this is safe, and if not if there is a better way to do it.
Also, in the case I end up with more than 4 bools but less than 8 can I do the same thing with a std::uint64_t or unsigned long long or something?
As πάντα ῥεῖ noticed in comments, std::bitset is probably the best way to deal with that in UB-free manner.
std::bitset<4> boolArray {};
if(boolArray.any()) {
//do the thing
}
If you want to stick to arrays, you could use std::any_of, but this requires (possibly peculiar to the readers) usage of functor which just returns its argument:
bool boolArray[4];
if(std::any_of(std::begin(boolArray), std::end(boolArray), [](bool b){return b;}) {
//do the thing
}
Type-punning 4 bools to int might be a bad idea - you cannot be sure of the size of each of the types. It probably will work on most architectures, but std::bitset is guaranteed to work everywhere, under any circumstances.
Several answers have already explained good alternatives, particularly std::bitset and std::any_of(). I am writing separately to point out that, unless you know something we don't, it is not safe to type pun between bool and int in this fashion, for several reasons:
int might not be four bytes, as multiple answers have pointed out.
M.M points out in the comments that bool might not be one byte. I'm not aware of any real-world architectures in which this has ever been the case, but it is nevertheless spec-legal. It (probably) can't be smaller than a byte unless the compiler is doing some very elaborate hide-the-ball chicanery with its memory model, and a multi-byte bool seems rather useless. Note however that a byte need not be 8 bits in the first place.
int can have trap representations. That is, it is legal for certain bit patterns to cause undefined behavior when they are cast to int. This is rare on modern architectures, but might arise on (for example) ia64, or any system with signed zeros.
Regardless of whether you have to worry about any of the above, your code violates the strict aliasing rule, so compilers are free to "optimize" it under the assumption that the bools and the int are entirely separate objects with non-overlapping lifetimes. For example, the compiler might decide that the code which initializes the bool array is a dead store and eliminate it, because the bools "must have" ceased to exist* at some point before you dereferenced the pointer. More complicated situations can also arise relating to register reuse and load/store reordering. All of these infelicities are expressly permitted by the C++ standard, which says the behavior is undefined when you engage in this kind of type punning.
You should use one of the alternative solutions provided by the other answers.
* It is legal (with some qualifications, particularly regarding alignment) to reuse the memory pointed to by boolArray by casting it to int and storing an integer, although if you actually want to do this, you must then pass boolArray through std::launder if you want to read the resulting int later. Regardless, the compiler is entitled to assume that you have done this once it sees the read, even if you don't call launder.
You can use std::bitset<N>::any:
Any returns true if any of the bits are set to true, otherwise false.
#include <iostream>
#include <bitset>
int main ()
{
std::bitset<4> foo;
// modify foo here
if (foo.any())
std::cout << foo << " has " << foo.count() << " bits set.\n";
else
std::cout << foo << " has no bits set.\n";
return 0;
}
Live
If you want to return true if all or none of the bits set to on, you can use std::bitset<N>::all or std::bitset<N>::none respectively.
The standard library has what you need in the form of the std::all_of, std::any_of, std::none_of algorithms.
...And for the obligatory "roll your own" answer, we can provide a simple "or"-like function for any array bool[N], like so:
template<size_t N>
constexpr bool or_all(const bool (&bs)[N]) {
for (bool b : bs) {
if (b) { return b; }
}
return false;
}
Or more concisely,
template<size_t N>
constexpr bool or_all(const bool (&bs)[N]) {
for (bool b : bs) { if (b) { return b; } }
return false;
}
This also has the benefit of both short-circuiting like ||, and being optimised out entirely if calculable at compile time.
Apart from that, if you want to examine the original idea of type-punning bool[N] to some other type to simplify observation, I would very much recommend that you don't do that view it as char[N2] instead, where N2 == (sizeof(bool) * N). This would allow you to provide a simple representation viewer that can automatically scale to the viewed object's actual size, allow iteration over its individual bytes, and allow you to more easily determine whether the representation matches specific values (such as, e.g., zero or non-zero). I'm not entirely sure off the top of my head whether such examination would invoke any UB, but I can say for certain that any such type's construction cannot be a viable constant-expression, due to requiring a reinterpret cast to char* or unsigned char* or similar (either explicitly, or in std::memcpy()), and thus couldn't as easily be optimised out.
There have previously been some great answers on memory alignment, but I feel don't completely answer some questions.
E.g.:
What is data alignment? Why and when should I be worried when typecasting pointers in C?
What is aligned memory allocation?
I have an example program:
#include <iostream>
#include <vector>
#include <cstring>
int32_t cast_1(int offset) {
std::vector<char> x = {1,2,3,4,5};
return reinterpret_cast<int32_t*>(x.data()+offset)[0];
}
int32_t cast_2(int offset) {
std::vector<char> x = {1,2,3,4,5};
int32_t y;
std::memcpy(reinterpret_cast<char*>(&y), x.data() + offset, 4);
return y;
}
int main() {
std::cout << cast_1(1) << std::endl;
std::cout << cast_2(1) << std::endl;
return 0;
}
The cast_1 function outputs a ubsan alignment error (as expected) but cast_2 does not. However, cast_2 looks much less readable to me (requires 3 lines). cast_1 looks perfectly clear on the intent, even though it is UB.
Questions:
1) Why is cast_1 UB, when the intent is perfectly clear? I understand that there may be performance issues with alignment.
2) Is cast_2 a correct approach to fixing the UB of cast_1?
1) Why is cast_1 UB?
Because the language rules say so. Multiple rules in fact.
The offset where you access the object does not meet the alignment requirements of int32_t (except on systems where the alignment requirement is 1). No objects can be created without conforming to the alignment requirement of the type.
A char pointer may not be aliased by a int32_t pointer.
2) Is cast_2 a correct approach to fixing the UB of cast_1?
cast_2 has well defined behaviour. The reinterpret_cast in that function is redundant, and it is bad to use magic constants (use sizeof).
WRT the first question, it would be trivial for the compiler to handle that for you, true. All it would have to do is pessimize every other non-char load in the program.
The alignment rules were written precisely so the compiler can generate code that performs well on the many platforms where aligned memory access is a fast native op, and misaligned access is the equivalent of your memcpy. Except where it could prove alignment, the compiler would have to handle every load the slow & safe way.