Type punning and unions - c++

So, there are a few questions on SO about this subject, but I haven't quite found something that exactly answers the question I have in mind. First some background:
I would like to have a uint32_t field, which I can also access as an array of bytes.
So the first thing that comes to mind is:
union U {
uint32_t u32;
uint8_t bytes[sizeof(uint32_t)];
};
Which allows me to do this:
// "works", but is UB as far as I understand
U u;
u.u32 = 0x11223344;
u.bytes[0] = 0x55;
OK, so undefined behavior (UB) is bad, therefore we don't want to do that. Similarly casts are UB and can sometimes be even worse due to alignment concerns (though not in this case because I'm using a char sized object for my array).
// "works", but is UB as far as I understand
uint32_t v = 0x11223344;
auto p = reinterpret_cast<uint8_t *>(&v);
p[0] = 0x55;
Once again, UB is bad, therefore we don't want to do that.
Some say that this is OK if we use a char* instead of a uint8_t*:
// "works", but maybe is UB?
uint32_t v = 0x11223344;
auto p = reinterpret_cast<char *>(&v);
p[0] = 0x55;
But I am honestly not sure about it... So getting creative.
So, I think I remember it being legal (as far as I know) to read the contents of a void* cast to a char* (this allows things like std::memcpy to not be UB). So maybe we can kinda play with this:
uint8_t get_byte(const void *p, size_t n) {
auto ptr = static_cast<const char *>(p);
return ptr[n];
}
void set_byte(void *p, size_t index, uint8_t v) {
auto ptr = static_cast<char *>(p);
ptr[index] = v;
}
// "works", is this UB?
uint32_t v = 0x11223344;
uint8_t v1 = get_byte(&v, 0); // read
set_byte(&v, 0, 0x55); // write
So my questions are:
Is the final example I came up with UB?
If it is, what is the "right" way to do this? I really hope the "correct" way isn't a memcpy to and from a byte array. That would be ridiculous.
(BONUS): suppose I want my get_byte to return a reference (like for implementing operator[]. Is it safe to use uint8_t instead of literal char when reading a the contents of a void *?
NOTE: I understand the concerns regarding endian and portability. They are not a problem for my use case. I think that it is acceptable for the result to be an "unspecified value" (in that it is compiler specific which byte it will read). My question is really focused on the UB aspects ("nasal demons" and similar).

Why not create a class for that ?
Something like:
class MyInt32 {
public:
std::uint32_t asInt32() const {
return b[0]
| (b[1] << 8)
| (b[2] << 16)
| (b[3] << 24);
}
void setInt32(std::uint32 i) {
b[0] = (i & 0xFF);
b[1] = ((i >> 8) & 0xFF);
b[2] = ((i >> 16) & 0xFF);
b[3] = ((i >> 24) & 0xFF);
}
const std::array<std::uint8_t, 4u>& asInt8() const { return b; }
std::array<std::uint8_t, 4u>& asInt8() { return b; }
void setInt8s(const std::array<std::uint8_t, 4u>& a) { b = a; }
private:
std::array<std::uint8_t, 4u> b;
};
So you don't have UB, you don't break aliasing rules, you manage endianess as you want.

It's perfectly legit (as long as the type is a POD), and uint8_t is not guaranteed to be legal so don't.

Related

Casting from basic type to non-POD struct

This question is whether casting a byte stream into a non-POD structure is undefined behaviour.
Where data is produced somewhere like this, i.e. packs 4 4-bit values into a short.
std::vector<unsigned short> v(50);
for(int i<0; k < 50; k++)
{
v[k] = (k%8) | (k+1)%8 << 4 | (k+2)%8 << 8 | (k+2)%8 << 12
}
Then the data is passed to a second module as a byte stream and parsed in the receiving module.
struct Data
{
Data(unsigned short a, unsigned short b, unsigned short c, unsigned short d)
{
value = (a & 0xF) | (b & 0xF) << 4 | (c & 0xF) << 8 | (d & 0xF) << 12
}
Data(unsigned short v) : value(v) {}
short getA() const
{
return value & 0xF
}
short getB() const
{
return (value >> 4) & 0xF
}
short getC() const
{
return (value >> 8) & 0xF
}
short getD() const
{
return (value >> 12) & 0xF
}
private:
unsigned short value;
}
...
unsigned char* data = ...;
Data* ptr = reinterpret_cast<Data*>(data);
ptr[0].getA();
Is this undefined behaviour? The size of the struct Data is the same as short, but does the presence of a constructor and member functions make it UB? If I remove the constructors, will it then be fine?
You have to go through a constructor to construct an object. You can't simply cast a byte stream and then pretend that it results in a usable object.
For a trivially copyable type you can create the object first, then copy in bytes, but otherwise a constructor call is needed.
To answer your question: Yes, that would be UB.
Yes it is. But you can find an easy alternative:
std::vector<unsigned short> in(50);
std::vector<Data> out{begin(in), end(in)};
(void) out[0].getA();
The construction of out will construct Data objects in-place from each element of in. And you can trust your compiler to optimize away any memory copy ;)

The safe and standard-compliant way of accessing array of integral type as an array of another unrelated integral type?

Here's what I need to do. I'm sure it's a routine and recognizable coding task for many C++ developers out there:
void processAsUint16(const char* memory, size_t size) {
auto uint16_ptr = (const uint16_t*)memory;
for (size_t i = 0, n = size/sizeof(uint16_t); i < n; ++i) {
std::cout << uint16_ptr[i]; // Some processing of the other unrelated type
}
}
Problem: I'm developing with an IDE that integrates clang static code analysis, and every way of casting I tried, short of memcpy (which I don't want to resort to) is either discouraged or strongly discouraged. For example, reinterpret_cast is simply banned by the CPP Core Guidelines. C-style cast is discouraged. static_cast cannot be used here.
What's the right way of doing this that avoids type aliasing problems and other kinds of undefined behavior?
What's the right way of doing this that avoids type aliasing problems and other kinds of undefined behavior?
You use memcpy:
void processAsUint16(const char* memory, size_t size) {
for (size_t i = 0; i < size; i += sizeof(uint16_t)) {
uint16_t x;
memcpy(&x, memory + i, sizeof(x));
// do something with x
}
}
uint16_t is trivially copyable, so this is fine.
Or, in C++20, with std::bit_cast (which awkwardly has to go through an array first):
void processAsUint16(const char* memory, size_t size) {
for (size_t i = 0; i < size; i += sizeof(uint16_t)) {
alignas(uint16_t) char buf[sizeof(uint16_t)];
memcpy(buf, memory + i, sizeof(buf));
auto x = std::bit_cast<uint16_t>(buf);
// do something with x
}
}
Practically speaking, compilers will just "do the right thing" if you just reinterpret_cast, even if it's undefined behavior. Perhaps something like std::bless will give us a more direct, non-copying, mechanism of doing this, but until then...
My preference would be to treat the array of char as a sequence of octets in a defined order. This obviously doesn't work if it actually can be either order depending on target architecture, but in practise, a memory buffer like this usually comes from a file or a network connection.
void processAsUint16(const char* memory, size_t size) {
for (size_t i = 0; i < size; i += 2) {
const unsigned char lo = memory[i];
const unsigned char hi = memory[i+1];
const uint16_t x = lo + hi*256; // or "lo | hi << 8"
// do something with x
}
}
Note that we do not use sizeof(uint16_t) here. memory is a sequence of octets, so even if CHAR_BITS is 16, there will be two chars needed to hold a uint16_t.
This can be a little bit cleaner if memory can be declared as unsigned char - no need for the definition of lo and hi.

uint32_t pointer to the same location as uint8_t pointer

#include <iostream>
int main(){
uint8_t memory[1024];
memory[0] = 1;
memory[1] = 1;
uint32_t *test = memory;
//is it possible to get a value for *test that would be in this example 257?
}
I want to create a uin32_t pointer to the same adress as the uint8_t pointer. Is this possible without using new(adress)? I don't want to lose the information at the adress. I know pointers are just adresses and therefor I should be able to just set the uint32_t pointer to the same adress.
This code produces an error:
invalid conversion from 'uint8_t*' to 'uint32_t*' in initialization
This would be a violation of so-called Strict Aliasing Rule, so it can not be done. Sad, but true.
Use memcpy to copy data and in many cases compilers will optimize memory copy and generate the same code as they would with cast, but in Standard-conforming way.
As already mentioned you cannot convert uint8_t * to uint32_t * due to strict aliasing rule, you can convert uint32_t * to unsigned char * though:
#include <iostream>
int main(){
uint32_t test[1024/4] = {}; // initialize it!
auto memory = reinterpret_cast<unsigned char *>( test );
memory[0] = 1;
memory[1] = 1;
std::cout << test[0] << std::endl;
}
this is not portable code due to Endianness, but at least it does not have UB.
This question completely ignores the concept of endian-ness; while your example has the lower and upper byte the same value, if the byte order is swapped it makes no difference; but in the case where it is; your number will be wrong unexpectedly.
As such, there's no portable way to use the resulting number.
You can do that with union. As mentioned above, you have to be aware of endianness of target device, but in most cases it will be little-endian. And there is also a bit of controversy about using unions in such way, but fwiw it's getting a job done and for some uses it's good enough.
#include <iostream>
int main(){
union {
uint8_t memory[1024] = {};
uint32_t test[1024/4];
};
memory[0] = 1;
memory[1] = 1;
std::cout << test[0]; // 257
}
uint32_t *test =(uint32_t*) memory;
uint32_t shows that the memory pointed by test should contain uint32_t .

Can I prevent breaking anti-aliasing rules using this technique?

If I recall correctly, it would be undefined behavior to write to FastKey::key and then read from FastKey::keyValue:
struct Key {
std::array<uint8_t, 6> MACAddress;
uint16_t EtherType;
};
union FastKey {
Key key;
uint64_t keyValue;
};
However, I have been told that if I add char array to the union then the UB is cleared:
union FastKey {
Key key;
uint64_t keyValue;
char fixUB[sizeof(Key)];
};
Is this true?
Edit
As usual my understanding was wrong. With the new information I gathered, I think that I can get the key as a uint64_t value like this:
struct Key {
std::array<uint8_t, 6> MACAddress;
uint16_t EtherType;
};
union FastKey {
Key key;
unsigned char data[sizeof(Key)];
};
inline uint64_t GetKeyValue(FastKey fastKey)
{
uint64_t key = 0;
key |= size_t(fastKey.data[0]) << 56;
key |= size_t(fastKey.data[1]) << 48;
key |= size_t(fastKey.data[2]) << 40;
key |= size_t(fastKey.data[3]) << 32;
key |= size_t(fastKey.data[4]) << 24;
key |= size_t(fastKey.data[5]) << 16;
key |= size_t(fastKey.data[6]) << 8;
key |= size_t(fastKey.data[7]) << 0;
return key;
}
I suspect that this will be equally fast as the original version. Feel free to correct me.
Update
#Steve Jessop I implemented a quick benchmark to test the performance of memcpy vs my solution. I'm not a benchmarking expert, so there may be stupid errors in the code the lead to wrong results. However, if the code is right then it would seem that memcpy is much slower.
Note: It seems the benchmark is wrong because the time to calculate the time for fast key is always zero. I'll see if I can fix it.
No, reading a uint64_t if you have a Key object there is still UB. What isn't UB is to read a char, because there's an exception for char in the aliasing rules. Adding the array doesn't propagate the exception to the other types.
The version in the edit seems fine (though I'd use unsigned char), but now it is more complex than just using a reinterpret_cast from Key* to unsigned char* or a memcpy.

Casting void pointers, depending on data (C++)

Basically what I want to do is, depending on the some variable, to cast a void pointer into a different datatype. For example (the 'cast' variable is just something in order to get my point across):
void* ptr = some data;
int temp = some data;
int i = 0;
...
if(temp == 32) cast = (uint32*)
else if(temp == 16) cast = (uint16*)
else cast = (uint8*)
i = someArray[*((cast)ptr)];
Is there anything in C++ that can do something like this (since you can't actually assign a variable to be just (uint32*) or something similar)? I apologize if this isn't clear, any help would be greatly appreciated.
The "correct" way:
union MyUnion
{
uint32 asUint32;
uint16 asUint16;
uint8 asUint8;
}
uint32 to_index(int size, MyUnion* ptr)
{
if (size== 32) return ptr->asUint32;
if (size== 16) return ptr->asUint16;
if (size== 8) return ptr->asUint8;
}
i = someArray[to_index(temp,ptr)]
[update: fixed dumb typo]
Clearly, boost::variant is the way to go. It already stores a type-tag that makes it impossible for you to cast to the wrong type, ensuring this using the help of the compiler. Here is how it works
typedef boost::variant<uint32_t*, uint16_t*, uint8_t*> v_type;
// this will get a 32bit value, regardless of what is contained. Never overflows
struct PromotingVisitor : boost::static_visitor<uint32_t> {
template<typename T> uint32_t operator()(T* t) const { return *t; }
};
v_type v(some_ptr); // may be either of the three pointers
// automatically figures out what pointer is stored, calls operator() with
// the correct type, and returns the result as an uint32_t.
int i = someArray[boost::apply_visitor(PromotingVisitor(), v)];
A cleaner solution:
uint32 to_index(int temp, void* ptr) {
if (temp == 32) return *((uint32*)ptr);
if (temp == 16) return *((uint16*)ptr);
if (temp == 8) return *((uint8*)ptr);
assert(0);
}
i = someArray[to_index(temp,ptr)]
It sounds like maybe you're after a union, or if you're using Visual Studio a _variant_t. Or maybe typeinfo() would be helpful? (To be honest, I'm not quite sure exactly what you're trying to do).
As far as the casts, you can cast just about anything to anything -- that's what makes C++ dangerous (and powerful if you're really careful).
Also, note that pointer values are 32-bit or 64-bit in most platforms, so you couldn't store a uint64 in a void* on a 32-bit platform.
Finally, maybe this is what you want:
void* p = whatever;
uint32 x = (uint32)p;
or maybe
uint32 source = 6;
void* p = &source;
uint32 dest = *((uint32*)p);
void* p =
If you were locked into using a void ptr, and absolutely needed to call [] with different types:
template <typename cast_to>
inline
int get_int_helper(someArray_t someArray, void* ptr) {
return someArray[*static_cast<cast_to*>(ptr)];
}
int get_int(someArray_t someArray, void* ptr, int temp) {
switch ( temp ) {
case 32: return get_int_helper<uint32>(someArray,ptr);
case 16: return get_int_helper<uint16>(someArray,ptr);
default: return get_int_helper<uint8>(someArray,ptr);
}
}
However, as others have pointed out; there are probably better/other ways to do it. Most likely, whatever array you have doesn't have multiple operator[], so it doesn't need the different types. In addition, you could be using boost::variant to hold a discriminated union of the types so you wouldn't have to pass around temp
It seems you want to store the "cast" function that takes a void* and produces an unsigned integer. So, make it a function:
std::map<int, boost::function<unsigned(*)(void*)> casts;
template <typename T> unsigned cast(void* v) { return *(T*)v; }
casts[32] = cast<uint32>;
casts[16] = cast<uint16>;
casts[8] = cast<uint8>;
casts[128] = MySpecialCastFromDouble;
void* foo = getFoo();
unsigned bar = casts[16](foo);