Real case that will break when access non-active union members?

Real case that will break when access non-active union members? - c++

According to cppreference and Purpose of Unions in C and C++, the code below is UB:
// convert char[8] to uint64_t
uint64_t convert(char c[8]) {
union{
uint64_t v;
char c[8];
} u;
for(int i = 0; i < 8; i++) {
u.c[i] = c[i];
}
return u.v;
}
// another example
union U {
uint64_t v;
struct{
uint32_t l;
uint32_t h;
}d;
};
uint64_t setlow(uint64_t v, uint32_t l) {
U u{v};
u.d.l = l;
return u.v;
}
However, even though it's UB, this kind of usage gives much convenience and I find it actually works for most compilers(GCC/Clang). So I want to know is there any compiler/implementation in practice that will make the code above break?

The reason the language hasn’t blessed the compiler extension (based on questionable C wording) is that it doesn’t fit with the rest of the language. Consider:
union U {
float f;
int i;
int operator()();
};
static int putget(float &f,int &i) {
i=0;
f=1;
return i;
}
int U::operator()() {return putget(f,i);}
With the function definition immediately available, GCC and Clang elide the store to i, and yet they return 0 as if that store had not only happened but wasn’t dead.

Related

Dynamically allocate memory to arrays in a union

I'm using union to fill some message fields in a char type message buffer. If the length of the message is constant, it works correctly. See the simplified code sample below.
The problem is, my message can have variable length. Specifically, the const N will be decided on runtime. Is there a way to keep using unions by dynamically allocating memory for buf?
I'm exploring smart pointers but haven't had any luck so far.
const int N = 4;
struct repeating_group_t {
uint8_t field1;
uint8_t field2;
}rpt_group;
struct message_t
{
union
{
char buf[2 + 2*N];
struct {
uint8_t header;
uint8_t block_len;
std::array<repeating_group_t, N> group;
};
};
};
int main()
{
message_t msg;
msg.header = 0x32;
msg.block_len = 8;
for (auto i = 0; i < N; i++)
{
msg.group[i].field1 = i;
msg.group[i].field2 = 10*i;
}
// msg.buf is correctly filled
return 0;
}

As said in the comments, use std::vector.
int main() {
// before C++17 use char
std::vector<std::byte> v.
v.push_back(0x32);
v.push_back(8);
for (auto i = 0; i < N; i++) {
v.push_back(i);
const uint16_t a = 10 * i;
// store uint16_t in big endian
v.push_back(a >> 16);
v.push_back(a & 0xff);
}
}
For custom datatypes, you could provide your own stream-like or container-like container and overload operator>> or another custom function of your choice for your datatypes.
struct Message{
std::vector<std::byte> v;
Message& push8(uint8_t t) { ... }
// push 16 bits little endian
Message& push16le(uint16_t t) { ... }
// push 16 bits big endian
Message& push16be(uint16_t t) { ... }
// etc
Message& push(const Repeating_group& t) {
v.push_back(t.field1);
v.push_back(t.field2);
return v;
}
// etc.
};
int main(){
Message v;
v.push8(0x32).push8(8);
for (...) {
v.push(Repeating_group(i, i * 10));
}
}

You can't have N evaluated at runtime because both c-array (your buf) and std::array have size information in its type.
Also - using union for (de)serialization is not a good practice - size of your structure will depend on alignment needed on given machine it is compiled for and so on... You could add packed attribute to overcome it, but you still have plenty of platform dependency problems here.
Regarding variable length - you'd need to write custom (de)serializer that will understand and store/read that size information to recreate that container on the other end.
Where do you want to pass these messages?

Concise way to initialize a block of memory with magic numbers

A few examples of what I'm referring to:
typedef struct SOME_STRUCT {
unsigned int x1;
unsigned int x2;
unsigned int x3;
unsigned int x4;
// What I expected would work, but doesn't; the 2nd parameter gets
// turned into an 8-bit quantity at some point within memset
SOME_STRUCT() { memset( this, 0xFEEDFACE, sizeof( *this ) ); }
// Something that worked, but seems hokey/hackish
SOME_STRUCT() {
unsigned int *me = (unsigned int *)this;
for( int ii = 0; ii < sizeof(*this)/sizeof(*me); ++ii ) {
me[ii] = 0xFEEDFACE;
}
}
// The far-more-verbose-but-C++-way-of-doing-it
// This works, but doesn't lend itself very well
// to being a drop-in way to pull this off on
// any struct.
SOME_STRUCT() : x1( 0xFEEDFACE )
, x2( 0XFEEDFACE )
, x3( 0XFEEDFACE )
, x4( 0XFEEDFACE ) {}
// This would work, but I figured there would be a standard
// function that would alleviate the need to do it myself
SOME_STRUCT() { my_memset( this, 0xFEEDFACE, sizeof(*this) ); }
}
I can't use valgrind here, and my options are limited as far as various debugging libraries I have access to -- which is why I'm doing it myself for this one-off case.

Here’s a partial example of using std::generate() safely:
#include <algorithm>
struct Wizard {
size_t i;
static unsigned char magic[4];
Wizard() : i(0) {}
unsigned char operator()() {
size_t j = i++;
i %= sizeof(magic); // Not strictly necessary due to wrapping.
return magic[j];
}
};
unsigned char Wizard::magic[4] = {0xDE,0xAD,0xBE,0xEF};
std::generate(reinterpret_cast<unsigned char*>(this),
reinterpret_cast<unsigned char*>(this) + sizeof(*this),
Wizard());
(Of course, the endianness may or may not be right, depending on how you’re looking and what you’re expecting to see when you do!)

I would declare this constructor:
SOME_STRUCT( unsigned int magic) : x1 (magic), x2 (magic), x3 (magic), x4 (magic) {}
This is very similar to your third option, and seems to be the natural C++ way of doing it.

A point not made by others is this:
I think it is unsafe to do this for Non-POD types. Ironically, adding the initialization into a constructor makes it non-pod. Therefore I propose a freestanding function that checks for POD-ness statically (sample uses c++0x type_traits but you could use Boost as well)
#include <iostream>
#include <type_traits>
template <typename T>
typename std::enable_if<std::is_pod<T>::value>::type* FeedFace(T& v)
{
static const unsigned char MAGIC[] = { 0xFE, 0xED, 0xFA, 0xCE };
unsigned char *me = reinterpret_cast<unsigned char *>(&v);
for( size_t ii = 0; ii < sizeof(T)/sizeof(unsigned char); ++ii )
me[ii] = MAGIC[ii % sizeof(MAGIC)/sizeof(unsigned char)];
}
struct Pod { char data[37]; };
struct NonPod : Pod { virtual ~NonPod() { } };
int main()
{
Pod pod;
FeedFace(pod);
NonPod nonpod;
// FeedFace(nonpod); // fails to compile (no matching function call)
return 0;
}

I assume this allows for nasty hacky stuff, like this:
#include <iomanip>
#include <iostream>
#include <algorithm>
using namespace std;
int main(void)
{
struct SOME_STRUCT {
unsigned int x1;
unsigned int x2;
unsigned int x3;
unsigned int x4;
} foo;
fill(reinterpret_cast<unsigned int *>(&foo),
reinterpret_cast<unsigned int *>(&foo) + sizeof(foo) / sizeof(unsigned int),
(unsigned int)0xDEADBEEF);
cout << foo.x1 << endl;
cout << foo.x2 << endl;
cout << foo.x3 << endl;
cout << foo.x4 << endl;
return (0);
}
Basically abusing std::fill() with pointer casts.

You could reinterpret_cast this as a char* and then use std::generate with a predicate that rotates through the values you care about. If I get time later I'll try to sketch the code.
Also have you considered for example an LD_PRELOAD memory checking malloc library?

Here's another hacky method.
SOME_STRUCT() {
x1 = 0xFEEDFACE;
memmove(&(this->x2), this, sizeof(*this)-sizeof(x1));
}

Even if your memset() attempt did work, it makes an assumption about the structure packing and is therefore not guaranteed to be correct. There is no programmatic way to iterate through the members of a struct and assign them in C or C++. You will therefore need to be content with assigning the members individually. Having said that, if you feel that you are comfortable with the memory layout of the structure and don't need to worry about portable code, you can just as easily initialize it with a for loop.
unsigned int i, *ar = (unsigned int *)&my_struct;
for (i = 0; i < sizeof(my_struct) / sizeof(unsigned int); i++) {
ar[i] = 0xdeadbeef;
}

Alterative array representation

I'm facing a problem in C++ for which I currently don't have an elegant solution. I'm receiving data in the following format:
typedef struct {
int x;
int y;
int z;
}Data3D;
vector<Data3D> v; // the way data is received (can be modified)
But the functions that do the computations receive parameters like this:
Compute(int *x, int *y, int *z, unsigned nPoints)
{...}
Is there a way to modify the way data is received Data3D so that the memory representation would change from:
XYZXYZXYZ
to
XXXYYYZZZ
What I'm looking for is some way of populating a data structure in a similar way we populate an array but that has the representation above (XXXYYYZZZ). Any custom data structures are welcome.
So I want to write something like (in the above example):
v[0].x = 1
v[0].y = 2
v[0].y = 0
v[1].x = 6
v[1].y = 7
v[1].z = 5
and to have the memory representation below
1,6...2,7....0,5
1,6 is the beginning of the x array
2,7 is the beginning of the y array
0,5 is the beginning of the z array
I know that this can be solved by using a temporary array but I'm interested to know if there are other methods for doing this.
Thanks,
Iulian
LATER EDIT:
Since there are some solutions that change only the declaration of Compute function without changing its code - this should be taken into account also. See the answers related to the solution that involves using an iterator.

Iterator-based solution
An elegant solution would be to make Compute() accept iterators instead of pointers. The iterators you provide will have an adequate ++ operator (see boost::iterator for an easy way to build them)
Compute(MyIterator x, MyIterator y, MyIterator z);
There are normally very few changes to make to the function body, since *x, x[i] or ++x will be handled by MyIterator to point to the right memory location.
Quick'n Dirty solution
A less elegant but more straightforward solution is to hold your Data in the following struct
typedef struct {
std::vector<int> x;
std::vector<int> y;
std::vector<int> z;
}DataArray3D;
When receiving the data fill your struct like
void Receive(const Data3D& data, DataArray3D& array)
{
array.x.push_back(data.x);
array.y.push_back(data.y);
array.z.push_back(data.z);
}
and call Compute like this (Compute itself is unchanged)
Compute(&array.x[0], &array.y[0], &array.z[0]);

You could of course change your computer function.
I assume that all operation done on your int* in compute are dereference and increment operation.
I did not test it but you could pass in a structure like this
struct IntIterator
{
int* m_currentPos;
IntIterator(int* startPos):m_currentPos(startPos){};
IntIterator& operator++()
{
m_currentPos += 3;
return *this;
}
IntIterator& operator++(int)
{
m_currentPos += 3;
return *this;
}
int operator*()
{
return *m_currentPos;
}
int& operator[](const int index)
{
return m_currentPos[index*3];
}
};
And initialize it with this
std::vector<Data3D> v;
IntIterator it(&v[0].x);
Now all you need to do is change the type of your compute function arguments and it should do it. If of course some pointer arithmetics are used than it is getting more complex.

Reasonably elegant would be (not compiled/tested):
struct TempReprPoints
{
TempReprPoints(size_t size)
{
x.reserve(size); y.reserve(size); z.reserve(size);
}
TempReprPoints(const vector<Data3D> &v)
{
x.reserve(v.size()); y.reserve(v.size()); z.reserve(v.size());
for (size_t i = 0; i < v.size(); ++i ) push_back(v[i]);
}
void push_back(const Data3D& data)
{
x.push_back(data.x); y.push_back(data.y); z.push_back(data.z);
}
int* getX() { return &x[0]; }
int* getY() { return &y[0]; }
int* getZ() { return &z[0]; }
size_t size() { return x.size(); }
std::vector<int> x;
std::vector<int> y;
std::vector<int> z;
};
So you can fill it with a loop or even try to make the std::back_inserter work with it.

In order to get the syntax you want, you could use something like this.
struct Foo {
vector<int> x;
vector<int> y;
vector<int> z;
struct FooAccessor {
FooAccessor(Foo & f, int i) : x(f.x[i]), y(f.y[i]), z(f.z[i]) {}
int &x, &y, &z;
};
FooAccessor operator[](int i) {
return FooAccessor(*this, i);
}
};
int main() {
Foo f;
f.x.resize(10);
f.y.resize(10);
f.z.resize(10);
f[0].x = 1;
f[1].y = 2;
f[2].z = 3;
for (size_t p = 0; p < 10; ++p) {
cout << f.x[p] << "," << f.y[p] << "," << f.z[p] << endl;
}
}
I'd consider this an ugly solution - changing the way you access your data would likely be "better".

float bits and strict aliasing

I am trying to extract the bits from a float without invoking undefined behavior. Here is my first attempt:
unsigned foo(float x)
{
unsigned* u = (unsigned*)&x;
return *u;
}
As I understand it, this is not guaranteed to work due to strict aliasing rules, right? Does it work if a take an intermediate step with a character pointer?
unsigned bar(float x)
{
char* c = (char*)&x;
unsigned* u = (unsigned*)c;
return *u;
}
Or do I have to extract the individual bytes myself?
unsigned baz(float x)
{
unsigned char* c = (unsigned char*)&x;
return c[0] | c[1] << 8 | c[2] << 16 | c[3] << 24;
}
Of course this has the disadvantage of depending on endianness, but I could live with that.
The union hack is definitely undefined behavior, right?
unsigned uni(float x)
{
union { float f; unsigned u; };
f = x;
return u;
}
Just for completeness, here is a reference version of foo. Also undefined behavior, right?
unsigned ref(float x)
{
return (unsigned&)x;
}
So, is it possible to extract the bits from a float (assuming both are 32 bits wide, of course)?
EDIT: And here is the memcpy version as proposed by Goz. Since many compilers do not support static_assert yet, I have replaced static_assert with some template metaprogramming:
template <bool, typename T>
struct requirement;
template <typename T>
struct requirement<true, T>
{
typedef T type;
};
unsigned bits(float x)
{
requirement<sizeof(unsigned)==sizeof(float), unsigned>::type u;
memcpy(&u, &x, sizeof u);
return u;
}

About the only way to truly avoid any issues is to memcpy.
unsigned int FloatToInt( float f )
{
static_assert( sizeof( float ) == sizeof( unsigned int ), "Sizes must match" );
unsigned int ret;
memcpy( &ret, &f, sizeof( float ) );
return ret;
}
Because you are memcpying a fixed amount the compiler will optimise it out.
That said the union method is VERY widely supported.

The union hack is definitely undefined behavior, right?
Yes and no. According to the standard, it is definitely undefined behavior. But it is such a commonly used trick that GCC and MSVC and as far as I know, every other popular compiler, explicitly guarantees that it is safe and will work as expected.

The following does not violate the aliasing rule, because it has no use of lvalues accessing different types anywhere
template<typename B, typename A>
B noalias_cast(A a) {
union N {
A a;
B b;
N(A a):a(a) { }
};
return N(a).b;
}
unsigned bar(float x) {
return noalias_cast<unsigned>(x);
}

If you really want to be agnostic about the size of the float type and just return the raw bits, do something like this:
void float_to_bytes(char *buffer, float f) {
union {
float x;
char b[sizeof(float)];
};
x = f;
memcpy(buffer, b, sizeof(float));
}
Then call it like so:
float a = 12345.6789;
char buffer[sizeof(float)];
float_to_bytes(buffer, a);
This technique will, of course, produce output specific to your machine's byte ordering.

C++ union array and vars?

There's no way to do something like this, in C++ is there?
union {
{
Scalar x, y;
}
Scalar v[2];
};
Where x == v[0] and y == v[1]?

Since you are using C++ and not C, and since they are of the same types, why not just make x a reference to v[0] and y a reference to v[1]

How about
union {
struct {
int x;
int y;
};
int v[2];
};
edit:
union a {
struct b { int first, second; } bee;
int v[2];
};
Ugly, but that's more accurate

Try this:
template<class T>
struct U1
{
U1();
T v[2];
T& x;
T& y;
};
template<class T>
U1<T>::U1()
:x(v[0])
,y(v[1])
{}
int main()
{
U1<int> data;
data.x = 1;
data.y = 2;
}

I've used something like this before. I'm not sure its 100% OK by the standard, but it seems to be OK with any compilers I've needed to use it on.
struct Vec2
{
float x;
float y;
float& operator[](int i) { return *(&x+i); }
};
You can add bounds checking etc to operator[] if you want ( you probably should want) and you can provide a const version of operator[] too.
If you're concerned about padding (and don't want to add the appropriate platform specific bits to force the struct to be unpadded) then you can use:
struct Vec2
{
float x;
float y;
float& operator[](int i) {
assert(i>=0);
assert(i<2);
return (i==0)?x:y;
}
const float& operator[](int i) const {
assert(i>=0);
assert(i<2);
return (i==0)?x:y;
}
};

I was looking for a similair thing and eventually came up with a solution.
I was looking to have a data storage object that I could use as both an array of values and as individual values (for end-user flexibility in writing Arduino libraries).
Here is what I came up with:
class data{
float _array[3];
public:
float& X = _array[0];
float& Y = _array[1];
float& Z = _array[2];
float& operator[](int index){
if (index >= 3) return _array[0]; //Make this action whatever you want...
return _array[index];
}
float* operator&(){return _array;}
};
int main(){
data Test_Vector;
Test_Vector[0] = 1.23; Test_Vector[1] = 2.34; Test_Vector[2] = 3.45;
cout<<"Member X = "<<Test_Vector.X;
cout<<"Member Y = "<<Test_Vector.Y;
cout<<"Member Z = "<<Test_Vector.Z;
float* vector_array = &Test_Vector;
cout<<"Array = {"<<vector_array[0]<<", "<<vector_array[1]<<", "<<vector_array[2]<<"}";
}
Thanks to Operator overloading, we can use the data object as if was an array and we can use it for pass-by-reference in function calls (just like an array)!
If someone with More C++ experience has a better way of applying this end product, I would love to see it!
EDIT: Changed up the code to be more cross-platform friendly

Given your example:
union
{
struct
{
Scalar x, y;
};
Scalar v[2];
};
As others have noted, in general, the standard does not guarantee that there will be no padding between x and y, and actually compilers inserting padding in structures is pretty common behavior.
On the other hand, with solutions like:
struct U
{
int v[2];
int& x;
int& y;
};
U::U()
: x(v[0])
, y(v[1])
{}
what I don't like mainly is the fact that I have to mention x, y twice. For cases where I have more than just a few elements (say 10), this becomes much less readable and harder to maintain - e.g. if you want to change the order of x,y then you have to change the indexes below too (well not mandatory but otherwise order in memory wouldn't match order of fields, which would not be recommended). Also, U can no longer be a POD since it needs a user-defined constructor. And finally, the x & y references consume additional memory.
Hence, the (acceptable for me) compromise I've come up with is:
struct Point
{
enum CoordType
{
X,
Y,
COUNT
};
int coords[CoordType::COUNT];
};
typedef Point::CoordType PtCoord;
With this you can then do:
Point p;
for ( int i = 0; i < PtCoord::COUNT; i++ )
p.coords[i] = 100;
std::cout << p.coords[PtCoord::X] << " " << p.coords[PtCoord::Y] << std::endl;
// 100 100
A bit sophisticated but I prefer this over the references suggestion.

Depending on what "Scalar" is, yes, you can do that in C++. The syntax is almost exactly (maybe even exactly exactly, but I'm rusty on unions) what you wrote in your example. It's the same as C, except there are restrictions on the types that can be in the unions (IIRC they must have a default constructor). Here's the relevant Wikipedia article.

With C++11 you have anonymous unions and structs which just export their definitions to the enclosing scope, so you can do this:
typedef int Scalar;
struct Vector
{
union
{
struct
{
Scalar x, y;
};
Scalar v[2];
};
};

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Real case that will break when access non-active union members? - c++

Related

Dynamically allocate memory to arrays in a union

Concise way to initialize a block of memory with magic numbers

Alterative array representation

float bits and strict aliasing

C++ union array and vars?

Categories

Resources