C++ Undefined behaviour with unions

C++ Undefined behaviour with unions - c++

Was just reading about some anonymous structures and how it is isn't standard and some general use case for it is undefined behaviour...
This is the basic case:
struct Point {
union {
struct {
float x, y;
};
float v[2];
};
};
So writing to x and then reading from v[0] would be undefined in that you would expect them to be the same but it may not be so.
Not sure if this is in the standard but unions of the same type...
union{ float a; float b; };
Is it undefined to write to a and then read from b ?
That is to say does the standard say anything about binary representation of arrays and sequential variables of the same type.

The standard says that reading from any element in a union other
than the last one written is undefined behavior. In theory, the
compiler could generate code which somehow kept track of the
reads and writes, and triggered a signal if you violated the
rule (even if the two are the same type). A compiler could also
use the fact for some sort of optimization: if you write to a
(or x), it can assume that you do not read b (or v[0])
when optimizing.
In practice, every compiler I know supports this, if the union
is clearly visible, and there are cases in many (most?, all?)
where even legal use will fail if the union is not visible
(e.g.:
union U { int i; float f; };
int f( int* pi, int* pf ) { int r = *pi; *pf = 3.14159; return r; }
// ...
U u;
u.i = 1;
std::cout << f( &u.i, &u.f );
I've actually seen this fail with g++, although according to the
standard, it is perfectly legal.)
Also, even if the compiler supports writing to Point::x and
reading from Point::v[0], there's no guarantee that Point::y
and Point::v[1] even have the same physical address.

The standard requires that in a union "[e]ach data member is allocated as if it were the sole member of a struct." (9.5)
It also requires that struct { float x, y; } and float v[2] must have the same internal representation (9.2) and thus you could safely reinterpret cast one as the other
Taken together these two rules guarantee that the union you describe will function provided that it is genuinely written to memory. However, because the standard only requires that the last data member written be valid it's theoretically possible to have an implementation that fails if the union is only used as a local variable. I'd be amazed if that ever actually happens, however.

I did not get why you have used float v[2];
The simple union for a point structure can be defined as:
union{
struct {
float a;
float b;
};
} Point;
You can access the values in unioin as:
Point.a = 10.5;
point.b = 12.2; //example

Related

Type punning two types from different third party libraries without union

I've read that using unions for type punning is actually undefined behavior in C++ and I was wondering how would you type pun instead?
As an example I used a union to type pun two types from two third party libraries with identical layout like this (libAQuaternion and libBQuaternion are the types of those third party libraries which I can't change):
struct libAQuaternion {
double x, y, z, w;
};
void libAFunc(libAQuaternion &p) {
p.x = p.y = p.z = p.w = 1.;
}
struct libBQuaternion {
double x, y, z, w;
};
void libBFunc(libBQuaternion &p) {
p.x = p.y = p.z = p.w = 2.;
}
union myQuat {
libAQuaternion a;
libBQuaternion b;
};
int main() {
myQuat q;
libAFunc(q.a);
libBFunc(q.b);
}
What would be the standard conforming best solution to this?

What be the standard conforming best solution to this?
Write a function to convert from one quaternion to the other.
libBQuaternion convert(const libAQuaternion &Quat) {
return{Quat.x, Quat.y, Quat.z, Quat.w};
}
libAQuaternion convert(const libBQuaternion &Quat) {
return{Quat.x, Quat.y, Quat.z, Quat.w};
}
// or template if you want to
template<typename T, typename U>
T convertTo(U &&Quat) {
return{Quat.x, Quat.y, Quat.z, Quat.w};
}
Any optimizer should be able to optimize this away completely, so there should be no performance penalty.
But this will be a problem if there is a function taking one such class by lvalue ref. You would need to create a new object of the appropriate class, pass that and then reassign the correct values to the original struct. I guess you could make a function for this, but IMO the cleanest way would be to change the signature of the function to take the individual values by lvalue ref, but that is not always possible.
There is just no way of doing type punning in C++ without invoking UB.

C++ doesn't allow type punning. Most of the time.
What you wrote is perfectly legal, but there is one potential hazard.
The two quaternions are standard layout classes and their common initial sequence is their entirety. It is thus legal to read the member of the other through a union
myQuat q = libAQuaternion{1, 0, 0, 0};
std::cout << q.b.x; // legal
We then note that the quaternions can only be written to by either a builtin/trivial assignment or by placement new.
Using a builtin/trivial assignment on an inactive member (or its member, recursively), causes the implicit beginning of the inactive member's lifetime.
Using a placement new will also begin the member's lifetime.
Thus upon writing to a quaternion, either its lifetime already begun, or it will begin.
Reading and writing is together called accessing, and the strict-aliasing rule forbids accessing something with another type, which is what forbids type-punning. But since we just proved that accessing either quaternions is well-defined, this is the one exception where type-punning is indeed legal.
The one hazard is when the library function writes partially to the quaternion
void someFunc(libBQuaternion& q)
{
q.x = 1;
}
myQuat q = libAQuaternion{1, 0, 0, 0};
someFunc(q.b);
std::cout << q.a.y; // UB
Unfortunately, q.a.y is uninitialized and therefore reading it is undefined behaviour.
However, given all the previous rules and the reason behind having uninitialized variables is efficiency, it is quite unlikely compilers will take advantage of the UB and "miscompile".

template<class D, class S>
D bitcpy( S const* s ){
static_assert( sizeof(S)>=sizeof(D) );
D r;
memcpy( &r, s, sizeof(D) );
return r;
}
union myQuat {
libAQuaternion a;
libBQuaternion b;
void AsA(){
a = bitcpy<libAQuaternion>(&b);
}
void AsB(){
b = bitcpy<libBQuaternion>(&a);
}
};
You have to keep track of if there is an A or a B in the union. To switch call AsA or AsB; this is a noop at runtime.
I believe the union rules allow activating members via assignment; if I misunderstand the standard or the situation (these are pod types right?) things get a bit trickier. But I think that doesn't apply here.

The safest way to type punning is to use memcpy from type A to type B. On many platforms memcpy is an intrinsic meaning it is implemented by the compiler and therefore may well be optimized away.

C++ - unions containing arrays

If I have a C++ union which contains an array. I would like to access each element of the array using a set of unique identifiers. (This may seem like a strange thing to want. In my application I have a union which contains pointers to cells in 8 directions, which represent how some object can move between cells. Sometimes it is convenient to write algorithms which work with indexes of arrays, however that is not convenient for an end user who would prefer to work with named identifiers rather than less obvious indices.)
Example:
union vector
{
double x;
double y;
double data[2];
}
I believe that x and y "are the same thing", so really one would have to:
struct v
{
double x, y;
}
union vector
{
v data_v_format;
double data_arr_format[2];
}
Which you then use:
vector v1;
v1.data_arr_format[0] = v1.data_v_format.y; // copy y component to x
Unfortunately this adds an ugly layer of syntax to the union. Is there any way to accomplish the original task as specified by the syntax:
union vector
{
double x;
double y;
double data[2];
}
Where x is equivalent to data[0] and y is equivalent to data[1]?
I could write a class to do this, where the "logically named identifiers become functions, returning a single component of the array" - but is there a better way?

Anyway, even if you will find a way, reading from inactive union field, i.e. reading not from the last one being written into, is UB. This actually means that often seen example of converting IP between 4 octets and int using union is illegal.
You can use accessors:
struct vec
{
double data[2];
double& x() {return data[0];}
double& y() {return data[1];}
};
Alternatively you can look into property implementation in C++. It would create a proxy object, accesses to which will be redirected to specific array elements.
Yet another way is to use references, but this will increase size of your struct (+pointer size per reference):
struct vec
{
double data[2];
double& x = data[0];
double& y = data[1];
};

Although not allowed in (standard) C++, in C (since C11) you can use an anonymous struct:
// not standard C++
union vector {
struct {
double x;
double y;
};
double arr[2];
};
Anonymous structs are also supported by some C++ compilers (including GNU, MSVC and Clang) as an extension to the language. In standard C++, you'll need to settle for unnamed struct:
union vector {
struct {
double x;
double y;
} data;
double arr[2];
};
This is essentially the same as your example, so you need the ugly layer of syntax v.data.x and so on. This is just simpler since you don't need to name the inner struct; You only need to name the member that is an instance of the struct.
A word about your comment:
v1.data_arr_format[0] = v1.data_v_format.y; // copy y component to x
You comment that you copy y to x. Do realize that reading v1.x after writing to v1.data_arr_format has technically undefined behaviour.
I give you that the struct probably has no padding at all since double probably doesn't have higher alignment requirement that it's size and therefore probably has same representation as the array. So on most implementations, this type punning would probably work as intended, even if that's not guaranteed by the standard.

Is it legal to use address of one field of a union to access another field?

Consider following code:
union U
{
int a;
float b;
};
int main()
{
U u;
int *p = &u.a;
*(float *)p = 1.0f; // <-- this line
}
We all know that addresses of union fields are usually same, but I'm not sure is it well-defined behavior to do something like this.
So, question is: Is it legal and well-defined behavior to cast and dereference a pointer to union field like in the code above?
P.S. I know that it's more C than C++, but I'm trying to understand if it's legal in C++, not C.

All members of a union must reside at the same address, that is guaranteed by the standard. What you are doing is indeed well-defined behavior, but it shall be noted that you cannot read from an inactive member of a union using the same approach.
Accessing inactive union member - undefined behavior?
Note: Do not use c-style casts, prefer reinterpret_cast in this case.
As long as all you do is write to the other data-member of the union, the behavior is well-defined; but as stated this changes which is considered to be the active member of the union; meaning that you can later only read from that you just wrote to.
union U {
int a;
float b;
};
int main () {
U u;
int *p = &u.a;
reinterpret_cast<float*> (p) = 1.0f; // ok, well-defined
}
Note: There is an exception to the above rule when it comes to layout-compatible types.
The question can be rephrased into the following snippet which is semantically equivalent to a boiled down version of the "problem".
#include <type_traits>
#include <algorithm>
#include <cassert>
int main () {
using union_storage_t = std::aligned_storage<
std::max ( sizeof(int), sizeof(float)),
std::max (alignof(int), alignof(float))
>::type;
union_storage_t u;
int * p1 = reinterpret_cast< int*> (&u);
float * p2 = reinterpret_cast<float*> (p1);
float * p3 = reinterpret_cast<float*> (&u);
assert (p2 == p3); // will never fire
}
What does the Standard (n3797) say?
9.5/1 Unions [class.union]
In a union, at most one of the non-static data members can be
active at any time, that is, the value of at most one of the
non-static dat amembers ca nbe stored in a union at any time.
[...] The size of a union is sufficient to contain the largest of
its non-static data members. Each non-static data member is
allocated as if it were the sole member of a struct. All non-static data members of a union object have the same address.
Note: The wording in C++11 (n3337) was underspecified, even though the intent has always been that of C++14.

Yes, it is legal. Using explicit casts, you can do almost anything.
As other comments have stated, all members in a union start at the same address / location so casting a pointer to a different member is pointless.
The assembly language will be the same. You want to make the code easy to read so I don't recommend the practice. It is confusing and there is no benefit.
Also, I recommend a "type" field so that you know when the data is in float format versus int format.

What are 'partially overlapping objects'?

I was just going through all the possible Undefined Behaviours in this thread, and one of them is
The result of assigning to partially overlapping objects
I wondered if anyone could give me a definition of what "partially overlapping objects" are and an example in code of how that could possibly be created?

As pointed out in other answers, a union is the most obvious way to arrange this.
This is an even clearer example of how partially overlapping objects might arise with the built in assignment operator. This example would not otherwise exhibit UB if it were not for the partially overlapping object restrictions.
union Y {
int n;
short s;
};
void test() {
Y y;
y.s = 3; // s is the active member of the union
y.n = y.s; // Although it is valid to read .s and then write to .x
// changing the active member of the union, .n and .s are
// not of the same type and partially overlap
}
You can get potential partial overlap even with objects of the same type. Consider this example in the case where short is strictly larger than char on an implementation that adds no padding to X.
struct X {
char c;
short n;
};
union Y {
X x;
short s;
};
void test() {
Y y;
y.s = 3; // s is the active member of the union
y.x.n = y.s; // Although it is valid to read .s and then write to .x
// changing the active member of the union, it may be
// that .s and .x.n partially overlap, hence UB.
}

A union is a good example for that.
You can create a structure of memory with overlapping members.
for example (from MSDN):
union DATATYPE // Declare union type
{
char ch;
int i;
long l;
float f;
double d;
} var1;
now if you use assign the char member all other member are undefined. That's because they are at the same memory block, and you've only set an actual value to a part of it:
DATATYPE blah;
blah.ch = 4;
If you will then try to access blah.i or blah.d or blah.f they will have an undefined value. (because only first byte, which is a char, had its value set)

This refers to the problem of pointer aliasing, which is forbidden in C++ to give compilers an easier time optimizing. A good explanation of the problem can be found in this thread

May be he mean a strict aliasing rule? Object in memory should not overlap with object of other type.
"Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias each other.)"

The canonical example is using memcpy:
char *s = malloc(100);
int i;
for(i=0; i != 100;++i) s[i] = i; /* just populate it with some data */
char *t = s + 10; /* s and t may overlap since s[10+i] = t[i] */
memcpy(t, s, 20); /* if you are copying at least 10 bytes, there is overlap and the behavior is undefined */
The reason why memcpy is undefined behavior is because there's no required algorithm for performing the copy. In this circumstance, memmove was introduced as a safe alternative.

Casting between unrelated congruent classes

Suppose I have two classes with identical members from two different libraries:
namespace A {
struct Point3D {
float x,y,z;
};
}
namespace B {
struct Point3D {
float x,y,z;
};
}
When I try cross-casting, it worked:
A::Point3D pa = {3,4,5};
B::Point3D* pb = (B::Point3D*)&pa;
cout << pb->x << " " << pb->y << " " << pb->z << endl;
Under which circumstances is this guaranteed to work? Always? Please note that it would be highly undesirable to edit an external library to add an alignment pragma or something like that. I'm using g++ 4.3.2 on Ubuntu 8.10.

If the structs you are using are just data and no inheritance is used I think it should always work.
As long as they are POD it should be ok.
http://en.wikipedia.org/wiki/Plain_old_data_structures
According to the standard(1.8.5)
"Unless it is a bit-ﬁeld (9.6), a most derived object shall have a non-zero size and shall occupy one or more bytes of
storage. Base class subobjects may have zero size. An object of POD5)
type (3.9) shall occupy contiguous bytes of
storage."
If they occupy contiguous bytes of storage and they are the same struct with different name, a cast should succeed

If two POD structs start with the same sequence of members, the standard guarantees that you'll be able to access them freely through a union. You can store an A::Point3D in a union, and then read from the B::Point3D member, as long as you're only touching the members that are part of the initial common sequence. (so if one struct contained int, int, int, float, and the other contained int, int, int, int, you'd only be allowed to access the three first ints).
So that seems like one guaranteed way in which your code should work.
It also means the cast should work, but I'm not sure if this is stated explicitly in the standard.
Of course all this assumes that both structs are compiled with the same compiler to ensure identical ABI.

This line should be :
B::Point3D* pb = (B::Point3D*)&pa;
Note the &. I think what you are doing is a reinterpret_cast between two pointers. In fact you can reinterpret_cast any pointer type to another one, regardless of the type of the two pointers. But this unsafe, and not portable.
For example,
int x = 5;
double* y = reinterpret_cast<double*>(&x);
You are just going with the C-Style, So the second line is actually equal to:
double* z = (double*)&x;
I just hate the C-Style when casting because you can't tell the purpose of the cast from one look :)
Under which circumstances is this
guaranteed to work?
This is not real casting between types. For example,
int i = 5;
float* f = reinterpret_cast<float*>(&i);
Now f points to the same place that i points to. So, no conversion is done. When you dereference f, you will get the a float with the same binary representation of the integer i. It is four bytes on my machine.

The following is pretty safe:
namespace A {
struct Point3D {
float x,y,z;
};
}
namespace B {
typedef A::Point3D Point3D;
}
int main() {
A::Point3D a;
B::Point3D* b = &a;
return 0;
}

I know exactly it wouldn't work:
both struct has differen alignment;
compiled with different RTTI options
may be some else...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ Undefined behaviour with unions - c++

I did not get why you have used float v[2]; The simple union for a point structure can be defined as: union{ struct { float a; float b; }; } Point; You can access the values in unioin as: Point.a = 10.5; point.b = 12.2; //example

Related

Type punning two types from different third party libraries without union

C++ - unions containing arrays

Is it legal to use address of one field of a union to access another field?

What are 'partially overlapping objects'?

Casting between unrelated congruent classes

Categories

Resources