C++ - unions containing arrays

C++ - unions containing arrays - c++

If I have a C++ union which contains an array. I would like to access each element of the array using a set of unique identifiers. (This may seem like a strange thing to want. In my application I have a union which contains pointers to cells in 8 directions, which represent how some object can move between cells. Sometimes it is convenient to write algorithms which work with indexes of arrays, however that is not convenient for an end user who would prefer to work with named identifiers rather than less obvious indices.)
Example:
union vector
{
double x;
double y;
double data[2];
}
I believe that x and y "are the same thing", so really one would have to:
struct v
{
double x, y;
}
union vector
{
v data_v_format;
double data_arr_format[2];
}
Which you then use:
vector v1;
v1.data_arr_format[0] = v1.data_v_format.y; // copy y component to x
Unfortunately this adds an ugly layer of syntax to the union. Is there any way to accomplish the original task as specified by the syntax:
union vector
{
double x;
double y;
double data[2];
}
Where x is equivalent to data[0] and y is equivalent to data[1]?
I could write a class to do this, where the "logically named identifiers become functions, returning a single component of the array" - but is there a better way?

Anyway, even if you will find a way, reading from inactive union field, i.e. reading not from the last one being written into, is UB. This actually means that often seen example of converting IP between 4 octets and int using union is illegal.
You can use accessors:
struct vec
{
double data[2];
double& x() {return data[0];}
double& y() {return data[1];}
};
Alternatively you can look into property implementation in C++. It would create a proxy object, accesses to which will be redirected to specific array elements.
Yet another way is to use references, but this will increase size of your struct (+pointer size per reference):
struct vec
{
double data[2];
double& x = data[0];
double& y = data[1];
};

Although not allowed in (standard) C++, in C (since C11) you can use an anonymous struct:
// not standard C++
union vector {
struct {
double x;
double y;
};
double arr[2];
};
Anonymous structs are also supported by some C++ compilers (including GNU, MSVC and Clang) as an extension to the language. In standard C++, you'll need to settle for unnamed struct:
union vector {
struct {
double x;
double y;
} data;
double arr[2];
};
This is essentially the same as your example, so you need the ugly layer of syntax v.data.x and so on. This is just simpler since you don't need to name the inner struct; You only need to name the member that is an instance of the struct.
A word about your comment:
v1.data_arr_format[0] = v1.data_v_format.y; // copy y component to x
You comment that you copy y to x. Do realize that reading v1.x after writing to v1.data_arr_format has technically undefined behaviour.
I give you that the struct probably has no padding at all since double probably doesn't have higher alignment requirement that it's size and therefore probably has same representation as the array. So on most implementations, this type punning would probably work as intended, even if that's not guaranteed by the standard.

Related

Type punning two types from different third party libraries without union

I've read that using unions for type punning is actually undefined behavior in C++ and I was wondering how would you type pun instead?
As an example I used a union to type pun two types from two third party libraries with identical layout like this (libAQuaternion and libBQuaternion are the types of those third party libraries which I can't change):
struct libAQuaternion {
double x, y, z, w;
};
void libAFunc(libAQuaternion &p) {
p.x = p.y = p.z = p.w = 1.;
}
struct libBQuaternion {
double x, y, z, w;
};
void libBFunc(libBQuaternion &p) {
p.x = p.y = p.z = p.w = 2.;
}
union myQuat {
libAQuaternion a;
libBQuaternion b;
};
int main() {
myQuat q;
libAFunc(q.a);
libBFunc(q.b);
}
What would be the standard conforming best solution to this?

What be the standard conforming best solution to this?
Write a function to convert from one quaternion to the other.
libBQuaternion convert(const libAQuaternion &Quat) {
return{Quat.x, Quat.y, Quat.z, Quat.w};
}
libAQuaternion convert(const libBQuaternion &Quat) {
return{Quat.x, Quat.y, Quat.z, Quat.w};
}
// or template if you want to
template<typename T, typename U>
T convertTo(U &&Quat) {
return{Quat.x, Quat.y, Quat.z, Quat.w};
}
Any optimizer should be able to optimize this away completely, so there should be no performance penalty.
But this will be a problem if there is a function taking one such class by lvalue ref. You would need to create a new object of the appropriate class, pass that and then reassign the correct values to the original struct. I guess you could make a function for this, but IMO the cleanest way would be to change the signature of the function to take the individual values by lvalue ref, but that is not always possible.
There is just no way of doing type punning in C++ without invoking UB.

C++ doesn't allow type punning. Most of the time.
What you wrote is perfectly legal, but there is one potential hazard.
The two quaternions are standard layout classes and their common initial sequence is their entirety. It is thus legal to read the member of the other through a union
myQuat q = libAQuaternion{1, 0, 0, 0};
std::cout << q.b.x; // legal
We then note that the quaternions can only be written to by either a builtin/trivial assignment or by placement new.
Using a builtin/trivial assignment on an inactive member (or its member, recursively), causes the implicit beginning of the inactive member's lifetime.
Using a placement new will also begin the member's lifetime.
Thus upon writing to a quaternion, either its lifetime already begun, or it will begin.
Reading and writing is together called accessing, and the strict-aliasing rule forbids accessing something with another type, which is what forbids type-punning. But since we just proved that accessing either quaternions is well-defined, this is the one exception where type-punning is indeed legal.
The one hazard is when the library function writes partially to the quaternion
void someFunc(libBQuaternion& q)
{
q.x = 1;
}
myQuat q = libAQuaternion{1, 0, 0, 0};
someFunc(q.b);
std::cout << q.a.y; // UB
Unfortunately, q.a.y is uninitialized and therefore reading it is undefined behaviour.
However, given all the previous rules and the reason behind having uninitialized variables is efficiency, it is quite unlikely compilers will take advantage of the UB and "miscompile".

template<class D, class S>
D bitcpy( S const* s ){
static_assert( sizeof(S)>=sizeof(D) );
D r;
memcpy( &r, s, sizeof(D) );
return r;
}
union myQuat {
libAQuaternion a;
libBQuaternion b;
void AsA(){
a = bitcpy<libAQuaternion>(&b);
}
void AsB(){
b = bitcpy<libBQuaternion>(&a);
}
};
You have to keep track of if there is an A or a B in the union. To switch call AsA or AsB; this is a noop at runtime.
I believe the union rules allow activating members via assignment; if I misunderstand the standard or the situation (these are pod types right?) things get a bit trickier. But I think that doesn't apply here.

The safest way to type punning is to use memcpy from type A to type B. On many platforms memcpy is an intrinsic meaning it is implemented by the compiler and therefore may well be optimized away.

Struct with one member or just typedef

While im coding I declare structs or classes because they are based on real world objects/ideas/concepts.
But often those structs/classes only have one single member. So I was wondering if it makes any difference, if I simpy make a typedef.
And then I'm not sure if that's correct, because typedefs are not 'objects' in my opinion.
So should I do:
struct Y { int x; }
or just:
typedef int Y;
Does it make any difference? Is my image of structs being objects and typedefs being something else correct?

There's a huge difference. With the typedef all the following are valid:
Y y = 14.3;
y += 7;
y = 1 + y << 3;
std::cout << y;
double d = y;
With the struct none are, unless you choose to expose those operations. The question is, do you have something that is an int, with no restrictions, or an abstraction that is in some way based on an integer value, and has its own constraints or invariant?

There are several aspects you should take into consideration:
Primitive types operations
typedef, unlike struct/class, is just another name for the type, an alias.
Therfore, if you choose to use a typedef for a primitive, you'll be able to perform all the primitive types operations (like addition, subtraction and so on):
Y y = 4;
y+=5;
printf("Y is %d", y);
On the opposite, if you choose to use struct/class you will lose all this functionality and you'll have to implement these operators yourself.
Maintainability and extensibility
Primitive types, unlike structs can neither be complemented nor inherited.
Therefore, if you plan to inherit from your type, or complement it with additional fields, you should prefer class over typedef.
class X : public Y { int x; }
Using typeid
As I mentioned, typedef is an alias for the type.
Therefore, if you want your type to get a different id than the primitive type, you definitely should choose struct/class:
typedef int Y;
typeid(Y)==typeid(int)
In this case the expression will be true.
struct Y { int x; }
typeid(Y)==typeid(int)
In this case the expression will be false.

In most cases, it does not really matter one way or the other. Where it does matter is if you need to overload a function based on Y parameters, or if you want to add custom methods to (or override operators on) Y. A typedef is just an alias, so typedef int Y is just a normal int as far as the compiler is concerned, whereas struct Y is its own distinct type.

C++ Undefined behaviour with unions

Was just reading about some anonymous structures and how it is isn't standard and some general use case for it is undefined behaviour...
This is the basic case:
struct Point {
union {
struct {
float x, y;
};
float v[2];
};
};
So writing to x and then reading from v[0] would be undefined in that you would expect them to be the same but it may not be so.
Not sure if this is in the standard but unions of the same type...
union{ float a; float b; };
Is it undefined to write to a and then read from b ?
That is to say does the standard say anything about binary representation of arrays and sequential variables of the same type.

The standard says that reading from any element in a union other
than the last one written is undefined behavior. In theory, the
compiler could generate code which somehow kept track of the
reads and writes, and triggered a signal if you violated the
rule (even if the two are the same type). A compiler could also
use the fact for some sort of optimization: if you write to a
(or x), it can assume that you do not read b (or v[0])
when optimizing.
In practice, every compiler I know supports this, if the union
is clearly visible, and there are cases in many (most?, all?)
where even legal use will fail if the union is not visible
(e.g.:
union U { int i; float f; };
int f( int* pi, int* pf ) { int r = *pi; *pf = 3.14159; return r; }
// ...
U u;
u.i = 1;
std::cout << f( &u.i, &u.f );
I've actually seen this fail with g++, although according to the
standard, it is perfectly legal.)
Also, even if the compiler supports writing to Point::x and
reading from Point::v[0], there's no guarantee that Point::y
and Point::v[1] even have the same physical address.

The standard requires that in a union "[e]ach data member is allocated as if it were the sole member of a struct." (9.5)
It also requires that struct { float x, y; } and float v[2] must have the same internal representation (9.2) and thus you could safely reinterpret cast one as the other
Taken together these two rules guarantee that the union you describe will function provided that it is genuinely written to memory. However, because the standard only requires that the last data member written be valid it's theoretically possible to have an implementation that fails if the union is only used as a local variable. I'd be amazed if that ever actually happens, however.

I did not get why you have used float v[2];
The simple union for a point structure can be defined as:
union{
struct {
float a;
float b;
};
} Point;
You can access the values in unioin as:
Point.a = 10.5;
point.b = 12.2; //example

C++ member layout

Let's we have a simple structure (POD).
struct xyz
{
float x, y, z;
};
May I assume that following code is OK? May I assume there is no any gaps? What the standard says? Is it true for PODs? Is it true for classes?
xyz v;
float* p = &v.x;
p[0] = 1.0f;
p[1] = 2.0f; // Is it ok?
p[2] = 3.0f; // Is it ok?

The answer here is a bit tricky. The C++ standard says that POD data types will have C layout compatability guarantees (Reference). According to section 9.2 of the C spec the members of a struct will be laid out in sequential order if
There is no accessibility modifier difference
No alignment issues with the data type
So yes this solution will work as long as the type float has a compatible alignment on the current platform (it's the platform word size). So this should work for 32 bit processors but my guess is that it would fail for 64 bit ones. Essentially anywhere that sizeof(void*) is different than sizeof(float)

This is not guaranteed by the standard, and will not work on many systems. The reasons are:
The compiler may align struct members as appropriate for the target platform, which may mean 32-bit alignment, 64-bit alignment, or anything else.
The size of the float might be 32 bits, or 64 bits. There's no guarantee that it's the same as the struct member alignment.
This means that p[1] might be at the same location as xyz.y, or it might overlap partially, or not at all.

No, it is not OK to do so except for the first field.
From the C++ standards:
9.2 Class members
A pointer to a POD-struct object,
suitably converted using a
reinterpret_cast, points to its
initial member (or if that member is a
bit-field, then to the unit in which
it resides) and vice versa. [Note:
There might therefore be unnamed
padding within a POD-struct object,
but not at its beginning, as necessary
to achieve appropriate alignment.

Depends on the hardware. The standard explicitly allows POD classes to have unspecified and unpredictable padding. I noted this on the C++ Wikipedia page and grabbed the footnote with the spec reference for you.
^ a b ISO/IEC (2003). ISO/IEC 14882:2003(E): Programming Languages - C++ §9.2 Class members [class.mem] para. 17
In practical terms, however, on common hardware and compilers it will be fine.

When in doubt, change the data structure to suit the application:
struct xyz
{
float p[3];
};
For readability you may want to consider:
struct xyz
{
enum { x_index = 0, y_index, z_index, MAX_FLOATS};
float p[MAX_FLOATS];
float X(void) const {return p[x_index];}
float X(const float& new_x) {p[x_index] = new_x;}
float Y(void) const {return p[y_index];}
float Y(const float& new_y) {p[y_index] = new_y;}
float Z(void) const {return p[z_index];}
float Z(const float& new_z) {p[z_index] = new_z;}
};
Perhaps even add some more encapsulation:
struct Functor
{
virtual void operator()(const float& f) = 0;
};
struct xyz
{
void for_each(Functor& ftor)
{
ftor(p[0]);
ftor(p[1]);
ftor(p[2]);
return;
}
private:
float p[3];
}
In general, if a data structure needs to be treated in two or more different ways, perhaps the data structure needs to be redesigned; or the code.

The standard requires that the order of arrangement in memory match the order of definition, but allows arbitrary padding between them. If you have an access specifier (public:, private: or protected:) between members, even the guarantee about order is lost.
Edit: in the specific case of all three members being of the same primitive type (i.e. not themselves structs or anything like that) you stand a pretty fair chance -- for primitive types, the object's size and alignment requirements are often the same, so it works out.
OTOH, this is only by accident, and tends to be more of a weakness than a strength; the code is wrong, so ideally it would fail immediately instead of appearing to work, right up to the day that you're giving a demo for the owner of the company that's going to be your most important customer, at which time it will (of course) fail in the most heinous possible fashion...

No, you may not assume that there are no gaps. You may check for you architecture, and if there aren't and you don't care about portability, it will be OK.
But imagine a 64-bit architecture with 32-bit floats. The compiler may align the struct's floats on 64-bit boundaries, and your
p[1]
will give you junk, and
p[2]
will give you what you think your getting from
p[1]
&c.
However, you compiler may give you some way to pack the structure. It still wouldn't be "standard"---the standard provides no such thing, and different compilers provide very incompatible ways of doing this--- but it is likely to be more portable.

Lets take a look at Doom III source code:
class idVec4 {
public:
float x;
float y;
float z;
float w;
...
const float * ToFloatPtr( void ) const;
float * ToFloatPtr( void );
...
}
ID_INLINE const float *idVec4::ToFloatPtr( void ) const {
return &x;
}
ID_INLINE float *idVec4::ToFloatPtr( void ) {
return &x;
}
It works on many systems.

Your code is OK (so long as it only ever handles data generated in the same environment). The structure will be laid out in memory as declared if it is POD. However, in general, there is a gotcha you need to be aware of: the compiler will insert padding into the structure to ensure each member's alignment requirements are obeyed.
Had your example been
struct xyz
{
float x;
bool y;
float z;
};
then z would have began 8 bytes into the structure and sizeof(xyz) would have been 12 as floats are (usually) 4 byte aligned.
Similarly, in the case
struct xyz
{
float x;
bool y;
};
sizeof(xyz) == 8, to ensure ((xyz*)ptr)+1 returns a pointer that obeys x's alignment requirements.
Since alignment requirements / type sizes may vary between compilers / platforms, such code is not in general portable.

As others have pointed out the alignment is not guaranteed by the spec. Many say it is hardware dependent, but actually it is also compiler dependent. Hardware may support many different formats. I remember that the PPC compiler support pragmas for how to "pack" the data. You could pack it on 'native' boundaries or force it to 32 bit boundaries, etc.
It would be nice to understand what you are trying to do. If you are trying to 'parse' input data, you are better off with a real parser. If you are going to serialize, then write a real serializer. If you are trying to twiddle bits such as for a driver, then the device spec should give you a specific memory map to write to. Then you can write your POD structure, specify the correct alignment pragmas (if supported) and move on.

structure packing (eg #pragma pack in MSVC) http://msdn.microsoft.com/en-us/library/aa273913%28v=vs.60%29.aspx
variable alignment
(eg __declspec(align( in MSVC) http://msdn.microsoft.com/en-us/library/83ythb65.aspx
are two factors that can wreck your assumptions. floats are usually 4 bytes wide, so it's rare to misalign such large variables. But it's still easy to break your code.
This issue is most visible when binary reading header struct with shorts (like BMP or TGA) - forgetting pack 1 causes a disaster.

I assume you want a struct to keep your coordinates accessed as members (.x, .y and .z) but you still want them to be accessed, let's say, an OpenGL way (as if it was an array).
You can try implementing the [] operator of the struct so it can be accessed as an array. Something like:
struct xyz
{
float x, y, z;
float& operator[] (unsigned int i)
{
switch (i)
{
case 0:
return x;
break;
case 1:
return y;
break;
case 2:
return z;
break;
default:
throw std::exception
break;
}
}
};

Casting between unrelated congruent classes

Suppose I have two classes with identical members from two different libraries:
namespace A {
struct Point3D {
float x,y,z;
};
}
namespace B {
struct Point3D {
float x,y,z;
};
}
When I try cross-casting, it worked:
A::Point3D pa = {3,4,5};
B::Point3D* pb = (B::Point3D*)&pa;
cout << pb->x << " " << pb->y << " " << pb->z << endl;
Under which circumstances is this guaranteed to work? Always? Please note that it would be highly undesirable to edit an external library to add an alignment pragma or something like that. I'm using g++ 4.3.2 on Ubuntu 8.10.

If the structs you are using are just data and no inheritance is used I think it should always work.
As long as they are POD it should be ok.
http://en.wikipedia.org/wiki/Plain_old_data_structures
According to the standard(1.8.5)
"Unless it is a bit-ﬁeld (9.6), a most derived object shall have a non-zero size and shall occupy one or more bytes of
storage. Base class subobjects may have zero size. An object of POD5)
type (3.9) shall occupy contiguous bytes of
storage."
If they occupy contiguous bytes of storage and they are the same struct with different name, a cast should succeed

If two POD structs start with the same sequence of members, the standard guarantees that you'll be able to access them freely through a union. You can store an A::Point3D in a union, and then read from the B::Point3D member, as long as you're only touching the members that are part of the initial common sequence. (so if one struct contained int, int, int, float, and the other contained int, int, int, int, you'd only be allowed to access the three first ints).
So that seems like one guaranteed way in which your code should work.
It also means the cast should work, but I'm not sure if this is stated explicitly in the standard.
Of course all this assumes that both structs are compiled with the same compiler to ensure identical ABI.

This line should be :
B::Point3D* pb = (B::Point3D*)&pa;
Note the &. I think what you are doing is a reinterpret_cast between two pointers. In fact you can reinterpret_cast any pointer type to another one, regardless of the type of the two pointers. But this unsafe, and not portable.
For example,
int x = 5;
double* y = reinterpret_cast<double*>(&x);
You are just going with the C-Style, So the second line is actually equal to:
double* z = (double*)&x;
I just hate the C-Style when casting because you can't tell the purpose of the cast from one look :)
Under which circumstances is this
guaranteed to work?
This is not real casting between types. For example,
int i = 5;
float* f = reinterpret_cast<float*>(&i);
Now f points to the same place that i points to. So, no conversion is done. When you dereference f, you will get the a float with the same binary representation of the integer i. It is four bytes on my machine.

The following is pretty safe:
namespace A {
struct Point3D {
float x,y,z;
};
}
namespace B {
typedef A::Point3D Point3D;
}
int main() {
A::Point3D a;
B::Point3D* b = &a;
return 0;
}

I know exactly it wouldn't work:
both struct has differen alignment;
compiled with different RTTI options
may be some else...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ - unions containing arrays - c++

Related

Type punning two types from different third party libraries without union

Struct with one member or just typedef

C++ Undefined behaviour with unions

C++ member layout

Casting between unrelated congruent classes

Categories

Resources