Union correct usage - c++

My understanding of a union is all its values are allocated in the same memory address and the memory space is as large as the largest member of the union. But I don't understand how we would actually use them.
This is a code where using a union is preferable according to The C++ Programming Language.
enum Type { str, num };
struct Entry {
char* name;
Type t;
char* s; // use s if t==str
int i; // use i if t==num
};
void f(Entry* p)
{
if (p->t == str)
cout << p->s;
// ...
}
After this Bjarne says:
The members s and i can never be used at the same time, so space is wasted. It can be easily recovered by specifying that both should be members of a union, like this:
union Value {
char* s;
int i;
};
The language doesn’t keep track of which kind of value is held by a union, so the programmer must do that:
struct Entry {
char* name;
Type t;
Value v; // use v.s if t==str; use v.i if t==num
};
void f(Entry* p)
{
if (p->t == str)
cout v.s;
// ...
}
Can anyone explain the resulting union code further? what will actually happen if we transform this into a union?

Let's say you have a 32-bit machine, with 32-bit integers and pointers. Your struct might then look like this:
[0-3] name
[4-7] type
[8-11] string
[12-15] integer
That's 16 bytes, but since type (t in your code) determines which field is valid, we never need to actually store the string and integer fields at the same time. So we can change the code:
struct Entry {
char* name;
Type t;
union {
char* s; // use s if t==str
int i; // use i if t==num
} u;
};
Now the layout is:
[0-3] name
[4-7] type
[8-11] string
[8-11] integer
In C++, whatever you assigned to most recently is the "valid" member of the union, but there is no way to know which one that is intrinsically, so you must store it yourself. This technique is often called a "discriminated union", the "discriminator" being the type field.
So the second struct takes 12 bytes instead of 16. If you're storing lots of them, or if they come from a network or disk, you might care about this. Otherwise, it's not really important.

For below union,
union mix_types {
int l;
struct {
short hi;
short lo;
} s;
char c[4];
} mix;
memory structure would be like :-

Another way you can use unions is to access the same data using different types. An example is the DirectX matrix structure,
typedef struct _D3DMATRIX {
union {
struct {
float _11, _12, _13, _14;
float _21, _22, _23, _24;
float _31, _32, _33, _34;
float _41, _42, _43, _44;
};
float m[4][4];
};
} D3DMATRIX;
Now you can do,
D3DMATRIX d;
d._11 = 20;
// Now the value of m[0][0] is 20
assert(d._11 == m[0][0]);

Related

How to initialize struct with an array?

Is there a better way of initializing a struct with an array than doing the following?
struct Parameters
{
double distance;
double radius;
double strength;
long distanceX;
long distanceY;
long clickX;
long clickY;
};
void calculate(double dParameters[], long lParameters[])
{
Parameters param =
{
dParameters[0],
dParameters[1],
dParameters[2],
lParameters[0],
lParameters[1],
lParameters[2],
lParameters[3]
};
}
I thought of assigning pointers:
void calculate(double dParameters[], long lParameters[])
{
Parameters param;
(double*)(&param.distance) = &dParameters[0];
(long*)(&param.distanceX) = &lParameters[0];
}
But I am not sure if it is valid in c++.
If you know the layout of the struct, and have carefully chosen to put all members of like type in order without anything between them, then you could use memcpy().
memcpy(&param.distance, dParameters, sizeof(*dParameters) * 3);
memcpy(&param.distanceX, lParameters, sizeof(*lParameters) * 4);
This is rather fragile code, as distance must be the first double parameter of exactly four double parameters in a row, or you'll get corrupted data, and nothing will verify this at compile time.
It could be improved with offsetof to get and/or verify the length. Such as:
void calculate(double dParameters[], size_t n_dParameters, long lParameters[], size_t n_lParameters)
{
Parameters param;
assert(offsetof(Parameters, strength) - offsetof(Parameters, distance) == sizeof(*dParameters) * n_dParameters);
memcpy(&param.distance, dParameters, offsetof(Parameters, strength) - offsetof(Parameters, distance));
assert(offsetof(Parameters, clickY) - offsetof(Parameters, distanceX) == sizeof(*lParameters) * n_lParameters);
memcpy(&param.distanceX, dParameters, offsetof(Parameters, clickY) - offsetof(Parameters, distanceX));
}
Historically, gcc has not been great at optimizing struct initialization, such as using the equivalent of memcpy() or memset() when it would be possible and beneficial. If your struct had a hundred fields, this might actually be useful.
Another technique would be use to a union to define both an array version and an individual field version of your struct.
struct ParametersArrays {
double doubles[3];
long longs[4];
};
union ParametersUnion {
struct Parameters params;
struct ParametersArrays arrays;
};
ParametersUnion u;
memcpy(u.arrays.doubles, dParameters, sizeof(u.arrays.doubles));
memcpy(u.arrays.longs, lParameters, sizeof(u.arrays.longs));
Parameters& p = u.params; // Now you can use p
Note that using more than one member of a union like this is not strictly legal in C++, but it is in C, and most/all C++ compilers will compile it as expected.
Your second example is illegal, but chances are that the optimizer (knowing the actual layout) does implement it like that.

What is the use case for an anonymous union type

A saw a question with the following code :
union
{
float dollars;
int yens;
}price;
price is a variable whose type does not have a name.
What is such an unnamed type useful for? Lambda expressions?
Is this valid in both C and C++?
The fact that the type does not have a name has very little effect on the use of the price variable. All it means is that you cannot (easily) create another object of this type.
This construct makes the most sense if price is a local variable inside a function. If you only ever want one object of this type, you don't need to name the type, so why bother. It doesn't differ at all from:
union SomeNameIPromiseNotToUseAnywhereAndWhichDoesntConflictWithAnything
{
float dollars;
int yens;
} price;
Notice that in C++11 and beyond, you can actually create another object:
decltype(price) anotherPrice;
In C++, it is valid. The code defines a local variable called price, which can either store an integer value in yens or a float value in dollars.
Without seeing how it is used, I can only conclude that the variable is a local/temporary variable (and probably, in a function that attempts to do too much).
Example:
union
{
float dollars;
int yens;
} price;
if(currency != "USD")
price.yens = ConvertToYEN(fullPrice);
else
price.dollars = GetUpdatedPriceInUSD(abc, currency);
if(currency == "YEN")
std::cout << "Using price in yens: " << price.yens << "\n";
I have used unions in the past as mechanisms for handling storage formats and translating between them.
For example, it could be that the program includes code for storing amounts in a a file in float format and that the storage function accepts/returns a float. Later, it is discovered we need to use an integer, so we simply use the union so access the data in the format we know it to be. For example:
price.dollars = load_from_file();
if (yen_flag)
// use price.yen
else
// use price.dollars
It is also commonly used for implementation independent storage of ints.
union {
int int_val;
char as_bytes[4];
} int_store;
Sorry if there are any syntax errors, it's been a while ...
In C according to this link
https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html#Unnamed-Fields
You can just access the the member of the union like price.dollars and price.yens because price is already an variable of type union and there is no need to create a new object of same type.
union
{
float dollars;
int yens;
}price;
int main(void) {
price.dollars = 90.5;
printf("%f\n",price.dollars);
price.yens = 20;
printf("%d\n",price.yens);
return 0;
}
I saw code like this in alot of mathematical libraries for 2D/3D calculation:
struct Matrix3x3
{
union
{
struct
{
float m00 , m01 , m02;
float m10 , m11 , m12;
float m20 , m21 , m22;
};
float m[ 3 ][ 3 ];
};
};
see also this Q
iirc, I read somewhere that using such methods leads to violating the strict aliasing rule
struct FooBar
{
union
{
Foo foo;
char dummy[128];
};
Bar bar;
};
I've seen people use nameless unions to control the offset of struct members.
There's no trivial way to align a struct member to a certain boundary, but there are ways to align the beginning of the struct to arbitrary boundaries, then pad the member to the next boundary.

C++ understanding Unions and Structs

I've come to work on an ongoing project where some unions are defined as follows:
/* header.h */
typedef union my_union_t {
float data[4];
struct {
float varA;
float varB;
float varC;
float varD;
};
} my_union;
If I understand well, unions are for saving space, so sizeof(my_union_t) = MAX of the variables in it. What are the advantages of using the statement above instead of this one:
typedef struct my_struct {
float varA;
float varB;
float varC;
float varD;
};
Won't be the space allocated for both of them the same?
And how can I initialize varA,varB... from my_union?
Unions are often used when implementing a variant like object (a type field and a union of data types), or in implementing serialisation.
The way you are using a union is a recipe for disaster.
You are assuming the the struct in the union is packing the floats with no gaps between then!
The standard guarantees that float data[4]; is contiguous, but not the structure elements. The only other thing you know is that the address of varA; is the same as the address of data[0].
Never use a union in this way.
As for your question: "And how can I initialize varA,varB... from my_union?". The answer is, access the structure members in the normal long-winded way not via the data[] array.
Union are not mostly for saving space, but to implement sum types (for that, you'll put the union in some struct or class having also a discriminating field which would keep the run-time tag). Also, I suggest you to use a recent standard of C++, at least C++11 since it has better support of unions (e.g. permits more easily union of objects and their construction or initialization).
The advantage of using your union is to be able to index the n-th floating point (with 0 <= n <= 3) as u.data[n]
To assign a union field in some variable declared my_union u; just code e.g. u.varB = 3.14; which in your case has the same effect as u.data[1] = 3.14;
A good example of well deserved union is a mutable object which can hold either an int or a string (you could not use derived classes in that case):
class IntOrString {
bool isint;
union {
int num; // when isint is true
str::string str; // when isint is false
};
public:
IntOrString(int n=0) : isint(true), num(n) {};
IntOrString(std::string s) : isint(false), str(s) {};
IntOrString(const IntOrString& o): isint(o.isint)
{ if (isint) num = o.num; else str = o.str); };
IntOrString(IntOrString&&p) : isint(p.isint)
{ if (isint) num = std::move (p.num);
else str = std::move (p.str); };
~IntOrString() { if (isint) num=0; else str->~std::string(); };
void set (int n)
{ if (!isint) str->~std::string(); isint=true; num=n; };
void set (std::string s) { str = s; isint=false; };
bool is_int() const { return isint; };
int as_int() const { return (isint?num:0; };
const std::string as_string() const { return (isint?"":str;};
};
Notice the explicit calls of destructor of str field. Notice also that you can safely use IntOrString in a standard container (std::vector<IntOrString>)
See also std::optional in future versions of C++ (which conceptually is a tagged union with void)
BTW, in Ocaml, you simply code:
type intorstring = Integer of int | String of string;;
and you'll use pattern matching. If you wanted to make that mutable, you'll need to make a record or a reference of it.
You'll better use union-s in a C++ idiomatic way (see this for general advices).
I think the best way to understand unions is to just to give 2 common practical examples.
The first example is working with images. Imagine you have and RGB image that is arranged in a long buffer.
What most people would do, is represent the buffer as a char* and then loop it by 3's to get the R,G,B.
What you could do instead, is make a little union, and use that to loop over the image buffer:
union RGB
{
char raw[3];
struct
{
char R;
char G;
char B;
} colors;
}
RGB* pixel = buffer[0];
///pixel.colors.R == The red color in the first pixel.
Another very useful use for unions is using registers and bitfields.
Lets say you have a 32 bit value, that represents some HW register, or something.
Sometimes, to save space, you can split the 32 bits into bit fields, but you also want the whole representation of that register as a 32 bit type.
This obviously saves bit shift calculation that a lot of programmers use for no reason at all.
union MySpecialRegister
{
uint32_t register;
struct
{
unsigned int firstField : 5;
unsigned int somethingInTheMiddle : 25;
unsigned int lastField : 6;
} data;
}
// Now you can read the raw register into the register field
// then you can read the fields using the inner data struct
The advantage is that with a union you can access the same memory in two different ways.
In your example the union contains four floats. You can access those floats as varA, varB... which might be more descriptive names or you can access the same variables as an array data[0], data[1]... which might be more useful in loops.
With a union you can also use the same memory for different kinds of data, you might find that useful for things like writing a function to tell you if you are on a big endian or little endian CPU.
No, it is not for saving space. It is for ability to represent some binary data as various data types.
for example
#include <iostream>
#include <stdint.h>
union Foo{
int x;
struct y
{
unsigned char b0, b1, b2, b3;
};
char z[sizeof(int)];
};
int main()
{
Foo bar;
bar.x = 100;
std::cout << std::hex; // to show number in hexadec repr;
for(size_t i = 0; i < sizeof(int); i++)
{
std::cout << "0x" << (int)bar.z[i] << " "; // int is just to show values as numbers, not a characters
}
return 0;
}
output: 0x64 0x0 0x0 0x0 The same values are stored in struct bar.y, but not in array but in sturcture members. Its because my machine have a little endiannes. If it were big, than the output would be reversed: 0x0 0x0 0x0 0x64
You can achieve the same using reinterpret_cast:
#include <iostream>
#include <stdint.h>
int main()
{
int x = 100;
char * xBytes = reinterpret_cast<char*>(&x);
std::cout << std::hex; // to show number in hexadec repr;
for (size_t i = 0; i < sizeof(int); i++)
{
std::cout << "0x" << (int)xBytes[i] << " "; // (int) is just to show values as numbers, not a characters
}
return 0;
}
its usefull, for example, when you need to read some binary file, that was written on a machine with different endianess than yours. You can just access values as bytearray and swap those bytes as you wish.
Also, it is usefull when you have to deal with bit fields, but its a whole different story :)
First of all: Avoid unions where the access goes to the same memory but to different types!
Unions did not save space at all. The only define multiple names on the same memory area! And you can only store one of the elements in one time in a union.
if you have
union X
{
int x;
char y[4];
};
you can store an int OR 4 chars but not both! The general problem is, that nobody knows which data is actually stored in a union. If you store a int and read the chars, the compiler will not check that and also there is no runtime check. A solution is often to provide an additional data element in a struct to a union which contains the actual stored data type as an enum.
struct Y
{
enum { IS_CHAR, IS_INT } tinfo;
union
{
int x;
char y[4];
};
}
But in c++ you always should use classes or structs which can derive from a maybe empty parent class like this:
class Base
{
};
class Int_Type: public Base
{
...
int x;
};
class Char_Type: public Base
{
...
char y[4];
};
So you can device pointers to base which actually can hold a Int or a Char Type for you. With virtual functions you can access the members in a object oriented way of programming.
As mentioned already from Basile's answer, a useful case can be the access via different names to the same type.
union X
{
struct data
{
float a;
float b;
};
float arr[2];
};
which allows different access ways to the same data with the same type. Using different types which are stored in the same memory should be avoided at all!

How to have a C++ stack with more than one data type?

Here's the problem:
I am currently trying to create a simple stack-based programming language (Reverse Polish Notation, FORTH style) as a component of a larger project. I have hit a snag, though.
There is no problem with creating a stack in C++ (by using std::vector<>) that would contain one type of element (I could use the syntax std::vector<double> Stack, for instance).
However, a programming language needs to be able to hold multiple data types, such as ints, doubles, strings, and 3D vectors (as in physics vectors with X, Y, and Z components), just to name some simple things.
So, is there a construct in C++ that I could use as a stack that would be able to store more than one kind of primitive type/object/struct?
Sure, one way is to use a tagged union:
enum Type { INTEGER, DOUBLE, /* ... */ };
union Data {
uint64_t as_integer;
double as_double;
// ...
};
struct Value {
Type type;
Data data;
};
The storage for as_integer, as_double, etc. will be overlapped, so a Value structure will take up two words of storage, and your stack will have type std::vector<Value>. Then you access members of data according to the value of type:
void sub(std::vector<Value>& stack) {
// In reality you would probably factor this pattern into a function.
auto b = stack.back();
stack.pop_back();
assert(b.type == INTEGER);
auto a = stack.back();
stack.pop_back();
assert(a.type == INTEGER);
Value result;
result.type = INTEGER;
result.data.as_integer = a.data.as_integer - b.data.as_integer;
stack.push_back(result);
}
Of course, Forths are usually untyped, meaning that the stack consists of words only (std::vector<uint64_t>) and the interpretation of a data value is up to the word operating on it. In that case, you would pun via a union or reinterpret_cast to the appropriate type in the implementation of each word:
void subDouble(std::vector<Data>& stack) {
// Note that this has no type safety guarantees anymore.
double b = stack.back().as_double;
stack.pop_back();
double a = stack.back().as_double;
stack.pop_back();
Data result;
result.as_double = a - b;
stack.push_back(result);
}
void subDouble(std::vector<uint64_t>& stack) {
double b = reinterpret_cast<double&>(stack.back());
stack.pop_back();
double a = reinterpret_cast<double&>(stack.back());
stack.pop_back();
double result = a - b;
stack.push_back(reinterpret_cast<uint64_t&>(result));
}
Alternatively, you can store not values but pointers to instances of a class Value from which other value types such as Integer or Double would derive:
struct Value {};
struct Integer : Value { uint64_t value; };
struct Double : Value { double value; };
// ...
Your stack would have type std::vector<unique_ptr<Value>> or std::vector<Value*>. Then you needn’t worry about different value sizes, at the cost of making wrapper structures and allocating instances of them at runtime.
I would suggest to use the inheritance. Make common base class for the objects you need to store, and make a vector of base types. The store all the inheriting objects in this vector.
Since c++ is an object-orientated language, you might just use inheritance. Here is a quick example taken from http://www.cplusplus.com/forum/general/17754/ and extended:
#include <iostream>
#include <vector>
using namespace std;
// abstract base class
class Animal
{
public:
// pure virtual method
virtual void speak() = 0;
// virtual destructor
virtual ~Animal() {}
};
// derived class 1
class Dog : public Animal
{
public:
// polymorphic implementation of speak
virtual void speak() { cout << "Ruff!"; }
};
// derived class 2
class Cat : public Animal
{
public:
// polymorphic implementation of speak
virtual void speak() { cout << "Meow!"; }
};
int main( int argc, char* args[] )
// container of base class pointers
vector<Animal*> barn;
// dynamically allocate an Animal instance and add it to the container
barn.push_back( new Dog() );
barn.push_back( new Cat() );
// invoke the speak method of the first Animal in the container
barn.front()->speak();
// invoke all speak methods and free the allocated memory
for( vector<Animal*>::iterator i = barn.begin(); i != barn.end(); ++i )
{
i->speak();
delete *i;
}
// empty the container
barn.clear();
return 0;
}
The solution for storing different types is a tagged union
enum Type { INT, STRING, DOUBLE, POINT2D, VECTOR, OBJECT... };
union Data {
int int_val;
double double_val;
struct point2D { int x, int y };
struct { int v3, int v2, int v1, int v0 }; // you can even use unnamed structs
// ...
};
struct StackElem {
Type type;
Data data;
};
In C++ it's even better to use std::variant (or boost::variant in older C++ standards), which might use a tagged union under the hood
However there's no need to use a single stack for all when using the reverse Polish notation. You can use a value stack and a separate operator stack. For every operator on the operator stack you pop the corresponding number of parameters from the value stack. That'll make things easier and save memory since you can use a small char array for operators (unless you need more than 255 operators), and no memory wasted for saving the type as well as the bigger-than-needed data field in the struct like above. That means you don't need a type OPERATOR in the Type enum
You can use a double type stack for all numeric types because a double can contain all int type's range without loss of precision. That's what implemented in Javascript and Lua. If the operator needs more than 1 parameter then just push/pop all of them just like what a compiler does when evaluating a function. You don't need to worry about int operations anymore, just do everything in double, unless there are specific int operators. But you may need different operators for different types, for example + for double addition, p or something like that for vector addition. However if you need 64-bit int then a separate integer type is needed
For example if you need to add 2 3D vectors, push 3 dimensions of the first vector, then the other. When you pop out a vector operator from the operator stack, pop 3 dimensions of the 2 vectors from value stack. After doing the math on it, push resulting 3 dimensions to stack. No need for a vector type.
If you don't want to store int as double then you can use NaN-boxing (or nunboxing/punboxing) like Firefox's JS engine, in which if the value is int then the upper 16 of 64 bits are 1s, otherwise it's double (or pointer, which you probable wouldn't use). Another way is type tag in 3 lower bits in old FFJS engines. In this case it's a little bit complicated but you can use the same operator for every type. For more information about this read Using the extra 16 bits in 64-bit pointers
You can even use a byte array for storing all data types and read the correct number of bytes specified by the operator. For example if the operator indicated that the next operand must be an int, just read 4 bytes. If it's a string, read the 4 bytes of string length first then the string content from the stack. If it's a 2D point of int read 4 bytes of x and 4 bytes of y. If it's a double read 8 bytes, etc. This is the most space efficient way, but obviously it must be traded by speed

C: Where is union practically used?

I have a example with me where in which the alignment of a type is guaranteed, union max_align . I am looking for a even simpler example in which union is used practically, to explain my friend.
I usually use unions when parsing text. I use something like this:
typedef enum DataType { INTEGER, FLOAT_POINT, STRING } DataType ;
typedef union DataValue
{
int v_int;
float v_float;
char* v_string;
}DataValue;
typedef struct DataNode
{
DataType type;
DataValue value;
}DataNode;
void myfunct()
{
long long temp;
DataNode inputData;
inputData.type= read_some_input(&temp);
switch(inputData.type)
{
case INTEGER: inputData.value.v_int = (int)temp; break;
case FLOAT_POINT: inputData.value.v_float = (float)temp; break;
case STRING: inputData.value.v_string = (char*)temp; break;
}
}
void printDataNode(DataNode* ptr)
{
printf("I am a ");
switch(ptr->type){
case INTEGER: printf("Integer with value %d", ptr->value.v_int); break;
case FLOAT_POINT: printf("Float with value %f", ptr->value.v_float); break;
case STRING: printf("String with value %s", ptr->value.v_string); break;
}
}
If you want to see how unions are used HEAVILY, check any code using flex/bison. For example see splint, it contains TONS of unions.
I've typically used unions where you want to have different views of the data
e.g. a 32-bit colour value where you want both the 32-bit val and the red,green,blue and alpha components
struct rgba
{
unsigned char r;
unsigned char g;
unsigned char b;
unsigned char a;
};
union
{
unsigned int val;
struct rgba components;
}colorval32;
NB You could also achieve the same thing with bit-masking and shifting i.e
#define GETR(val) ((val&0xFF000000) >> 24)
but I find the union approach more elegant
For accessing registers or I/O ports bytewise as well as bitwise by mapping that particular port to memory, see the example below:
typedef Union
{
unsigned int a;
struct {
unsigned bit0 : 1,
bit1 : 1,
bit2 : 1,
bit3 : 1,
bit4 : 1,
bit5 : 1,
bit6 : 1,
bit7 : 1,
bit8 : 1,
bit9 : 1,
bit10 : 1,
bit11 : 1,
bit12 : 1,
bit13 : 1,
bit14 : 1,
bit15 : 1
} bits;
} IOREG;
# define PORTA (*(IOREG *) 0x3B)
...
unsigned int i = PORTA.a;//read bytewise
int j = PORTA.bits.bit0;//read bitwise
...
PORTA.bits.bit0 = 1;//write operation
In the Windows world, unions are commonly used to implement tagged variants, which are (or were, before .NET?) one standard way of passing data between COM objects.
The idea is that a union type can provide a single natural interface for passing arbitrary data between two objects. Some COM object could pass you a variant (e.g. type VARIANT or _variant_t) which could contain either a double, float, int, or whatever.
If you have to deal with COM objects in Windows C++ code, you'll see variant types all over the place.
VARIANTs, SAFEARRAYs, and BSTRs, Oh My!
Boost variant
struct cat_info
{
int legs;
int tailLen;
};
struct fish_info
{
bool hasSpikes;
};
union
{
fish_info fish;
cat_info cat;
} animal_data;
struct animal
{
char* name;
int animal_type;
animal_data data;
};
Unions are useful if you have different kinds of messages, in which case you don't have to know in any intermediate levels the exact type. Only the sender and receiver need to parse the message actual message. Any other levels only really need to know the size and possibly sender and/or receiver info.
SDL uses an union for representing events: http://www.libsdl.org/cgi/docwiki.cgi/SDL_Event.
do you mean something like this ?
union {
long long a;
unsigned char b[sizeof(long long)];
} long_long_to_single_bytes;
ADDED:
I have recently used this on our AIX machine to transform the 64bit machine-indentifier into a byte-array.
std::string getHardwareUUID(void) {
#ifdef AIX
struct xutsname m; // aix specific struct to hold the 64bit machine id
unamex(&b); // aix specific call to get the 64bit machine id
long_long_to_single_bytes.a = m.longnid;
return convertToHexString(long_long_to_single_bytes.b, sizeof(long long));
#else // Windows or Linux or Solaris or ...
... get a 6byte ethernet MAC address somehow and put it into mac_buf
return convertToHexString(mac_buf, 6);
#endif
Here is another example where a union could be useful.
(not my own idea, I have found this on a document discussing
c++ optimizations)
begin-quote
.... Unions can also be used to save space, e.g.
first the non-union approach:
void F3(bool useInt) {
if (y) {
int a[1000];
F1(a); // call a function which expects an array of int as parameter
}
else {
float b[1000];
F2(b); // call a function which expects an array of float as parameter
}
}
Here it is possible to use the same memory area for a and b because their live ranges do
not overlap. You can save a lot of cpu-cache space by joining a and b in a union:
void F3(bool useInt) {
union {
int a[1000];
float b[1000];
};
if (y) {
F1(a); // call a function which expects an array of int as parameter
}
else {
F2(b); // call a function which expects an array of float as parameter
}
}
Using a union is not a safe programming practice, of course, because you will get no
warning from the compiler if the uses of a and b overlap. You should use this method only
for big objects that take a lot of cache space. ...
end-qoute
I've used sometimes unions this way
//Define type of structure
typedef enum { ANALOG, BOOLEAN, UNKNOWN } typeValue_t;
//Define the union
typedef struct {
typeValue_t typeValue;
/*On this structure you will access the correct type of
data according to its type*/
union {
float ParamAnalog;
char ParamBool;
};
} Value_t;
Then you could declare arrays of different kind of values, storing more or less efficiently the data, and make some "polimorph" operations like:
void printValue ( Value_t value ) {
switch (value.typeValue) {
case BOOL:
printf("Bolean: %c\n", value.ParamBool?'T':'F');
break;
case ANALOG:
printf("Analog: %f\n", value.ParamAnalog);
break;
case UNKNOWN:
printf("Error, value UNKNOWN\n");
break;
}
}
When reading serialized data that needs to be coerced into specific types.
When returning semantic values from lex to yacc. (yylval)
When implementing a polymorphic type, especially one that reads a DSL or general language
When implementing a dispatcher that specifically calls functions intended to take different types.
Recently I think I saw some union used in vector programming. vector programming is used in intel MMX technology, GPU hardware, IBM's Cell Broadband Engine, and others.
a vector may correspond to a 128 bit register. It is very commonly used for SIMD architecture. since the hardware has 128-bit registers, you can store 4 single-precision-floating points in a register/variable. an easy way to construct, convert, extract individual elements of a vector is to use the union.
typedef union {
vector4f vec; // processor-specific built-in type
struct { // human-friendly access for transformations, etc
float x;
float y;
float z;
float w;
};
struct { // human-friendly access for color processing, lighting, etc
float r;
float g;
float b;
float a;
};
float arr[4]; // yet another convenience access
} Vector4f;
int main()
{
Vector4f position, normal, color;
// human-friendly access
position.x = 12.3f;
position.y = 2.f;
position.z = 3.f;
position.w = 1.f;
// computer friendly access
//some_processor_specific_operation(position.vec,normal.vec,color.vec);
return 0;
}
if you take a path in PlayStation 3 Multi-core Programming, or graphics programming, a good chance you'll face more of these stuffs.
I know I'm a bit late to the party, but as a practical example the Variant datatype in VBScript is, I believe, implemented as a Union. The following code is a simplified example taken from an article otherwise found here
struct tagVARIANT
{
union
{
VARTYPE vt;
WORD wReserved1;
WORD wReserved2;
WORD wReserved3;
union
{
LONG lVal;
BYTE bVal;
SHORT iVal;
FLOAT fltVal;
DOUBLE dblVal;
VARIANT_BOOL boolVal;
DATE date;
BSTR bstrVal;
SAFEARRAY *parray;
VARIANT *pvarVal;
};
};
};
The actual implementation (as the article states) is found in the oaidl.h C header file.
Example:
When using different socket types, but you want a comon type to refer.
Another example more: to save doing castings.
typedef union {
long int_v;
float float_v;
} int_float;
void foo(float v) {
int_float i;
i.float_v = v;
printf("sign=%d exp=%d fraction=%d", (i.int_v>>31)&1, ((i.int_v>>22)&0xff)-128, i.int_v&((1<<22)-1));
}
instead of:
void foo(float v) {
long i = *((long*)&v);
printf("sign=%d exp=%d fraction=%d", (i>>31)&1, ((i>>22)&0xff)-128, i&((1<<22)-1));
}
For convenience, I use unions to let me use the same class to store xyzw and rgba values
#ifndef VERTEX4DH
#define VERTEX4DH
struct Vertex4d{
union {
double x;
double r;
};
union {
double y;
double g;
};
union {
double z;
double b;
};
union {
double w;
double a;
};
Vertex4d(double x=0, double y=0,double z=0,double w=0) : x(x), y(y),z(z),w(w){}
};
#endif
Many examples of unions can be found in <X11/Xlib.h>. Few others are in some IP stacks (in BSD <netinet/ip.h> for instance).
As a general rule, protocol implementations use union construct.
Unions can also be useful when type punning, which is desirable in a select few places (such as some techniques for floating-point comparison algorithms).