Practical use of Anonymous union in real world C++ programing - c++

I know that we can access anonymous unions without creating it's object(without dot),
but could anybody please explain,what is the use of anonymous unions in real world c++ programing?

I have mostly used unions to store multiple different types of elements in the same contiguous storage without resorting to dynamic polymorphism. Thus, every element of my union is a struct describing the data for the corresponding node type. Using an anonymous union mostly gives a more convenient notation, i.e. instead of object.union_member.struct_member, I can just write object.struct_member, since there is no other member of that name anyways.
A recent example where I used them would be a rooted (mostly binary) tree which has different kinds of nodes:
struct multitree_node {
multitree_node_type type;
...
union {
node_type_1 n1;
node_type_2 n2;
...
};
};
Using this type tag type I am able to determine which element of the union to use. All of the structs node_type_x have roughly the same size, which is why I used the union in the first place (no unused storage).
With C++17, you would be able to do this using std::variant, but for now, using anonymous unions are a convenient way of implementing such 'polymorphic' types without virtual functions.

Here's a real-world example:
struct Point3D {
union {
struct {
float x, y, z;
};
struct {
float c[3];
};
};
};
Point3D p;
You can access p's x/y/z coordinates with p.x, p.y, p.z. This is convenient.
But sometimes you want to access point as a float[3] array. You can use p.c for that.
Note: Using this construct is Undefined Behavior by the standard. But, it works on all compilers I've met so far. So, if you want to use such a construct, be aware, that this may broke some day.

I actually remembered a use case I came across a while back. You know bit-fields? The standard makes very little guarantees about their layout in memory. If you want to pack binary data into an integer of a specific size, you are usually better off doing bit-wise arithmetic yourself.
However, with unions and the common initial sequence guarantee, you can put all the boilerplate behind member access syntax. So your code will look like it's using a bit-field, but will in fact just be packing bits into a predictable memory location.
Here's a Live Example
#include <cstdint>
#include <type_traits>
#include <climits>
#include <iostream>
template<typename UInt, std::size_t Pos, std::size_t Width>
struct BitField {
static_assert(std::is_integral<UInt>::value && std::is_unsigned<UInt>::value,
"To avoid UB, only unsigned integral type are supported");
static_assert(Width > 0 && Pos < sizeof(UInt) * CHAR_BIT && Width < sizeof(UInt) * CHAR_BIT - Pos,
"Position and/or width cannot be supported");
UInt mem;
BitField& operator=(UInt val) {
if((val & ((UInt(1) << Width) - 1)) == val) {
mem &= ~(((UInt(1) << Width) - 1) << Pos);
mem |= val << Pos;
}
// Should probably handle the error somehow
return *this;
}
operator UInt() {
return (mem >> Pos) & Width;
}
};
struct MyColor {
union {
std::uint32_t raw;
BitField<std::uint32_t, 0, 8> r;
BitField<std::uint32_t, 8, 8> g;
BitField<std::uint32_t, 16, 8> b;
};
MyColor() : raw(0) {}
};
int main() {
MyColor c;
c.r = 0xF;
c.g = 0xA;
c.b = 0xD;
std::cout << std::hex << c.raw;
}

Related

concatenating uint16_t and uint32_t values for hashing

I am trying concatenating (not adding) 2 uint16_t struct members and 2 uint32_t struct members and assigning the result to const void *p for the purpose of hashing. The struct and concatenation function that I am trying to implement is as follows.
struct xyz {
....
uint32_t a;
uint32_t b;
....
uint16_t c;
uint16_t d;
....
}
const void *p=concatenation(xyz.a,xyz.b,xyz.c,xyz.d)
Edited:
I have to use pre-defined hash functions. The most suitable hash function for my task seems to be this.
uint32_t hash(const uint32_t p[], size_t n)
{
//Returns the hash of the 'n' 32-bit words at 'p'
}
or
uint32_t hash64(const uint64_t p[], size_t n)
{
//Returns the hash of the 'n' 64-bit words at 'p'
}
for the purpose of hashing
In this case, I'd rather prefer providing a custom hash function – or specialise std::hash for. For use with standard templates, this might look like this:
namespace std // any extension of std namespace is UB
// sole exception: specialising templates, which we are going to do
{
template <>
struct hash<xyz>
{
size_t operator()(xyz const& i) const
{
// TODO: need to calculate the value from a, b, c, and d appropriately
return 0;
};
};
// if xyz is polymorphic, you might need to operate on pointers
// no problem either:
template <>
struct hash<xyz*>
{
size_t operator()(xyz const* i) const
{
return hash<xyz>()(*i);
// or if hash value is type dependent:
return i->hash(); // custom virtual hash member function needs to be implented
}
}
// now you can have
std::unordered_set<xyz> someSet;
void demo()
{
someSet.insert(xyz());
}
(Untested code, in case of errors please fix yourself.)
A list of hashing algorithms which might be used can be found at wikipedia.
If you want the value to fit into a pointer, the full value can be 32 bits on x86 or 64 bits on x64. I'm going to assume you are compiling for 64 bit machines.
This means you can only fit 2 uint16 and one uint32, or 2 uint32s.
Either way, you would shift the values into a uint64 (c | (d << 16) | (c << 32)) and then convert that value to a void*.
Edit: for clarification, you cannot fit all the structs members bit shifted one after another into a single pointer. You need a minimum of 96 bits to hold the packed struct which means at least two 64 bit pointers.
There are a few things to consider:
Does that hash value need to be portable across systems? If it does, then you will need to be careful to order the bytes the same way on different systems. If not, then the implementation can be simpler.
Do you want to hash every member of the class, and the class has no padding, and no value of a member should be hashed equally to another different value?
If both of these simplifications apply, then your function is fast and easy to implement but violating that precondition will break the hash. If not, then you must serialise the the data into a buffer, which practically means that you cannot simply return a pointer.
Here is a super simple implementation for the case that you don't need portability, and you hash all members, and there is no padding:
xyz example;
static_assert(std::has_unique_object_representations_v<xyz>);
const void* p = &example;
Note that this doesn't work with (IEEE-754) float members due to peculiarities of NaN.
A more robust solution that can produce hashes that are portable across systems is to use a general purpose serialisation scheme, and hash the serialised result. There is no standard serialisation functionality in C++.
void* has problems like: Who owns the memory? What's the type you are going to reinterpret the pointer as?
A more typed solution would be to use std::array of std::byte then you at least know that you're looking at an array of raw bytes and nothing else:
#include <cstdint>
#include <array>
#include <cstddef>
#include <cstring>
auto concat(std::uint32_t a, std::uint32_t b, std::uint16_t c, std::uint16_t d) {
std::array<std::byte, sizeof a + sizeof b + sizeof c + sizeof d> res;
std::byte* p = res.data();
std::memcpy(p, &a, sizeof a);
std::memcpy(p += sizeof a, &b, sizeof b);
std::memcpy(p += sizeof b, &c, sizeof c);
std::memcpy(p += sizeof c, &d, sizeof d);
return res;
}
int main() {
std::uint32_t a = 1, b = 0;
std::uint16_t c = 1, d = 0;
auto res = concat(a, b, c, d);
return 0;
}

C++ and dynamically typed languages

Today I talked to a friend about the differences between statically and dynamically typed languages (more info about the difference between static and dynamic typed languages in this SO question). After that, I was wondering what kind of trick can be used in C++ to emulate such dynamic behavior.
In C++, as in other statically typed languages, the variable type is specified at compile time. For example, let's say I have to read from a file a big amount of numbers, which are in the majority of the cases quite small, small enough to fit in an unsigned short type. Here comes the tricky thing, a small amount of these values are much bigger, bigger enough to need an unsigned long long to be stored.
Since I assume I'm going to do calculations with all of them I want all of them stored in the same container in consecutive positions of memory in the same order than I read them from the input file.. The naive approach would be to store them in a vector of type unsigned long long, but this means having typically up to 4 times extra space of what is actually needed (unsigned short 2 bytes, unsigned long long 8 bytes).
In dynamically typed languages, the type of a variable is interpreted at runtime and coerced to a type where it fits. How can I achieve something similar in C++?
My first idea is to do that by pointers, depending on its size I will store the number with the appropriate type. This has the obvious drawback of having to also store the pointer, but since I assume I'm going to store them in the heap anyway, I don't think it matters.
I'm totally sure that many of you can give me way better solutions than this ...
#include <iostream>
#include <vector>
#include <limits>
#include <sstream>
#include <fstream>
int main() {
std::ifstream f ("input_file");
if (f.is_open()) {
std::vector<void*> v;
unsigned long long int num;
while(f >> num) {
if (num > std::numeric_limits<unsigned short>::max()) {
v.push_back(new unsigned long long int(num));
}
else {
v.push_back(new unsigned short(num));
}
}
for (auto i: v) {
delete i;
}
f.close();
}
}
Edit 1:
The question is not about saving memory, I know in dynamically typed languages the necessary space to store the numbers in the example is going to be way more than in C++, but the question is not about that, it's about emulating a dynamically typed language with some c++ mechanism.
Options include...
Discriminated union
The code specifies a set of distinct, supported types T0, T1, T2, T3..., and - conceptually - creates a management type to
struct X
{
enum { F0, F1, F2, F3... } type_;
union { T0 t0_; T1 t1_; T2 t2_; T3 t3_; ... };
};
Because there are limitations on the types that can be placed into unions, and if they're bypassed using placement-new care needs to be taken to ensure adequate alignment and correct destructor invocation, a generalised implementation becomes more complicated, and it's normally better to use boost::variant<>. Note that the type_ field requires some space, the union will be at least as large as the largest of sizeof t0_, sizeof t1_..., and padding may be required.
std::type_info
It's also possible to have a templated constructor and assignment operator that call typeid and record the std::type_info, allowing future operations like "recover-the-value-if-it's-of-a-specific-type". The easiest way to pick up this behaviour is to use boost::any.
Run-time polymorphism
You can create a base type with virtual destructor and whatever functions you need (e.g. virtual void output(std::ostream&)), then derive a class for each of short and long long. Store pointers to the base class.
Custom solutions
In your particular scenario, you've only got a few large numbers: you could do something like reserve one of the short values to be a sentinel indicating that the actual value at this position can be recreated by bitwise shifting and ORing of the following 4 values. For example...
10 299 32767 0 0 192 3929 38
...could encode:
10
299
// 32767 is a sentinel indicating next 4 values encode long long
(0 << 48) + (0 << 32) + (192 << 16) + 3929
38
The concept here is similar to UTF-8 encoding for international character sets. This will be very space efficient, but it suits forward iteration, not random access indexing a la [123].
You could create a class for storing dynamic values:
enum class dyn_type {
none_type,
integer_type,
fp_type,
string_type,
boolean_type,
array_type,
// ...
};
class dyn {
dyn_type type_ = dyn_type::none_type;
// Unrestricted union:
union {
std::int64_t integer_value_;
double fp_value_;
std::string string_value_;
bool boolean_value_;
std::vector<dyn> array_value_;
};
public:
// Constructors
dyn()
{
type_ = dyn_type::none_type;
}
dyn(std::nullptr_t) : dyn() {}
dyn(bool value)
{
type_ = dyn_type::boolean_type;
boolean_value_ = value;
}
dyn(std::int32_t value)
{
type_ = dyn_type::integer_type;
integer_value_ = value;
}
dyn(std::int64_t value)
{
type_ = dyn_type::integer_type;
integer_value_ = value;
}
dyn(double value)
{
type_ = dyn_type::fp_type;
fp_value_ = value;
}
dyn(const char* value)
{
type_ = dyn_type::string_type;
new (&string_value_) std::string(value);
}
dyn(std::string const& value)
{
type_ = dyn_type::string_type;
new (&string_value_) std::string(value);
}
dyn(std::string&& value)
{
type_ = dyn_type::string_type;
new (&string_value_) std::string(std::move(value));
}
// ....
// Clear
void clear()
{
switch(type_) {
case dyn_type::string_type:
string_value_.std::string::~string();
break;
//...
}
type_ = dyn_type::none_type;
}
~dyn()
{
this->clear();
}
// Copy:
dyn(dyn const&);
dyn& operator=(dyn const&);
// Move:
dyn(dyn&&);
dyn& operator=(dyn&&);
// Assign:
dyn& operator=(std::nullptr_t);
dyn& operator=(std::int64_t);
dyn& operator=(double);
dyn& operator=(bool);
// Operators:
dyn operator+(dyn const&) const;
dyn& operator+=(dyn const&);
// ...
// Query
dyn_type type() const { return type_; }
std::string& string_value()
{
assert(type_ == dyn_type::string_type);
return string_value_;
}
// ....
// Conversion
explicit operator bool() const
{
switch(type_) {
case dyn_type::none_type:
return true;
case dyn_type::integer_type:
return integer_value_ != 0;
case dyn_type::fp_type:
return fp_value_ != 0.0;
case dyn_type::boolean_type:
return boolean_value_;
// ...
}
}
// ...
};
Used with:
std::vector<dyn> xs;
xs.push_back(3);
xs.push_back(2.0);
xs.push_back("foo");
xs.push_back(false);
An easy way to get dynamic language behavior in C++ is to use a dynamic language engine, e.g. for Javascript.
Or, for example, the Boost library provides an interface to Python.
Possibly that will deal with a collection of numbers in a more efficient way than you could do yourself, but still it's extremely inefficient compared to just using an appropriate single common type in C++.
The normal way of dynamic typing in C++ is a boost::variant or a boost::any.
But in many cases you don't want to do that. C++ is a great statically typed language and it's just not your best use case to try to force it to be dynamically typed (especially not to save memory use). Use an actual dynamically typed language instead as it is very likely better optimized (and easier to read) for that use case.

C++ understanding Unions and Structs

I've come to work on an ongoing project where some unions are defined as follows:
/* header.h */
typedef union my_union_t {
float data[4];
struct {
float varA;
float varB;
float varC;
float varD;
};
} my_union;
If I understand well, unions are for saving space, so sizeof(my_union_t) = MAX of the variables in it. What are the advantages of using the statement above instead of this one:
typedef struct my_struct {
float varA;
float varB;
float varC;
float varD;
};
Won't be the space allocated for both of them the same?
And how can I initialize varA,varB... from my_union?
Unions are often used when implementing a variant like object (a type field and a union of data types), or in implementing serialisation.
The way you are using a union is a recipe for disaster.
You are assuming the the struct in the union is packing the floats with no gaps between then!
The standard guarantees that float data[4]; is contiguous, but not the structure elements. The only other thing you know is that the address of varA; is the same as the address of data[0].
Never use a union in this way.
As for your question: "And how can I initialize varA,varB... from my_union?". The answer is, access the structure members in the normal long-winded way not via the data[] array.
Union are not mostly for saving space, but to implement sum types (for that, you'll put the union in some struct or class having also a discriminating field which would keep the run-time tag). Also, I suggest you to use a recent standard of C++, at least C++11 since it has better support of unions (e.g. permits more easily union of objects and their construction or initialization).
The advantage of using your union is to be able to index the n-th floating point (with 0 <= n <= 3) as u.data[n]
To assign a union field in some variable declared my_union u; just code e.g. u.varB = 3.14; which in your case has the same effect as u.data[1] = 3.14;
A good example of well deserved union is a mutable object which can hold either an int or a string (you could not use derived classes in that case):
class IntOrString {
bool isint;
union {
int num; // when isint is true
str::string str; // when isint is false
};
public:
IntOrString(int n=0) : isint(true), num(n) {};
IntOrString(std::string s) : isint(false), str(s) {};
IntOrString(const IntOrString& o): isint(o.isint)
{ if (isint) num = o.num; else str = o.str); };
IntOrString(IntOrString&&p) : isint(p.isint)
{ if (isint) num = std::move (p.num);
else str = std::move (p.str); };
~IntOrString() { if (isint) num=0; else str->~std::string(); };
void set (int n)
{ if (!isint) str->~std::string(); isint=true; num=n; };
void set (std::string s) { str = s; isint=false; };
bool is_int() const { return isint; };
int as_int() const { return (isint?num:0; };
const std::string as_string() const { return (isint?"":str;};
};
Notice the explicit calls of destructor of str field. Notice also that you can safely use IntOrString in a standard container (std::vector<IntOrString>)
See also std::optional in future versions of C++ (which conceptually is a tagged union with void)
BTW, in Ocaml, you simply code:
type intorstring = Integer of int | String of string;;
and you'll use pattern matching. If you wanted to make that mutable, you'll need to make a record or a reference of it.
You'll better use union-s in a C++ idiomatic way (see this for general advices).
I think the best way to understand unions is to just to give 2 common practical examples.
The first example is working with images. Imagine you have and RGB image that is arranged in a long buffer.
What most people would do, is represent the buffer as a char* and then loop it by 3's to get the R,G,B.
What you could do instead, is make a little union, and use that to loop over the image buffer:
union RGB
{
char raw[3];
struct
{
char R;
char G;
char B;
} colors;
}
RGB* pixel = buffer[0];
///pixel.colors.R == The red color in the first pixel.
Another very useful use for unions is using registers and bitfields.
Lets say you have a 32 bit value, that represents some HW register, or something.
Sometimes, to save space, you can split the 32 bits into bit fields, but you also want the whole representation of that register as a 32 bit type.
This obviously saves bit shift calculation that a lot of programmers use for no reason at all.
union MySpecialRegister
{
uint32_t register;
struct
{
unsigned int firstField : 5;
unsigned int somethingInTheMiddle : 25;
unsigned int lastField : 6;
} data;
}
// Now you can read the raw register into the register field
// then you can read the fields using the inner data struct
The advantage is that with a union you can access the same memory in two different ways.
In your example the union contains four floats. You can access those floats as varA, varB... which might be more descriptive names or you can access the same variables as an array data[0], data[1]... which might be more useful in loops.
With a union you can also use the same memory for different kinds of data, you might find that useful for things like writing a function to tell you if you are on a big endian or little endian CPU.
No, it is not for saving space. It is for ability to represent some binary data as various data types.
for example
#include <iostream>
#include <stdint.h>
union Foo{
int x;
struct y
{
unsigned char b0, b1, b2, b3;
};
char z[sizeof(int)];
};
int main()
{
Foo bar;
bar.x = 100;
std::cout << std::hex; // to show number in hexadec repr;
for(size_t i = 0; i < sizeof(int); i++)
{
std::cout << "0x" << (int)bar.z[i] << " "; // int is just to show values as numbers, not a characters
}
return 0;
}
output: 0x64 0x0 0x0 0x0 The same values are stored in struct bar.y, but not in array but in sturcture members. Its because my machine have a little endiannes. If it were big, than the output would be reversed: 0x0 0x0 0x0 0x64
You can achieve the same using reinterpret_cast:
#include <iostream>
#include <stdint.h>
int main()
{
int x = 100;
char * xBytes = reinterpret_cast<char*>(&x);
std::cout << std::hex; // to show number in hexadec repr;
for (size_t i = 0; i < sizeof(int); i++)
{
std::cout << "0x" << (int)xBytes[i] << " "; // (int) is just to show values as numbers, not a characters
}
return 0;
}
its usefull, for example, when you need to read some binary file, that was written on a machine with different endianess than yours. You can just access values as bytearray and swap those bytes as you wish.
Also, it is usefull when you have to deal with bit fields, but its a whole different story :)
First of all: Avoid unions where the access goes to the same memory but to different types!
Unions did not save space at all. The only define multiple names on the same memory area! And you can only store one of the elements in one time in a union.
if you have
union X
{
int x;
char y[4];
};
you can store an int OR 4 chars but not both! The general problem is, that nobody knows which data is actually stored in a union. If you store a int and read the chars, the compiler will not check that and also there is no runtime check. A solution is often to provide an additional data element in a struct to a union which contains the actual stored data type as an enum.
struct Y
{
enum { IS_CHAR, IS_INT } tinfo;
union
{
int x;
char y[4];
};
}
But in c++ you always should use classes or structs which can derive from a maybe empty parent class like this:
class Base
{
};
class Int_Type: public Base
{
...
int x;
};
class Char_Type: public Base
{
...
char y[4];
};
So you can device pointers to base which actually can hold a Int or a Char Type for you. With virtual functions you can access the members in a object oriented way of programming.
As mentioned already from Basile's answer, a useful case can be the access via different names to the same type.
union X
{
struct data
{
float a;
float b;
};
float arr[2];
};
which allows different access ways to the same data with the same type. Using different types which are stored in the same memory should be avoided at all!

Can I cast an array of POD which has floats to float*?

Consider the following:
#include <vector>
using namespace std;
struct Vec2
{
float m_x;
float m_y;
};
vector<Vec2> myArray;
int main()
{
myArray.resize(100);
for (int i = 0; i < 100; ++i)
{
myArray[i].m_x = (float)(i);
myArray[i].m_y = (float)(i);
}
float* raw;
raw = reinterpret_cast<float*>(&(myArray[0]));
}
Is raw guaranteed to have 200 contiguous floats with the correct values? That is, does the standard guarantee this?
EDIT: If the above is guaranteed, and if Vec2 has some functions (non-virtual) and a constructor, is the guarantee still there?
NOTE: I realize this is dangerous, in my particular case I have no
choice as I am working with a 3rd party library.
I realize this is dangerous, in my particular case I have no choice as I am working with a 3rd party library.
You may add compile time check of structure size:
live demo
struct Vec2
{
float a;
float b;
};
int main()
{
int assert_s[ sizeof(Vec2) == 2*sizeof(float) ? 1 : -1 ];
}
It would increase your confidence of your approach (which is still unsafe due to reinterpret_cast, as mentioned).
raw = reinterpret_cast(&(myArray[0]));
ISO C++98 9.2/17:
A pointer to a POD struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note: There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment. —end note ]
And finally, runtime check of corresponding addresses would make such solution rather safe. It can be done during unit-tests or even at every start of program (on small test array).
Putting it all together:
live demo
#include <vector>
#include <cassert>
using namespace std;
struct Vec2
{
float a;
float b;
};
int main()
{
int assert_s[ sizeof(Vec2) == 2*sizeof(float) ? 1 : -1 ];
typedef vector<Vec2> Vector;
Vector v(32);
float *first=static_cast<float*>(static_cast<void*>(&v[0]));
for(Vector::size_type i,size=v.size();i!=size;++i)
{
assert((first+i*2) == (&(v[i].a)));
assert((first+i*2+1) == (&(v[i].b)));
}
assert(false != false);
}
No, this is not safe, because the compiler is free to insert padding between or after the two floats in the structure, and so the floats of the structure may not be contiguous.
If you still want to try it, you can add compile time checks to add more surety that it will work:
static_assert(sizeof(Vec2) == sizeof(float) * 2, "Vec2 struct is too big!");
static_assert(offsetof(Vec2, b) == sizeof(float), "Vec2::b at the wrong offset!");
The only guarantee that a reinterpret_cast gives is, that you get the original object when you reinterpret_cast the casted object back to the original data type.
Especially, raw is not guaranteed to have 200 contiguous floats with the correct values.

C: Where is union practically used?

I have a example with me where in which the alignment of a type is guaranteed, union max_align . I am looking for a even simpler example in which union is used practically, to explain my friend.
I usually use unions when parsing text. I use something like this:
typedef enum DataType { INTEGER, FLOAT_POINT, STRING } DataType ;
typedef union DataValue
{
int v_int;
float v_float;
char* v_string;
}DataValue;
typedef struct DataNode
{
DataType type;
DataValue value;
}DataNode;
void myfunct()
{
long long temp;
DataNode inputData;
inputData.type= read_some_input(&temp);
switch(inputData.type)
{
case INTEGER: inputData.value.v_int = (int)temp; break;
case FLOAT_POINT: inputData.value.v_float = (float)temp; break;
case STRING: inputData.value.v_string = (char*)temp; break;
}
}
void printDataNode(DataNode* ptr)
{
printf("I am a ");
switch(ptr->type){
case INTEGER: printf("Integer with value %d", ptr->value.v_int); break;
case FLOAT_POINT: printf("Float with value %f", ptr->value.v_float); break;
case STRING: printf("String with value %s", ptr->value.v_string); break;
}
}
If you want to see how unions are used HEAVILY, check any code using flex/bison. For example see splint, it contains TONS of unions.
I've typically used unions where you want to have different views of the data
e.g. a 32-bit colour value where you want both the 32-bit val and the red,green,blue and alpha components
struct rgba
{
unsigned char r;
unsigned char g;
unsigned char b;
unsigned char a;
};
union
{
unsigned int val;
struct rgba components;
}colorval32;
NB You could also achieve the same thing with bit-masking and shifting i.e
#define GETR(val) ((val&0xFF000000) >> 24)
but I find the union approach more elegant
For accessing registers or I/O ports bytewise as well as bitwise by mapping that particular port to memory, see the example below:
typedef Union
{
unsigned int a;
struct {
unsigned bit0 : 1,
bit1 : 1,
bit2 : 1,
bit3 : 1,
bit4 : 1,
bit5 : 1,
bit6 : 1,
bit7 : 1,
bit8 : 1,
bit9 : 1,
bit10 : 1,
bit11 : 1,
bit12 : 1,
bit13 : 1,
bit14 : 1,
bit15 : 1
} bits;
} IOREG;
# define PORTA (*(IOREG *) 0x3B)
...
unsigned int i = PORTA.a;//read bytewise
int j = PORTA.bits.bit0;//read bitwise
...
PORTA.bits.bit0 = 1;//write operation
In the Windows world, unions are commonly used to implement tagged variants, which are (or were, before .NET?) one standard way of passing data between COM objects.
The idea is that a union type can provide a single natural interface for passing arbitrary data between two objects. Some COM object could pass you a variant (e.g. type VARIANT or _variant_t) which could contain either a double, float, int, or whatever.
If you have to deal with COM objects in Windows C++ code, you'll see variant types all over the place.
VARIANTs, SAFEARRAYs, and BSTRs, Oh My!
Boost variant
struct cat_info
{
int legs;
int tailLen;
};
struct fish_info
{
bool hasSpikes;
};
union
{
fish_info fish;
cat_info cat;
} animal_data;
struct animal
{
char* name;
int animal_type;
animal_data data;
};
Unions are useful if you have different kinds of messages, in which case you don't have to know in any intermediate levels the exact type. Only the sender and receiver need to parse the message actual message. Any other levels only really need to know the size and possibly sender and/or receiver info.
SDL uses an union for representing events: http://www.libsdl.org/cgi/docwiki.cgi/SDL_Event.
do you mean something like this ?
union {
long long a;
unsigned char b[sizeof(long long)];
} long_long_to_single_bytes;
ADDED:
I have recently used this on our AIX machine to transform the 64bit machine-indentifier into a byte-array.
std::string getHardwareUUID(void) {
#ifdef AIX
struct xutsname m; // aix specific struct to hold the 64bit machine id
unamex(&b); // aix specific call to get the 64bit machine id
long_long_to_single_bytes.a = m.longnid;
return convertToHexString(long_long_to_single_bytes.b, sizeof(long long));
#else // Windows or Linux or Solaris or ...
... get a 6byte ethernet MAC address somehow and put it into mac_buf
return convertToHexString(mac_buf, 6);
#endif
Here is another example where a union could be useful.
(not my own idea, I have found this on a document discussing
c++ optimizations)
begin-quote
.... Unions can also be used to save space, e.g.
first the non-union approach:
void F3(bool useInt) {
if (y) {
int a[1000];
F1(a); // call a function which expects an array of int as parameter
}
else {
float b[1000];
F2(b); // call a function which expects an array of float as parameter
}
}
Here it is possible to use the same memory area for a and b because their live ranges do
not overlap. You can save a lot of cpu-cache space by joining a and b in a union:
void F3(bool useInt) {
union {
int a[1000];
float b[1000];
};
if (y) {
F1(a); // call a function which expects an array of int as parameter
}
else {
F2(b); // call a function which expects an array of float as parameter
}
}
Using a union is not a safe programming practice, of course, because you will get no
warning from the compiler if the uses of a and b overlap. You should use this method only
for big objects that take a lot of cache space. ...
end-qoute
I've used sometimes unions this way
//Define type of structure
typedef enum { ANALOG, BOOLEAN, UNKNOWN } typeValue_t;
//Define the union
typedef struct {
typeValue_t typeValue;
/*On this structure you will access the correct type of
data according to its type*/
union {
float ParamAnalog;
char ParamBool;
};
} Value_t;
Then you could declare arrays of different kind of values, storing more or less efficiently the data, and make some "polimorph" operations like:
void printValue ( Value_t value ) {
switch (value.typeValue) {
case BOOL:
printf("Bolean: %c\n", value.ParamBool?'T':'F');
break;
case ANALOG:
printf("Analog: %f\n", value.ParamAnalog);
break;
case UNKNOWN:
printf("Error, value UNKNOWN\n");
break;
}
}
When reading serialized data that needs to be coerced into specific types.
When returning semantic values from lex to yacc. (yylval)
When implementing a polymorphic type, especially one that reads a DSL or general language
When implementing a dispatcher that specifically calls functions intended to take different types.
Recently I think I saw some union used in vector programming. vector programming is used in intel MMX technology, GPU hardware, IBM's Cell Broadband Engine, and others.
a vector may correspond to a 128 bit register. It is very commonly used for SIMD architecture. since the hardware has 128-bit registers, you can store 4 single-precision-floating points in a register/variable. an easy way to construct, convert, extract individual elements of a vector is to use the union.
typedef union {
vector4f vec; // processor-specific built-in type
struct { // human-friendly access for transformations, etc
float x;
float y;
float z;
float w;
};
struct { // human-friendly access for color processing, lighting, etc
float r;
float g;
float b;
float a;
};
float arr[4]; // yet another convenience access
} Vector4f;
int main()
{
Vector4f position, normal, color;
// human-friendly access
position.x = 12.3f;
position.y = 2.f;
position.z = 3.f;
position.w = 1.f;
// computer friendly access
//some_processor_specific_operation(position.vec,normal.vec,color.vec);
return 0;
}
if you take a path in PlayStation 3 Multi-core Programming, or graphics programming, a good chance you'll face more of these stuffs.
I know I'm a bit late to the party, but as a practical example the Variant datatype in VBScript is, I believe, implemented as a Union. The following code is a simplified example taken from an article otherwise found here
struct tagVARIANT
{
union
{
VARTYPE vt;
WORD wReserved1;
WORD wReserved2;
WORD wReserved3;
union
{
LONG lVal;
BYTE bVal;
SHORT iVal;
FLOAT fltVal;
DOUBLE dblVal;
VARIANT_BOOL boolVal;
DATE date;
BSTR bstrVal;
SAFEARRAY *parray;
VARIANT *pvarVal;
};
};
};
The actual implementation (as the article states) is found in the oaidl.h C header file.
Example:
When using different socket types, but you want a comon type to refer.
Another example more: to save doing castings.
typedef union {
long int_v;
float float_v;
} int_float;
void foo(float v) {
int_float i;
i.float_v = v;
printf("sign=%d exp=%d fraction=%d", (i.int_v>>31)&1, ((i.int_v>>22)&0xff)-128, i.int_v&((1<<22)-1));
}
instead of:
void foo(float v) {
long i = *((long*)&v);
printf("sign=%d exp=%d fraction=%d", (i>>31)&1, ((i>>22)&0xff)-128, i&((1<<22)-1));
}
For convenience, I use unions to let me use the same class to store xyzw and rgba values
#ifndef VERTEX4DH
#define VERTEX4DH
struct Vertex4d{
union {
double x;
double r;
};
union {
double y;
double g;
};
union {
double z;
double b;
};
union {
double w;
double a;
};
Vertex4d(double x=0, double y=0,double z=0,double w=0) : x(x), y(y),z(z),w(w){}
};
#endif
Many examples of unions can be found in <X11/Xlib.h>. Few others are in some IP stacks (in BSD <netinet/ip.h> for instance).
As a general rule, protocol implementations use union construct.
Unions can also be useful when type punning, which is desirable in a select few places (such as some techniques for floating-point comparison algorithms).