how to assign multiple values into a struct at once? - c++

I can do this on initialization for a struct Foo:
Foo foo = {bunch, of, things, initialized};
but, I can't do this:
Foo foo;
foo = {bunch, of, things, initialized};
So, two questions:
Why can't I do the latter, is the former a special constructor for initialization only?
How can I do something similar to the second example, i.e. declare a bunch of variables for a struct in a single line of code after it's already been initialized? I'm trying to avoid having to do this for large structs with many variables:
Foo foo;
foo.a = 1;
foo.b = 2;
foo.c = 3;
//... ad infinitum

Try this:
Foo foo;
foo = (Foo){bunch, of, things, initialized};
This will work if you have a good compiler (e.g. GCC).
Update: In modern versions of C (but not C++), you can also use a compound literal with designated initializers, which looks like this:
foo = (Foo){ .bunch = 4, .of = 2, .things = 77, .initialized = 8 };
The name right after the "." should be the name of the structure member you wish to initialize. These initializers can appear in any order, and any member that is not specified explicitly will get initialized to zero.

The first is an aggregate initializer - you can read up on those and tagged initializers at this solution:
What is tagged structure initialization syntax?
It is a special initialization syntax, and you can't do something similar after initialization of your struct. What you can do is provide a member (or non-member) function to take your series of values as parameters which you then assign within the member function - that would allow you to accomplish this after the structure is initialized in a way that is equally concise (after you've written the function the first time of course!)

In C++11 you can perform multiple assignment with "tie" (declared in the tuple header)
struct foo {
int a, b, c;
} f;
std::tie(f.a, f.b, f.c) = std::make_tuple(1, 2, 3);
If your right hand expression is of fixed size and you only need to get some of the elements, you can use the ignore placeholder with tie
std::tie(std::ignore, f.b, std::ignore) = some_tuple; // only f.b modified
If you find the syntax std::tie(f.a, f.b, f.c) too code cluttering you could have a member function returning that tuple of references
struct foo {
int a, b, c;
auto members() -> decltype(std::tie(a, b, c)) {
return std::tie(a, b, c);
}
} f;
f.members() = std::make_tuple(1, 2, 3);
All this ofcourse assuming that overloading the assignment operator is not an option because your struct is not constructible by such sequence of values, in which case you could say
f = foo(1, 2, 3);

Memory Footprint - Here is an interesting i386 addition.
After much hassle, using optimization and memcpy seems to generate the smallest footprint using i386 with GCC and C99. I am using -O3 here. stdlib seems to have all sorts of fun compiler optimizations at hand, and this example makes use of that (memcpy is actually compiled out here).
Do this by:
Foo foo; //some global variable
void setStructVal (void) {
const Foo FOO_ASSIGN_VAL = { //this goes into .rodata
.bunch = 1,
.of = 2,
.things = 3,
.initialized = 4
};
memcpy((void*) &FOO_ASSIGN_VAL, (void*) foo, sizeof(Foo));
return;
}
Result:
(.rodata) FOO_ASSIGN_VAL is stored in .rodata
(.text) a sequence of *movl FOO_ASSIGN_VAL, %registers* occur
(.text) a sequence of movl %registers, foo occur
Example:
Say Foo was a 48 field struct of uint8_t values. It is aligned in memory.
(IDEAL) On a 32-bit machine, this COULD be as quick as 12 MOVL instructions of immediates out to foo's address space. For me this is 12*10 == 120bytes of .text in size.
(ACTUAL) However, using the answer by AUTO will likely generate 48 MOVB instructions in .text. For me this is 48*7 == 336bytes of .text!!
(SMALLEST*) Use the memcpy version above. IF alignment is taken care of,
FOO_ASSIGN_VAL is placed in .rodata (48 bytes),
12 MOVL into %register
12 MOVL outof %registers are used in .text (24*10) == 240bytes.
For me then this is a total of 288 bytes.
So, for me at least with my i386 code,
- Ideal: 120 bytes
- Direct: 336 bytes
- Smallest: 288 bytes
*Smallest here means 'smallest footprint I know of'. It also executes faster than the above methods (24 instructions vs 48). Of course, the IDEAL version is fastest & smallest, but I still can't figure that out.
-Justin
*Does anyone know how to get implementation of 'IDEAL' above? It is annoying the hell out of me!!

If you don't care too much about efficiency, you could double assign: i.e. create a new instance of the structure using aggregate initialization, and then copy it over:
struct Foo foo;
{
struct Foo __tmp__ = {bunch, of, things, initialized};
foo = __tmp__;
}
Make sure you keep the portion wrapped in {}s so as to discard the unnecessary temporary variable as soon as it's no longer necessary.
Note this isn't as efficient as making, e.g., a 'set' function in the struct (if c++) or out of the struct, accepting a struct pointer (if C). But if you need a quick, preferably temporary, alternative to writing element-by-element assignment, this might do.

If you care about efficiency, you can define a union of the same length as your structure, with a type you can assign at once.
To assign values by elements use the struct of your union, to assign the whole data, use the other type of your union.
typedef union
{
struct
{
char a;
char b;
} Foo;
unsigned int whole;
} MyUnion;
MyUnion _Union;
_Union.Foo.a = 0x23; // assign by element
_Union.Foo.b = 0x45; // assign by element
_Union.whole = 0x6789; // assign at once
Be carefull about your memory organization (is "a" the MSB or the LSB of "whole"?).

Related

Clean syntax for filling Bitfield structs from function parameters

I must send some data over a network where the package parts are not byte aligned. All packages are 8 byte long and an example package type might look like this:
union Packed
{
struct
{
uint64_t a : 5;
uint64_t b : 10;
bool c : 1;
};
uint64_t raw;
};
So the first 5 bits are field a the next 10 bits are field b and the last bit is field c. Now I need a send function that can send this and possibly other package types. The function should accept the fields as parameters. The low-level send function accepts a single uint64_t.
Edit As pointed out in the comments, it is not safe to read from raw after writing to a, b or c. To make it clear: this is also something i would like to change, but i included it at the top, because the union is used for all of my attempts.
My requirements are:
An overflow should be detected at compile time
The syntax should not be too bulky (subjective, but i will show what i want)
My first attempt
void send_struct(const Packed& packed)
{
raw_send(packed.raw);
}
int main()
{
send_struct({1000, 2, true}); // conversion changes value
Packed packed = {1000, 2, true}; // conversion changes value
send_struct(packed);
}
The warnings are generated which satisfies my first requirement, but i don't like the syntax: The curly braces look superfluous and manually creating a struct first is cumbersome.
With some warnings enable i even have to use two layers of curly braces, because the struct is nested inside the union.
Second attempt
template <typename ...Args>
void send_var(Args... args)
{
Packed packed {args...};
raw_send(packed.raw);
};
int main()
{
send_var(1000u, 2u, true);
}
Here, i like the syntax, but no warnings are generated, presumably because the bit width is lost somewhere.
Third attempt
struct A
{
uint64_t data : 5;
};
struct B
{
uint64_t data : 10;
};
void send_separate(A a, B b, bool c)
{
Packed packed {a.data, b.data, c};
raw_send(packed.raw);
}
int main()
{
send_separate({1000u}, {2u}, true); // conversion changes value
send_separate(1000u, 2u, true); // compile error
}
The first usage is ugly again: too many curly braces and the second one does not compile, because the structs cannot be implicitly constructed with a single value.
Question
How can i implement a function and a safe package definition such that the following function call compiles and shows a warning, because the value 1000 does not fit into 5 bit.
send(1000u, 2u, true);
I actually only care about the call site. The function and union definitions may be more complicated.
Edit 2
Using variables for the function parameters must should work, too.
uint64_t a, b;
send(a, b, true); // compiles, but may generate a warning
send(a & 0x1f, b & 0x3ff, true); // compiles preferably without a warning
The software will be used on linux only, is compiled with gcc or clang using at lease these warnings: -Wall -Wextra -pedantic plus the flag that allows anonymous structs and unions.

Suppose I declare an int but don't initialize it; what value is it? Can someone clear this up for me?

Can someone help me get a better understanding of creating variables in C++? I'll state my understanding and then you can correct me.
int x;
Not sure what that does besides declare that x is an integer on the stack.
int x = 5;
Creates a new variable x on the stack and sets it equal to 5. So empty space was found the stack and then used to house that variable.
int* px = new int;
Creates an anonymous variable on the heap. px is the memory address of the variable. Its value is 0 because, well, the bits are all off at that memory address.
int* px = new int;
*px = 5;
Same thing as before, except that the value of the integer at memory address px is set to 5. (Does this happen in 1 step???? Or does the program create an integer with value 0 on the heap and then set it to 5?
I know that everything I wrote above probably sounds naive, but I really am trying to understand this stuff.
Others have answered this question from the point of view of how the C++ standard works. My only additional comment there would be with global or static variables. So if you have
int bar ()
{
static int x;
return x;
}
then x doesn't live on the stack. It will be initialised to zero at the "start of time" (this is done in a function called crt0, at least with GCC: look up "BSS" segments for more information) and bar will return zero.
I'd massively recommend looking at the assembled code to see how a compiler actually treats what you write. For example, consider this tiny snippet:
int foo (int a)
{
int x, y;
x = 3;
y = a;
return x + y;
}
I made sure to use the values of x and y (by returning their sum) to ensure the compiler didn't just elide them completely. If you stick that code in a file called tmp.cc and then compile it with
$ g++ -O2 -c -o tmp.o tmp.cc
then ask for the disassembled code with objdump, you get:
$ objdump -d tmp.o
tmp.o: file format elf32-i386
Disassembly of section .text:
00000000 <_Z3fooi>:
0: 8b 44 24 04 mov 0x4(%esp),%eax
4: 83 c0 03 add $0x3,%eax
7: c3 ret
Whoah! What happened to x and y? Well, the point is that the C and C++ standards merely require the compiler to generate code that has the same behaviour as what your program asks for. In fact, this program loads 32 bits from the stack (this is the contents of a, a fact dictated by the ABI on my particular platform) and sticks it in the eax register. Then it adds three and returns. Another important fact about the ABI on my laptop (and probably yours too) is that the return value of a function sits in eax. Notice, the function didn't allocate any memory on the stack at all!
In fact, I also put bar (from above) in my tmp.cc. Here's the resulting code:
00000010 <_Z3barv>:
10: 31 c0 xor %eax,%eax
12: c3 ret
"Huh, what happened to x?", I hear you say :-) Well, the compiler spotted that nothing in the code required x to actually exist, and it always had the value zero. So the function basically got transformed into
int bar ()
{
return 0;
}
Magic!
When a new variable is created, it does not have a value. It can be anything, pretty much depending on what was in that piece of stack or heap before. int x; will give you a warning if you try to use the value without setting it to something first. E.g. int y = x; will cause a warning unless you give x an explicit value first.
Creating an int on the heap works pretty much the same way: int *p = new int; calls the default constructor, which does nothing, leaving the value of *p up to chance until you set it to something explicit. If you want to make sure your heap value is initialized, use int *p = new int(5); to tell the constructor what value to copy into the memory it allocates.
Unless you initialize an int variable to zero explicitly, it is pretty much never initialized for you unless it is a global, namespace, or class static.
In VS2010 specifically(other compilers may treat it differently), an int is not given a default value of 0. You can see this by trying to print out a non-initialized int. It does allocate memory with a size of int but it is not initialized(just junk).
In both of your cases, the memory is allocated FIRST, and then the value is set. If a value is not set, you have a non-initialized part of memory that will have "junk data" inside of it and you will get a compiler warning and possibly an error when running it.
Yes, it has an address in memory but there is no valid(known) data inside of it unless you specifically set it. It vary well could be anything that the compiler recognizes as available memory to be overwritten. Since it is unknown and not reliable, it is considered junk and useless and why compilers warn you about it.
Compilers WILL set static int and global int to 0.
EDIT: Due to Peter Schneider's comment.

Why does gcc/clang use two 128bit xmm registers to pass a single value?

So I stumbled upon something which I'd like to understand, as it's causing me headaches. I have the following code:
#include <stdio.h>
#include <smmintrin.h>
typedef union {
struct { float x, y, z, w; } v;
__m128 m;
} vec;
vec __attribute__((noinline)) square(vec a)
{
vec x = { .m = _mm_mul_ps(a.m, a.m) };
return x;
}
int main(int argc, char *argv[])
{
float f = 4.9;
vec a = (vec){f, f, f, f};
vec res = square(a); // ?
printf("%f %f %f %f\n", res.v.x, res.v.y, res.v.z, res.v.w);
return 0;
}
Now, in my mind, the call to square in main should put the value of a in xmm0 so that the square function can do mulps xmm0, xmm0 and be done with it.
This is not what happens when I compile with clang or gcc. Instead, the first 8 bytes of a are put in xmm0 and the next 8 bytes in xmm1, making the square function a lot more complicated as it needs to patch things back up.
Any idea why?
NOTE: This is with -O3 optimization.
After further research, it seems like it has to do with the union type. If the function takes a straight __m128, the generated code will expect the value in a single register (xmm0). But given that they should both fit in xmm0, I don't see why it's being split in two half-used registers when the vec type is used..
The compiler is just trying to follow the calling convention as specified by the System V Application Binary Interface AMD64 Architecture Processor Supplement, section 3.2.3 Parameter Passing.
The relevant points are:
We first define a number of classes to classify arguments. The
classes are corresponding to AMD64 register classes and defined as:
SSE The class consists of types that fit into a vector register.
SSEUP The class consists of types that fit into a vector register and can
be passed and returned in the upper bytes of it.
The size of each argument gets rounded up to eightbytes.
The basic types are assigned their natural classes:
Arguments of types float, double, _Decimal32, _Decimal64 and __m64 are
in class SSE.
The classification of aggregate (structures and arrays) and union types
works as follows:
If the size of the aggregate exceeds a single eightbyte, each is
classified separately.
Applying the above rules means that the x, y and z, w pairs of the embedded struct get separately classified as SSE class, which in turn means they must be passed in two separate registers. The presence of the m member in this case doesn't have any effect, you can even delete it.
EDIT: on a second read through, I'm less certain why this is happening, but I'm more certain that this is where it is happening. I don't think this answer is right, but I'll leave it up as it may be helpful.
Speaking only for clang:
It seems like this is an issue that is just an unfortunate side effect of a compiler heuristic.
From a brief look at clang (file CGRecordLayoutBuilder.cpp, function CGRecordLowering::lowerUnion) it looks like llvm doesn't internally represent union types as such, and the types of a function don't get changed depending on the uses within the function.
clang looks at your function and sees that it needs 16 bytes worth of arguments for the type signature, then uses a heuristic to pick which type it thinks is best. It favors a { double, double } interpretation over a <4 x float> (which would give it the most efficiency in your case) because doubles are more lenient with respect to alignment.
I'm no expert on clang internals, so I could be very wrong, but it doesn't look like there's a particularly nice way around this one. If you want the optimized version you may have to use pointer casting instead of unions to get it.
The code I suspect is causing the problem:
void CGRecordLowering::lowerUnion() {
...
// Conditionally update our storage type if we've got a new "better" one.
if (!StorageType ||
getAlignment(FieldType) > getAlignment(StorageType) ||
(getAlignment(FieldType) == getAlignment(StorageType) &&
getSize(FieldType) > getSize(StorageType)))
StorageType = FieldType;
...
}

What's the major difference between "union" and "struct" in C.? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Difference between a Structure and a Union in C
I could understand what a struct means. But, i am bit confused with the difference between union and struct. Union is like a share of memory. What exactly it means.?
With a union, all members share the same memory. With a struct, they do not share memory, so a different space in memory is allocated to each member of the struct.
For example:
union foo
{
int x;
int y;
};
foo f;
f.x = 10;
printf("%d\n", f.y);
Here, we assign the value of 10 to foo::x. Then we output the value of foo::y, which is also 10 since x and y share the same memory. Note that since all members of a union share the same memory, the compiler must allocate enough memory to fit the largest member of the union. So a union containing a char and a long would need enough space to fit the long.
But if we use a struct:
struct foo
{
int x;
int y;
};
foo f;
f.x = 10;
f.y = 20;
printf("%d %d\n", f.x, f.y);
We assign 10 to x and 20 to y, and then print them both out. We see that x is 10 and y is 20, because x and y do not share the same memory.
EDIT: Also take note of Gman's comment above. The example I provided with the union is for demonstration purposes only. In practice, you shouldn't write to one data member of a union, and then access another data member. Usually this will simply cause the compiler to interpret the bit pattern as another type, but you may get unexpected results since doing this is undefined behavior.
I've used unions to convert bytes to and from other types. I find it easier than bit-shifting.
union intConverter {
int intValue;
struct {
byte hi;
byte lo;
} byteValue;
}
intConverter cv;
cv.intValue =1100;
printf("%X %X\n", cv.byteValue.hi, cv.byteValue.lo);
Where int is 16-bit (was used on a micro controller).
Each member of a union shares the same memory. That means if you change one, you change the others. And if the members are of different types, this can have unpredictable results. (not exactly unpredictable, but hard to predict unless you are aware of the underlying bit patterns that make up the data members).
It may be more useful to have an uncontrived example of what this is good for. (I say "uncontrived" because most bit-banging uses of union are extremely treacherous. Bit-banging unions taken from big-endian to little-endian hardware break in the most (initially) mystifying ways.) (Of course, I've written bit-banging unions to tear apart floating point numbers to implement orders-of-magnitude-faster-than-the-library math functions. I just add assertions about which members are supposed to have the same addresses.)
struct option1 { int type; /* other members */ };
struct option2 { int type; /* other members */ };
struct option3 { int type; /* other members */ };
union combo {
int type; // guaranteed to exactly overlap with the structs' ints type.
struct option1;
struct option2;
struct option3;
};
// ...
void foo(union combo *in) {
switch(in.type) {
case 1: { struct option1 *bar = in; //then process an option1 type of request }
case 2: { struct option2 *bar = in; //then process an option2 type of request }
case 3: { struct option3 *bar = in; //then process an option3 type of request }
}
This kind of construction is very common in X programming and other situations where one wishes to make a function that can receive many different types of messages (with different argument and layout requirements).
I suppose one way you can think of a union is that it is a set of aliases of varying type to a block of memory where each member of the union is an "alias" with a given type. Each alias refers to the same address in memory. How the bits at that address are interpreted are determined by the alias' type.
The amount of memory the union occupies is always equal to or possibly larger than the largest sized "member" of the union (due to alignment restrictions).
Run this program and find out the output.
#include < stdio.h >
int main()
{
union _testUnion
{
long long x;
long long y;
} testUnion;
struct _testStruct
{
long long x;
long long y;
}testStruct;
printf("Sizeof Union %d\n",sizeof(testUnion));
printf("Sizeof Struct %d\n",sizeof(testStruct));
return;
}
You will find that the size of struct is double than that of union. This is because union has allocated space for only one variable while struct has allocated for two.
Most answers here are correct. A union is essentially a way to access same data in different ways (For example, you can access/interpret 4 bytes of memory as 1 integers, or as 4 characters). Structs as you know are straightforward - a collection of different, seprate objects with their own memory.
Usually you require Unions at a much later stage in programming as compared to Structs.

When would anyone use a union? Is it a remnant from the C-only days?

I have learned but don't really get unions. Every C or C++ text I go through introduces them (sometimes in passing), but they tend to give very few practical examples of why or where to use them. When would unions be useful in a modern (or even legacy) case? My only two guesses would be programming microprocessors when you have very limited space to work with, or when you're developing an API (or something similar) and you want to force the end user to have only one instance of several objects/types at one time. Are these two guesses even close to right?
Unions are usually used with the company of a discriminator: a variable indicating which of the fields of the union is valid. For example, let's say you want to create your own Variant type:
struct my_variant_t {
int type;
union {
char char_value;
short short_value;
int int_value;
long long_value;
float float_value;
double double_value;
void* ptr_value;
};
};
Then you would use it such as:
/* construct a new float variant instance */
void init_float(struct my_variant_t* v, float initial_value) {
v->type = VAR_FLOAT;
v->float_value = initial_value;
}
/* Increments the value of the variant by the given int */
void inc_variant_by_int(struct my_variant_t* v, int n) {
switch (v->type) {
case VAR_FLOAT:
v->float_value += n;
break;
case VAR_INT:
v->int_value += n;
break;
...
}
}
This is actually a pretty common idiom, specially on Visual Basic internals.
For a real example see SDL's SDL_Event union. (actual source code here). There is a type field at the top of the union, and the same field is repeated on every SDL_*Event struct. Then, to handle the correct event you need to check the value of the type field.
The benefits are simple: there is one single data type to handle all event types without using unnecessary memory.
I find C++ unions pretty cool. It seems that people usually only think of the use case where one wants to change the value of a union instance "in place" (which, it seems, serves only to save memory or perform doubtful conversions).
In fact, unions can be of great power as a software engineering tool, even when you never change the value of any union instance.
Use case 1: the chameleon
With unions, you can regroup a number of arbitrary classes under one denomination, which isn't without similarities with the case of a base class and its derived classes. What changes, however, is what you can and can't do with a given union instance:
struct Batman;
struct BaseballBat;
union Bat
{
Batman brucewayne;
BaseballBat club;
};
ReturnType1 f(void)
{
BaseballBat bb = {/* */};
Bat b;
b.club = bb;
// do something with b.club
}
ReturnType2 g(Bat& b)
{
// do something with b, but how do we know what's inside?
}
Bat returnsBat(void);
ReturnType3 h(void)
{
Bat b = returnsBat();
// do something with b, but how do we know what's inside?
}
It appears that the programmer has to be certain of the type of the content of a given union instance when he wants to use it. It is the case in function f above. However, if a function were to receive a union instance as a passed argument, as is the case with g above, then it wouldn't know what to do with it. The same applies to functions returning a union instance, see h: how does the caller know what's inside?
If a union instance never gets passed as an argument or as a return value, then it's bound to have a very monotonous life, with spikes of excitement when the programmer chooses to change its content:
Batman bm = {/* */};
Baseball bb = {/* */};
Bat b;
b.brucewayne = bm;
// stuff
b.club = bb;
And that's the most (un)popular use case of unions. Another use case is when a union instance comes along with something that tells you its type.
Use case 2: "Nice to meet you, I'm object, from Class"
Suppose a programmer elected to always pair up a union instance with a type descriptor (I'll leave it to the reader's discretion to imagine an implementation for one such object). This defeats the purpose of the union itself if what the programmer wants is to save memory and that the size of the type descriptor is not negligible with respect to that of the union. But let's suppose that it's crucial that the union instance could be passed as an argument or as a return value with the callee or caller not knowing what's inside.
Then the programmer has to write a switch control flow statement to tell Bruce Wayne apart from a wooden stick, or something equivalent. It's not too bad when there are only two types of contents in the union but obviously, the union doesn't scale anymore.
Use case 3:
As the authors of a recommendation for the ISO C++ Standard put it back in 2008,
Many important problem domains require either large numbers of objects or limited memory
resources. In these situations conserving space is very important, and a union is often a perfect way to do that. In fact, a common use case is the situation where a union never changes its active member during its lifetime. It can be constructed, copied, and destructed as if it were a struct containing only one member. A typical application of this would be to create a heterogeneous collection of unrelated types which are not dynamically allocated (perhaps they are in-place constructed in a map, or members of an array).
And now, an example, with a UML class diagram:
The situation in plain English: an object of class A can have objects of any class among B1, ..., Bn, and at most one of each type, with n being a pretty big number, say at least 10.
We don't want to add fields (data members) to A like so:
private:
B1 b1;
.
.
.
Bn bn;
because n might vary (we might want to add Bx classes to the mix), and because this would cause a mess with constructors and because A objects would take up a lot of space.
We could use a wacky container of void* pointers to Bx objects with casts to retrieve them, but that's fugly and so C-style... but more importantly that would leave us with the lifetimes of many dynamically allocated objects to manage.
Instead, what can be done is this:
union Bee
{
B1 b1;
.
.
.
Bn bn;
};
enum BeesTypes { TYPE_B1, ..., TYPE_BN };
class A
{
private:
std::unordered_map<int, Bee> data; // C++11, otherwise use std::map
public:
Bee get(int); // the implementation is obvious: get from the unordered map
};
Then, to get the content of a union instance from data, you use a.get(TYPE_B2).b2 and the likes, where a is a class A instance.
This is all the more powerful since unions are unrestricted in C++11. See the document linked to above or this article for details.
One example is in the embedded realm, where each bit of a register may mean something different. For example, a union of an 8-bit integer and a structure with 8 separate 1-bit bitfields allows you to either change one bit or the entire byte.
Herb Sutter wrote in GOTW about six years ago, with emphasis added:
"But don't think that unions are only a holdover from earlier times. Unions are perhaps most useful for saving space by allowing data to overlap, and this is still desirable in C++ and in today's modern world. For example, some of the most advanced C++ standard library implementations in the world now use just this technique for implementing the "small string optimization," a great optimization alternative that reuses the storage inside a string object itself: for large strings, space inside the string object stores the usual pointer to the dynamically allocated buffer and housekeeping information like the size of the buffer; for small strings, the same space is instead reused to store the string contents directly and completely avoid any dynamic memory allocation. For more about the small string optimization (and other string optimizations and pessimizations in considerable depth), see... ."
And for a less useful example, see the long but inconclusive question gcc, strict-aliasing, and casting through a union.
Well, one example use case I can think of is this:
typedef union
{
struct
{
uint8_t a;
uint8_t b;
uint8_t c;
uint8_t d;
};
uint32_t x;
} some32bittype;
You can then access the 8-bit separate parts of that 32-bit block of data; however, prepare to potentially be bitten by endianness.
This is just one hypothetical example, but whenever you want to split data in a field into component parts like this, you could use a union.
That said, there is also a method which is endian-safe:
uint32_t x;
uint8_t a = (x & 0xFF000000) >> 24;
For example, since that binary operation will be converted by the compiler to the correct endianness.
Some uses for unions:
Provide a general endianness interface to an unknown external host.
Manipulate foreign CPU architecture floating point data, such as accepting VAX G_FLOATS from a network link and converting them to IEEE 754 long reals for processing.
Provide straightforward bit twiddling access to a higher-level type.
union {
unsigned char byte_v[16];
long double ld_v;
}
With this declaration, it is simple to display the hex byte values of a long double, change the exponent's sign, determine if it is a denormal value, or implement long double arithmetic for a CPU which does not support it, etc.
Saving storage space when fields are dependent on certain values:
class person {
string name;
char gender; // M = male, F = female, O = other
union {
date vasectomized; // for males
int pregnancies; // for females
} gender_specific_data;
}
Grep the include files for use with your compiler. You'll find dozens to hundreds of uses of union:
[wally#zenetfedora ~]$ cd /usr/include
[wally#zenetfedora include]$ grep -w union *
a.out.h: union
argp.h: parsing options, getopt is called with the union of all the argp
bfd.h: union
bfd.h: union
bfd.h:union internal_auxent;
bfd.h: (bfd *, struct bfd_symbol *, int, union internal_auxent *);
bfd.h: union {
bfd.h: /* The value of the symbol. This really should be a union of a
bfd.h: union
bfd.h: union
bfdlink.h: /* A union of information depending upon the type. */
bfdlink.h: union
bfdlink.h: this field. This field is present in all of the union element
bfdlink.h: the union; this structure is a major space user in the
bfdlink.h: union
bfdlink.h: union
curses.h: union
db_cxx.h:// 4201: nameless struct/union
elf.h: union
elf.h: union
elf.h: union
elf.h: union
elf.h:typedef union
_G_config.h:typedef union
gcrypt.h: union
gcrypt.h: union
gcrypt.h: union
gmp-i386.h: union {
ieee754.h:union ieee754_float
ieee754.h:union ieee754_double
ieee754.h:union ieee854_long_double
ifaddrs.h: union
jpeglib.h: union {
ldap.h: union mod_vals_u {
ncurses.h: union
newt.h: union {
obstack.h: union
pi-file.h: union {
resolv.h: union {
signal.h:extern int sigqueue (__pid_t __pid, int __sig, __const union sigval __val)
stdlib.h:/* Lots of hair to allow traditional BSD use of `union wait'
stdlib.h: (__extension__ (((union { __typeof(status) __in; int __i; }) \
stdlib.h:/* This is the type of the argument to `wait'. The funky union
stdlib.h: causes redeclarations with either `int *' or `union wait *' to be
stdlib.h:typedef union
stdlib.h: union wait *__uptr;
stdlib.h: } __WAIT_STATUS __attribute__ ((__transparent_union__));
thread_db.h: union
thread_db.h: union
tiffio.h: union {
wchar.h: union
xf86drm.h:typedef union _drmVBlank {
Unions are useful when dealing with byte-level (low level) data.
One of my recent usage was on IP address modeling which looks like below :
// Composite structure for IP address storage
union
{
// IPv4 # 32-bit identifier
// Padded 12-bytes for IPv6 compatibility
union
{
struct
{
unsigned char _reserved[12];
unsigned char _IpBytes[4];
} _Raw;
struct
{
unsigned char _reserved[12];
unsigned char _o1;
unsigned char _o2;
unsigned char _o3;
unsigned char _o4;
} _Octet;
} _IPv4;
// IPv6 # 128-bit identifier
// Next generation internet addressing
union
{
struct
{
unsigned char _IpBytes[16];
} _Raw;
struct
{
unsigned short _w1;
unsigned short _w2;
unsigned short _w3;
unsigned short _w4;
unsigned short _w5;
unsigned short _w6;
unsigned short _w7;
unsigned short _w8;
} _Word;
} _IPv6;
} _IP;
Unions provide polymorphism in C.
An example when I've used a union:
class Vector
{
union
{
double _coord[3];
struct
{
double _x;
double _y;
double _z;
};
};
...
}
this allows me to access my data as an array or the elements.
I've used a union to have the different terms point to the same value. In image processing, whether I was working on columns or width or the size in the X direction, it can become confusing. To alleve this problem, I use a union so I know which descriptions go together.
union { // dimension from left to right // union for the left to right dimension
uint32_t m_width;
uint32_t m_sizeX;
uint32_t m_columns;
};
union { // dimension from top to bottom // union for the top to bottom dimension
uint32_t m_height;
uint32_t m_sizeY;
uint32_t m_rows;
};
The union keyword, while still used in C++031, is mostly a remnant of the C days. The most glaring issue is that it only works with POD1.
The idea of the union, however, is still present, and indeed the Boost libraries feature a union-like class:
boost::variant<std::string, Foo, Bar>
Which has most of the benefits of the union (if not all) and adds:
ability to correctly use non-POD types
static type safety
In practice, it has been demonstrated that it was equivalent to a combination of union + enum, and benchmarked that it was as fast (while boost::any is more of the realm of dynamic_cast, since it uses RTTI).
1Unions were upgraded in C++11 (unrestricted unions), and can now contain objects with destructors, although the user has to invoke the destructor manually (on the currently active union member). It's still much easier to use variants.
A brilliant usage of union is memory alignment, which I found in the PCL(Point Cloud Library) source code. The single data structure in the API can target two architectures: CPU with SSE support as well as the CPU without SSE support. For eg: the data structure for PointXYZ is
typedef union
{
float data[4];
struct
{
float x;
float y;
float z;
};
} PointXYZ;
The 3 floats are padded with an additional float for SSE alignment.
So for
PointXYZ point;
The user can either access point.data[0] or point.x (depending on the SSE support) for accessing say, the x coordinate.
More similar better usage details are on following link: PCL documentation PointT types
From the Wikipedia article on unions:
The primary usefulness of a union is
to conserve space, since it provides a
way of letting many different types be
stored in the same space. Unions also
provide crude polymorphism. However,
there is no checking of types, so it
is up to the programmer to be sure
that the proper fields are accessed in
different contexts. The relevant field
of a union variable is typically
determined by the state of other
variables, possibly in an enclosing
struct.
One common C programming idiom uses
unions to perform what C++ calls a
reinterpret_cast, by assigning to one
field of a union and reading from
another, as is done in code which
depends on the raw representation of
the values.
In the earliest days of C (e.g. as documented in 1974), all structures shared a common namespace for their members. Each member name was associated with a type and an offset; if "wd_woozle" was an "int" at offset 12, then given a pointer p of any structure type, p->wd_woozle would be equivalent to *(int*)(((char*)p)+12). The language required that all members of all structures types have unique names except that it explicitly allowed reuse of member names in cases where every struct where they were used treated them as a common initial sequence.
The fact that structure types could be used promiscuously made it possible to have structures behave as though they contained overlapping fields. For example, given definitions:
struct float1 { float f0;};
struct byte4 { char b0,b1,b2,b3; }; /* Unsigned didn't exist yet */
code could declare a structure of type "float1" and then use "members" b0...b3 to access the individual bytes therein. When the language was changed so that each structure would receive a separate namespace for its members, code which relied upon the ability to access things multiple ways would break. The values of separating out namespaces for different structure types was sufficient to require that such code be changed to accommodate it, but the value of such techniques was sufficient to justify extending the language to continue supporting it.
Code which had been written to exploit the ability to access the storage within a struct float1 as though it were a struct byte4 could be made to work in the new language by adding a declaration: union f1b4 { struct float1 ff; struct byte4 bb; };, declaring objects as type union f1b4; rather than struct float1, and replacing accesses to f0, b0, b1, etc. with ff.f0, bb.b0, bb.b1, etc. While there are better ways such code could have been supported, the union approach was at least somewhat workable, at least with C89-era interpretations of the aliasing rules.
Lets say you have n different types of configurations (just being a set of variables defining parameters). By using an enumeration of the configuration types, you can define a structure that has the ID of the configuration type, along with a union of all the different types of configurations.
This way, wherever you pass the configuration can use the ID to determine how to interpret the configuration data, but if the configurations were huge you would not be forced to have parallel structures for each potential type wasting space.
One recent boost on the, already elevated, importance of the unions has been given by the Strict Aliasing Rule introduced in recent version of C standard.
You can use unions do to type-punning without violating the C standard.
This program has unspecified behavior (because I have assumed that float and unsigned int have the same length) but not undefined behavior (see here).
#include <stdio.h>
union float_uint
{
float f;
unsigned int ui;
};
int main()
{
float v = 241;
union float_uint fui = {.f = v};
//May trigger UNSPECIFIED BEHAVIOR but not UNDEFINED BEHAVIOR
printf("Your IEEE 754 float sir: %08x\n", fui.ui);
//This is UNDEFINED BEHAVIOR as it violates the Strict Aliasing Rule
unsigned int* pp = (unsigned int*) &v;
printf("Your IEEE 754 float, again, sir: %08x\n", *pp);
return 0;
}
I would like to add one good practical example for using union - implementing formula calculator/interpreter or using some kind of it in computation(for example, you want to use modificable during run-time parts of your computing formulas - solving equation numerically - just for example).
So you may want to define numbers/constants of different types(integer, floating-point, even complex numbers) like this:
struct Number{
enum NumType{int32, float, double, complex}; NumType num_t;
union{int ival; float fval; double dval; ComplexNumber cmplx_val}
}
So you're saving memory and what is more important - you avoid any dynamic allocations for probably extreme quantity(if you use a lot of run-time defined numbers) of small objects(compared to implementations through class inheritance/polymorphism). But what's more interesting, you still can use power of C++ polymorphism(if you're fan of double dispatching, for example ;) with this type of struct. Just add "dummy" interface pointer to parent class of all number types as a field of this struct, pointing to this instance instead of/in addition to raw type, or use good old C function pointers.
struct NumberBase
{
virtual Add(NumberBase n);
...
}
struct NumberInt: Number
{
//implement methods assuming Number's union contains int
NumberBase Add(NumberBase n);
...
}
struct NumberDouble: Number
{
//implement methods assuming Number's union contains double
NumberBase Add(NumberBase n);
...
}
//e.t.c. for all number types/or use templates
struct Number: NumberBase{
union{int ival; float fval; double dval; ComplexNumber cmplx_val;}
NumberBase* num_t;
Set(int a)
{
ival=a;
//still kind of hack, hope it works because derived classes of Number dont add any fields
num_t = static_cast<NumberInt>(this);
}
}
so you can use polymorphism instead of type checks with switch(type) - with memory-efficient implementation(no dynamic allocation of small objects) - if you need it, of course.
From http://cplus.about.com/od/learningc/ss/lowlevel_9.htm:
The uses of union are few and far between. On most computers, the size
of a pointer and an int are usually the same- this is because both
usually fit into a register in the CPU. So if you want to do a quick
and dirty cast of a pointer to an int or the other way, declare a
union.
union intptr { int i; int * p; };
union intptr x; x.i = 1000;
/* puts 90 at location 1000 */
*(x.p)=90;
Another use of a union is in a command or message protocol where
different size messages are sent and received. Each message type will
hold different information but each will have a fixed part (probably a
struct) and a variable part bit. This is how you might implement it..
struct head { int id; int response; int size; }; struct msgstring50 { struct head fixed; char message[50]; } struct
struct msgstring80 { struct head fixed; char message[80]; }
struct msgint10 { struct head fixed; int message[10]; } struct
msgack { struct head fixed; int ok; } union messagetype {
struct msgstring50 m50; struct msgstring80 m80; struct msgint10
i10; struct msgack ack; }
In practice, although the unions are the same size, it makes sense to
only send the meaningful data and not wasted space. A msgack is just
16 bytes in size while a msgstring80 is 92 bytes. So when a
messagetype variable is initialized, it has its size field set
according to which type it is. This can then be used by other
functions to transfer the correct number of bytes.
Unions provide a way to manipulate different kind of data in a single area of storage without embedding any machine independent information in the program
They are analogous to variant records in pascal
As an example such as might be found in a compiler symbol table manager, suppose that a
constant may be an int, a float, or a character pointer. The value of a particular constant
must be stored in a variable of the proper type, yet it is most convenient for table management if the value occupies the same amount of storage and is stored in the same place regardless of its type. This is the purpose of a union - a single variable that can legitimately hold any of one of several types. The syntax is based on structures:
union u_tag {
int ival;
float fval;
char *sval;
} u;
The variable u will be large enough to hold the largest of the three types; the specific size is implementation-dependent. Any of these types may be assigned to u and then used in
expressions, so long as the usage is consistent