C++ Memory alignment - should we care? [duplicate]

C++ Memory alignment - should we care? [duplicate] - c++

This question already has answers here:
Does alignment really matter for performance in C++11?
(2 answers)
Closed 3 years ago.
Consider working on an x64 bit Windows operation system with the following type alignments:
As far as I understand, it is very bad to do something like this:
struct X_chaotic
{
bool flag1;
double d1;
bool flag2;
double d2;
bool flag3;
double d3;
//... and so on ...
};
According to C++ Alignment, Cache Line and Best Practice
and Data structure alignment,
it should be better/faster and much more compact to write this:
struct X_alignOrder
{
double d1;
double d2;
double d3;
//... all other doubles ...
bool flag1;
bool flag2;
bool flag3;
//... all other bools ...
};
The members are declared in the order of the alignment size, starting with the highest alignment.
Is it safe to say it is a good idea to order the declaration of the data members by alignment size? Would you say it is best practice? Or does it make no difference?
(I heard that the compiler can not rearrange the defined order, due to the C++ standard, and this even holds for all data members declared in access specifier blocks of a class)
Because I never read about this, neither in Scott Meyers' books nor in Bjarne Stroustrup's books, I wonder if I should start reordering the data declarations by alignment for my day-to-day work.

This is more complicated than it may seem.
By ordering your members according to alignment needs you'll save some padding bytes and the total size will be smaller. This may be important to you if memory is tight or if this means the type can fit in a single cache line rather than two or three.
On the other hand; if you often access members that used to be close together so they would often be pulled into cache together by the CPUs prefetcher before, but now won't after reorganizing the class. Then you could be saving memory but sacrificing runtime performance.
Performance here may also vary greatly across different CPUs and different compilers/compiler options.
You'll need to run some benchmarks in your actual environment to see what performs the best for you.
Also keep in mind that reshuffling your member variables changes the order of initialization, which can be important if members depend on each other (foo initializes bar, so foo needs to be initialized first, etc).

Yes. In theory, the alignment of your data structures matters if you are concerned about the performance. It is good programming practice as well.
Most of the time, the alignment of your data structure is set based on the widest member of the 'struct'. Normally, your compiler takes care of it for you. The behaviour, however, can be different for C++ and C when in comes to inserting leading padding.
You can use the offsetof macro to evaluate the distance of a given struct member in size_t. This is ANSI C, though.
#include <stdio.h>
#include <stddef.h>
typedef struct Test_t {
char *p;
char c;
int i;
long l;
} Test;
int main(){
printf("offsetof(Test,p) = %zu\n", offsetof(Test,p));
printf("offsetof(Test,c) = %zu\n", offsetof(Test,c));
printf("offsetof(Test,i) = %zu\n", offsetof(Test,i));
printf("offsetof(Test,l) = %zu\n", offsetof(Test,l));
return 0;
}
This will print
offsetof(Test,p) = 0
offsetof(Test,c) = 8
offsetof(Test,i) = 12
offsetof(Test,l) = 16

Related

How can I change the byte orders of a struct?

How can I change the order of a packed structure in C or C++?
struct myStruct {
uint32_t A;
uint16_t B1;
uint16_t B2;
} __attribute__((packed));
The address 0x0 of the structure (or the LSB) is A.
My app communicates with hardware and the structure in hardware is defined like this:
struct packed {
logic [31:0] A;
logic [15:0] B1;
logic [15:0] B2;
} myStruct;
But in SystemVerilog the "address 0x0" or more accurately the LSB of the structure is the LSB of B2 = B2[0].
The order is reversed.
To stay consistent and to avoid changing the hardware part, I'd like to inverse the "endianness" of the whole C/C++ structure.
I could just inverse all the fields:
struct myStruct {
uint16_t B2;
uint16_t B1;
uint32_t A;
} __attribute__((packed));
but it's error-prone and not so convenient.
For datatype, both SystemVerilog and Intel CPUs are little-endian, that's not an issue.
How can I do it?

How can I change the byte orders of a struct?
You cannot change the order of bytes within members. And you cannot change the memory order of the members in relation to other members to be different from the order of their declaration.
But, you can change the declaration order of members which is what determines their memory order. The first member is always in lowest memory position, second is after that and so on.
If correct order of members can be known based on the verilog source, then ideally the C struct definition should be generated with meta-programming to ensure matching order.
it's error-prone
Relying on particular memory order is error-prone indeed.
It is possible to rely only on the known memory order of the source data (presumably an array of bytes) without relying on the memory order of the members at all:
unsigned char* data = read_hardware();
myStruct s;
s.B2 = data[0] << 0u
| data[1] << 8u;
s.B1 = data[2] << 0u
| data[3] << 8u;
s.A = data[4] << 0u
| data[5] << 8u
| data[6] << 16u
| data[7] << 24u;
This relies neither memory layout of the members, nor on the endianness of CPU. It relies only on order of the source data (assumed to be little endian in this case).
If possible, this function should also ideally be generated based on the verilog source.

How can I change the order of a packed structure in C or C++?
C specifies that the members of a struct are laid out in memory in the order in which they are declared, with the address of the first-declared, when converted to the appropriate pointer type, being equal to the address of the overall struct. At least for struct types expressible in C, such as yours, conforming C++ implementations will follow the same member-order rule. Those implementations that support packed structure layout as an extension are pretty consistent in what they mean by that: packed structure layouts will have no padding between members, and the overall size is the sum of the sizes of the members. And no other effects.
I am not aware of any implementation that provides an extention allowing members to be ordered differently than declaration order, and who would bother to implement that? The order of members is well-defined. If you want a different order, then the solution is to change the declaration order of the members.
If VeriLog indeed orders the members differently (to which I cannot speak) then I think you're just going to need to make peace with that. Implement it as you need to do or as otherwise makes the most sense, document on both sides, and move on. I'm inclined to think that the number of people who ever notice that the declaration order differs in the two languages will be very small. As long as appropriate the documentation is present, those that do notice won't be inclined to think there's an error.

You know I just looked AMD does in it's open source drivers to handle endianness.
First of all they detect if the system is big endian/little endian using cmake.
#if !defined (__GFX10_GB_REG_H__)
#define __GFX10_GB_REG_H__
/*
* gfx10_gb_reg.h
*
* Register Spec Release: 1.0
*
*/
//
// Make sure the necessary endian defines are there.
//
#if defined(LITTLEENDIAN_CPU)
#elif defined(BIGENDIAN_CPU)
#else
#error "BIGENDIAN_CPU or LITTLEENDIAN_CPU must be defined"
#endif
union GB_ADDR_CONFIG
{
struct
{
#if defined(LITTLEENDIAN_CPU)
unsigned int NUM_PIPES : 3;
unsigned int PIPE_INTERLEAVE_SIZE : 3;
unsigned int MAX_COMPRESSED_FRAGS : 2;
unsigned int NUM_PKRS : 3;
unsigned int : 21;
#elif defined(BIGENDIAN_CPU)
unsigned int : 21;
unsigned int NUM_PKRS : 3;
unsigned int MAX_COMPRESSED_FRAGS : 2;
unsigned int PIPE_INTERLEAVE_SIZE : 3;
unsigned int NUM_PIPES : 3;
#endif
} bitfields, bits;
unsigned int u32All;
int i32All;
float f32All;
};
#endif
Yes there is some code duplication as mentioned above. But I'm not aware of a universally better solution either.

Independent of the endian issue, I wouldn't recommend C++ bit fields for this kind of purpose, or any purpose in which you actually need explicit control of bit alignment. A long time ago, the decision to put performance over portability ruined this possibility. Alignment of bit fields (and structs in general for that matter) is not well defined in C++, making bit fields useless for many purposes. IMO would be better to let programmers make such decisions for optimization, or implement a strictly portable (non-machine dependent) packed keyword. If this means the compiler has to emit code that combines multiple shift-and operations once in a while, so be it.
As far as I know, the only general solution for this kind of thing is to add a layer that implements bit fields explicitly using shift-and logic. Of course this will likely ruin performance because you really want the conditionals to be handled at compile time, which is ironic because performance is what motivated this situation in the first place.

Is there any portable way to ensure a struct is defined without padding bytes using c++11? [duplicate]

This question already has answers here:
Compile-time check to make sure that there is no padding anywhere in a struct
(4 answers)
Closed 3 years ago.
Lets consider the following task:
My C++ module as part of an embedded system receives 8 bytes of data, like: uint8_t data[8].
The value of the first byte determines the layout of the rest (20-30 different). In order to get the data effectively, I would create different structs for each layout and put each to a union and read the data directly from the address of my input through a pointer like this:
struct Interpretation_1 {
uint8_t multiplexer;
uint8_t timestamp;
uint32_t position;
uint16_t speed;
};
// and a lot of other struct like this (with bitfields, etc..., layout is not defined by me :( )
union DataInterpreter {
Interpretation_1 movement;
//Interpretation_2 temperatures;
//etc...
};
...
uint8_t exampleData[8] {1u, 10u, 20u,0u,0u,0u, 5u,0u};
DataInterpreter* interpreter = reinterpret_cast<DataInterpreter*>(&exampleData);
std::cout << "position: " << +interpreter->movement.position << "\n";
The problem I have is, the compiler can insert padding bytes to the interpretation structs and this kills my idea. I know I can use
with gcc: struct MyStruct{} __attribute__((__packed__));
with MSVC: I can use #pragma pack(push, 1) MyStruct{}; #pragma pack(pop)
with clang: ? (I could check it)
But is there any portable way to achieve this? I know c++11 has e.g. alignas for alignment control, but can I use it for this? I have to use c++11 but I would be just interested if there is a better solution with later version of c++.

But is there any portable way to achieve this?
No, there is no (standard) way to "make" a type that would have padding to not have padding in C++. All objects are aligned at least as much as their type requires and if that alignment doesn't match with the previous sub objects, then there will be padding and that is unavoidable.
Furthermore, there is another problem: You're accessing through a reinterpreted pointed that doesn't point to an object of compatible type. The behaviour of the program is undefined.
We can conclude that classes are not generally useful for representing arbitrary binary data. The packed structures are non-standard, and they also aren't compatible across different systems with different representations for integers (byte endianness).
There is a way to check whether a type contains padding: Compare the size of the sub objects to the size of the complete object, and do this recursively to each member. If the sizes don't match, then there is padding. This is quite tricky however because C++ has minimal reflection capabilities, so you need to resort either hard coding or meta programming.
Given such check, you can make the compilation fail on systems where the assumption doesn't hold.
Another handy tool is std::has_unique_object_representations (since C++17) which will always be false for all types that have padding. But note that it will also be false for types that contain floats for example. Only types that return true can be meaningfully compared for equality with std::memcmp.

Reading from unaligned memory is undefined behavior in C++. In other words, the compiler is allowed to assume that every uint32_t is located at a alignof(uint32_t)-byte boundary and every uint16_t is located at a alignof(uint16_t)-byte boundary. This means that if you somehow manage to pack your bytes portably, doing interpreter->movement.position will still trigger undefined behaviour.
(In practice, on most architectures, unaligned memory access will still work, but albeit incur a performance penalty.)
You could, however, write a wrapper, like how std::vector<bool>::operator[] works:
#include <cstdint>
#include <cstring>
#include <iostream>
#include <type_traits>
template <typename T>
struct unaligned_wrapper {
static_assert(std::is_trivial<T>::value);
std::aligned_storage_t<sizeof(T), 1> buf;
operator T() const noexcept {
T ret;
memcpy(&ret, &buf, sizeof(T));
return ret;
}
unaligned_wrapper& operator=(T t) noexcept {
memcpy(&buf, &t, sizeof(T));
return *this;
}
};
struct Interpretation_1 {
unaligned_wrapper<uint8_t> multiplexer;
unaligned_wrapper<uint8_t> timestamp;
unaligned_wrapper<uint32_t> position;
unaligned_wrapper<uint16_t> speed;
};
// and a lot of other struct like this (with bitfields, etc..., layout is not defined by me :( )
union DataInterpreter {
Interpretation_1 movement;
//Interpretation_2 temperatures;
//etc...
};
int main(){
uint8_t exampleData[8] {1u, 10u, 20u,0u,0u,0u, 5u,0u};
DataInterpreter* interpreter = reinterpret_cast<DataInterpreter*>(&exampleData);
std::cout << "position: " << interpreter->movement.position << "\n";
}
This would ensure that every read or write to the unaligned integer is transformed to a bytewise memcpy, which does not have any alignment requirement. There might be a performance penalty for this on architectures with the ability to access unaligned memory quickly, but it would work on any conforming compiler.

Cluster member variables declaration by their type useful or not?

Please have a look a the following code sample, executed on a Windows-32 system using Visual Studio 2010:
#include <iostream>
using namespace std;
class LogicallyClustered
{
bool _fA;
int _nA;
char _cA;
bool _fB;
int _nB;
char _cB;
};
class TypeClustered
{
bool _fA;
bool _fB;
char _cA;
char _cB;
int _nA;
int _nB;
};
int main(int argc, char* argv[])
{
cout << sizeof(LogicallyClustered) << endl; // 20
cout << sizeof(TypeClustered) << endl; // 12
return 0;
}
Question 1
The sizeof the two classes varies because the compiler is inserting padding bytes to achieve an optimized memory allignment of the variables. Is this correct?
Question 2
Why is the memory footprint smaller if I cluster the variables by type as in class TypeClustered?
Question 3
Is it a good rule of thumb to always cluster member variables according to their type?
Should I also sort them according to their size ascending (bool, char, int, double...)?
EDIT
Additional Question 4
A smaller memory footprint will improve data cache efficiency, since more objects can be cached and you avoid full memory accesses into "slow" RAM. So could the ordering and grouping of the member declaration can be considered as a (small) but easy to achieve performance optimization?

1) Absolutely correct.
2) It's not smaller because they are grouped, but because of the way they are ordered and grouped. For example, if you declare 4 chars one after the other, they can be packed into 4 byte. If you declare one char and immediately one int, 3 padding bytes will be inserted as the int will need to be aligned to 4 bytes.
3) No! You should group members in a class so that the class becomes more readable.
Important note: this is all platform/compiler specific. Don't take it ad-literam.
Another note - there also exist some small performance increase on some platforms for accessing members that reside in the first n (varies) bytes of a class instance. So declaring frequently accessed members at the beginning of a class can result in a small speed increase. However, this too shouldn't be a criteria. I'm just stating a fact, but in no way recommend you do this.

You are right, the size differs because the compiler inserts padding bytes in class LogicallyClustered. The compiler should use a memory layout like this:
class LogicallyClustered
{
// class starts well aligned
bool _fA;
// 3 bytes padding (int needs to be aligned)
int _nA;
char _cA;
bool _fB;
// 2 bytes padding (int needs to be aligned)
int _nB;
char _cB;
// 3 bytes padding (so next class object in an array would be aligned)
};
Your class TypeClustered does not need any padding bytes because all elements are aligned. bool and char do not need alignment, int needs to be aligned on 4 byte boundary.
Regarding question 3 I would say (as often :-)) "It depends.". If you are in an environment where memory footprint does not matter very much I would rather sort logically to make the code more readable. If you are in an environment where every byte counts you might consider moving around the members for optimal usage of space.

Unless there are no extreme memory footprint restrictions, cluster them logically, which improves code readability and ease of maintenance.

Unless you actually have problems of space (i.e. very, very large
vectors with such structures), don't worry about it. Otherwise: padding
is added for alignment: on most machines, for example, a double will
be aligned on an 8 byte boundary. Regrouping all members according to
type, with the types requiring the most alignment at the start will
result in the smallest memory footprint.

Q1: Yes
Q2: Depends on the size of bool (which is AFAIK compiler-dependent). Assuming it is 1 byte (like char), the first 4 members together use 4 bytes, which is as much as is used by one integer. Therefore, the compiler does not need to insert alignment padding in front of the integers.
Q3: If you want to order by type, size-descending is a better idea. However, that kind of clustering impedes readability. If you want to avoid padding under all circumstances, just make sure that every variable which needs more memory than 1 byte starts at an alignment boundary.
The alignment boundary, however, differs from architecture to architecture. That is (besides the possibly different sizes of int) why the same struct may have different sizes on different architectures. It is generally safe to start every member x at an offset of a
multiple of sizeof(x). I.e., in
struct {
char a;
char b;
char c;
int d;
}
The int d would start at an offset of 3, which is not a multiple of sizeof(int) (=4 on x86/64), so you should probably move it to the front. It is, however, not necessary to strictly cluster by type.
Some compilers also offer the possibility to completely omit padding, e.g. __attribute((packed))__ in g++. This, however, may slow down your program, because an int then might actually need two memory accesses.

When would anyone use a union? Is it a remnant from the C-only days?

I have learned but don't really get unions. Every C or C++ text I go through introduces them (sometimes in passing), but they tend to give very few practical examples of why or where to use them. When would unions be useful in a modern (or even legacy) case? My only two guesses would be programming microprocessors when you have very limited space to work with, or when you're developing an API (or something similar) and you want to force the end user to have only one instance of several objects/types at one time. Are these two guesses even close to right?

Unions are usually used with the company of a discriminator: a variable indicating which of the fields of the union is valid. For example, let's say you want to create your own Variant type:
struct my_variant_t {
int type;
union {
char char_value;
short short_value;
int int_value;
long long_value;
float float_value;
double double_value;
void* ptr_value;
};
};
Then you would use it such as:
/* construct a new float variant instance */
void init_float(struct my_variant_t* v, float initial_value) {
v->type = VAR_FLOAT;
v->float_value = initial_value;
}
/* Increments the value of the variant by the given int */
void inc_variant_by_int(struct my_variant_t* v, int n) {
switch (v->type) {
case VAR_FLOAT:
v->float_value += n;
break;
case VAR_INT:
v->int_value += n;
break;
...
}
}
This is actually a pretty common idiom, specially on Visual Basic internals.
For a real example see SDL's SDL_Event union. (actual source code here). There is a type field at the top of the union, and the same field is repeated on every SDL_*Event struct. Then, to handle the correct event you need to check the value of the type field.
The benefits are simple: there is one single data type to handle all event types without using unnecessary memory.

I find C++ unions pretty cool. It seems that people usually only think of the use case where one wants to change the value of a union instance "in place" (which, it seems, serves only to save memory or perform doubtful conversions).
In fact, unions can be of great power as a software engineering tool, even when you never change the value of any union instance.
Use case 1: the chameleon
With unions, you can regroup a number of arbitrary classes under one denomination, which isn't without similarities with the case of a base class and its derived classes. What changes, however, is what you can and can't do with a given union instance:
struct Batman;
struct BaseballBat;
union Bat
{
Batman brucewayne;
BaseballBat club;
};
ReturnType1 f(void)
{
BaseballBat bb = {/* */};
Bat b;
b.club = bb;
// do something with b.club
}
ReturnType2 g(Bat& b)
{
// do something with b, but how do we know what's inside?
}
Bat returnsBat(void);
ReturnType3 h(void)
{
Bat b = returnsBat();
// do something with b, but how do we know what's inside?
}
It appears that the programmer has to be certain of the type of the content of a given union instance when he wants to use it. It is the case in function f above. However, if a function were to receive a union instance as a passed argument, as is the case with g above, then it wouldn't know what to do with it. The same applies to functions returning a union instance, see h: how does the caller know what's inside?
If a union instance never gets passed as an argument or as a return value, then it's bound to have a very monotonous life, with spikes of excitement when the programmer chooses to change its content:
Batman bm = {/* */};
Baseball bb = {/* */};
Bat b;
b.brucewayne = bm;
// stuff
b.club = bb;
And that's the most (un)popular use case of unions. Another use case is when a union instance comes along with something that tells you its type.
Use case 2: "Nice to meet you, I'm object, from Class"
Suppose a programmer elected to always pair up a union instance with a type descriptor (I'll leave it to the reader's discretion to imagine an implementation for one such object). This defeats the purpose of the union itself if what the programmer wants is to save memory and that the size of the type descriptor is not negligible with respect to that of the union. But let's suppose that it's crucial that the union instance could be passed as an argument or as a return value with the callee or caller not knowing what's inside.
Then the programmer has to write a switch control flow statement to tell Bruce Wayne apart from a wooden stick, or something equivalent. It's not too bad when there are only two types of contents in the union but obviously, the union doesn't scale anymore.
Use case 3:
As the authors of a recommendation for the ISO C++ Standard put it back in 2008,
Many important problem domains require either large numbers of objects or limited memory
resources. In these situations conserving space is very important, and a union is often a perfect way to do that. In fact, a common use case is the situation where a union never changes its active member during its lifetime. It can be constructed, copied, and destructed as if it were a struct containing only one member. A typical application of this would be to create a heterogeneous collection of unrelated types which are not dynamically allocated (perhaps they are in-place constructed in a map, or members of an array).
And now, an example, with a UML class diagram:
The situation in plain English: an object of class A can have objects of any class among B1, ..., Bn, and at most one of each type, with n being a pretty big number, say at least 10.
We don't want to add fields (data members) to A like so:
private:
B1 b1;
.
.
.
Bn bn;
because n might vary (we might want to add Bx classes to the mix), and because this would cause a mess with constructors and because A objects would take up a lot of space.
We could use a wacky container of void* pointers to Bx objects with casts to retrieve them, but that's fugly and so C-style... but more importantly that would leave us with the lifetimes of many dynamically allocated objects to manage.
Instead, what can be done is this:
union Bee
{
B1 b1;
.
.
.
Bn bn;
};
enum BeesTypes { TYPE_B1, ..., TYPE_BN };
class A
{
private:
std::unordered_map<int, Bee> data; // C++11, otherwise use std::map
public:
Bee get(int); // the implementation is obvious: get from the unordered map
};
Then, to get the content of a union instance from data, you use a.get(TYPE_B2).b2 and the likes, where a is a class A instance.
This is all the more powerful since unions are unrestricted in C++11. See the document linked to above or this article for details.

One example is in the embedded realm, where each bit of a register may mean something different. For example, a union of an 8-bit integer and a structure with 8 separate 1-bit bitfields allows you to either change one bit or the entire byte.

Herb Sutter wrote in GOTW about six years ago, with emphasis added:
"But don't think that unions are only a holdover from earlier times. Unions are perhaps most useful for saving space by allowing data to overlap, and this is still desirable in C++ and in today's modern world. For example, some of the most advanced C++ standard library implementations in the world now use just this technique for implementing the "small string optimization," a great optimization alternative that reuses the storage inside a string object itself: for large strings, space inside the string object stores the usual pointer to the dynamically allocated buffer and housekeeping information like the size of the buffer; for small strings, the same space is instead reused to store the string contents directly and completely avoid any dynamic memory allocation. For more about the small string optimization (and other string optimizations and pessimizations in considerable depth), see... ."
And for a less useful example, see the long but inconclusive question gcc, strict-aliasing, and casting through a union.

Well, one example use case I can think of is this:
typedef union
{
struct
{
uint8_t a;
uint8_t b;
uint8_t c;
uint8_t d;
};
uint32_t x;
} some32bittype;
You can then access the 8-bit separate parts of that 32-bit block of data; however, prepare to potentially be bitten by endianness.
This is just one hypothetical example, but whenever you want to split data in a field into component parts like this, you could use a union.
That said, there is also a method which is endian-safe:
uint32_t x;
uint8_t a = (x & 0xFF000000) >> 24;
For example, since that binary operation will be converted by the compiler to the correct endianness.

Some uses for unions:
Provide a general endianness interface to an unknown external host.
Manipulate foreign CPU architecture floating point data, such as accepting VAX G_FLOATS from a network link and converting them to IEEE 754 long reals for processing.
Provide straightforward bit twiddling access to a higher-level type.
union {
unsigned char byte_v[16];
long double ld_v;
}
With this declaration, it is simple to display the hex byte values of a long double, change the exponent's sign, determine if it is a denormal value, or implement long double arithmetic for a CPU which does not support it, etc.
Saving storage space when fields are dependent on certain values:
class person {
string name;
char gender; // M = male, F = female, O = other
union {
date vasectomized; // for males
int pregnancies; // for females
} gender_specific_data;
}
Grep the include files for use with your compiler. You'll find dozens to hundreds of uses of union:
[wally#zenetfedora ~]$ cd /usr/include
[wally#zenetfedora include]$ grep -w union *
a.out.h: union
argp.h: parsing options, getopt is called with the union of all the argp
bfd.h: union
bfd.h: union
bfd.h:union internal_auxent;
bfd.h: (bfd *, struct bfd_symbol *, int, union internal_auxent *);
bfd.h: union {
bfd.h: /* The value of the symbol. This really should be a union of a
bfd.h: union
bfd.h: union
bfdlink.h: /* A union of information depending upon the type. */
bfdlink.h: union
bfdlink.h: this field. This field is present in all of the union element
bfdlink.h: the union; this structure is a major space user in the
bfdlink.h: union
bfdlink.h: union
curses.h: union
db_cxx.h:// 4201: nameless struct/union
elf.h: union
elf.h: union
elf.h: union
elf.h: union
elf.h:typedef union
_G_config.h:typedef union
gcrypt.h: union
gcrypt.h: union
gcrypt.h: union
gmp-i386.h: union {
ieee754.h:union ieee754_float
ieee754.h:union ieee754_double
ieee754.h:union ieee854_long_double
ifaddrs.h: union
jpeglib.h: union {
ldap.h: union mod_vals_u {
ncurses.h: union
newt.h: union {
obstack.h: union
pi-file.h: union {
resolv.h: union {
signal.h:extern int sigqueue (__pid_t __pid, int __sig, __const union sigval __val)
stdlib.h:/* Lots of hair to allow traditional BSD use of `union wait'
stdlib.h: (__extension__ (((union { __typeof(status) __in; int __i; }) \
stdlib.h:/* This is the type of the argument to `wait'. The funky union
stdlib.h: causes redeclarations with either `int *' or `union wait *' to be
stdlib.h:typedef union
stdlib.h: union wait *__uptr;
stdlib.h: } __WAIT_STATUS __attribute__ ((__transparent_union__));
thread_db.h: union
thread_db.h: union
tiffio.h: union {
wchar.h: union
xf86drm.h:typedef union _drmVBlank {

Unions are useful when dealing with byte-level (low level) data.
One of my recent usage was on IP address modeling which looks like below :
// Composite structure for IP address storage
union
{
// IPv4 # 32-bit identifier
// Padded 12-bytes for IPv6 compatibility
union
{
struct
{
unsigned char _reserved[12];
unsigned char _IpBytes[4];
} _Raw;
struct
{
unsigned char _reserved[12];
unsigned char _o1;
unsigned char _o2;
unsigned char _o3;
unsigned char _o4;
} _Octet;
} _IPv4;
// IPv6 # 128-bit identifier
// Next generation internet addressing
union
{
struct
{
unsigned char _IpBytes[16];
} _Raw;
struct
{
unsigned short _w1;
unsigned short _w2;
unsigned short _w3;
unsigned short _w4;
unsigned short _w5;
unsigned short _w6;
unsigned short _w7;
unsigned short _w8;
} _Word;
} _IPv6;
} _IP;

Unions provide polymorphism in C.

An example when I've used a union:
class Vector
{
union
{
double _coord[3];
struct
{
double _x;
double _y;
double _z;
};
};
...
}
this allows me to access my data as an array or the elements.
I've used a union to have the different terms point to the same value. In image processing, whether I was working on columns or width or the size in the X direction, it can become confusing. To alleve this problem, I use a union so I know which descriptions go together.
union { // dimension from left to right // union for the left to right dimension
uint32_t m_width;
uint32_t m_sizeX;
uint32_t m_columns;
};
union { // dimension from top to bottom // union for the top to bottom dimension
uint32_t m_height;
uint32_t m_sizeY;
uint32_t m_rows;
};

The union keyword, while still used in C++031, is mostly a remnant of the C days. The most glaring issue is that it only works with POD1.
The idea of the union, however, is still present, and indeed the Boost libraries feature a union-like class:
boost::variant<std::string, Foo, Bar>
Which has most of the benefits of the union (if not all) and adds:
ability to correctly use non-POD types
static type safety
In practice, it has been demonstrated that it was equivalent to a combination of union + enum, and benchmarked that it was as fast (while boost::any is more of the realm of dynamic_cast, since it uses RTTI).
1Unions were upgraded in C++11 (unrestricted unions), and can now contain objects with destructors, although the user has to invoke the destructor manually (on the currently active union member). It's still much easier to use variants.

A brilliant usage of union is memory alignment, which I found in the PCL(Point Cloud Library) source code. The single data structure in the API can target two architectures: CPU with SSE support as well as the CPU without SSE support. For eg: the data structure for PointXYZ is
typedef union
{
float data[4];
struct
{
float x;
float y;
float z;
};
} PointXYZ;
The 3 floats are padded with an additional float for SSE alignment.
So for
PointXYZ point;
The user can either access point.data[0] or point.x (depending on the SSE support) for accessing say, the x coordinate.
More similar better usage details are on following link: PCL documentation PointT types

From the Wikipedia article on unions:
The primary usefulness of a union is
to conserve space, since it provides a
way of letting many different types be
stored in the same space. Unions also
provide crude polymorphism. However,
there is no checking of types, so it
is up to the programmer to be sure
that the proper fields are accessed in
different contexts. The relevant field
of a union variable is typically
determined by the state of other
variables, possibly in an enclosing
struct.
One common C programming idiom uses
unions to perform what C++ calls a
reinterpret_cast, by assigning to one
field of a union and reading from
another, as is done in code which
depends on the raw representation of
the values.

In the earliest days of C (e.g. as documented in 1974), all structures shared a common namespace for their members. Each member name was associated with a type and an offset; if "wd_woozle" was an "int" at offset 12, then given a pointer p of any structure type, p->wd_woozle would be equivalent to *(int*)(((char*)p)+12). The language required that all members of all structures types have unique names except that it explicitly allowed reuse of member names in cases where every struct where they were used treated them as a common initial sequence.
The fact that structure types could be used promiscuously made it possible to have structures behave as though they contained overlapping fields. For example, given definitions:
struct float1 { float f0;};
struct byte4 { char b0,b1,b2,b3; }; /* Unsigned didn't exist yet */
code could declare a structure of type "float1" and then use "members" b0...b3 to access the individual bytes therein. When the language was changed so that each structure would receive a separate namespace for its members, code which relied upon the ability to access things multiple ways would break. The values of separating out namespaces for different structure types was sufficient to require that such code be changed to accommodate it, but the value of such techniques was sufficient to justify extending the language to continue supporting it.
Code which had been written to exploit the ability to access the storage within a struct float1 as though it were a struct byte4 could be made to work in the new language by adding a declaration: union f1b4 { struct float1 ff; struct byte4 bb; };, declaring objects as type union f1b4; rather than struct float1, and replacing accesses to f0, b0, b1, etc. with ff.f0, bb.b0, bb.b1, etc. While there are better ways such code could have been supported, the union approach was at least somewhat workable, at least with C89-era interpretations of the aliasing rules.

Lets say you have n different types of configurations (just being a set of variables defining parameters). By using an enumeration of the configuration types, you can define a structure that has the ID of the configuration type, along with a union of all the different types of configurations.
This way, wherever you pass the configuration can use the ID to determine how to interpret the configuration data, but if the configurations were huge you would not be forced to have parallel structures for each potential type wasting space.

One recent boost on the, already elevated, importance of the unions has been given by the Strict Aliasing Rule introduced in recent version of C standard.
You can use unions do to type-punning without violating the C standard.
This program has unspecified behavior (because I have assumed that float and unsigned int have the same length) but not undefined behavior (see here).
#include <stdio.h>
union float_uint
{
float f;
unsigned int ui;
};
int main()
{
float v = 241;
union float_uint fui = {.f = v};
//May trigger UNSPECIFIED BEHAVIOR but not UNDEFINED BEHAVIOR
printf("Your IEEE 754 float sir: %08x\n", fui.ui);
//This is UNDEFINED BEHAVIOR as it violates the Strict Aliasing Rule
unsigned int* pp = (unsigned int*) &v;
printf("Your IEEE 754 float, again, sir: %08x\n", *pp);
return 0;
}

I would like to add one good practical example for using union - implementing formula calculator/interpreter or using some kind of it in computation(for example, you want to use modificable during run-time parts of your computing formulas - solving equation numerically - just for example).
So you may want to define numbers/constants of different types(integer, floating-point, even complex numbers) like this:
struct Number{
enum NumType{int32, float, double, complex}; NumType num_t;
union{int ival; float fval; double dval; ComplexNumber cmplx_val}
}
So you're saving memory and what is more important - you avoid any dynamic allocations for probably extreme quantity(if you use a lot of run-time defined numbers) of small objects(compared to implementations through class inheritance/polymorphism). But what's more interesting, you still can use power of C++ polymorphism(if you're fan of double dispatching, for example ;) with this type of struct. Just add "dummy" interface pointer to parent class of all number types as a field of this struct, pointing to this instance instead of/in addition to raw type, or use good old C function pointers.
struct NumberBase
{
virtual Add(NumberBase n);
...
}
struct NumberInt: Number
{
//implement methods assuming Number's union contains int
NumberBase Add(NumberBase n);
...
}
struct NumberDouble: Number
{
//implement methods assuming Number's union contains double
NumberBase Add(NumberBase n);
...
}
//e.t.c. for all number types/or use templates
struct Number: NumberBase{
union{int ival; float fval; double dval; ComplexNumber cmplx_val;}
NumberBase* num_t;
Set(int a)
{
ival=a;
//still kind of hack, hope it works because derived classes of Number dont add any fields
num_t = static_cast<NumberInt>(this);
}
}
so you can use polymorphism instead of type checks with switch(type) - with memory-efficient implementation(no dynamic allocation of small objects) - if you need it, of course.

From http://cplus.about.com/od/learningc/ss/lowlevel_9.htm:
The uses of union are few and far between. On most computers, the size
of a pointer and an int are usually the same- this is because both
usually fit into a register in the CPU. So if you want to do a quick
and dirty cast of a pointer to an int or the other way, declare a
union.
union intptr { int i; int * p; };
union intptr x; x.i = 1000;
/* puts 90 at location 1000 */
*(x.p)=90;
Another use of a union is in a command or message protocol where
different size messages are sent and received. Each message type will
hold different information but each will have a fixed part (probably a
struct) and a variable part bit. This is how you might implement it..
struct head { int id; int response; int size; }; struct msgstring50 { struct head fixed; char message[50]; } struct
struct msgstring80 { struct head fixed; char message[80]; }
struct msgint10 { struct head fixed; int message[10]; } struct
msgack { struct head fixed; int ok; } union messagetype {
struct msgstring50 m50; struct msgstring80 m80; struct msgint10
i10; struct msgack ack; }
In practice, although the unions are the same size, it makes sense to
only send the meaningful data and not wasted space. A msgack is just
16 bytes in size while a msgstring80 is 92 bytes. So when a
messagetype variable is initialized, it has its size field set
according to which type it is. This can then be used by other
functions to transfer the correct number of bytes.

Unions provide a way to manipulate different kind of data in a single area of storage without embedding any machine independent information in the program
They are analogous to variant records in pascal
As an example such as might be found in a compiler symbol table manager, suppose that a
constant may be an int, a float, or a character pointer. The value of a particular constant
must be stored in a variable of the proper type, yet it is most convenient for table management if the value occupies the same amount of storage and is stored in the same place regardless of its type. This is the purpose of a union - a single variable that can legitimately hold any of one of several types. The syntax is based on structures:
union u_tag {
int ival;
float fval;
char *sval;
} u;
The variable u will be large enough to hold the largest of the three types; the specific size is implementation-dependent. Any of these types may be assigned to u and then used in
expressions, so long as the usage is consistent

Algorithm for determining Alignment of elements in C/C++ structs

Okay, Allow me to re-ask the question, as none of the answers got at what I was really interested in (apologies if whole-scale editing of the question like this is a faux-paus).
A few points:
This is offline analysis with a different compiler than the one I'm testing, so SIZEOF() or similar won't work for what I'm doing.
I know it's implementation-defined, but I happen to know the implementation that is of interest to me, which is below.
Let's make a function called pack, which takes as input an integer, called alignment, and a tuple of integers, called elements. It outputs another integer, called size.
The function works as follows:
int pack (int alignment, int[] elements)
{
total_size = 0;
foreach( element in elements )
{
while( total_size % min(alignment, element) != 0 ) { ++total_size; }
total_size += element;
}
while( total_size % packing != 0 ) { ++total_size; }
return total_size;
}
I think what I want to ask is "what is the inverse of this function?", but I'm not sure whether inversion is the correct term--I don't remember ever dealing with inversions of functions with multiple inputs, so I could just be using a term that doesn't apply.
Something like what I want (sort of) exists; here I provide pseudo code for a function we'll call determine_align. The function is a little naive, though, as it just calls pack over and over again with different inputs until it gets an answer it expects (or fails).
int determine_align(int total_size, int[] elements)
{
for(packing = 1,2,4,...,64) // expected answers.
{
size_at_cur_packing = pack(packing, elements);
if(actual_size == size_at_cur_packing)
{
return packing;
}
}
return unknown;
}
So the question is, is there a better implementation of determine_align?
Thanks,

Alignment of struct members in C/C++ is entirely implementation-defined. There are a few guarantees there, but I don't see how they would help you.
Thus, there's no generic way to do what you want. In the context of a particular implementation, you should refer to the documentation of that implementation that covers this (if it is covered).

When choosing how to pack members into a struct an implementation doesn't have to follow the sort of scheme that you describe in your algorithm although it is a common one. (i.e. minimum of sizeof type being aligned and preferred machine alignment size.)
You don't have to compare overall size of a struct to determine the padding that has been applied to individual struct members, though. The standard macro offsetof will give the byte offset from the start of the struct of any individual struct member.

I let the compiler do the alignment for me.
In gcc,
typedef struct _foo
{
u8 v1 __attribute__((aligned(4)));
u16 v2 __attribute__((aligned(4)));
u32 v3 __attribute__((aligned(8)));
u8 v1 __attribute__((aligned(4)));
} foo;
Edit: Note that sizeof(foo) will return the correct value including any padding.
Edit2: And offsetof(foo, v2) also works. Given these two functions/macros, you can figure out everything you need to know about the layout of the struct in memory.

I'm honestly not sure what you're trying to do, and I'm probably completely misunderstanding what you're looking for, but if you want to simply determine what the alignment requirement of a struct is, the following macro might be helpful:
#define ALIGNMENT_OF( t ) offsetof( struct { char x; t test; }, test )
To determine the alignment of your foo structure, you can do:
ALIGNMENT_OF( foo);
If this isn't what you're ultimately tring to do, it might be possible that the macro might help in whatever algorithm you do come up with.

You need to pad based on the alignment of the next field and then pad the last element based on the maximum alignment you've seen in the struct. Note that the actual alignment of a field is the minimum of its natural alignment and the packing for that struct. I.e., if you have a struct packed at 4 bytes, a double will be aligned to 4 bytes, even though its natural alignment is 8.
You can make your inner loop faster with total_size+= total_size % min(packing, element.size); You can optimize it further if packing and element.size is a power of two.

If the problem is just that you want to guarantee a particular alignment, that is easy. For a particular alignment=2^n:
void* p = malloc( sizeof( _foo ) + alignment -1 );
p = (void*) ( ( (char*)(p) + alignment - 1 ) & ~alignment );
I've neglected to save to original p returned from malloc. If you intend to free this memory, you need to save that pointer somewhere.

I'm not sure what you want to achieve here. As Pavel Minaev said, alignment is handled by a compiler which in turn is constrained by a platform's Application Binary Interface for data that is made accessible to code compiled by a different compiler. The following paper discusses the problem in the context of a compiler that needs to implement calling conventions:
Christian Lindig and Norman Ramsey. Declarative Composition of Stack Frames. In Evelyn Duesterwald, editors, Proc. of the 14th International Conference on Compiler Construction, Springer, LNCS 2985, 2004.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js