Comparing enums to integers - c++

I've read that you shouldn't trust on the underlying implementation of an enum on being either signed or unsigned. From this I have concluded that you should always cast the enum value to the type that it's being compared against. Like this:
enum MyEnum { MY_ENUM_VALUE = 0 };
int i = 1;
if (i > static_cast<int>(MY_ENUM_VALUE))
{
// do stuff
}
unsigned int u = 2;
if (u > static_cast<unsigned int>(MY_ENUM_VALUE))
{
// do more stuff
}
Is this the best practice?
Edit: Does the situation change if the enum is anonymous?

An enum is an integer so you can compare it against any other integer, and even floats. The compiler will automatically convert both integers to the largest, or the enum to a double before the compare.
Now, if your enumeration is not supposed to represent a number per se, you may want to consider creating a class instead:
enum class some_name { MY_ENUM_VALUE, ... };
int i;
if(i == static_cast<int>(some_name::MY_ENUM_VALUE))
{
...
}
In that case you need a cast because an enum class is not viewed as an integer by default. This helps quite a bit to avoid bugs in case you were to misuse an enum value...
Update: also, you can now specify the type of integer of an enum. This was available in older compilers too, but it was often not working quite right (in my own experience).
enum class some_name : uint8_t { ... };
That means the enumeration uses uint8_t to store those values. Practical if you are using enumeration values in a structure used to send data over a network or save in a binary file where you need to know the exact size of the data.
When not specified, the type defaults to int.
As brought up by others, if the point of using enum is just to declare numbers, then using constexpr is probably better.
constexpr int MY_CONSTANT_VALUE = 0;
This has the same effect, only the type of MY_CONSTANT_VALUE is now an int. You could go a little further and use typedef as in:
typedef int my_type_t;
constexpr my_type_t MY_CONSTANT_VALUE = 0;
I often use enum even if I'm to use a single value when the value is not generally considered an integer. There is no set in stone rule in this case.

Short answer: Yes
enum is signed int type, but they get implicitly cast into unsigned int. Your compiler might give a warning without explicit casting, but its still very commonly used. however you should explicitly cast to make it clear to maintainers.
And of course, explicit cast will be must when its a strongly typed enum.

Best practice is not to write
int i = 1;
if (i > static_cast<int>(MY_ENUM_VALUE))
{
// do stuff
}
instead write
MyEnumValue i = MY_ENUM_VALUE ;
...
if ( i > MY_ENUM_VALUE ) {..}
But if - as in your example - you only have one value in your enum it is better to declare it as a constant instead of an enum.

Related

Weird Data Types - C++

I know this sounds like a silly question, but I would like to know if it is possible in any way to make a custom variable size like this rather than using plain 8, 16, 32 and 64 bit integers:
uint15_t var; //as an example
I did some research online and I found nothing that works on all sizes (only stuff like 12 bit and 24 bit). (I was also wondering if bitfields would work with other data sizes too).
And yes, I know you could use a 16 bit integer to store a 15 bit value, but I just wanted to know if it is possible to do something like this.
How can I make and implement custom integer sizes?
Inside a struct or class, can use the bitfields feature to declare integers of the size you want for a member-variable:
unsigned int var : 15;
... it won't be very CPU-efficient (since the compiler will have to generate bitwise-operations on most accesses to that variable) but it will give you the desired 15-bit behavior.
To be able to use your bitfield int as a normal int but still get the behavior of a 15 bit int you can do it like this :
#include <cassert>
#include <utility>
template<typename type_t, std::size_t N>
struct bits_t final
{
bits_t() = default;
~bits_t() = default;
// explicitly implicit so it can convert automatically from underlying type
bits_t(const type_t& value) :
m_value{ value }
{
};
// implicit conversion back to underlying type
operator type_t ()
{
return m_value;
}
private:
type_t m_value : N;
};
int main()
{
bits_t<int, 15> value;
value = 16383; // 0x3FFF
assert(value == 16383);
// overflow now at bit 15 :)
value = 16384; // 0x4000, 16th bit is set.
assert(value == -16384);
return 0;
}
bitfields feature will do the trick ...
uint32_t customInt : 15;
You can try using bitfields, like many people mentioned. However, bitfields don't have a proper type. If you want to make your arbitrary-sized integers object-oriented, you can stuff the bitfield into a template:
template <int size> struct my_uint
{
uint32_t value: size;
};
typedef my_uint<13> uint13_t; // some people use "using" syntax to do this
typedef my_uint<14> uint14_t;
typedef my_uint<15> uint15_t;
However, now you lost arithmetic operators, and you have to implement (overload) them yourself. You have to ask yourself many questions about what you really want to do with these new types:
Do you want to overload operators like +, *, etc? Which ones?
Do you want to support arrays?
What is the maximum size you want to support? In my example, it's 32.
Do you want to support implicit constructors, e.g. uint15_t(uint32_t)?
How to support overflow?
There is no way to make your new types behave like built-in types - you can come close but cannot quite do it. That is, if you write a big program where you work with uint15_t and later you decide to switch to uint16_t, there will be subtle changes caused by uint16_t being a built-in type (e.g. consider rules about implicit conversions).

Type safe numeric constants in C++

Let's say you start off coding a project, and certain types are integers, i.e. you code might look like this:
typedef int NumericType;
NumericType a = 0;
NumericType b = 100;
Down the road you change CountingType to a special class, that doesn't have an implicit cast from integer for safety reasons. Now, you have to go through all of your code and update places where you used constants. I've started to go through my code to change it to this:
typedef specialType NumericType;
NumericType a = static_cast<NumericType>(0);
NumericType b = static_cast<NumericType>(100);
This way if I change the NumericType again in the future, I'd have less code to change. I've started to wonder if this is one of those rules that I should start following in general, in that I should always static_cast constants.
I've started to do that anytime that I use format strings in C++ so that if I change types, I have some chance of getting a warning for my sprintf_s calls. i.e.
sprintf_s(buffer, 10, "Bob %d", static_cast<int>(bob));
Is there another pattern for handling constants and typing?
I spoke to some other developers in my group and another solution, which I like, is to use a code like this:
typedef int NumericType;
NumericType a = NumericType(0);
NumericType b = NumericType(100);
The benefit of this is that if I switch the type for NumericType, I just have to make sure that it can be constructed with int, which is different than letting it be convertable to int. i.e. this code works when only the typedef is updated, as long as specialType has a constructor that takes an int or a long.
typedef specialType NumericType;
NumericType a = NumericType(0);
NumericType b = NumericType(100);

C++ enum flags vs bitset

What are pros/cons of usage bitsets over enum flags?
namespace Flag {
enum State {
Read = 1 << 0,
Write = 1 << 1,
Binary = 1 << 2,
};
}
namespace Plain {
enum State {
Read,
Write,
Binary,
Count
};
}
int main()
{
{
unsigned int state = Flag::Read | Flag::Binary;
std::cout << state << std::endl;
state |= Flag::Write;
state &= ~(Flag::Read | Flag::Binary);
std::cout << state << std::endl;
} {
std::bitset<Plain::Count> state;
state.set(Plain::Read);
state.set(Plain::Binary);
std::cout << state.to_ulong() << std::endl;
state.flip();
std::cout << state.to_ulong() << std::endl;
}
return 0;
}
As I can see so far, bitsets have more convinient set/clear/flip functions to deal with, but enum-flags usage is a more wide-spreaded approach.
What are possible downsides of bitsets and what and when should I use in my daily code?
Both std::bitset and c-style enum have important downsides for managing flags. First, let's consider the following example code :
namespace Flag {
enum State {
Read = 1 << 0,
Write = 1 << 1,
Binary = 1 << 2,
};
}
namespace Plain {
enum State {
Read,
Write,
Binary,
Count
};
}
void f(int);
void g(int);
void g(Flag::State);
void h(std::bitset<sizeof(Flag::State)>);
namespace system1 {
Flag::State getFlags();
}
namespace system2 {
Plain::State getFlags();
}
int main()
{
f(Flag::Read); // Flag::Read is implicitly converted to `int`, losing type safety
f(Plain::Read); // Plain::Read is also implicitly converted to `int`
auto state = Flag::Read | Flag::Write; // type is not `Flag::State` as one could expect, it is `int` instead
g(state); // This function calls the `int` overload rather than the `Flag::State` overload
auto system1State = system1::getFlags();
auto system2State = system2::getFlags();
if (system1State == system2State) {} // Compiles properly, but semantics are broken, `Flag::State`
std::bitset<sizeof(Flag::State)> flagSet; // Notice that the type of bitset only indicates the amount of bits, there's no type safety here either
std::bitset<sizeof(Plain::State)> plainSet;
// f(flagSet); bitset doesn't implicitly convert to `int`, so this wouldn't compile which is slightly better than c-style `enum`
flagSet.set(Flag::Read); // No type safety, which means that bitset
flagSet.reset(Plain::Read); // is willing to accept values from any enumeration
h(flagSet); // Both kinds of sets can be
h(plainSet); // passed to the same function
}
Even though you may think those problems are easy to spot on simple examples, they end up creeping up in every code base that builds flags on top of c-style enum and std::bitset.
So what can you do for better type safety? First, C++11's scoped enumeration is an improvement for type safety. But it hinders convenience a lot. Part of the solution is to use template-generated bitwise operators for scoped enums. Here is a great blog post which explains how it works and also provides working code : https://www.justsoftwaresolutions.co.uk/cplusplus/using-enum-classes-as-bitfields.html
Now let's see what this would look like :
enum class FlagState {
Read = 1 << 0,
Write = 1 << 1,
Binary = 1 << 2,
};
template<>
struct enable_bitmask_operators<FlagState>{
static const bool enable=true;
};
enum class PlainState {
Read,
Write,
Binary,
Count
};
void f(int);
void g(int);
void g(FlagState);
FlagState h();
namespace system1 {
FlagState getFlags();
}
namespace system2 {
PlainState getFlags();
}
int main()
{
f(FlagState::Read); // Compile error, FlagState is not an `int`
f(PlainState::Read); // Compile error, PlainState is not an `int`
auto state = FlagState::Read | FlagState::Write; // type is `FlagState` as one could expect
g(state); // This function calls the `FlagState` overload
auto system1State = system1::getFlags();
auto system2State = system2::getFlags();
if (system1State == system2State) {} // Compile error, there is no `operator==(FlagState, PlainState)`
auto someFlag = h();
if (someFlag == FlagState::Read) {} // This compiles fine, but this is another type of recurring bug
}
The last line of this example shows one problem that still cannot be caught at compile time. In some cases, comparing for equality may be what's really desired. But most of the time, what is really meant is if ((someFlag & FlagState::Read) == FlagState::Read).
In order to solve this problem, we must differentiate the type of an enumerator from the type of a bitmask. Here's an article which details an improvement on the partial solution I referred to earlier : https://dalzhim.github.io/2017/08/11/Improving-the-enum-class-bitmask/
Disclaimer : I'm the author of this later article.
When using the template-generated bitwise operators from the last article, you will get all of the benefits we demonstrated in the last piece of code, while also catching the mask == enumerator bug.
Some observations:
std::bitset< N > supports an arbitrary number of bits (e.g., more than 64 bits), whereas underlying integral types of enums are restricted to 64 bits;
std::bitset< N > can implicitly (depending on the std implementation) use the underlying integral type with the minimal size fitting the requested number of bits, whereas underlying integral types for enums need to be explicitly declared (otherwise, int will be used as the default underlying integral type);
std::bitset< N > represents a generic sequence of N bits, whereas scoped enums provide type safety that can be exploited for method overloading;
If std::bitset< N > is used as a bit mask, a typical implementation depends on an additional enum type for indexing (!= masking) purposes;
Note that the latter two observations can be combined to define a strong std::bitset type for convenience:
typename< Enum E, std::size_t N >
class BitSet : public std::bitset< N >
{
...
[[nodiscard]]
constexpr bool operator[](E pos) const;
...
};
and if the code supports some reflection to obtain the number of explicit enum values, then the number of bits can be deduced directly from the enum type.
scoped enum types do not have bitwise operator overloads (which can easily be defined once using SFINAE or concepts for all scoped and unscoped enum types, but need to be included before use) and unsoped enum types will decay to the underlying integral type;
bitwise operator overloads for enum types, require less boilerplate than std::bitset< N > (e.g., auto flags = Depth | Stencil;);
enum types support both signed and unsigned underlying integral types, whereas std::bitset< N > internally uses unsigned integral types (shift operators).
FWIIW, in my own code I mostly use std::bitset (and eastl::bitvector) as private bit/bool containers for setting/getting single bits/bools. For masking operations, I prefer scoped enum types with explicitly defined underlying types and bitwise operator overloads.
Do you compile with optimization on? It is very unlikely that there is a 24x speed factor.
To me, bitset is superior, because it manages space for you:
can be extended as much as wanted. If you have a lot of flags, you may run out of space in the int/long long version.
may take less space, if you only use just several flags (it can fit in an unsigned char/unsigned short - I'm not sure that implementations apply this optimization, though)
(Ad mode on)
You can get both: a convenient interface and max performance. And type-safety as well. https://github.com/oliora/bitmask

When would anyone use a union? Is it a remnant from the C-only days?

I have learned but don't really get unions. Every C or C++ text I go through introduces them (sometimes in passing), but they tend to give very few practical examples of why or where to use them. When would unions be useful in a modern (or even legacy) case? My only two guesses would be programming microprocessors when you have very limited space to work with, or when you're developing an API (or something similar) and you want to force the end user to have only one instance of several objects/types at one time. Are these two guesses even close to right?
Unions are usually used with the company of a discriminator: a variable indicating which of the fields of the union is valid. For example, let's say you want to create your own Variant type:
struct my_variant_t {
int type;
union {
char char_value;
short short_value;
int int_value;
long long_value;
float float_value;
double double_value;
void* ptr_value;
};
};
Then you would use it such as:
/* construct a new float variant instance */
void init_float(struct my_variant_t* v, float initial_value) {
v->type = VAR_FLOAT;
v->float_value = initial_value;
}
/* Increments the value of the variant by the given int */
void inc_variant_by_int(struct my_variant_t* v, int n) {
switch (v->type) {
case VAR_FLOAT:
v->float_value += n;
break;
case VAR_INT:
v->int_value += n;
break;
...
}
}
This is actually a pretty common idiom, specially on Visual Basic internals.
For a real example see SDL's SDL_Event union. (actual source code here). There is a type field at the top of the union, and the same field is repeated on every SDL_*Event struct. Then, to handle the correct event you need to check the value of the type field.
The benefits are simple: there is one single data type to handle all event types without using unnecessary memory.
I find C++ unions pretty cool. It seems that people usually only think of the use case where one wants to change the value of a union instance "in place" (which, it seems, serves only to save memory or perform doubtful conversions).
In fact, unions can be of great power as a software engineering tool, even when you never change the value of any union instance.
Use case 1: the chameleon
With unions, you can regroup a number of arbitrary classes under one denomination, which isn't without similarities with the case of a base class and its derived classes. What changes, however, is what you can and can't do with a given union instance:
struct Batman;
struct BaseballBat;
union Bat
{
Batman brucewayne;
BaseballBat club;
};
ReturnType1 f(void)
{
BaseballBat bb = {/* */};
Bat b;
b.club = bb;
// do something with b.club
}
ReturnType2 g(Bat& b)
{
// do something with b, but how do we know what's inside?
}
Bat returnsBat(void);
ReturnType3 h(void)
{
Bat b = returnsBat();
// do something with b, but how do we know what's inside?
}
It appears that the programmer has to be certain of the type of the content of a given union instance when he wants to use it. It is the case in function f above. However, if a function were to receive a union instance as a passed argument, as is the case with g above, then it wouldn't know what to do with it. The same applies to functions returning a union instance, see h: how does the caller know what's inside?
If a union instance never gets passed as an argument or as a return value, then it's bound to have a very monotonous life, with spikes of excitement when the programmer chooses to change its content:
Batman bm = {/* */};
Baseball bb = {/* */};
Bat b;
b.brucewayne = bm;
// stuff
b.club = bb;
And that's the most (un)popular use case of unions. Another use case is when a union instance comes along with something that tells you its type.
Use case 2: "Nice to meet you, I'm object, from Class"
Suppose a programmer elected to always pair up a union instance with a type descriptor (I'll leave it to the reader's discretion to imagine an implementation for one such object). This defeats the purpose of the union itself if what the programmer wants is to save memory and that the size of the type descriptor is not negligible with respect to that of the union. But let's suppose that it's crucial that the union instance could be passed as an argument or as a return value with the callee or caller not knowing what's inside.
Then the programmer has to write a switch control flow statement to tell Bruce Wayne apart from a wooden stick, or something equivalent. It's not too bad when there are only two types of contents in the union but obviously, the union doesn't scale anymore.
Use case 3:
As the authors of a recommendation for the ISO C++ Standard put it back in 2008,
Many important problem domains require either large numbers of objects or limited memory
resources. In these situations conserving space is very important, and a union is often a perfect way to do that. In fact, a common use case is the situation where a union never changes its active member during its lifetime. It can be constructed, copied, and destructed as if it were a struct containing only one member. A typical application of this would be to create a heterogeneous collection of unrelated types which are not dynamically allocated (perhaps they are in-place constructed in a map, or members of an array).
And now, an example, with a UML class diagram:
The situation in plain English: an object of class A can have objects of any class among B1, ..., Bn, and at most one of each type, with n being a pretty big number, say at least 10.
We don't want to add fields (data members) to A like so:
private:
B1 b1;
.
.
.
Bn bn;
because n might vary (we might want to add Bx classes to the mix), and because this would cause a mess with constructors and because A objects would take up a lot of space.
We could use a wacky container of void* pointers to Bx objects with casts to retrieve them, but that's fugly and so C-style... but more importantly that would leave us with the lifetimes of many dynamically allocated objects to manage.
Instead, what can be done is this:
union Bee
{
B1 b1;
.
.
.
Bn bn;
};
enum BeesTypes { TYPE_B1, ..., TYPE_BN };
class A
{
private:
std::unordered_map<int, Bee> data; // C++11, otherwise use std::map
public:
Bee get(int); // the implementation is obvious: get from the unordered map
};
Then, to get the content of a union instance from data, you use a.get(TYPE_B2).b2 and the likes, where a is a class A instance.
This is all the more powerful since unions are unrestricted in C++11. See the document linked to above or this article for details.
One example is in the embedded realm, where each bit of a register may mean something different. For example, a union of an 8-bit integer and a structure with 8 separate 1-bit bitfields allows you to either change one bit or the entire byte.
Herb Sutter wrote in GOTW about six years ago, with emphasis added:
"But don't think that unions are only a holdover from earlier times. Unions are perhaps most useful for saving space by allowing data to overlap, and this is still desirable in C++ and in today's modern world. For example, some of the most advanced C++ standard library implementations in the world now use just this technique for implementing the "small string optimization," a great optimization alternative that reuses the storage inside a string object itself: for large strings, space inside the string object stores the usual pointer to the dynamically allocated buffer and housekeeping information like the size of the buffer; for small strings, the same space is instead reused to store the string contents directly and completely avoid any dynamic memory allocation. For more about the small string optimization (and other string optimizations and pessimizations in considerable depth), see... ."
And for a less useful example, see the long but inconclusive question gcc, strict-aliasing, and casting through a union.
Well, one example use case I can think of is this:
typedef union
{
struct
{
uint8_t a;
uint8_t b;
uint8_t c;
uint8_t d;
};
uint32_t x;
} some32bittype;
You can then access the 8-bit separate parts of that 32-bit block of data; however, prepare to potentially be bitten by endianness.
This is just one hypothetical example, but whenever you want to split data in a field into component parts like this, you could use a union.
That said, there is also a method which is endian-safe:
uint32_t x;
uint8_t a = (x & 0xFF000000) >> 24;
For example, since that binary operation will be converted by the compiler to the correct endianness.
Some uses for unions:
Provide a general endianness interface to an unknown external host.
Manipulate foreign CPU architecture floating point data, such as accepting VAX G_FLOATS from a network link and converting them to IEEE 754 long reals for processing.
Provide straightforward bit twiddling access to a higher-level type.
union {
unsigned char byte_v[16];
long double ld_v;
}
With this declaration, it is simple to display the hex byte values of a long double, change the exponent's sign, determine if it is a denormal value, or implement long double arithmetic for a CPU which does not support it, etc.
Saving storage space when fields are dependent on certain values:
class person {
string name;
char gender; // M = male, F = female, O = other
union {
date vasectomized; // for males
int pregnancies; // for females
} gender_specific_data;
}
Grep the include files for use with your compiler. You'll find dozens to hundreds of uses of union:
[wally#zenetfedora ~]$ cd /usr/include
[wally#zenetfedora include]$ grep -w union *
a.out.h: union
argp.h: parsing options, getopt is called with the union of all the argp
bfd.h: union
bfd.h: union
bfd.h:union internal_auxent;
bfd.h: (bfd *, struct bfd_symbol *, int, union internal_auxent *);
bfd.h: union {
bfd.h: /* The value of the symbol. This really should be a union of a
bfd.h: union
bfd.h: union
bfdlink.h: /* A union of information depending upon the type. */
bfdlink.h: union
bfdlink.h: this field. This field is present in all of the union element
bfdlink.h: the union; this structure is a major space user in the
bfdlink.h: union
bfdlink.h: union
curses.h: union
db_cxx.h:// 4201: nameless struct/union
elf.h: union
elf.h: union
elf.h: union
elf.h: union
elf.h:typedef union
_G_config.h:typedef union
gcrypt.h: union
gcrypt.h: union
gcrypt.h: union
gmp-i386.h: union {
ieee754.h:union ieee754_float
ieee754.h:union ieee754_double
ieee754.h:union ieee854_long_double
ifaddrs.h: union
jpeglib.h: union {
ldap.h: union mod_vals_u {
ncurses.h: union
newt.h: union {
obstack.h: union
pi-file.h: union {
resolv.h: union {
signal.h:extern int sigqueue (__pid_t __pid, int __sig, __const union sigval __val)
stdlib.h:/* Lots of hair to allow traditional BSD use of `union wait'
stdlib.h: (__extension__ (((union { __typeof(status) __in; int __i; }) \
stdlib.h:/* This is the type of the argument to `wait'. The funky union
stdlib.h: causes redeclarations with either `int *' or `union wait *' to be
stdlib.h:typedef union
stdlib.h: union wait *__uptr;
stdlib.h: } __WAIT_STATUS __attribute__ ((__transparent_union__));
thread_db.h: union
thread_db.h: union
tiffio.h: union {
wchar.h: union
xf86drm.h:typedef union _drmVBlank {
Unions are useful when dealing with byte-level (low level) data.
One of my recent usage was on IP address modeling which looks like below :
// Composite structure for IP address storage
union
{
// IPv4 # 32-bit identifier
// Padded 12-bytes for IPv6 compatibility
union
{
struct
{
unsigned char _reserved[12];
unsigned char _IpBytes[4];
} _Raw;
struct
{
unsigned char _reserved[12];
unsigned char _o1;
unsigned char _o2;
unsigned char _o3;
unsigned char _o4;
} _Octet;
} _IPv4;
// IPv6 # 128-bit identifier
// Next generation internet addressing
union
{
struct
{
unsigned char _IpBytes[16];
} _Raw;
struct
{
unsigned short _w1;
unsigned short _w2;
unsigned short _w3;
unsigned short _w4;
unsigned short _w5;
unsigned short _w6;
unsigned short _w7;
unsigned short _w8;
} _Word;
} _IPv6;
} _IP;
Unions provide polymorphism in C.
An example when I've used a union:
class Vector
{
union
{
double _coord[3];
struct
{
double _x;
double _y;
double _z;
};
};
...
}
this allows me to access my data as an array or the elements.
I've used a union to have the different terms point to the same value. In image processing, whether I was working on columns or width or the size in the X direction, it can become confusing. To alleve this problem, I use a union so I know which descriptions go together.
union { // dimension from left to right // union for the left to right dimension
uint32_t m_width;
uint32_t m_sizeX;
uint32_t m_columns;
};
union { // dimension from top to bottom // union for the top to bottom dimension
uint32_t m_height;
uint32_t m_sizeY;
uint32_t m_rows;
};
The union keyword, while still used in C++031, is mostly a remnant of the C days. The most glaring issue is that it only works with POD1.
The idea of the union, however, is still present, and indeed the Boost libraries feature a union-like class:
boost::variant<std::string, Foo, Bar>
Which has most of the benefits of the union (if not all) and adds:
ability to correctly use non-POD types
static type safety
In practice, it has been demonstrated that it was equivalent to a combination of union + enum, and benchmarked that it was as fast (while boost::any is more of the realm of dynamic_cast, since it uses RTTI).
1Unions were upgraded in C++11 (unrestricted unions), and can now contain objects with destructors, although the user has to invoke the destructor manually (on the currently active union member). It's still much easier to use variants.
A brilliant usage of union is memory alignment, which I found in the PCL(Point Cloud Library) source code. The single data structure in the API can target two architectures: CPU with SSE support as well as the CPU without SSE support. For eg: the data structure for PointXYZ is
typedef union
{
float data[4];
struct
{
float x;
float y;
float z;
};
} PointXYZ;
The 3 floats are padded with an additional float for SSE alignment.
So for
PointXYZ point;
The user can either access point.data[0] or point.x (depending on the SSE support) for accessing say, the x coordinate.
More similar better usage details are on following link: PCL documentation PointT types
From the Wikipedia article on unions:
The primary usefulness of a union is
to conserve space, since it provides a
way of letting many different types be
stored in the same space. Unions also
provide crude polymorphism. However,
there is no checking of types, so it
is up to the programmer to be sure
that the proper fields are accessed in
different contexts. The relevant field
of a union variable is typically
determined by the state of other
variables, possibly in an enclosing
struct.
One common C programming idiom uses
unions to perform what C++ calls a
reinterpret_cast, by assigning to one
field of a union and reading from
another, as is done in code which
depends on the raw representation of
the values.
In the earliest days of C (e.g. as documented in 1974), all structures shared a common namespace for their members. Each member name was associated with a type and an offset; if "wd_woozle" was an "int" at offset 12, then given a pointer p of any structure type, p->wd_woozle would be equivalent to *(int*)(((char*)p)+12). The language required that all members of all structures types have unique names except that it explicitly allowed reuse of member names in cases where every struct where they were used treated them as a common initial sequence.
The fact that structure types could be used promiscuously made it possible to have structures behave as though they contained overlapping fields. For example, given definitions:
struct float1 { float f0;};
struct byte4 { char b0,b1,b2,b3; }; /* Unsigned didn't exist yet */
code could declare a structure of type "float1" and then use "members" b0...b3 to access the individual bytes therein. When the language was changed so that each structure would receive a separate namespace for its members, code which relied upon the ability to access things multiple ways would break. The values of separating out namespaces for different structure types was sufficient to require that such code be changed to accommodate it, but the value of such techniques was sufficient to justify extending the language to continue supporting it.
Code which had been written to exploit the ability to access the storage within a struct float1 as though it were a struct byte4 could be made to work in the new language by adding a declaration: union f1b4 { struct float1 ff; struct byte4 bb; };, declaring objects as type union f1b4; rather than struct float1, and replacing accesses to f0, b0, b1, etc. with ff.f0, bb.b0, bb.b1, etc. While there are better ways such code could have been supported, the union approach was at least somewhat workable, at least with C89-era interpretations of the aliasing rules.
Lets say you have n different types of configurations (just being a set of variables defining parameters). By using an enumeration of the configuration types, you can define a structure that has the ID of the configuration type, along with a union of all the different types of configurations.
This way, wherever you pass the configuration can use the ID to determine how to interpret the configuration data, but if the configurations were huge you would not be forced to have parallel structures for each potential type wasting space.
One recent boost on the, already elevated, importance of the unions has been given by the Strict Aliasing Rule introduced in recent version of C standard.
You can use unions do to type-punning without violating the C standard.
This program has unspecified behavior (because I have assumed that float and unsigned int have the same length) but not undefined behavior (see here).
#include <stdio.h>
union float_uint
{
float f;
unsigned int ui;
};
int main()
{
float v = 241;
union float_uint fui = {.f = v};
//May trigger UNSPECIFIED BEHAVIOR but not UNDEFINED BEHAVIOR
printf("Your IEEE 754 float sir: %08x\n", fui.ui);
//This is UNDEFINED BEHAVIOR as it violates the Strict Aliasing Rule
unsigned int* pp = (unsigned int*) &v;
printf("Your IEEE 754 float, again, sir: %08x\n", *pp);
return 0;
}
I would like to add one good practical example for using union - implementing formula calculator/interpreter or using some kind of it in computation(for example, you want to use modificable during run-time parts of your computing formulas - solving equation numerically - just for example).
So you may want to define numbers/constants of different types(integer, floating-point, even complex numbers) like this:
struct Number{
enum NumType{int32, float, double, complex}; NumType num_t;
union{int ival; float fval; double dval; ComplexNumber cmplx_val}
}
So you're saving memory and what is more important - you avoid any dynamic allocations for probably extreme quantity(if you use a lot of run-time defined numbers) of small objects(compared to implementations through class inheritance/polymorphism). But what's more interesting, you still can use power of C++ polymorphism(if you're fan of double dispatching, for example ;) with this type of struct. Just add "dummy" interface pointer to parent class of all number types as a field of this struct, pointing to this instance instead of/in addition to raw type, or use good old C function pointers.
struct NumberBase
{
virtual Add(NumberBase n);
...
}
struct NumberInt: Number
{
//implement methods assuming Number's union contains int
NumberBase Add(NumberBase n);
...
}
struct NumberDouble: Number
{
//implement methods assuming Number's union contains double
NumberBase Add(NumberBase n);
...
}
//e.t.c. for all number types/or use templates
struct Number: NumberBase{
union{int ival; float fval; double dval; ComplexNumber cmplx_val;}
NumberBase* num_t;
Set(int a)
{
ival=a;
//still kind of hack, hope it works because derived classes of Number dont add any fields
num_t = static_cast<NumberInt>(this);
}
}
so you can use polymorphism instead of type checks with switch(type) - with memory-efficient implementation(no dynamic allocation of small objects) - if you need it, of course.
From http://cplus.about.com/od/learningc/ss/lowlevel_9.htm:
The uses of union are few and far between. On most computers, the size
of a pointer and an int are usually the same- this is because both
usually fit into a register in the CPU. So if you want to do a quick
and dirty cast of a pointer to an int or the other way, declare a
union.
union intptr { int i; int * p; };
union intptr x; x.i = 1000;
/* puts 90 at location 1000 */
*(x.p)=90;
Another use of a union is in a command or message protocol where
different size messages are sent and received. Each message type will
hold different information but each will have a fixed part (probably a
struct) and a variable part bit. This is how you might implement it..
struct head { int id; int response; int size; }; struct msgstring50 { struct head fixed; char message[50]; } struct
struct msgstring80 { struct head fixed; char message[80]; }
struct msgint10 { struct head fixed; int message[10]; } struct
msgack { struct head fixed; int ok; } union messagetype {
struct msgstring50 m50; struct msgstring80 m80; struct msgint10
i10; struct msgack ack; }
In practice, although the unions are the same size, it makes sense to
only send the meaningful data and not wasted space. A msgack is just
16 bytes in size while a msgstring80 is 92 bytes. So when a
messagetype variable is initialized, it has its size field set
according to which type it is. This can then be used by other
functions to transfer the correct number of bytes.
Unions provide a way to manipulate different kind of data in a single area of storage without embedding any machine independent information in the program
They are analogous to variant records in pascal
As an example such as might be found in a compiler symbol table manager, suppose that a
constant may be an int, a float, or a character pointer. The value of a particular constant
must be stored in a variable of the proper type, yet it is most convenient for table management if the value occupies the same amount of storage and is stored in the same place regardless of its type. This is the purpose of a union - a single variable that can legitimately hold any of one of several types. The syntax is based on structures:
union u_tag {
int ival;
float fval;
char *sval;
} u;
The variable u will be large enough to hold the largest of the three types; the specific size is implementation-dependent. Any of these types may be assigned to u and then used in
expressions, so long as the usage is consistent

Is using enum for integer bit oriented operations in C++ reliable/safe?

Consider the following (simplified) code:
enum eTestMode
{
TM_BASIC = 1, // 1 << 0
TM_ADV_1 = 1 << 1,
TM_ADV_2 = 1 << 2
};
...
int m_iTestMode; // a "bit field"
bool isSet( eTestMode tsm )
{
return ( (m_iTestMode & tsm) == tsm );
}
void setTestMode( eTestMode tsm )
{
m_iTestMode |= tsm;
}
Is this reliable, safe and/or good practice? Or is there a better way of achieving what i want to do apart from using const ints instead of enum? I would really prefer enums, but code reliability is more important than readability.
I can't see anything bad in that design.
However, keep in mind that enum types can hold unspecified values. Depending on who uses your functions, you might want to check first that the value of tsm is a valid enumeration value.
Since enums are integer values, one could do something like:
eTestMode tsm = static_cast<eTestMode>(17); // We consider here that 17 is not a valid value for your enumeration.
However, doing this is ugly and you might just consider that doing so results in undefined behavior.
There is no problem. You can even use an variable of eTestMode (and defines bit manipulation for that type) as it is guaranteed to hold all possible values in that case.
See also
What is the size of an enum in C?
For some compilers (e.g. VC++) this non-standard width specifier can be used:
enum eTestMode : unsigned __int32
{
TM_BASIC = 1, // 1 << 0
TM_ADV_1 = 1 << 1,
TM_ADV_2 = 1 << 2
};
Using enums for representing bit patterns, masks and flags is not always a good idea because enums generally promote to signed integer type, while for bit-based operation unsigned types are almost always preferable.