A well supported alternative to c++ unions? - c++

I think that unions can be perfect for what i have in mind and especially when I consider that my code should run on a really heterogeneous family of machines, especially low-powered machine, what is bugging me is the fact that the people who creates compilers doesn't seem to care too much about introducing and offering a good union support, for example this table is practical empty when it comes to Unrestricted Unions support, and this is a real unpleasant view for my projects.
There are alternatives to union that can at least mimic the same properties ?

Union is well supported on most compilers, what's not well supported are unions that contains members that have non trivial constructor (unrestricted unions). In practice, you'd almost always want a custom constructor when creating unions, so not having unrestricted union is more of a matter of inconvenience.
Alternatively, you can always use void pointer pointing to a malloc-ed memory with sufficient size for the largest member. The drawback is you'd need explicit type casting.

One popular alternative to union is Boost.Variant.
The types you use in it have to be copy-constructible.
Update:
C++17 introduced std::variant. Its requirements are based on Boost.Variant. It is modernized to take into account the features in C++17. It does not carry the overhead of being compatible with C++98 compilers (like Boost). It comes for free with standard library. If available it is better to use it as alternative to unions.

You can always do essentially the same thing using explicit casts:
struct YourUnion {
char x[the size of your largest object];
A &field1() { return *(A*)&x[0]; }
int &field2() { return *(int*)&x[0]; }
};
YourUnion y;
new(&y.field1()) A(); // construct A
y.field1().~A(); // destruct A
y.field2() = 1;
// ...
(This is zero overhead compared to, e.g., Boost.Variant.)
EDIT: more union-like and without templates.

Related

Easy way to implement small buffer optimization for arbitrary type erasure (like in std::function.)

I tend to use type erasure technique quite a bit.
It typically looks like this:
class YetAnotherTypeErasure
{
public:
// interface redirected to pImpl
private:
// Adapting function
template ...
friend YetAnotherTypeErasure make_YetAnotherTypeErasure (...);
class Interface {...};
template <typename Adaptee>
class Concrete final : public Interface {
// redirecting Interface to Adaptee
};
std::unique_ptr<Interface> pImpl_; // always on the heap
};
std::function does something similar, but it has a small buffer optimization, so if Concrete<Adaptee> is smaller than smth and has nothrow move operations, it will be stored in it. Is there some generic library solution to do it fairly easy? For enforcing small buffer only storing at compile time? Maybe something has been proposed for standardisation?
I know nothing about the small buffer optimization required by the standard or any proposal, though it is often allowed or encouraged.
Note that some (conditionally) non-throwing requirements on such types effectively require the optimization in practice because alternatives (like non-throwing allocation from emergency buffers) seem insane here.
On the other hand, you can just make your own solution from scratch, based on the standard library (e.g. std::aligned_storage). This may still verbose from the view of users, but not too hard.
Actually I implemented (not proposed then) any with such optimization and some related utilities several years ago. Lately, libstdc++'s implementation of std::experimental::any used the technique almost exactly as this (however, __ prefixed internal names are certainly not good for ordinary library users).
My implementation now uses some common helpers to deal with the storage. These helpers do ease to implement the type erasure storage strategy (at least fit for something similar to any enough). But I am still interested in more general high-level solution to simplify the interface redirecting.
There is also a function implementation based directly on the any implementation above. They support move-only types and sane allocator interface, while std ones not. The function implementation has better performance than std::function in libstdc++ in some cases, thanks to the (partially no-op) default initialization of the underlying any object.
I found a reasonably nice solution for everyday code - use std::function
With tiny library support to help with const correctness,
the code get's down to 20 lines:
https://gcc.godbolt.org/z/GtewFI
I think C++20 polymorphic_value comes closest to what we can do in modern c++: wg21.link/p0201
Basically it's like std::any but all of your types have to inherit the same interface.
It is Semiregular, they decided to drop equality.
This has some overhead: one vptr in the class itself and a separate dispatch mechanism in the polymorphic value. It also has a pointer like interface instead of a value like.
However, considering how easy it is to use it comparing to writing your own type_erased adapter, I'd say for most use-cases would be more than good enough.

Tagged unions (aka variant) in C++ with the same type multiple times

I need to create an union, but 2 members of the union would have the same type, thus I need a way to identify them. For example in OCaml :
type A =
| B of int
| C of float
| D of float
Boost.Variant doesn't seem to support this case, is there a known library which supports that ?
If you want to do this, I think your best option is to wrap the same-but-different-types into a struct which then lets the boost variant visit the proper one:
struct Speed
{
float val_;
};
struct Darkness
{
float val_;
};
You might be able to use BOOST_STRONG_TYPEDEF to do this automatically but I'm not sure it's guaranteed to generate types legal for use in a union (although it would probably be fine in a variant).
You cannot at the moment but C++17's implementation of std::variant fortunately allows it:
A variant is permitted to hold the same type more than once, and to hold differently cv-qualified versions of the same type.
Unlike with the boost version, you can get values by index, something like this (not tested):
// Construct a variant with the second value set.
variant<string, string, string> s(std::in_place_index<1>, "Hello");
// Get the second value.
string first = std::get<1>(s);
Michael Park has written a C++14 implementation of C++17's std::variant.
The c++ code here:
http://svn.boost.org/svn/boost/sandbox/variadic_templates/boost/composite_storage/pack/container_one_of_maybe.hpp
is truly a tagged union in that it can contain duplicate types. One nice feature
is the tags can be enumerations; hence, the tags can have meaningful names.
Unfortunately, the compile time cost is pretty bad, I guess, because the implementation
uses recursive inheritance. OTOH, maybe compilers will eventually figure out a way
to lessen the compile time cost.
OTOH, if you want to stick with boost::variant, you could wrap the types,
as Mark B suggested. However, instead of Mark B's descriptive class names,
which require some thought, you could use fusion::pair<mpl::int_<tag>,T_tag>
where T_tag is the tag-th element in the source fusion::vector. IOW:
variant
< fusion::pair<mpl::int_<1>,T1>
, fusion::pair<mpl::int_<2>,T2>
...
, fusion::pair<mpl::int_<n>,Tn>
>
As the fusion docs:
http://www.boost.org/doc/libs/1_55_0/libs/fusion/doc/html/fusion/support/pair.html
say, fusion::pair only allocates space for the 2nd template argument; hence,
this should not take any more room than boost::variant<T1,T2,...,Tn>.
HTH.
-regards,
Larry

Why isn't std::initializer_list a language built-in?

Why isn't std::initializer_list a core-language built-in?
It seems to me that it's quite an important feature of C++11 and yet it doesn't have its own reserved keyword (or something alike).
Instead, initializer_list it's just a template class from the standard library that has a special, implicit mapping from the new braced-init-list {...} syntax that's handled by the compiler.
At first thought, this solution is quite hacky.
Is this the way new additions to the C++ language will be now implemented: by implicit roles of some template classes and not by the core language?
Please consider these examples:
widget<int> w = {1,2,3}; //this is how we want to use a class
why was a new class chosen:
widget( std::initializer_list<T> init )
instead of using something similar to any of these ideas:
widget( T[] init, int length ) // (1)
widget( T... init ) // (2)
widget( std::vector<T> init ) // (3)
a classic array, you could probably add const here and there
three dots already exist in the language (var-args, now variadic templates), why not re-use the syntax (and make it feel built-in)
just an existing container, could add const and &
All of them are already a part of the language. I only wrote my 3 first ideas, I am sure that there are many other approaches.
There were already examples of "core" language features that returned types defined in the std namespace. typeid returns std::type_info and (stretching a point perhaps) sizeof returns std::size_t.
In the former case, you already need to include a standard header in order to use this so-called "core language" feature.
Now, for initializer lists it happens that no keyword is needed to generate the object, the syntax is context-sensitive curly braces. Aside from that it's the same as type_info. Personally I don't think the absence of a keyword makes it "more hacky". Slightly more surprising, perhaps, but remember that the objective was to allow the same braced-initializer syntax that was already allowed for aggregates.
So yes, you can probably expect more of this design principle in future:
if more occasions arise where it is possible to introduce new features without new keywords then the committee will take them.
if new features require complex types, then those types will be placed in std rather than as builtins.
Hence:
if a new feature requires a complex type and can be introduced without new keywords then you'll get what you have here, which is "core language" syntax with no new keywords and that uses library types from std.
What it comes down to, I think, is that there is no absolute division in C++ between the "core language" and the standard libraries. They're different chapters in the standard but each references the other, and it has always been so.
There is another approach in C++11, which is that lambdas introduce objects that have anonymous types generated by the compiler. Because they have no names they aren't in a namespace at all, certainly not in std. That's not a suitable approach for initializer lists, though, because you use the type name when you write the constructor that accepts one.
The C++ Standard Committee seems to prefer not to add new keywords, probably because that increases the risk of breaking existing code (legacy code could use that keyword as the name of a variable, a class, or whatever else).
Moreover, it seems to me that defining std::initializer_list as a templated container is quite an elegant choice: if it was a keyword, how would you access its underlying type? How would you iterate through it? You would need a bunch of new operators as well, and that would just force you to remember more names and more keywords to do the same things you can do with standard containers.
Treating an std::initializer_list as any other container gives you the opportunity of writing generic code that works with any of those things.
UPDATE:
Then why introduce a new type, instead of using some combination of existing? (from the comments)
To begin with, all others containers have methods for adding, removing, and emplacing elements, which are not desirable for a compiler-generated collection. The only exception is std::array<>, which wraps a fixed-size C-style array and would therefore remain the only reasonable candidate.
However, as Nicol Bolas correctly points out in the comments, another, fundamental difference between std::initializer_list and all other standard containers (including std::array<>) is that the latter ones have value semantics, while std::initializer_list has reference semantics. Copying an std::initializer_list, for instance, won't cause a copy of the elements it contains.
Moreover (once again, courtesy of Nicol Bolas), having a special container for brace-initialization lists allows overloading on the way the user is performing initialization.
This is nothing new. For example, for (i : some_container) relies on existence of specific methods or standalone functions in some_container class. C# even relies even more on its .NET libraries. Actually, I think, that this is quite an elegant solution, because you can make your classes compatible with some language structures without complicating language specification.
This is indeed nothing new and how many have pointed out, this practice was there in C++ and is there, say, in C#.
Andrei Alexandrescu has mentioned a good point about this though: You may think of it as a part of imaginary "core" namespace, then it'll make more sense.
So, it's actually something like: core::initializer_list, core::size_t, core::begin(), core::end() and so on. This is just an unfortunate coincidence that std namespace has some core language constructs inside it.
Not only can it work completely in the standard library. Inclusion into the standard library does not mean that the compiler can not play clever tricks.
While it may not be able to in all cases, it may very well say: this type is well known, or a simple type, lets ignore the initializer_list and just have a memory image of what the initialized value should be.
In other words int i {5}; can be equivalent to int i(5); or int i=5; or even intwrapper iw {5}; Where intwrapper is a simple wrapper class over an int with a trivial constructor taking an initializer_list
It's not part of the core language because it can be implemented entirely in the library, just line operator new and operator delete. What advantage would there be in making compilers more complicated to build it in?

C++11 and [17.5.2.1.3] Bitmask Types

The Standard allows one to choose between an integer type, an enum, and a std::bitset.
Why would a library implementor use one over the other given these choices?
Case in point, llvm's libcxx appears to use a combination of (at least) two of these implementation options:
ctype_base::mask is implemented using an integer type:
<__locale>
regex_constants::syntax_option_type is implemented using an enum + overloaded operators:
<regex>
The gcc project's libstdc++ uses all three:
ios_base::fmtflags is implemented using an enum + overloaded operators: <bits/ios_base.h>
regex_constants::syntax_option_type is implemented using an integer type,
regex_constants::match_flag_type is implemented using a std::bitset
Both: <bits/regex_constants.h>
AFAIK, gdb cannot "detect" the bitfieldness of any of these three choices so there would not be a difference wrt enhanced debugging.
The enum solution and integer type solution should always use the same space. std::bitset does not seem to make the guarantee that sizeof(std::bitset<32>) == std::uint32_t so I don't see what is particularly appealing about std::bitset.
The enum solution seems slightly less type safe because the combinations of the masks does not generate an enumerator.
Strictly speaking, the aforementioned is with respect to n3376 and not FDIS (as I do not have access to FDIS).
Any available enlightenment in this area would be appreciated.
The really surprising thing is that the standard restricts it to just three alternatives. Why shouldn't a class type be acceptable? Anyway…
Integral types are the simplest alternative, but they lack type safety. Very old legacy code will tend to use these as they are also the oldest.
Enumeration types are safe but cumbersome, and until C++11 they tended to be fixed to the size and range of int.
std::bitset may be have somewhat more type safety in that bitset<5> and bitset<6> are different types, and addition is disallowed, but otherwise is unsafe much like an integral type. This wouldn't be an issue if they had allowed types derived from std::bitset<N>.
Clearly enums are the ideal alternative, but experience has proven that the type safety is really unnecessary. So they threw implementers a bone and allowed them to take easier routes. The short answer, then, is that laziness leads implementers to choose int or bitset.
It is a little odd that types derived from bitset aren't allowed, but really that's a minor thing.
The main specification that clause provides is the set of operations defined over these types (i.e., the bitwise operators).
My preference is to use an enum, but there are sometimes valid reasons to use an integer. Usually ctype_base::mask interacts with the native OS headers, with a mapping from ctype_base::mask to the <ctype.h> implementation-defined constants such as _CTYPE_L and _CTYPE_U used for isupper and islower etc. Using an integer might make it easier to use ctype_base::mask directly with native OS APIs.
I don't know why libstdc++'s <regex> uses a std::bitset. When that code was committed I made a mental note to replace the integer types with an enumeration at some point, but <regex> is not a priority for me to work on.
Why would the standard allow different ways of implementing the library? And the answer is: Why not?
As you have seen, all three options are obviously used in some implementations. The standard doesn't want to make existing implementations non-conforming, if that can be avoided.
One reason to use a bitset could be that its size fits better than an enum or an integer. Not all systems even have a std::uint32_t. Maybe a bitset<24> will work better there?

Unions as Base Class

The standard defines that Unions cannot be used as Base class, but is there any specific reasoning for this? As far as I understand Unions can have constructors, destructors, also member variables, and methods to operate on those varibales. In short a Union can encapsulate a datatype and state which might be accessed through member functions. Thus it in most common terms qualifies for being a class and if it can act as a class then why is it restricted from acting as a base class?
Edit: Though the answers try to explain the reasoning I still do not understand how Union as a Derived class is worst than when Union as just a class. So in hope of getting more concrete answer and reasoning I will push this one for a bounty. No offence to the already posted answers, Thanks for those!
Tony Park gave an answer which is pretty close to the truth. The C++ committee basically didn't think it was worth the effort to make unions a strong part of C++, similarly to the treatment of arrays as legacy stuff we had to inherit from C but didn't really want.
Unions have problems: if we allow non-POD types in unions, how do they get constructed? It can certainly be done, but not necessarily safely, and any consideration would require committee resources. And the final result would be less than satisfactory, because what is really required in a sane language is discriminated unions, and bare C unions could never be elevated to discriminated unions in way compatible with C (that I can imagine, anyhow).
To elaborate on the technical issues: since you can wrap a POD-component only union in a struct without losing anything, there's no advantage allowing unions as bases. With POD-only union components, there's no problem with explicit constructors simply assigning one of the components, nor with using a bitblit (memcpy) for compiler generated copy constructor (or assignment).
Such unions, however, aren't useful enough to bother with except to retain them so existing C code can be considered valid C++. These POD-only unions are broken in C++ because they fail to retain a vital invariant they possess in C: any data type can be used as a component type.
To make unions useful, we must allow constructable types as members. This is significant because it is not acceptable to merely assign a component in a constructor body, either of the union itself, or any enclosing struct: you cannot, for example, assign a string to an uninitialised string component.
It follows one must invent some rules for initialising union component with mem-initialisers, for example:
union X { string a; string b; X(string q) : a(q) {} };
But now the question is: what is the rule? Normally the rule is you must initialise every member and base of a class, if you do not do so explicitly, the default constructor is used for the remainder, and if one type which is not explicitly initialised does not have a default constructor, it's an error [Exception: copy constructors, the default is the member copy constructor].
Clearly this rule can't work for unions: the rule has to be instead: if the union has at least one non-POD member, you must explicitly initialise exactly one member in a constructor. In this case, no default constructor, copy constructor, assignment operator, or destructor will be generated and if any of these members are actually used, they must be explicitly supplied.
So now the question becomes: how would you write, say, a copy constructor? It is, of course quite possible to do and get right if you design your union the way, say, X-Windows event unions are designed: with the discriminant tag in each component, but you will have to use placement operator new to do it, and you will have to break the rule I wrote above which appeared at first glance to be correct!
What about default constructor? If you don't have one of those, you can't declare an uninitialised variable.
There are other cases where you can determine the component externally and use placement new to manage a union externally, but that isn't a copy constructor. The fact is, if you have N components you'd need N constructors, and C++ has a broken idea that constructors use the class name, which leaves you rather short of names and forces you to use phantom types to allow overloading to choose the right constructor .. and you can't do that for the copy constructor since its signature is fixed.
Ok, so are there alternatives? Probably, yes, but they're not so easy to dream up, and harder to convince over 100 people that it's worthwhile to think about in a three day meeting crammed with other issues.
It is a pity the committee did not implement the rule above: unions are mandatory for aligning arbitrary data and external management of the components is not really that hard to do manually, and trivial and completely safe when the code is generated by a suitable algorithm, in other words, the rule is mandatory if you want to use C++ as a compiler target language and still generate readable, portable code. Such unions with constructable members have many uses but the most important one is to represent the stack frame of a function containing nested blocks: each block has local data in a struct, and each struct is a union component, there is no need for any constructors or such, the compiler will just use placement new. The union provides alignment and size, and cast free component access.
[And there is no other conforming way to get the right alignment!]
Therefore the answer to your question is: you're asking the wrong question. There's no advantage to POD-only unions being bases, and they certainly can't be derived classes because then they wouldn't be PODs. To make them useful, some time is required to understand why one should follow the principle used everywhere else in C++: missing bits aren't an error unless you try to use them.
Union is a type that can be used as any one of its members depending on which member has been set - only that member can be later read.
When you derive from a type the derived type inherits the base type - the derived type can be used wherever the base type could be. If you could derive from a union the derived class could be used (not implicitly, but explicitly through naming the member) wherever any of the union members could be used, but among those members only one member could be legally accessed. The problem is the data on which member has been set is not stored in the union.
To avoid this subtle yet dangerous contradiction that in fact subverts a type system deriving from a union is not allowed.
Bjarne Stroustrup said 'there seems little reason for it' in The Annotated C++ Reference Manual.
The title asks why unions can't be a base class, but the question appears to be about unions as a derived class. So, which is it?
There's no technical reason why unions can't be a base class; it's just not allowed. A reasonable interpretation would be to think of the union as a struct whose members happen to potentially overlap in memory, and consider the derived class as a class that inherits from this (rather odd) struct. If you need that functionality, you can usually persuade most compilers to accept an anonymous union as a member of a struct. Here's an example, that's suitable for use as a base class. (And there's an anonymous struct in the union for good measure.)
struct V3 {
union {
struct {
float x,y,z;
};
float f[3];
};
};
The rationale for unions as a derived class is probably simpler: the result wouldn't be a union. Unions would have to be the union of all their members, and all of their bases. That's fair enough, and might open up some interesting template possibilities, but you'd have a number of limitations (all bases and members would have to be POD -- and would you be able to inherit twice, because a derived type is inherently non-POD?), this type of inheritance would be different from the other type the language sports (OK, not that this has stopped C++ before) and it's sort of redundant anyway -- the existing union functionality would do just as well.
Stroustrup says this in the D&E book:
As with void *, programmers should know that unions ... are inherently dangerous, should be avoided wherever possible, and should be handled with special care when actually needed.
(The elision doesn't change the meaning.)
So I imagine the decision is arbitrary, and he just saw no reason to change the union functionality (it works fine as-is with the C subset of C++), and so didn't design any integration with the new C++ features. And when the wind changed, it got stuck that way.
I think you got the answer yourself in your comments on EJP's answer.
I think unions are only included in C++ at all in order to be backwards compatible with C. I guess unions seemed like a good idea in 1970, on systems with tiny memory spaces. By the time C++ came along I imagine unions were already looking less useful.
Given that unions are pretty dangerous anyway, and not terribly useful, the vast new opportunities for creating bugs that inheriting from unions would create probably just didn't seem like a good idea :-)
Here's my guess for C++ 03.
As per $9.5/1, In C++ 03, Unions can not have virtual functions. The whole point of a meaningful derivation is to be able to override behaviors in the derived class. If a union cannot have virtual functions, that means that there is no point in deriving from a union.
Hence the rule.
You can inherit the data layout of a union using the anonymous union feature from C++11.
#include <cstddef>
template <size_t N,typename T>
struct VecData {
union {
struct {
float x;
float y;
float z;
};
float a[N];
};
};
template <size_t N, typename T>
class Vec : public VecData<N,T> {
//methods..
};
In general its almost always better not work with unions directly but enclose them within a struct or class. Then you can base your inheritance off the struct outer layer and use unions within if you need to.