Boost Variant essentially a Union in c/c++? - c++

I'm wondering what the differences are between a Boost Variant and a union data-type in c/c++. I know that a union data-type takes up the same memory location and the largest data type in the region of memory occupies the total amount of memory used e.g.
union space {
char CHAR;
float FLOAT;
int INTEGER;
}S;
should occupy 4 bytes of memory since int and float are the largest and equal size. Are there similarities and differences in other ways between Boost Variant and union data types?
I also know that a Boost Variant can take any data type and it allows data type "polymorphism" (correct me if I'm misusing a OOP topic word). Is a union data type therefore a type of polymorphism also?

The primary difference is that Boost's Variant knows which type is stored in it, so you can't make mistakes or get UB from misusing a Variant in the same way you can a union. This also permits Variant to take non-POD (i.e. actually useful) types. Variant also has a few extra tricks like permitting visitors and recursive variants.
The best guide to using unions is "Don't, because it's almost impossible to put them to good use without invoking UB". This does not apply to Variant, so it's a lot safer to recommend.

Boost variant emulates a union but it does not use a union in its implementation. Instead it uses aligned storage and placement new.
It is polymorphic in the sense that if you apply a visitor object on a variant then it will pick the right overload for you. This selection must happen at runtime, but the object code for this is unrolled at compile time. So it's quite fast.

Related

C++ std::variant vs std::any

C++17 presents std::variant and std::any, both able to store different type of values under an object. For me, they are somehow similar (are they?).
Also std::variant restricts the entry types, beside this one. Why we should prefer std::variant over std::any which is simpler to use?
The more things you check at compile time the fewer runtime bugs you have.
variant guarantees that it contains one of a list of types (plus valueless by exception). It provides a way for you to guarantee that code operating on it considers every case in the variant with std::visit; even every case for a pair of variants (or more).
any does not. With any the best you can do is "if the type isn't exactly what I ask for, some code won't run".
variant exists in automatic storage. any may use the free store; this means any has performance and noexcept(false) issues that variant does not.
Checking for which of N types is in it is O(N) for an any -- for variant it is O(1).
any is a dressed-up void*. variant is a dressed-up union.
any cannot store non-copy or non-move able types. variant can.
The type of variant is documentation for the reader of your code.
Passing a variant<Msg1, Msg2, Msg3> through an API makes the operation obvious; passing an any there means understanding the API requires reliable documentation or reading the implementation source.
Anyone who has been frustrated by statically typeless languages will understand the dangers of any.
Now this doesn't mean any is bad; it just doesn't solve the same problems as variant. As a copyable object for type erasure purposes, it can be great. Runtime dynamic typing has its place; but that place is not "everywhere" but rather "where you cannot avoid it".
The difference is that the objects are stored within the memory allocated by std::variant:
cppreference.com - std::variant
As with unions, if a variant holds a value of some object type T, the object representation of T is allocated directly within the object representation of the variant itself. Variant is not allowed to allocate additional (dynamic) memory.
and for std::any this is not possible.
As of that a std::variant, does only require one memory allocation for the std::variant itself, and it can stay on the stack.
In addition to never using additional heap memory, variant has one other advantage:
You can std::visit a variant, but not any.

"Non-pointer POD" type in C++

Is there a term for a class/struct that is both trivial and standard-layout but also has no pointer members?
Basically I'd like to refer to "really" plain-old-data types. Data that I can grab from memory and store on disk, and read back into memory for later processing because it is nothing more than a collection of ints, characters, enums, etc.
Is there a way to test at compile time if a type is a "really" plain-old-data type?
related:
What are POD types in C++?
What are Aggregates and PODs and how/why are they special?
This can depend on semantics of the structure. I could imagine a struct having int fields being keys into some volatile temporary data store (or cache). You still shouldn't serialize those, but you need internal knowledge about that struct to be able to tell1.
In general, C++ lacks features for generic serialization. Making this automatic just on pointers is just a tip of the iceberg (if possibly pretty accurate in general) - it's also impossible in a generic way. C++ still has no reflection, and thus no way to check "every member" for some condition.
The realistic approaches could be:
preprocessing the class sources before build to scan for pointers
declaring all structs that are to be serialized with some macros that track the types
the regular template check could be implemented for a set of known names for fields
All of those have their limitations, though, and together with my earlier reservations, I'm not sure how practical they'd be.
1 This of course goes both ways; pointers could be used to store relative offsets, and thus be perfectly serializable.

Where to use std::variant over union?

Please explain what is the difference between union and std::variant and why std::variant was introduced into the standard? In what situations should we use std::variant over the old-school union?
Generally speaking, you should prefer variant unless one of the following comes up:
You're cheating. You're doing type-punning or other things that are UB but you're hoping your compiler won't break your code.
You're doing some of the pseudo-punnery that C++ unions are allowed to do: conversion between layout-compatible types or between common initial sequences.
You explicitly need layout compatibility. variant<Ts> are not required to have any particular layout; unions of standard layout types are standard layout.
You need low-level support for in-place switching of objects. Using a memory buffer for such things doesn't provide the trivial copying guarantees that you could get out of a union.
The basic difference between the two is that variant knows which type it stores, while union expects you to keep track of that externally. So if you try to access the wrong item in a variant, you get an exception or nullptr. By contrast, doing so with a union is merely undefined behavior.
union is a lower-level tool, and thus should only be used when you absolutely need that lower-level.
variant also has machinery for doing visitation, which means you get to avoid having a bunch of if statements where you ask "if it is type X, do this. If it is type Y, do that, etc".

How to type-pun Boost quantity arrays to the underlying type?

I'm building a dynamic animation & rendering system and I would like to use Boost.Units for representing physical quantities to get the nice dimensional safety. However, I will have to pass arrays of quantities around to functions which know nothing about Boost, such as:
OpenGL buffer-filling commands. These simply take a const void * and expect to find an array of either float or double values when dereferencing it. They read the data.
Linear algebra functions (such as gemm or gesv) from different implementations of BLAS and LAPACK. These generally take either a float * or double * to a given array. They both read and write to the data.
I know that boost::units::quantity<U, T> has a const T& value() member which gives direct reference access to the contained T value. I have also verified that a boost::units::quantity<U, T> is a standard-layout struct with exactly one non-static data member, of type T.
So, let's assume that for a boost::units::quantity<U, T> q, the following holds:
static_cast<const void*>(&q) == static_cast<const void*>(&q.value())
sizeof(q) == sizeof(T)
My question is: given an array boost::units::quantity<U, T> a[100];, is it safe to:
Pass &a[0].value() to a function which expects to read an array of 100 objects of type T at the address?
Pass reinterpret_cast<T*>(&a[0]) to a function which will write 100 sequential values of type T at the address?
I am well aware this is probably Undefined Behaviour, but right now I have to follow the "Practicality beats purity"(1) principle. Even if this is UB, is it one which will do the expected thing, or will it bite in unforeseen ways? Since this might be compiler-specific: I need this for modern MSVC (from VS 2015).
And if this is not safe, is there a way to actually do this safely? With "this" referring to "using Boost.Units with OpenGL and with number crunchers which only have a C interface," without unnecessarily copying data.
(1) Adapted from the Zen of Python.
Yes, this looks like something you can do.
There's one thing you didn't mention and should be added to the list of conditions to check, though: the alignment of the wrapped amount type should match that of the underlying type. (see alignof).
So, in practice I'd write code like this only with a number of static_asserts¹ that guard the assumptions that make the re-interpretation valid.
If you add the assertion that T is the same as remove_cv_t<decltype(q.value())> this should be reliable.
With these pre-cautions in place there should not be UB, just IB (implementation defined behaviour) due the semantics of reinterpret_cast on your particular platform.
¹ and perhaps the debug assert that &q.value() == &q

Requirements on standard library allocator pointer types

I am trying to write a quadtree sparse matrix class. In short, a quadtree_matrix<T> is either the zero matrix or a quadruple (ne, nw, se, sw) of quadtree_matrix<T>.
I'd like eventually to test different allocation schemes since this will probably impact the performance of linear algebra operations. So I will also template quadtree_matrix on a standard allocator type, so that I can reuse existing allocators.
I will have to allocate two different kind of data: either a T, or a node, which contains four pointers (to either T or node). For all the algorithms I will consider, I know for sure what kind of data to expect because I know what are the sizes of the submatrices I am facing at any point of the algorithm (I don't even need to store these sizes).
I will of course be using two different allocators: this is ok, since allocator types provide the rebind template and a template copy constructor (and are intended to be used as value types, as the get_allocator members of standard containers suggest by returning a copy).
The problem is that allocator member functions use a certain pointer type, which is not required to be a vanilla pointer. Some allocators (boost interprocess allocators) use this feature extensively.
If the allocator pointer types were garden variety pointers, I would have no problems: at the very least, I could use pointers to void and reinterpret_cast them to the right type (either node* or T*). I could also use a union (probably better).
As far as I know, there is no requirement on the PODness of the allocator::pointer types. They are only required to be random access iterators.
Now, my question is:
Given an allocator class template A<T> (or its equivalent A::rebind<T>::other), is there any guarantee on:
The ability to static cast A<T>::pointer to A<U>::pointer provided U is an accessible base of T ?
The ability to static cast A<T>::pointer to A<U>::pointer provided T is an accessible base of U and the "runtime type" (whatever this means in this context) of the castee is U ?
The type A<void>::pointer (if this makes sense) ?
Or is there a solution to my problem I didn't think about ?
From the tables in 20.1.5/2 it clearly indicates that the type of A<T>::pointer must be "pointer to T". Since those pointer types are normally convertible your 1 and 2 are true. It follows then that A<void>::pointer must be void*.
EDIT:
There's also explicit wording in 20.1.5/4 (it applies to what standard containers may assume about allocators):
The typedef members pointer,
const_pointer, size_type, and
difference_type are required to be
T*,T const*, size_t, and ptrdiff_t,
respectively.
No, not really.
There is a requirement that A<T>::pointer is convertible to A<T>::const_pointer and A<T>::void_pointer but that is about all I can find.
A<void>::pointer is likely to be void*, unless you have some fancy special memory.
Note that even if a union is usable, I would still not use one, especially because here you could probably benefit from some form of automatic memory management (in order for your contain not to leak), which requires fancy classes.
I would therefore recommend a two-steps approach:
Write a small smart pointer that use a given allocator to perform the destruction (instead of delete)
Use boost::variant on your pointers
This way you have both automatic memory management and compacity.