Is boost::variant rocket science? (And should I therefore avoid it for simple problems?) - c++

OK, so I have this tiny little corner of my code where I'd like my function return either of (int, double, CString) to clean up the code a bit.
So I think: No problem to write a little union-like wrapper struct with three members etc. But wait! Haven't I read of boost::variant? Wouldn't this be exactly what I need? This would save me from messing around with a wrapper struct myself! (Note that I already have the boost library available in my project.)
So I fire up my browser, navigate to Chapter 28. Boost.Variant and lo and behold:
The variant class template is a safe, generic, stack-based discriminated union container, offering a simple solution for manipulating an object from a heterogeneous set of types [...]
Great! Exactly what I need!
But then it goes on:
Boost.Variant vs. Boost.Any
Boost.Any makes little use of template metaprogramming techniques (avoiding potentially hard-to-read error messages and significant compile-time processor and memory demands).
[...]
Troubleshooting
"Internal heap limit reached" -- Microsoft Visual C++ -- The compiler option /ZmNNN can increase the memory allocation limit. The NNN is a scaling percentage (i.e., 100 denotes the default limit). (Try /Zm200.)
[...]
Uh oh. So using boost::variant may significantly increase compile-time and generate hard-to-read error messages. What if someone moves my use of boost::variant to a common header, will our project suddenly take lots longer to compile? Am I introducing an (unnecessarily) complex type?
Should I use boost::variant for my simple tiny problem?

Generally, use boost::variant if you do want a discriminated union (any is for unknown types -- think of it as some kind of equivalent to how void* is used in C).
Some advantages include exception handling, potential usage of less space than the sum of the type sizes, type discriminated "visiting". Basically, stuff you'd want to perform on the discriminated union.
However, for boost::variant to be efficient, at least one of the types used must be "easily" constructed (read the documentation for more details on what "easily" means).

Boost.variant is not that complex, IMHO. Yes, it is template based, but it doesn't use any really complex feature of C++. I've used quite a bit and no problem at all. I think in your case it would help better describing what your code is doing.
Another way of thinking is transforming what that function returns into a more semantically rich structure/class that allows interpreting which inner element is interesting, but that depends on your design.

This kind of boost element comes from functional programming, where you have variants around every corner.
It should be a way to have a type-safe approach to returning a kind of value that can be of many precise types. This means that is useful to solve your problem BUT you should consider if it's really what you need to do.
The added value compared to other approaches that tries to solve the same problem should be the type-safety (you won't be able to place whatever you want inside a variant without noticing, in opposition to a void*)

I don't use it because, to me, it's a symptom of bad design.
Either your method should return an object that implements a determinated interface or it should be split in more than one method. Design should be reviewed, anyway.

Related

What is the purpose of boost::fusion?

Ive spent the day reading notes and watching a video on boost::fusion and I really don't get some aspects to it.
Take for example, the boost::fusion::has_key<S> function. What is the purpose of having this in boost::fusion? Is the idea that we just try and move as much programming as possible to happen at compile-time? So pretty much any boost::fusion function is the same as the run-time version, except it now evaluates at compile time? (and we assume doing more at compile-time is good?).
Related to boost::fusion, i'm also a bit confused why metafunctions always return types. Why is this?
Another way to look at boost::fusion is to think of it as "poor man introspection" library. The original motivation for boost::fusion comes from the direction of boost::spirit parser/generator framework, in particular the need to support what is called "parser attributes".
Imagine, you've got a CSV string to parse:
aaaa, 1.1
The type, this string parses into, can be described as "tuple of string and double". We can define such tuples in "plain" C++, either with old school structs (struct { string a; double b; } or newer tuple<string, double>). The only thing we miss is some sort of adapter, which will allow to pass tuples (and some other types) of arbitrary composition to a unified parser interface and expect it to make sense of it without passing any out of band information (such as string parsing templates used by scanf).
That's where boost::fusion comes into play. The most straightforward way to construct a "fusion sequence" is to adapt a normal struct:
struct a {
string s;
double d;
};
BOOST_FUSION_ADAPT_STRUCT(a, (string, s)(double, d))
The "ADAPT_STRUCT" macro adds the necessary information for parser framework (in this example) to be able to "iterate" over members of struct a to the tune of the following questions:
I just parsed a string. Can I assign it to first member of struct a?
I just parsed a double. Can I assign it to second member of struct a?
Are there any other members in struct a or should I stop parsing?
Obviously, this basic example can be further extended (and boost::fusion supplies the capability) to address much more complex cases:
Variants - let's say parser can encounter either sting or double and wants to assign it to the right member of struct a. BOOST_FUSION_ADAPT_ASSOC_STRUCT comes to the rescue (now our parser can ask questions like "which member of struct a is of type double?").
Transformations - our parser can be designed to accept certain types as parameters but the rest of the programs had changed quite a bit. Yet, fusion metafunctions can be conveniently used to adapt new types to old realities (or vice versa).
The rest of boost::fusion functionality naturally follows from the above basics. fusion really shines when there's a need for conversion (in either direction) of "loose IO data" to strongly typed/structured data C++ programs operate upon (if efficiency is of concern). It is the enabling factor behind spirit::qi and spirit::karma being such an efficient (probably the fastest) I/O frameworks .
Fusion is there as a bridge between compile-time and run-time containers and algorithms. You may or may not want to move some of your processing to compile-time, but if you do want to then Fusion might help. I don't think it has a specific manifesto to move as much as possible to compile-time, although I may be wrong.
Meta-functions return types because template meta-programming wasn't invented on purpose. It was discovered more-or-less by accident that C++ templates can be used as a compile-time programming language. A meta-function is a mapping from template arguments to instantiations of a template. As of C++03 there were are two kinds of template (class- and function-), therefore a meta-function has to "return" either a class or a function. Classes are more useful than functions, since you can put values etc. in their static data members.
C++11 adds another kind of template (for typedefs), but that is kind of irrelevant to meta-programming. More importantly for compile-time programming, C++11 adds constexpr functions. They're properly designed for the purpose and they return values just like normal functions. Of course, their input is not a type, so they can't be mappings from types to something else in the way that templates can. So in that sense they lack the "meta-" part of meta-programming. They're "just" compile-time evaluation of normal C++ functions, not meta-functions.

How is is_standard_layout useful?

From what I understand, standard layout allows three things:
Empty base class optimization
Backwards compatibility with C with certain pointer casts
Use of offsetof
Now, included in the library is the is_standard_layout predicate metafunction, but I can't see much use for it in generic code as those C features I listed above seem extremely rare to need checking in generic code. The only thing I can think of is using it inside static_assert, but that is only to make code more robust and isn't required.
How is is_standard_layout useful? Are there any things which would be impossible without it, thus requiring it in the standard library?
General response
It is a way of validating assumptions. You wouldn't want to write code that assumes standard layout if that wasn't the case.
C++11 provides a bunch of utilities like this. They are particularly valuable for writing generic code (templates) where you would otherwise have to trust the client code to not make any mistakes.
Notes specific to is_standard_layout
It looks to me like the (pseudo code) definition of is_pod would roughly be...
// note: applied recursively to all members
bool is_pod(T) { return is_standard_layout(T) && is_trivial(T); }
So, you need to know is_standard_layout in order to implement is_pod. Given that, we might as well expose is_standard_layout as a tool available to library developers. Also of note: if you have a use-case for is_pod, you might want to consider the possibility that is_standard_layout might actually be a better (more accurate) choice in that case, since POD is essentially a subset of standard layout.
I get the feeling that they added every conceivable variant of type evaluation, regardless of any obvious value, just in case someone might encounter a need sometime before the next standard comes out. I doubt if piling on these "extra" type properties adds a significant additional burden to compiler developers.
There is a nice discussion of standard layout here: Why is C++11's POD "standard layout" definition the way it is?
There is also a lot of good detail at cppreference.com: Non-static data members

Creating serializeable unique compile-time identifiers for arbitrary UDT's

I would like a generic way to create unique compile-time identifiers for any C++ user defined types.
for example:
unique_id<my_type>::value == 0 // true
unique_id<other_type>::value == 1 // true
I've managed to implement something like this using preprocessor meta programming, the problem is, serialization is not consistent. For instance if the class template unique_id is instantiated with other_type first, then any serialization in previous revisions of my program will be invalidated.
I've searched for solutions to this problem, and found several ways to implement this with non-consistent serialization if the unique values are compile-time constants. If RTTI or similar methods, like boost::sp_typeinfo are used, then the unique values are obviously not compile-time constants and extra overhead is present. An ad-hoc solution to this problem would be, instantiating all of the unique_id's in a separate header in the correct order, but this causes additional maintenance and boilerplate code, which is not different than using an enum unique_id{my_type, other_type};.
A good solution to this problem would be using user-defined literals, unfortunately, as far as I know, no compiler supports them at this moment. The syntax would be 'my_type'_id; 'other_type'_id; with udl's.
I'm hoping somebody knows a trick that allows implementing serialize-able unique identifiers in C++ with the current standard (C++03/C++0x), I would be happy if it works with the latest stable MSVC and GNU-G++ compilers, although I expect if there is a solution, it's not portable.
I would like to make clear, that using mpl::set or similar constructs like mpl::vector and filtering, does not solve this problem, because the scope of the meta-set/vector is limited and actually causes more problems than just preprocessor meta programming.
A while back I added a build step to one project of mine, which allowed me to write #script_name(args) in a C++ source file and have it automatically replaced with the output of the associated script, for instance ./script_name.pl args or ./script_name.py args.
You may balk at the idea of polluting the language into nonstandard C++, but all you'd have to do is write #sha1(my_type) to get the unique integer hash of the class name, regardless of build order and without the need for explicit instantiation.
This is just one of many possible nonstandard solutions, and I think a fairly clean one at that. There's currently no great way to impose an arbitrary, consistent ordering on your classes without just specifying it explicitly, so I recommend you simply give in and go the explicit instantiation route; there's nothing really wrong with centralising the information, but as you said it's not all that different from an enumeration, which is what I'd actually use in this situation.
Persistence of data is a very interesting problem.
My first question would be: do you really want serialization ? If you are willing to investigate an alternative, then jump to the next section.
If you're still there, I think you have not given the typeid solution all its due.
// static detection
template <typename T>
size_t unique_id()
{
static size_t const id = some_hash(typeid(T)); // or boost::sp_typeinfo
return id;
}
// dynamic detection
template <typename T>
size_t unique_id(T const& t)
{
return some_hash(typeid(t)); // no memoization possible
}
Note: I am using a local static to avoid the order of initialization issue, in case this value is required before main is entered
It's pretty similar to your unique_id<some_type>::value, and even though it's computed at runtime, it's only computed once, and the result (for the static detection) is then memoized for future calls.
Also note that it's fully generic: no need to explicitly write the function for each type.
It may seem silly, but the issue of serialization is that you have a one-to-one mapping between the type and its representation:
you need to version the representation, so as to be able to decode "older" versions
dealing with forward compatibility is pretty hard
dealing with cyclic reference is pretty hard (some framework handle it)
and then there is the issue of moving information from one to another --> deserializing older versions becomes messy and frustrating
For persistent saves, I usually recommend using a dedicated BOM. Think of the saved data as a message to your future self. And I usually go the extra mile and proposes the awesome Google Proto Buffer library:
Backward and Forward compatibility baked-in
Several format outputs -> human readable (for debug) or binary
Several languages can read/write the same messages (C++, Java, Python)
Pretty sure that you will have to implement your own extension to make this happen, I've not seen nor heard of any such construct for compile-time. MSVC offers __COUNTER__ for the preprocessor but I know of no template equivalent.

How to explain C++ templates to junior developers?

One could break the question into two: how to read and to write templated code.
It is very easy to say, "it you want an array of doubles, write std::vector<double>", but it won't teach them how the templates work.
I'd probably try to demonstrate the power of templates, by demonstrating the annoyance of not using them.
A good demonstration would be to write something simple like a stack of doubles (hand-written, not STL), with methods push, pop, and foldTopTwo, which pops off and adds together the top two values in the stack, and pushes the new value back on.
Then tell them to do the same for ints (or whatever, just some different numeric type).
Then show them how, by writing this stack as a template, you can significantly reduce the number of lines of code, and all of that horrible duplication.
There is a saying: "If you can't explain it, you don't understand it."
You can break it down further: How to write code that uses templated code, and how to write code that provides a templated service to others.
The basic explanation is that templates generated code based on a template. That is the source of the term "meta programming". It is programming how programming should be done.
The essential complexity of a vector is not that it is a vector of doubles (or type T), but that it is a vector. The basic structure is the same and templates separate that which is consistent from that which is not.
Further explanation depends on how much of that makes sense to you!
IMHO it is best to explain them as (very) fancy macros. They just work at higher level than C-style text substitution macros.
I found it very instructive to look at duck-typed languages. It doesn't matter, there, what type of argument you give a function, as long as they offer the right interface.
Templates allows to do the same thing: you can take any type, as long as the right interface is present. The additional benefit over duck-typing is, that the interface is checked at compile-time.
Present them as advanced macros. It's a programming language on its own that is executed during compliation.
I would get them to implement something themselves, then experiment with different variations until they understand it. Learning by doing is almost always the better option with programming.
For example, get them to make a template which compares two values and returns the higher one. Then have them see has passing ints or doubles or whatever still allows it to work. Then get them to tweak the the code / copy it and have it return the minimum value. Again, experiment with variations - will the template allow them to pass an int and a double, or will it complain?
From there, you can have them pass in arrays of whatever type (int, double etc), and have it sort the array from highest to lowest, again encouraging experimentation. From there, start to move into templated class definitions, using the same kind of ideas but on a larger scale. This is pretty much how I learnt about templates, ending up with complex array manipulation classes for generic types.
When I was teaching myself C++ I used this site a lot. It explains templates in depth and very well. I would recommend having them read that and try implementing something simple.
For a shorter explanation: Templates are frameworks for complicated constructs that act on data without having to know what that data is. Give them some examples of a simple template (like a linked-list) and walk through how the template is used to generate the final class.
You can tell that a template is a half-written source with parameters to be filled while instatiating the template.

Valid use of typedef?

I have a char (ie. byte) buffer that I'm sending over the network. At some point in the future I might want to switch the buffer to a different type like unsigned char or short. I've been thinking about doing something like this:
typedef char bufferElementType;
And whenever I do anything with a buffer element I declare it as bufferElementType rather than char. That way I could switch to another type by changing this typedef (of course it wouldn't be that simple, but it would at least be easy to identify the places that need to be modified... there'll be a bufferElementType nearby).
Is this a valid / good use of typedef? Is it not worth the trouble? Is it going to give me a headache at some point in the future? Is it going to make maintainance programmers hate me?
I've read through When Should I Use Typedef In C++, but no one really covered this.
It is a great (and normal) usage. You have to be careful, though, that, for example, the type you select meet the same signed/unsigned criteria, or that they respond similarly to operators. Then it would be easier to change the type afterwards.
Another option is to use templates to avoid fixing the type till the moment you're compiling. A class that is defined as:
template <typename CharType>
class Whatever
{
CharType aChar;
...
};
is able to work with any char type you select, while it responds to all the operators in the same way.
Another advantage of typedefs is that, if used wisely, they can increase readability. As a really dumb example, a Meter and a Degree can both be doubles, but you'd like to differentiate between them. Using a typedef is onc quick & easy solution to make errors more visible.
Note: a more robust solution to the above example would have been to create different types for a meter and a degree. Thus, the compiler can enforce things itself. This requires a bit of work, which doesn't always pay off, however. Using typedefs is a quick & easy way to make errors visible, as described in the article linked above.
Yes, this is the perfect usage for typedef, at least in C.
For C++ it may be argued that templates are a better idea (as Diego Sevilla has suggested) but they have their drawbacks. (Extra work if everything using the data type is not already wrapped in a few classes, slower compilation times and a more complex source file structure, etc.)
It also makes sense to combine the two approaches, that is, give a typedef name to a template parameter.
Note that as you're sending data over a network, char and other integer types may not be interchangeable (e.g. due to endian-ness). In that case, using a templated class with specialized functions might make more sense. (send<char> sends the byte, send<short> converts it to network byte order first)
Yet another solution would be to create a "BufferElementType" class with helper methods (convertToNetworkOrderBytes()) but I'll bet that would be an overkill for you.