How to store variant data in C++ - c++

I'm in the process of creating a class that stores metadata about a particular data source. The metadata is structured in a tree, very similar to how XML is structured. The metadata values can be integer, decimal, or string values.
I'm curious if there is a good way in C++ to store variant data for a situation like this. I'd like for the variant to use standard libraries, so I'm avoiding the COM, Ole, and SQL VARIANT types that are available.
My current solution looks something like this:
enum MetaValueType
{
MetaChar,
MetaString,
MetaShort,
MetaInt,
MetaFloat,
MetaDouble
};
union MetaUnion
{
char cValue;
short sValue;
int iValue;
float fValue;
double dValue;
};
class MetaValue
{
...
private:
MetaValueType ValueType;
std::string StringValue;
MetaUnion VariantValue;
};
The MetaValue class has various Get functions for obtaining the currently stored variant value, but it ends up making every query for a value a big block of if/else if statements to figure out which value I'm looking for.
I've also explored storing the value as only a string, and performing conversions to get different variant types out, but as far as I've seen this leads to a bunch of internal string parsing and error handling which isn't pretty, opens up a big old can of precision and data loss issues with floating point values, and still doesn't eliminate the query if/else if issue stated above.
Has anybody implemented or seen something that's cleaner to use for a C++ variant data type using standard libraries?

As of C++17, there’s std::variant.
If you can’t use that yet, you might want Boost.Variant. A similar, but distinct, type for modelling polymorphism is provided by std::any (and, pre-C++17, Boost.Any).
Just as an additional pointer, you can look for “type erasure”.

While Konrad's answer (using an existing standardized solution) is certainly preferable to writing your own bug-prone version, the boost variant has some overheads, especially in copy construction and memory.
A common customized approach is the following modified Factory Pattern:
Create a Base interface for a generic object that also encapsulates the object type (either as an enum), or using 'typeid' (preferable).
Now implement the interface using a template Derived class.
Create a factory class with a templateized create function with signature:
template <typename _T> Base * Factory::create ();
This internally creates a Derived<_T> object on the heap, and retuns a dynamic cast pointer. Specialize this for each class you want implemented.
Finally, define a Variant wrapper that contains this Base * pointer and defines template get and set functions. Utility functions like getType(), isEmpty(), assignment and equality operators, etc can be appropriately implemented here.
Depending on the utility functions and the factory implementation, supported classes will need to support some basic functions like assignment or copy construction.

You can also go down to a more C-ish solution, which would have a void* the size of a double on your system, plus an enum for which type you're using. It's reasonably clean, but definitely a solution for someone who feels wholly comfortable with the raw bytes of the system.

C++17 now has std::variant which is exactly what you're looking for.
std::variant
The class template std::variant represents a type-safe union. An
instance of std::variant at any given time either holds a value of one
of its alternative types, or in the case of error - no value (this
state is hard to achieve, see valueless_by_exception).
As with unions, if a variant holds a value of some object type T, the
object representation of T is allocated directly within the object
representation of the variant itself. Variant is not allowed to
allocate additional (dynamic) memory.

Although the question had been answered for a long time, for the record I would like to mention that QVariant in the Qt libraries also does this.
Because C++ forbids unions from including types that have non-default
constructors or destructors, most interesting Qt classes cannot be
used in unions. Without QVariant, this would be a problem for
QObject::property() and for database work, etc.
A QVariant object holds a single value of a single type() at a time.
(Some type()s are multi-valued, for example a string list.) You can
find out what type, T, the variant holds, convert it to a different
type using convert(), get its value using one of the toT() functions
(e.g., toSize()) and check whether the type can be converted to a
particular type using canConvert().

Related

how to declare variable using typeinfo.name C++

I love coding, and generally do so in Python due to its simplicity and power.
However, for some time critical programs/tasks, I use C++.
Therefore, to get best of both worlds, I am making a Pythonesque list in C++.
AIM: I would like to be able to add any variable or value of any data type, including classes user has defined.
To do this, I have a structure item with a char * value, a char * type and an int size.
My List has an array of these item * s.
Now, I have taken the variable in a template function:
template<class T> item * encode(const T& var);
and declared a pointer to item item * i = new item;
And, I have stored the values of these variables as c style strings.
For example, 14675 in binary is 0000 0000 0000 0000 0011 1001 0101 0011
Therefore, I have dynamically created space, like so:
i->size = sizeof(var);
i->value = new char[i->size]; //4 in this case
and set each bit in value with respective bits in var.
I have also stored their types as
i->type = typeinfo(var).name();
So far so good!
Now, I am stuck with auto decode(item * i) -> decltype(/*What goes here???*/)
How do I specify the return type of the function?
Is there any possible way?
Preferably using the i->type?
Or, should I make changes in the basic design of this process?
Thanks in advance!
Answering your question
I would like to be able to add any variable or value of any data type, including classes user has defined.
Without cooperation from the user that’s impossible in C++.
Remember that C++ types are a compile-time concept only. They do not exist at runtime. The only type information available at runtime is the thin layer of RTTI provided by typeid(). Runtime duck-typing like in Python is not possible.
You can create a container of arbitrary objects quite easily.
std::vector<std::any> v; // requires C++17
However the user of that container has to know what index contains what type:
if (v[0].type() == typeid(ArbitraryUserType)) {
const auto& item = std::any_cast<ArbitraryUserType>(v[0]);
// work on item ...
}
Because of the compile-time nature of types you as the library writer cannot perform that any_cast. It has to be spelled out in the user’s source code.
In general, don’t try to shoehorn a pythonic mindset into C++. It never ends well, especially when you try to circumvent one of the most basic foundations of C++: its powerful static type system.
Notes:
Without C++17 you could use boost::any.
If you know the list of possible types at compile-time std::vector<std::variant<Type1, Type2, etc>> is a good alternative. With any the user is fully responsible to keep track of their types. Because all type checks happen at runtime the compiler cannot help. Variant on the other hand brings back a large chunk of the compile-time safety. And again there’s boost::variant as a non-C++17 alternative.
Notes on your encoding approach
Basically you’re trying to serialize (encode) and deserialize (decode) arbitrary types. Without cooperation from those types, that’s not possible.
Your approach only works for trivial types that can be copied bit by bit. C++ even has a type trait for that: std::is_trivially_copyable. In the end you support fundamental types and C-style structs of those, but nothing else.
Imagine the T for your encode() function was std::string. Simply put a std::string contains a pointer to a separately allocated piece of memory where the actual string data is stored. The string object itself is just a managing wrapper for that pointer. encode() only serializes the wrapper object, but not the pointed-to memory block with the actual data.
Even if during deserialization you could instantiate arbitrary types from a stream of bits, the stream is not complete. What you’d have to implement is a C++ version of Python’s copy.deepcopy, which is impossible without cooperation from each type. Have a look at a C++ serialization library – take Cereal as a straight-forward example – to see how that cooperation can look in practice.

How to deal with a generic object in C++

What is the proper way to deal with a generic value in C++11 or is it OK to use (void *)?
Basically, I am parsing json, and the node value can either be String, Integer, Double, Date, etc.
In C, just using void * is OK (not safe, but ok), and in C# we use Object. But what is the proper way in C++11 to do this? Do I have to build a wrapper class, or is there an easier way?
You can make a base class for the various types, or use a "discriminated union" class such as Boost.Variant which holds a known set of types and remembers which one it is holding.

Is there a way to have a function return a type?

I have a data class (struct actually) two variables: a void pointer and a string containing the type of the object being pointed to.
struct data{
void* index;
std::string type;
data(): index(0), type("null"){}
data(void* index, std::string type): index(index), type(type){}};
Now I need to use the object being pointed to, by casting the void pointer to a type that is specified by the string, so I thought of using an std::map with strings and functions.
std::unordered_map<std::string, function> cast;
The problem is that the functions must always have the exact same return-type and can't return a type itself.
Edit:
Because I use the data class as a return-type and as arguments, templates won't suffice.
(also added some code to show what I mean)
data somefunction(data a){
//do stuff
return data();}
Currently, I use functions like this to do the trick, but I thought it could be done more easily:
void functionforstring(data a){
dynamic_cast<string*>(data.index)->function();}
Neither thing is possible in C++:
Functions cannot return types (that is to say, types are not values).
Code cannot operate on objects whose type it doesn't know at compile-time (that is to say, C++ is statically typed). Of course there is dynamic polymorphism via virtual functions, but even with that, the type of the pointer you use to call them is known at compile time by the calling code.
So the operation you want, "convert to the pointer type indicated by a string" is not possible. If it were possible, then the result would be a pointer whose type is not known at compile time, and that cannot be.
There's nothing you could do with this "pointer of type unknown at compile time", that you can't do using the void* you started with. void* pretty much already is what C++ has in place of a pointer to unknown type.
While it's not possible to return a type from a function, you could use typeid to get information about the object, and use the string returned by typeid(*obj).name() as an argument to your constructor.
Keep in mind that this string would be implementation defined, so you would have to generate this string at runtime for every type that you might possibly use in the program in order to make your unordered_map useful.
There is almost certainly a much simpler and more idiomatic way to accomplish your goal in C++, however. Perhaps if you explained more about the goals of the program, someone might be able to suggest an alternative approach.

What is the purpose of boost::fusion?

Ive spent the day reading notes and watching a video on boost::fusion and I really don't get some aspects to it.
Take for example, the boost::fusion::has_key<S> function. What is the purpose of having this in boost::fusion? Is the idea that we just try and move as much programming as possible to happen at compile-time? So pretty much any boost::fusion function is the same as the run-time version, except it now evaluates at compile time? (and we assume doing more at compile-time is good?).
Related to boost::fusion, i'm also a bit confused why metafunctions always return types. Why is this?
Another way to look at boost::fusion is to think of it as "poor man introspection" library. The original motivation for boost::fusion comes from the direction of boost::spirit parser/generator framework, in particular the need to support what is called "parser attributes".
Imagine, you've got a CSV string to parse:
aaaa, 1.1
The type, this string parses into, can be described as "tuple of string and double". We can define such tuples in "plain" C++, either with old school structs (struct { string a; double b; } or newer tuple<string, double>). The only thing we miss is some sort of adapter, which will allow to pass tuples (and some other types) of arbitrary composition to a unified parser interface and expect it to make sense of it without passing any out of band information (such as string parsing templates used by scanf).
That's where boost::fusion comes into play. The most straightforward way to construct a "fusion sequence" is to adapt a normal struct:
struct a {
string s;
double d;
};
BOOST_FUSION_ADAPT_STRUCT(a, (string, s)(double, d))
The "ADAPT_STRUCT" macro adds the necessary information for parser framework (in this example) to be able to "iterate" over members of struct a to the tune of the following questions:
I just parsed a string. Can I assign it to first member of struct a?
I just parsed a double. Can I assign it to second member of struct a?
Are there any other members in struct a or should I stop parsing?
Obviously, this basic example can be further extended (and boost::fusion supplies the capability) to address much more complex cases:
Variants - let's say parser can encounter either sting or double and wants to assign it to the right member of struct a. BOOST_FUSION_ADAPT_ASSOC_STRUCT comes to the rescue (now our parser can ask questions like "which member of struct a is of type double?").
Transformations - our parser can be designed to accept certain types as parameters but the rest of the programs had changed quite a bit. Yet, fusion metafunctions can be conveniently used to adapt new types to old realities (or vice versa).
The rest of boost::fusion functionality naturally follows from the above basics. fusion really shines when there's a need for conversion (in either direction) of "loose IO data" to strongly typed/structured data C++ programs operate upon (if efficiency is of concern). It is the enabling factor behind spirit::qi and spirit::karma being such an efficient (probably the fastest) I/O frameworks .
Fusion is there as a bridge between compile-time and run-time containers and algorithms. You may or may not want to move some of your processing to compile-time, but if you do want to then Fusion might help. I don't think it has a specific manifesto to move as much as possible to compile-time, although I may be wrong.
Meta-functions return types because template meta-programming wasn't invented on purpose. It was discovered more-or-less by accident that C++ templates can be used as a compile-time programming language. A meta-function is a mapping from template arguments to instantiations of a template. As of C++03 there were are two kinds of template (class- and function-), therefore a meta-function has to "return" either a class or a function. Classes are more useful than functions, since you can put values etc. in their static data members.
C++11 adds another kind of template (for typedefs), but that is kind of irrelevant to meta-programming. More importantly for compile-time programming, C++11 adds constexpr functions. They're properly designed for the purpose and they return values just like normal functions. Of course, their input is not a type, so they can't be mappings from types to something else in the way that templates can. So in that sense they lack the "meta-" part of meta-programming. They're "just" compile-time evaluation of normal C++ functions, not meta-functions.

Tagged unions (aka variant) in C++ with the same type multiple times

I need to create an union, but 2 members of the union would have the same type, thus I need a way to identify them. For example in OCaml :
type A =
| B of int
| C of float
| D of float
Boost.Variant doesn't seem to support this case, is there a known library which supports that ?
If you want to do this, I think your best option is to wrap the same-but-different-types into a struct which then lets the boost variant visit the proper one:
struct Speed
{
float val_;
};
struct Darkness
{
float val_;
};
You might be able to use BOOST_STRONG_TYPEDEF to do this automatically but I'm not sure it's guaranteed to generate types legal for use in a union (although it would probably be fine in a variant).
You cannot at the moment but C++17's implementation of std::variant fortunately allows it:
A variant is permitted to hold the same type more than once, and to hold differently cv-qualified versions of the same type.
Unlike with the boost version, you can get values by index, something like this (not tested):
// Construct a variant with the second value set.
variant<string, string, string> s(std::in_place_index<1>, "Hello");
// Get the second value.
string first = std::get<1>(s);
Michael Park has written a C++14 implementation of C++17's std::variant.
The c++ code here:
http://svn.boost.org/svn/boost/sandbox/variadic_templates/boost/composite_storage/pack/container_one_of_maybe.hpp
is truly a tagged union in that it can contain duplicate types. One nice feature
is the tags can be enumerations; hence, the tags can have meaningful names.
Unfortunately, the compile time cost is pretty bad, I guess, because the implementation
uses recursive inheritance. OTOH, maybe compilers will eventually figure out a way
to lessen the compile time cost.
OTOH, if you want to stick with boost::variant, you could wrap the types,
as Mark B suggested. However, instead of Mark B's descriptive class names,
which require some thought, you could use fusion::pair<mpl::int_<tag>,T_tag>
where T_tag is the tag-th element in the source fusion::vector. IOW:
variant
< fusion::pair<mpl::int_<1>,T1>
, fusion::pair<mpl::int_<2>,T2>
...
, fusion::pair<mpl::int_<n>,Tn>
>
As the fusion docs:
http://www.boost.org/doc/libs/1_55_0/libs/fusion/doc/html/fusion/support/pair.html
say, fusion::pair only allocates space for the 2nd template argument; hence,
this should not take any more room than boost::variant<T1,T2,...,Tn>.
HTH.
-regards,
Larry