I need to parse and store a somewhat (but not too) complex stream and need to store the parsed result somehow. The stream essentially contains name-value pairs with values possibly being of different type for different names. Basically, I end up with a map of key (always string) to a pair <type, value>.
I started with something like this:
typedef enum ValidType {STRING, INT, FLOAT, BINARY} ValidType;
map<string, pair<ValidType, void*>> Data;
However I really dislike void* and storing pointers. Of course, I can always store the value as binary data (vector<char> for example), in which case the map would end up being
map<string, pair<ValidType, vector<char>>> Data;
Yet, in this case I would have to parse the binary data every time I need the actual value, which would be quite expensive in terms of performance.
Considering that I am not too worried about memory footprint (the amount of data is not large), but I am concerned about performance, what would be the right way to store such data?
Ideally, I'd like to avoid using boost, as that would increase the size of the final app by a factor of 3 if not more and I need to minimise that.
You're looking for a discriminated (or tagged) union.
Boost.Variant is one example, and Boost.Any is another. Are you so sure Boost will increase your final app size by a factor of 3? I would have thought variant was header-only, in which case you don't need to link any libraries.
If you really can't use Boost, implementing a simple discriminated union isn't so hard (a general and fully-correct one is another matter), and at least you know what to search for now.
For completeness, a naive discriminated union might look like:
class DU
{
public:
enum TypeTag { None, Int, Double };
class DUTypeError {};
private:
TypeTag type_;
union {
int i;
double d;
} data_;
void typecheck(TypeTag tt) const { if(type_ != tt) throw DUTypeError(); }
public:
DU() : type_(None) {}
DU(DU const &other) : type_(other.type_), data_(other.data_) {}
DU& operator= (DU const &other) {
type_=other.type_; data_=other.data_; return *this;
}
TypeTag type() const { return type_; }
bool istype(TypeTag tt) const { return type_ == tt; }
#define CONVERSIONS(TYPE, ENUM, MEMBER) \
explicit DU(TYPE val) : type_(ENUM) { data_.MEMBER = val; } \
operator TYPE & () { typecheck(ENUM); return data_.MEMBER; } \
operator TYPE const & () const { typecheck(ENUM); return data_.MEMBER; } \
DU& operator=(TYPE val) { type_ = ENUM; data_.MEMBER = val; return *this; }
CONVERSIONS(int, Int, i)
CONVERSIONS(double, Double, d)
};
Now, there are several drawbacks:
you can't store non-POD types in the union
adding a type means modifying the enum, and the union, and remembering to add a new CONVERSIONS line (it would be even worse without the macro)
you can't use the visitor pattern with this (or, you'd have to write your own dispatcher for it), which means lots of switch statements in the client code
every one of these switches may also need updating if you add a type
if you did write a visitor dispatch, that needs updating if you add a type, and so may every visitor
you need to manually reproduce something like the built-in C++ type-conversion rules if you want to do anything like arithmetic with these (ie, operator double could promote an Int instead of only handling Double ... but only if you hand-roll every operator)
I haven't implemented operator== precisely because it needs a switch. You can't just memcmp the two unions if the types match, because identical 32-bit integers could still compare different if the extra space required for the double holds a different bit pattern
Some of these issues can be addressed if you care about them, but it's all more work. Hence my preference for not re-inventing this particular wheel if it can be avoided.
Since your data types are fixed what about something like this...
Have something like a std::vector for each type of value.
And your map would have as the second value of the pair the index to the data.
std::vector<int> vInt;
std::vector<float> vFloat;
.
.
.
map<std::string, std::pair<ValidType, int>> Data;
You can implement a multi-type map by leveraging the nifty features of std::tuple in C++11, which allows access by a type key. You can wrap this to create access by arbitrary keys. An in-depth explanation of this (and quite an interesting read) is available here:
https://jguegant.github.io/blogs/tech/thread-safe-multi-type-map.html
The modern C++ features provide create ways to solve old problems.
Related
I need to find the most frequent element in an array of custom structs. There is no custom ID to them just matching properties.
I was thinking of sorting my vector by frequency but I have no clue how to do that.
I'm assuming by frequency you mean the number of times an identical structure appears in the array.
You probably want to make a hash function (or overload std::hash<> for your type) for your custom struct. Then iterate over your array, incrementing the value on an unordered_map<mytype, int> for every struct in the array. This will give you the frequency in the value field. Something like the below would work:
std::array<mytype> elements;
std::unordered_map<mytype, int> freq;
mytype most_frequent;
int max_frequency = 0;
for (const mytype &el : elements) {
freq[el]++;
if (freq[el] > max_frequency) {
most_frequent = el;
}
}
For this to work, the map will need to be able to create a hash for the above function. By default, it tries to use std::hash<>. You are expressly allowed by the standard to specialize this template in the standard namespace for your own types. You could do this as follows:
struct mytype {
std::string name;
double value;
};
namespace std {
template <> struct hash<mytype> {
size_t operator()(const mytype &t) const noexcept {
// Use standard library hash implementations of member variable types
return hash<string>()(t.name) ^ hash<double>()(t.value)
}
}
}
The primary goal is to ensure that any two variables that do not contain exactly the same values will generate a different hash. The above XORs the results of the standard library's hash function for each type together, which according to Mark Nelson is probably as good as the individual hashing algorithms XOR'd together. An alternative algorithm suggested by cppreference's hash reference is the Fowler-Noll-Vo hash function.
Look at std::sort and the example provided in the ref, where you actually pass your own comparator to do the trick you want (in your case, use the frequencies). Of course, a lambda function can be used too, if you wish.
I need to parse and store a somewhat (but not too) complex stream and need to store the parsed result somehow. The stream essentially contains name-value pairs with values possibly being of different type for different names. Basically, I end up with a map of key (always string) to a pair <type, value>.
I started with something like this:
typedef enum ValidType {STRING, INT, FLOAT, BINARY} ValidType;
map<string, pair<ValidType, void*>> Data;
However I really dislike void* and storing pointers. Of course, I can always store the value as binary data (vector<char> for example), in which case the map would end up being
map<string, pair<ValidType, vector<char>>> Data;
Yet, in this case I would have to parse the binary data every time I need the actual value, which would be quite expensive in terms of performance.
Considering that I am not too worried about memory footprint (the amount of data is not large), but I am concerned about performance, what would be the right way to store such data?
Ideally, I'd like to avoid using boost, as that would increase the size of the final app by a factor of 3 if not more and I need to minimise that.
You're looking for a discriminated (or tagged) union.
Boost.Variant is one example, and Boost.Any is another. Are you so sure Boost will increase your final app size by a factor of 3? I would have thought variant was header-only, in which case you don't need to link any libraries.
If you really can't use Boost, implementing a simple discriminated union isn't so hard (a general and fully-correct one is another matter), and at least you know what to search for now.
For completeness, a naive discriminated union might look like:
class DU
{
public:
enum TypeTag { None, Int, Double };
class DUTypeError {};
private:
TypeTag type_;
union {
int i;
double d;
} data_;
void typecheck(TypeTag tt) const { if(type_ != tt) throw DUTypeError(); }
public:
DU() : type_(None) {}
DU(DU const &other) : type_(other.type_), data_(other.data_) {}
DU& operator= (DU const &other) {
type_=other.type_; data_=other.data_; return *this;
}
TypeTag type() const { return type_; }
bool istype(TypeTag tt) const { return type_ == tt; }
#define CONVERSIONS(TYPE, ENUM, MEMBER) \
explicit DU(TYPE val) : type_(ENUM) { data_.MEMBER = val; } \
operator TYPE & () { typecheck(ENUM); return data_.MEMBER; } \
operator TYPE const & () const { typecheck(ENUM); return data_.MEMBER; } \
DU& operator=(TYPE val) { type_ = ENUM; data_.MEMBER = val; return *this; }
CONVERSIONS(int, Int, i)
CONVERSIONS(double, Double, d)
};
Now, there are several drawbacks:
you can't store non-POD types in the union
adding a type means modifying the enum, and the union, and remembering to add a new CONVERSIONS line (it would be even worse without the macro)
you can't use the visitor pattern with this (or, you'd have to write your own dispatcher for it), which means lots of switch statements in the client code
every one of these switches may also need updating if you add a type
if you did write a visitor dispatch, that needs updating if you add a type, and so may every visitor
you need to manually reproduce something like the built-in C++ type-conversion rules if you want to do anything like arithmetic with these (ie, operator double could promote an Int instead of only handling Double ... but only if you hand-roll every operator)
I haven't implemented operator== precisely because it needs a switch. You can't just memcmp the two unions if the types match, because identical 32-bit integers could still compare different if the extra space required for the double holds a different bit pattern
Some of these issues can be addressed if you care about them, but it's all more work. Hence my preference for not re-inventing this particular wheel if it can be avoided.
Since your data types are fixed what about something like this...
Have something like a std::vector for each type of value.
And your map would have as the second value of the pair the index to the data.
std::vector<int> vInt;
std::vector<float> vFloat;
.
.
.
map<std::string, std::pair<ValidType, int>> Data;
You can implement a multi-type map by leveraging the nifty features of std::tuple in C++11, which allows access by a type key. You can wrap this to create access by arbitrary keys. An in-depth explanation of this (and quite an interesting read) is available here:
https://jguegant.github.io/blogs/tech/thread-safe-multi-type-map.html
The modern C++ features provide create ways to solve old problems.
This is a question on what people think the best way to lay out my class structure for my problem. I am doing some numerical analysis and require certain "elements" to be integrated. So I have created a class called "BoundaryElement" like so
class BoundaryElement
{
/* private members */
public:
integrate(Point& pt);
};
The key function is 'integrate' which I need to evaluate for a whole variety of different points. What happens is that, depending on the point, I need to use a different number of integration points and weights, which are basically vectors of numbers. To find these, I have a class like so:
class GaussPtsWts
{
int numPts;
double* gaussPts;
double* gaussWts;
public:
GaussPtsWts(const int n);
GaussPtsWts(const GaussPtsWts& rhs);
~GaussPtsWts();
GaussPtsWts& operator=(const GaussPtsWts& rhs);
inline double gwt(const unsigned int i)
{
return gaussWts[i];
}
inline double gpt(const unsigned int i)
{
return gaussPts[i];
}
inline int numberGPs()
{
return numGPs;
}
};
Using this, I could theoretically create a GaussPtsWts instance for every call to the integrate function. But I know that I maybe using the same number of gauss points many times , and so I would like to cache this data. I'm not very confident on how this might be done - potentially a std::map which is a static member of the BoundaryElement class? If people could shed any light on this I would be very grateful. Thanks!
I had a similar issue once and used a map (as you suggested). What I would do is change the GaussPtsWts to contain the map:
typedef std::map<int, std::vector<std::pair<double, double>>> map_type;
Here I've taken your two arrays of the points and weights and put them into a single vector of pairs - which should apply if I remember my quadrature correctly. Feel free to make a small structure of the point and weight to make it more readable.
Then I'd create a single instance of the GaussPtsWts and store a reference to it in each BoundaryElement. Or perhaps a shared_ptr depending on how you like it. You'd also need to record how many points you are using.
When you ask for a weight, you might have something like this:
double gwt(const unsigned int numGPs, const unsigned int i)
{
map_type::const_iterator found = themap.find(numGPs);
if(found == themap.end())
calculatePoints(numGPs);
return themap[numGPs][i].first;
}
Alternatively you could mess around with templates with an integer parameter:
template <int N>
class GaussPtsWts...
I am working on an application with a message based / asynchronous agent-like architecture.
There will be a few dozen distinct message types, each represented by C++ types.
class message_a
{
long long identifier;
double some_value;
class something_else;
...//many more data members
}
Is it possible to write a macro/meta-program that would allow calculating the number of data members within the class at compile time?
//eg:
class message_b
{
long long identifier;
char foobar;
}
bitset<message_b::count_members> thebits;
I am not familiar with C++ meta programming, but could boost::mpl::vector allow me to accomplish this type of calculation?
as others already suggested, you need Boost.Fusion and its BOOST_FUSION_DEFINE_STRUCT. You'll need to define your struct once using unused but simple syntax. As result you receive required count_members (usually named as size) and much more flexibility than just that.
Your examples:
Definition:
BOOST_FUSION_DEFINE_STRUCT(
(), message_a,
(long long, identifier),
(double, some_value)
)
usage:
message_a a;
size_t count_members = message_a::size;
No, there is no way in C++ to know the names of all members or how many members are actually there.
You could store all types in a mpl::vector along in your classes but then you face the problem of how to turn them into members with appropriate names (which you cannot achieve without some macro hackery).
Using std::tuple instead of PODs is a solution that generally works but makes for incredible messy code when you actually work with the tuple (no named variables) unless you convert it at some point or have a wrapper that forwards accessors onto the tuple member.
class message {
public:
// ctors
const int& foo() const { return std::get<0>(data); }
// continue boiler plate with const overloads etc
static std::size_t nun_members() { return std::tuple_size<data>::value; }
private:
std::tuple<int, long long, foo> data;
};
A solution with Boost.PP and MPL:
#include <boost/mpl/vector.hpp>
#include <boost/mpl/at.hpp>
#include <boost/preprocessor.hpp>
#include <boost/preprocessor/arithmetic/inc.hpp>
struct Foo {
typedef boost::mpl::vector<int, double, long long> types;
// corresponding type names here
#define SEQ (foo)(bar)(baz)
#define MACRO(r, data, i, elem) boost::mpl::at< types, boost::mpl::int_<i> >::type elem;
BOOST_PP_SEQ_FOR_EACH_I(MACRO, 0, SEQ)
};
int main() {
Foo a;
a.foo;
}
I didn't test it so there could be bugs.
There are several answers simply saying that it is not possible, and if you hadn't linked to magic_get I would've agreed with them. But magic_get shows, to my amazement, that it actually is possible in some cases. This goes to show that proving that something is not possible is harder than proving that something is possible!
The short answer to your question would be to use the facilities in magic_get directly rather than reimplement them yourself. After all, even looking at the pre-Boost version of the code, it's not exactly clear how it works. At one point in the comments it mentions something about constructor arguments; I suspect this is the key, because it is possible to count the arguments to a regular function, so perhaps it is counting the number of arguments needed to brace-initialise the struct. This indicates that it may only be possible with plain old structs rather than objects with your own methods.
Despite all this, I would suggest using a reflection library as others have suggested. A good one that I often recommend is Google's protobuf library, which has reflection and serialisation along with multi-language support. However, it is intended only for data-only objects (like plain old structs but with vectors and strings).
Plain structs do not support counting members, but boost::fusion offers a good way to declare a struct that is count- and iteratable.
Something like this might get you closer:
struct Foo {
Foo() : a(boost::get<0>(values)), b(boost::get<1>(values)) {}
int &a;
float &b;
typedef boost::tuple<int,float> values_t;
values_t values;
};
If your types respect some properties ("SimpleAggregate"), you might use magic_get (which is now boost_pfr) (from C++14/C++17).
So you will have something like:
class message_b
{
public;
long long identifier;
char foobar;
};
static_assert(boost::pfr::tuple_size<message_b>::value == 2);
As most programmers I admire and try to follow the principles of Literate programming, but in C++ I routinely find myself using std::pair, for a gazillion common tasks. But std::pair is, IMHO, a vile enemy of literate programming...
My point is when I come back to code I've written a day or two ago, and I see manipulations of a std::pair (typically as an iterator) I wonder to myself "what did iter->first and iter->second mean???".
I'm guessing others have the same doubts when looking at their std::pair code, so I was wondering, has anyone come up with some good solutions to recover literacy when using std::pair?
std::pair is a good way to make a "local" and essentially anonymous type with essentially anonymous columns; if you're using a certain pair over so large a lexical space that you need to name the type and columns, I'd use a plain struct instead.
How about this:
struct MyPair : public std::pair < int, std::string >
{
const int& keyInt() { return first; }
void keyInt( const int& keyInt ) { first = keyInt; }
const std::string& valueString() { return second; }
void valueString( const std::string& valueString ) { second = valueString; }
};
It's a bit verbose, however using this in your code might make things a little easier to read, eg:
std::vector < MyPair > listPairs;
std::vector < MyPair >::iterator iterPair( listPairs.begin() );
if ( iterPair->keyInt() == 123 )
iterPair->valueString( "hello" );
Other than this, I can't see any silver bullet that's going to make things much clearer.
typedef std::pair<bool, int> IsPresent_Value;
typedef std::pair<double, int> Price_Quantity;
...you get the point.
You can create two pairs of getters (const and non) that will merely return a reference to first and second, but will be much more readable. For instance:
string& GetField(pair& p) { return p.first; }
int& GetValue(pair& p) { return p.second; }
Will let you get the field and value members from a given pair without having to remember which member holds what.
If you expect to use this a lot, you could also create a macro that will generate those getters for you, given the names and types: MAKE_PAIR_GETTERS(Field, string, Value, int) or so. Making the getters straightforward will probably allow the compiler to optimize them away, so they'll add no overhead at runtime; and using the macro will make it a snap to create those getters for whatever use you make of pairs.
You could use boost tuples, but they don't really alter the underlying issue: Do your really want to access each part of the pair/tuple with a small integral type, or do you want more 'literate' code. See this question I posted a while back.
However, boost::optional is a useful tool which I've found replaces quite a few of the cases where pairs/tuples are touted as ther answer.
Recently I've found myself using boost::tuple as a replacement for std::pair. You can define enumerators for each member and so it's obvious what each member is:
typedef boost::tuple<int, int> KeyValueTuple;
enum {
KEY
, VALUE
};
void foo (KeyValueTuple & p) {
p.get<KEY> () = 0;
p.get<VALUE> () = 0;
}
void bar (int key, int value)
{
foo (boost:tie (key, value));
}
BTW, comments welcome on if there is a hidden cost to using this approach.
EDIT: Remove names from global scope.
Just a quick comment regarding global namespace. In general I would use:
struct KeyValueTraits
{
typedef boost::tuple<int, int> Type;
enum {
KEY
, VALUE
};
};
void foo (KeyValueTuple::Type & p) {
p.get<KeyValueTuple::KEY> () = 0;
p.get<KeyValueTuple::VALUE> () = 0;
}
It does look to be the case that boost::fusion does tie the identity and value closer together.
As Alex mentioned, std::pair is very convenient but when it gets confusing create a structure and use it in the same way, have a look at std::pair code, it's not that complex.
I don't like std::pair as used in std::map either, map entries should have had members key and value.
I even used boost::MIC to avoid this. However, boost::MIC also comes with a cost.
Also, returning a std::pair results in less than readable code:
if (cntnr.insert(newEntry).second) { ... }
???
I also found that std::pair is commonly used by the lazy programmers who needed 2 values but didn't think why these values where needed together.