c++ - sorting a vector of custom structs based on frequency - c++

I need to find the most frequent element in an array of custom structs. There is no custom ID to them just matching properties.
I was thinking of sorting my vector by frequency but I have no clue how to do that.

I'm assuming by frequency you mean the number of times an identical structure appears in the array.
You probably want to make a hash function (or overload std::hash<> for your type) for your custom struct. Then iterate over your array, incrementing the value on an unordered_map<mytype, int> for every struct in the array. This will give you the frequency in the value field. Something like the below would work:
std::array<mytype> elements;
std::unordered_map<mytype, int> freq;
mytype most_frequent;
int max_frequency = 0;
for (const mytype &el : elements) {
freq[el]++;
if (freq[el] > max_frequency) {
most_frequent = el;
}
}
For this to work, the map will need to be able to create a hash for the above function. By default, it tries to use std::hash<>. You are expressly allowed by the standard to specialize this template in the standard namespace for your own types. You could do this as follows:
struct mytype {
std::string name;
double value;
};
namespace std {
template <> struct hash<mytype> {
size_t operator()(const mytype &t) const noexcept {
// Use standard library hash implementations of member variable types
return hash<string>()(t.name) ^ hash<double>()(t.value)
}
}
}
The primary goal is to ensure that any two variables that do not contain exactly the same values will generate a different hash. The above XORs the results of the standard library's hash function for each type together, which according to Mark Nelson is probably as good as the individual hashing algorithms XOR'd together. An alternative algorithm suggested by cppreference's hash reference is the Fowler-Noll-Vo hash function.

Look at std::sort and the example provided in the ref, where you actually pass your own comparator to do the trick you want (in your case, use the frequencies). Of course, a lambda function can be used too, if you wish.

Related

Using an unordered_map with arrays as keys

I don't understand why I can't have an unordered_map with an array<int,3> as the key type:
#include <unordered_map>
using namespace std;
int main() {
array<int,3> key = {0,1,2};
unordered_map< array<int,3> , int > test;
test[key] = 2;
return 0;
}
I get a long error, the most pertinent part being
main.cpp:11:9: error: no match for ‘operator[]’ (operand types are std::unordered_map<std::array<int, 3ul>, int>’ and ‘std::array<int, 3ul>’)
test[key] = 2;
^
Are arrays not eligible to be keys because they miss some requirements?
You have to implement a hash. Hash tables depending on hashing the key, to find a bucket to put them in. C++ doesn't magically know how to hash every type, and in this particular case it doesn't know how to hash an array of 3 integers by default. You can implement a simple hash struct like this:
struct ArrayHasher {
std::size_t operator()(const std::array<int, 3>& a) const {
std::size_t h = 0;
for (auto e : a) {
h ^= std::hash<int>{}(e) + 0x9e3779b9 + (h << 6) + (h >> 2);
}
return h;
}
};
And then use it:
unordered_map< array<int,3> , int, ArrayHasher > test;
Edit: I changed the function for combining hashes from a naive xor, to the function used by boost for this purpose: http://www.boost.org/doc/libs/1_35_0/doc/html/boost/hash_combine_id241013.html. This should be robust enough to actually use.
Why?
As mentioned in http://www.cplusplus.com/reference/unordered_map/unordered_map/
Internally, the elements in the unordered_map are not sorted in any
particular order with respect to either their key or mapped values,
but organized into buckets depending on their hash values to allow for
fast access to individual elements directly by their key values (with
a constant average time complexity on average).
Now as per your question we need to hash an array which has not been implemented internally in standard c++.
How to get over with it?
So if you want to map an array to a value you must implement your own std::hash http://en.cppreference.com/w/cpp/utility/hash for which you might get some help from C++ how to insert array into hash set?.
Some work around
If you are free to use boost then it can provide you with hashing of arrays and many other types. It basically uses hash_combine method for which you can have a look at http://www.boost.org/doc/libs/1_49_0/boost/functional/hash/hash.hpp.
The relevant error is
error: no match for call to '(const std::hash<std::array<int, 3ul> >) (const std::array<int, 3ul>&)'
The unordered_map needs a hash of the key, and it looks for an overload of std::hash to do that. You can extend the namespace std with a suitable hash function.
Compiled with msvc14 gives the following error:
"The C++ Standard doesn't provide a hash for this type."
I guess this is self-explanatory.

C++ How to create an array of template class? [duplicate]

I need to parse and store a somewhat (but not too) complex stream and need to store the parsed result somehow. The stream essentially contains name-value pairs with values possibly being of different type for different names. Basically, I end up with a map of key (always string) to a pair <type, value>.
I started with something like this:
typedef enum ValidType {STRING, INT, FLOAT, BINARY} ValidType;
map<string, pair<ValidType, void*>> Data;
However I really dislike void* and storing pointers. Of course, I can always store the value as binary data (vector<char> for example), in which case the map would end up being
map<string, pair<ValidType, vector<char>>> Data;
Yet, in this case I would have to parse the binary data every time I need the actual value, which would be quite expensive in terms of performance.
Considering that I am not too worried about memory footprint (the amount of data is not large), but I am concerned about performance, what would be the right way to store such data?
Ideally, I'd like to avoid using boost, as that would increase the size of the final app by a factor of 3 if not more and I need to minimise that.
You're looking for a discriminated (or tagged) union.
Boost.Variant is one example, and Boost.Any is another. Are you so sure Boost will increase your final app size by a factor of 3? I would have thought variant was header-only, in which case you don't need to link any libraries.
If you really can't use Boost, implementing a simple discriminated union isn't so hard (a general and fully-correct one is another matter), and at least you know what to search for now.
For completeness, a naive discriminated union might look like:
class DU
{
public:
enum TypeTag { None, Int, Double };
class DUTypeError {};
private:
TypeTag type_;
union {
int i;
double d;
} data_;
void typecheck(TypeTag tt) const { if(type_ != tt) throw DUTypeError(); }
public:
DU() : type_(None) {}
DU(DU const &other) : type_(other.type_), data_(other.data_) {}
DU& operator= (DU const &other) {
type_=other.type_; data_=other.data_; return *this;
}
TypeTag type() const { return type_; }
bool istype(TypeTag tt) const { return type_ == tt; }
#define CONVERSIONS(TYPE, ENUM, MEMBER) \
explicit DU(TYPE val) : type_(ENUM) { data_.MEMBER = val; } \
operator TYPE & () { typecheck(ENUM); return data_.MEMBER; } \
operator TYPE const & () const { typecheck(ENUM); return data_.MEMBER; } \
DU& operator=(TYPE val) { type_ = ENUM; data_.MEMBER = val; return *this; }
CONVERSIONS(int, Int, i)
CONVERSIONS(double, Double, d)
};
Now, there are several drawbacks:
you can't store non-POD types in the union
adding a type means modifying the enum, and the union, and remembering to add a new CONVERSIONS line (it would be even worse without the macro)
you can't use the visitor pattern with this (or, you'd have to write your own dispatcher for it), which means lots of switch statements in the client code
every one of these switches may also need updating if you add a type
if you did write a visitor dispatch, that needs updating if you add a type, and so may every visitor
you need to manually reproduce something like the built-in C++ type-conversion rules if you want to do anything like arithmetic with these (ie, operator double could promote an Int instead of only handling Double ... but only if you hand-roll every operator)
I haven't implemented operator== precisely because it needs a switch. You can't just memcmp the two unions if the types match, because identical 32-bit integers could still compare different if the extra space required for the double holds a different bit pattern
Some of these issues can be addressed if you care about them, but it's all more work. Hence my preference for not re-inventing this particular wheel if it can be avoided.
Since your data types are fixed what about something like this...
Have something like a std::vector for each type of value.
And your map would have as the second value of the pair the index to the data.
std::vector<int> vInt;
std::vector<float> vFloat;
.
.
.
map<std::string, std::pair<ValidType, int>> Data;
You can implement a multi-type map by leveraging the nifty features of std::tuple in C++11, which allows access by a type key. You can wrap this to create access by arbitrary keys. An in-depth explanation of this (and quite an interesting read) is available here:
https://jguegant.github.io/blogs/tech/thread-safe-multi-type-map.html
The modern C++ features provide create ways to solve old problems.

Find in Vector of a Struct

I made the following program where there is a struct
struct data
{
int integers; //input of integers
int times; //number of times of appearance
}
and there is a vector of this struct
std::vector<data> inputs;
and then I'll get from a file an integer of current_int
std::fstream openFile("input.txt")
int current_int; //current_int is what I want to check if it's in my vector of struct (particularly in inputs.integers)
openFile >> current_int;
and I wanna check if current_int is already stored in my vector inputs.
I've tried researching about finding data in a vector and supposedly you use an iterator like this:
it = std::find(inputs.begin(),inputs.end(),current_int)
but will this work if it's in a struct? Please help.
There are two variants of find:
find() searches for a plain value. In you case you have a vector of data, so the values passed to find() should be data.
find_if() takes a predicate, and returns the first position where the predicates returns true.
Using the latter, you can easily match one field of your struct:
auto it = std::find_if(inputs.begin(), inputs.end(),
[current_int] (const data& d) {
return d.integers == current_int;
});
Note that the above uses a C++11 lambda function. Doing this in earlier versions of C++ requires you to create a functor instead.

Store different data types in map - with info on type

I need to parse and store a somewhat (but not too) complex stream and need to store the parsed result somehow. The stream essentially contains name-value pairs with values possibly being of different type for different names. Basically, I end up with a map of key (always string) to a pair <type, value>.
I started with something like this:
typedef enum ValidType {STRING, INT, FLOAT, BINARY} ValidType;
map<string, pair<ValidType, void*>> Data;
However I really dislike void* and storing pointers. Of course, I can always store the value as binary data (vector<char> for example), in which case the map would end up being
map<string, pair<ValidType, vector<char>>> Data;
Yet, in this case I would have to parse the binary data every time I need the actual value, which would be quite expensive in terms of performance.
Considering that I am not too worried about memory footprint (the amount of data is not large), but I am concerned about performance, what would be the right way to store such data?
Ideally, I'd like to avoid using boost, as that would increase the size of the final app by a factor of 3 if not more and I need to minimise that.
You're looking for a discriminated (or tagged) union.
Boost.Variant is one example, and Boost.Any is another. Are you so sure Boost will increase your final app size by a factor of 3? I would have thought variant was header-only, in which case you don't need to link any libraries.
If you really can't use Boost, implementing a simple discriminated union isn't so hard (a general and fully-correct one is another matter), and at least you know what to search for now.
For completeness, a naive discriminated union might look like:
class DU
{
public:
enum TypeTag { None, Int, Double };
class DUTypeError {};
private:
TypeTag type_;
union {
int i;
double d;
} data_;
void typecheck(TypeTag tt) const { if(type_ != tt) throw DUTypeError(); }
public:
DU() : type_(None) {}
DU(DU const &other) : type_(other.type_), data_(other.data_) {}
DU& operator= (DU const &other) {
type_=other.type_; data_=other.data_; return *this;
}
TypeTag type() const { return type_; }
bool istype(TypeTag tt) const { return type_ == tt; }
#define CONVERSIONS(TYPE, ENUM, MEMBER) \
explicit DU(TYPE val) : type_(ENUM) { data_.MEMBER = val; } \
operator TYPE & () { typecheck(ENUM); return data_.MEMBER; } \
operator TYPE const & () const { typecheck(ENUM); return data_.MEMBER; } \
DU& operator=(TYPE val) { type_ = ENUM; data_.MEMBER = val; return *this; }
CONVERSIONS(int, Int, i)
CONVERSIONS(double, Double, d)
};
Now, there are several drawbacks:
you can't store non-POD types in the union
adding a type means modifying the enum, and the union, and remembering to add a new CONVERSIONS line (it would be even worse without the macro)
you can't use the visitor pattern with this (or, you'd have to write your own dispatcher for it), which means lots of switch statements in the client code
every one of these switches may also need updating if you add a type
if you did write a visitor dispatch, that needs updating if you add a type, and so may every visitor
you need to manually reproduce something like the built-in C++ type-conversion rules if you want to do anything like arithmetic with these (ie, operator double could promote an Int instead of only handling Double ... but only if you hand-roll every operator)
I haven't implemented operator== precisely because it needs a switch. You can't just memcmp the two unions if the types match, because identical 32-bit integers could still compare different if the extra space required for the double holds a different bit pattern
Some of these issues can be addressed if you care about them, but it's all more work. Hence my preference for not re-inventing this particular wheel if it can be avoided.
Since your data types are fixed what about something like this...
Have something like a std::vector for each type of value.
And your map would have as the second value of the pair the index to the data.
std::vector<int> vInt;
std::vector<float> vFloat;
.
.
.
map<std::string, std::pair<ValidType, int>> Data;
You can implement a multi-type map by leveraging the nifty features of std::tuple in C++11, which allows access by a type key. You can wrap this to create access by arbitrary keys. An in-depth explanation of this (and quite an interesting read) is available here:
https://jguegant.github.io/blogs/tech/thread-safe-multi-type-map.html
The modern C++ features provide create ways to solve old problems.

C++ check if element is std::vector

i would like to iterate through a vector and check if elements are vectors or strings. Also i need a way to pass different vecors to a function.
Something like this:
using namespace std;
string toCustomString(<some vector> vec) {
string ret = "";
for(size_t i = 0; i < vec.length(); ++i)
if (vec[i] == %vector%)
ret += toCustomString(vec[i]);
else //if type of vec[i] is string
ret += "foo"+vec[i]+"bar";
}
return ret;
}
Well, first i need to know how i can check correctly if vec[i] is a std::vector
Then i need to know how to define the paramater for the function to accept any kind of (multidimensional) vector
std::vector can only contain one type - that is the T in std::vector<T>, which can be accessed with the member value_type.
What you probably are looking for is template specialization:
template<typename T>
string toCustomString(std::vector<T> vec) {
// general case
}
template<>
string toCustomString<std::string>(std::vector<std::string> vec) {
// strings
}
(if you want to partially specialize it over all vectors then you'll need to lift it to a struct)
If you really want to store both strings and vectors in the vector then look at Boost.Variant and Boost.Any
Generally, your <some vector> vec would have type either vector<string> or vector<vector<string>>, for example.
In order to declare the variable, you need its type, and its type also specifies exactly what it stores.
Now, you can work around this using Boost.Variant (or roll your own discriminated union), like so:
typedef boost::variant<std::string, std::vector<std::string>> Vec_of_StringOrVec;
but Dirk Holsopple is right that this isn't idiomatic C++, and you may be better off looking for a different approach.
As everyone says, vectors in C++ only hold one type. There's no need or point in checking the type of each element in turn, which is just as well because there's no way to do that. What you do instead is overload the function on the type of the argument. Something like this:
string toCustomString(const string &str) {
return "foo" +str + "bar";
}
template <typename T>
string toCustomString(const std::vector<T> &vec) {
string ret;
for(size_t i = 0; i < vec.size(); ++i)
ret += toCustomString(vec[i]);
return ret;
}
Now, if someone passes a vector<string> into toCustomString then the call toCustomString(vec[i]) will select the toCustomString(const string &str) overload.
If someone passes a vector<int> into toCustomString then the code won't compile, because there is (currently) no toCustomString(int) overload[*].
If someone passes a vector<vector<string>> to toCustomString then toCustomString(vec[i]) will pass a vector<string>, see above.
In all three cases, different toCustomString functions are called. In the first case it's toCustomString<string>(const vector<string>&), which is a different instantiation of the toCustomString template from the third case, toCustomString<vector<string>>(const vector<vector<string>>&). The middle case tries to instantiate toCustomString<int>, but fails because toCustomString(v[i]) doesn't match any function it knows about.
All of this is determined at compile time. The point of templates is to create multiple functions (or classes) with particular differences between them. In this case the difference is the type of vector passed in.
[*] This seems in line with your claim that vec[i] must be either a vector or a string, not any third option. If you wanted for example the return value for a vector<something_else> to be empty, then you could add a catch-all template:
template <typename T>
string toCustomString(const T &) {
return string();
}
and of course you can add more overloads for any other types you want to handle.