Proper way to use large amount of known constant variables - c++

The program receives a vector that represents a character.
It then compares the received vector with all the known vectors that represents characters.
I'm not sure how should I use the known vectors.
A few options I thought of:
1) Using global variables:
vector<int> charA{1,2,3,4,5};
vector<int> charB{5,3,7,1};
...
vector<int> charZ{3,2,5,6,8,9,0}
char getLetter(const vector<int> &input){
if(compareVec(input,charA) return 'A';
if(compareVec(input,charB) return 'B';
....
if(compareVec(input,charZ) return 'Z';
}
2) Declaring all variables in function:
char getLetter(const vector<int> &input){
vector<int> charA{1,2,3,4,5};
vector<int> charB{5,3,7,1};
...
vector<int> charZ{3,2,5,6,8,9,0}
if(compareVec(input,charA) return 'A';
if(compareVec(input,charB) return 'B';
....
if(compareVec(input,charZ) return 'Z';
}
3) Passing the variables
char getLetter(const vector<int> &input, vector<int> charA,
vector<int> charB... , vecotr<int> charZ){
if(compareVec(input,charA) return 'A';
if(compareVec(input,charB) return 'B';
....
if(compareVec(input,charZ) return 'Z';
}

This sounds like an application for a perfect hash generator (link to GNU gperf).
To quote the documentation
gperf is a perfect hash function generator written in C++. It
transforms an n element user-specified keyword set W into a perfect
hash function F. F uniquely maps keywords in W onto the range 0..k,
where k >= n-1. If k = n-1 then F is a minimal perfect hash function.
gperf generates a 0..k element static lookup table and a pair of C
functions. These functions determine whether a given character string
s occurs in W, using at most one probe into the lookup table.
If this is not a suitable solution then I'd recommend using function statics. You want to avoid function locals as this will badly affect performance, and globals will pollute your namespace.
So something like
char getLetter(const vector<int> &input){
static vector<int> charA{1,2,3,4,5};
static vector<int> charB{5,3,7,1};

Giving you snippet, I'd go for:
char getLetter(const vector<int> &input)
{
struct
{
char result;
std::vector<char> data;
} const data[]=
{
{ 'A', {1,2,3,4,5}, },
{ 'B', {5,3,7,1}, },
...
};
for(auto const & probe : data)
{
if (comparevec(input, probe.data))
return probe.result;
}
// input does not match any of the given values
throw "That's not the input I'm looking for!";
}
For 40 such pairs, if this is not called in a tight inner loop, the linear search is good enough.
Alternatives:
use a std::map<std::vector<char>, char> to map valid values to results, and turn compareVec into a functor suitable as key-comaprison for the map, and initialize it the same way.
as above, but use a std::unordered_map.
use gperf, as suggested by #PaulFloyd above

I would start by suggesting that you hash or represent the numbers in their binary collection so that you are not comparing vectors each time as that would prove very costly. That said, your question is about how to make a dictionary, so whether you improve your keys as I suggested or not, I'd prefer the use of a map:
map<vector<int>, char, function<bool(const vector<int>&, const vector<int>&)>> dictionary([](const auto& lhs, const auto& rhs){
const auto its = mismatch(cbegin(lhs), cend(lhs), cbegin(rhs), cend(rhs));
return its.second != cend(rhs) && (its.first == cend(lhs) || *its.first < *its.second);
});
If possible dictionary should be constructed constant with an initializer_list containing all mappings and the comparator. If mappings must be looked up before you are guaranteed to have finished all letters then you obviously can't construct constant. Either way this map should be a private member of the class responsible for translating strings. Adding and mapping should be public functions of the class.
Live Example

Related

map comparator for pair of objects in c++

I want to use a map to count pairs of objects based on member input vectors. If there is a better data structure for this purpose, please tell me.
My program returns a list of int vectors. Each int vector is the output of a comparison between two int vectors ( a pair of int vectors). It is, however, possible, that the output of the comparison differs, though the two int vectors are the same (maybe in different order). I want to store how many different outputs (int vectors) each pair of int vectors has produced.
Assuming that I can access the int vector of my object with .inp()
Two pairs (a1,b1) and (a2,b2) should be considered equal, when (a1.inp() == a2.inp() && b2.inp() == b1.inp()) or (a1.inp() == b2.inp() and b1.inp() == a2.inp()).
This answer says:
The keys in a map a and b are equivalent by definition when neither a
< b nor b < a is true.
class SomeClass
{
vector <int> m_inputs;
public:
//constructor, setter...
vector<int> inp() {return m_inputs};
}
typedef pair < SomeClass, SomeClass > InputsPair;
typedef map < InputsPair, size_t, MyPairComparator > InputsPairCounter;
So the question is, how can I define equivalency of two pairs with a map comparator. I tried to concatenate the two vectors of a pair, but that leads to (010,1) == (01,01), which is not what I want.
struct MyPairComparator
{
bool operator() (const InputsPair & pair1, const InputsPair pair2) const
{
vector<int> itrc1 = pair1.first->inp();
vector<int> itrc2 = pair1.second->inp();
vector<int> itrc3 = pair2.first->inp();
vector<int> itrc4 = pair2.second->inp();
// ?
return itrc1 < itrc3;
}
};
I want to use a map to count pairs of input vectors. If there is a better data structure for this purpose, please tell me.
Using std::unordered_map can be considered instead due to 2 reasons:
if hash implemented properly it could be faster than std::map
you only need to implement hash and operator== instead of operator<, and operator== is trivial in this case
Details on how implement hash for std::vector can be found here. In your case possible solution could be to join both vectors into one, sort it and then use that method to calculate the hash. This is straightforward solution, but can produce to many hash collisions and lead to worse performance. To suggest better alternative would require knowledge of the data used.
As I understand, you want:
struct MyPairComparator
{
bool operator() (const InputsPair& lhs, const InputsPair pair2) const
{
return std::minmax(std::get<0>(lhs), std::get<1>(lhs))
< std::minmax(std::get<0>(rhs), std::get<1>(rhs));
}
};
we order the pair {a, b} so that a < b, then we use regular comparison.

C++ using an unordered key combination for a map lookup

I want to create an unordered_map, where the key is a combination of two integers. As the key values order shall be ignored when comparing, I thought of using an unordered_set as key like this:
#include <unordered_set>
#include <unordered_map>
using namespace std;
int main ()
{
unordered_set<int> key_set1 = {21, 42};
unordered_map<unordered_set<int>, char> map;
map[key_set1] = 'a';
...
unordered_set<int> key_set2 = {42, 21};
if(map[key_set2] == map[key_set2])
success();
}
On compile time it looks like some problem with the hash function:
error: no match for call to ‘(const std::hash<std::unordered_set<int> >) (const std::unordered_set<int>&)’
noexcept(declval<const _Hash&>()(declval<const _Key&>()))>
How can I solve this? Or is there a better way/data structure?
There is no predefined hash function for an unordered_set so you have to implement your own; there's documentation for that here http://en.cppreference.com/w/cpp/utility/hash.
Basically you'd need:
// custom specialization of std::hash can be injected in namespace std
namespace std
{
template<> struct hash<unordered_set<int>>
{
std::size_t operator()(unordered_set<int> const& s) const
{
std::size_t hash = 0;
for (auto && i : s) hash ^= std::hash<int>()(i);
return hash;
}
};
}
Now xor isn't the recommended way to combine hash functions, but it should work in this case specifically because it's both unordered and set. Because it's unordered you need a function that's commutative. The recommended hash combiners don't have this property as you usually want "abc" to hash differently than "bca". Secondly the fact that it's a set insures that you won't have any duplicate elements. This saves your hash function from failing because x ^ x == 0.
I should also mention that you want to define this in the cpp file so you don't expose this specific hash implementation on std types to everyone.
The problem is that unordered_set is not built for being used as a key in an unordered container.
If you always use exactly two ints, it would be more economical for you to use a pair of ints as a key, and add a function that makes a properly ordered pair from two integers:
pair<int,int> unordered_key(int a, int b) {
return a<b?make_pair(a, b):make_pair(b, a);
}
As pointed out earlier to use std::pair directly as a key you would need to explicitly define a hash function for it. If you want to avoid that, you can just do a bit-wise combination of 2 unsigned integers into 1:
uint64_t makeKey(uint32_t a, uint32_t b)
{
return a < b ? (static_cast<uint64_t>(a) << 32) + b : (static_cast<uint64_t>(b) << 32) + a;
}
int main ()
{
auto key_set1 = makeKey(21, 42);
unordered_map<uint64_t, char> map;
map[key_set1] = 'a';
//...
auto key_set2 = makeKey(42, 21);
if(map[key_set1] == map[key_set2])
std::cout << "success" << std::endl;
}
Since the order is not important here, you can use std::pair with a customized factory to force the order of the two integers:
std::pair<int, int> make_my_pair(int x, int y) {
return std::make_pair(std::min(x, y), std::max(x, y));
}
Of course this is only going to work if you use make_my_pair consistently.
Alternatively you can define your own key class that has a similar property.

hash function for a vector of pair<int, int>

I'm trying to implement an unordered_map for a vector< pair < int,int> >. Since there's no such default hash function, I tried to imagine a function of my own :
struct ObjectHasher
{
std::size_t operator()(const Object& k) const
{
std::string h_string("");
for (auto i = k.vec.begin(); i != k.vec.end(); ++i)
{
h_string.push_back(97+i->first);
h_string.push_back(47); // '-'
h_string.push_back(97+i->second);
h_string.push_back(43); // '+'
}
return std::hash<std::string>()(h_string);
}
};
The main idea is to change the list of integers, say ( (97, 98), (105, 107) ) into a formatted string like "a-b+i-k" and to compute its hash thanks to hash < string >(). I choosed the 97, 48 and 43 numbers only to allow the hash string to be easily displayed in a terminal during my tests.
I know this kind of function might be a very naive idea since a good hash function should be fast and strong against collisions. Well, if the integers given to push_back() are greater than 255 I don't know what might happen... So, what do you think of the following questions :
(1) is my function ok for big integers ?
(2) is my function ok for all environments/platforms ?
(3) is my function too slow to be a hash function ?
(4) ... do you have anything better ?
All you need is a function to "hash in" an integer. You can steal such a function from boost:
template <class T>
inline void hash_combine(std::size_t& seed, const T& v)
{
std::hash<T> hasher;
seed ^= std::hash<T>(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}
Now your function is trivial:
struct ObjectHasher
{
std::size_t operator()(const Object& k) const
{
std::size_t hash = 0;
for (auto i = k.vec.begin(); i != k.vec.end(); ++i)
{
hash_combine(hash, i->first);
hash_combine(hash, i->second);
}
return hash;
}
};
This function is is probably very slow compared to other hash functions since it uses dynamic memory allocation. Also std::hash<std::string> Is not a very good hash function since it is very general. It's probably better to XOR all ints and use std::hash<int>.
This is a perfectly valid solution. All a hash function needs is a sequence of bytes and by concatenating your elements together as a string you are providing a unique byte representation of the map.
Of course this could become unruly if your map contains a large number of items.

How to use std::string as key in stxxl::map

I am trying to use std::string as a key in the stxxl::map
The insertion was fine for small number of strings about 10-100.
But while trying to insert large number of strings about 100000 in it, I am getting segmentation fault.
The code is as follows:
struct CompareGreaterString {
bool operator () (const std::string& a, const std::string& b) const {
return a > b;
}
static std::string max_value() {
return "";
}
};
// template parameter <KeyType, DataType, CompareType, RawNodeSize, RawLeafSize, PDAllocStrategy (optional)>
typedef stxxl::map<std::string, unsigned int, CompareGreaterString, DATA_NODE_BLOCK_SIZE, DATA_LEAF_BLOCK_SIZE> name_map;
name_map strMap((name_map::node_block_type::raw_size)*3, (name_map::leaf_block_type::raw_size)*3);
for (unsigned int i = 0; i < 1000000; i++) { /// Inserting 1 million strings
std::stringstream strStream;
strStream << (i);
Console::println("Inserting: " + strStream.str());
strMap[strStream.str()]=i;
}
In here I am unable to identify why I am unable to insert more number of strings. I am getting segmentation fault exactly while inserting "1377". Plus I am able to add any number of integers as key. I feel that the variable size of string might be causing this trouble.
Also I am unable to understand what to return for max_value of the string. I simply returned a blank string.
According to documentation:
CompareType must also provide a static max_value method, that returns a value of type KeyType that is larger than any key stored in map
Because empty string happens to compare as smaller than any other string, it breaks this precondition and may thus cause unspecified behaviour.
Here's a max_value that should work. MAX_KEY_LEN is just an integer which is larger or equal to the length of the longest possible string key that the map can have.
struct CompareGreaterString {
// ...
static std::string max_value() {
return std::string(MAX_KEY_LEN, std::numeric_limits<unsigned char>::max());
}
};
I have finally found the solution to my problem with great help from Timo bingmann, user2079303 and Martin Ba. Thank you.
I would like to share it with you.
Firstly stxxl supports POD only. That means it stores fixed sized structures only. Hence std::string cannot be a key. stxxl::map worked for about 100-1000 strings because they were contained in the physical memory itself. When more strings are inserted it has to write on disk which is internally causing some problems.
Hence we need to use a fixed string using char[] as follows:
static const int MAX_KEY_LEN = 16;
class FixedString {
public:
char charStr[MAX_KEY_LEN];
bool operator< (const FixedString& fixedString) const {
return std::lexicographical_compare(charStr, charStr+MAX_KEY_LEN,
fixedString.charStr, fixedString.charStr+MAX_KEY_LEN);
}
bool operator==(const FixedString& fixedString) const {
return std::equal(charStr, charStr+MAX_KEY_LEN, fixedString.charStr);
}
bool operator!=(const FixedString& fixedString) const {
return !std::equal(charStr, charStr+MAX_KEY_LEN, fixedString.charStr);
}
};
struct comp_type : public std::less<FixedString> {
static FixedString max_value()
{
FixedString s;
std::fill(s.charStr, s.charStr+MAX_KEY_LEN, 0x7f);
return s;
}
};
Please note that all the operators mainly((), ==, !=) need to be overriden for all the stxxl::map functions to work
Now we may define fixed_name_map for map as follows:
typedef stxxl::map<FixedString, unsigned int, comp_type, DATA_NODE_BLOCK_SIZE, DATA_LEAF_BLOCK_SIZE> fixed_name_map;
fixed_name_map myFixedMap((fixed_name_map::node_block_type::raw_size)*5, (fixed_name_map::leaf_block_type::raw_size)*5);
Now the program is compiling fine and is accepting about 10^8 strings without any problem.
also we can use myFixedMap like std::map itself. {for ex: myFixedMap[fixedString] = 10}
If you are using C++11, then as an alternative to the FixedString class you could use std::array<char, MAX_KEY_LEN>. It is an STL layer on top of an ordinary fixed-size C array, implementing comparisons and iterators as you are used to from std::string, but it's a POD type, so STXXL should support it.
Alternatively, you can use serialization_sort in TPIE. It can sort elements of type std::pair<std::string, unsigned int> just fine, so if all you need is to insert everything in bulk and then access it in bulk, this will be sufficient for your case (and probably faster depending on the exact case).

How to use a hash_map with case insensitive unicode string for key?

I'm very new to STL, and pretty new to C++ in general. I'm trying to get the equivalent of a .NET Dictionary<string, value>(StringComparer.OrdinalIgnoreCase) but in C++. This is roughly what I'm trying:
stdext::hash_map<LPCWSTR, SomeStruct> someMap;
someMap.insert(stdext::pair<LPCWSTR, SomeStruct>(L"a string", struct));
someMap.find(L"a string")
someMap.find(L"A STRING")
The trouble is, neither find operation usually works (it returns someMap.end()). It seems to sometimes work, but most of the time it doesn't. I'm guessing that the hash function the hash_map is using is hashing the memory address of the string instead of the content of the string itself, and it's almost certainly not case insensitive.
How can I get a dictionary-like structure that uses case-insensitive keys and can store my custom struct?
The hash_map documentation you link to indicates that you can supply your own traits class as a third template parameter. This must satisfy the same interface as hash_compare.
Scanning the docs, I think that what you have to do is this, which basically replaces the use of StringComparer.OrdinalIgnoreCase you had in your Dictionary:
struct my_hash_compare {
const size_t bucket_size = 4;
const size_t min_buckets = 8;
size_t operator()(const LPCWSTR &Key) const {
// implement a case-insensitive hash function here,
// or find something in the Windows libraries.
}
bool operator()(const LPCWSTR &Key1, const LPCWSTR &Key2) const {
// implement a case-insensitive comparison function here
return _wcsicmp(Key1, Key2) < 0;
// or something like that. There's warnings about
// locale plastered all over this function's docs.
}
};
I'm worried though that the docs say that the comparison function has to be a total order, not a strict weak order as is usual for sorted containers in the C++ standard libraries. If MS really means a total order, then the hash_map might rely on it being consistent with operator==. That is, they might require that if my_hash_compare()(a,b) is false, and my_hash_compare()(b,a) is false, then a == b. Obviously that's not true for what I've written, in which case you're out of luck.
As an alternative, which in any case is probably more efficient, you could push all the keys to a common case before using them in the map. A case-insensitive comparison is more costly than a regular string comparison. There's some Unicode gotcha to do with that which I can never quite remember, though. Maybe you have to convert -> lowercase -> uppercase, instead of just -> uppercase, or something like that, in order to avoid some nasty cases in certain languages or with titlecase characters. Anyone?
Also as other people said, you might not really want LPCWSTR as your key. This will store pointers in the map, which means that anyone who inserts a string has to ensure that the data it points to remains valid as long as it's in the hash_map. It's often better in the long run for hash_map to keep a copy of the key string passed to insert, in which case you should use wstring as the key.
There was some great information given here. I gathered bits and pieces from the answers and put this one together:
#include "stdafx.h"
#include "atlbase.h"
#include <map>
#include <wchar.h>
typedef std::pair<std::wstring, int> MyPair;
struct key_comparer
{
bool operator()(std::wstring a, std::wstring b) const
{
return _wcsicmp(a.c_str(), b.c_str()) < 0;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
std::map<std::wstring, int, key_comparer> mymap;
mymap.insert(MyPair(L"GHI",3));
mymap.insert(MyPair(L"DEF",2));
mymap.insert(MyPair(L"ABC",1));
std::map<std::wstring, int, key_comparer>::iterator iter;
iter = mymap.find(L"def");
if (iter == mymap.end()) {
printf("No match.\n");
} else {
printf("match: %i\n", iter->second);
}
return 0;
}
If you use an std::map instead of the non-standard hash_map, you can set the comparison function to be used when doing the binary search:
// Function object for case insensitive comparison
struct case_insensitive_compare
{
case_insensitive_compare() {}
// Function objects overloader operator()
// When used as a comparer, it should function as operator<(a,b)
bool operator()(const std::string& a, const std::string& b) const
{
return to_lower(a) < to_lower(b);
}
std::string to_lower(const std::string& a) const
{
std::string s(a);
std::for_each(s.begin(), s.end(), char_to_lower);
return s;
}
void char_to_lower(char& c) const
{
if (c >= 'A' && c <= 'Z')
c += ('a' - 'A');
}
};
// ...
std::map<std::string, std::string, case_insensitive_compare> someMap;
someMap["foo"] = "Hello, world!";
std::cout << someMap["FOO"] << endl; // Hello, world!
LPCWSTR is a pointer to a null-terminated array of unicode characters and probably not what you want in this case. Use the wstring specialization of basic_string instead.
For case-insensitivity, you would need to convert the keys to all upper case or all lower case before you insert and search. At least I don't think you can do it any other way.