C++ hash table w/o using STL - c++

I need to create a hash table that has a key as a string, and value as an int. I cannot use STL containers on my target. Is there a suitable hash table class for this purpose?

Here's a quick a dirty C hash I just wrote. Compiles, but untested locally. Still, the idea is there for you to run with it as needed. The performance of this is completely dependant upon the keyToHash function. My version will not be high performance, but again demonstrates how to do it.
static const int kMaxKeyLength = 31;
static const int kMaxKeyStringLength = kMaxKeyLength + 1;
struct HashEntry
{
int value;
char key[kMaxKeyLength];
};
static const char kEmptyHash[2] = "";
static const int kHashPowerofTwo = 10;
static const int kHashSize = 1 &lt&lt kHashPowerofTwo;
static const int kHashMask = kHashSize - 1;
static const int kSmallPrimeNumber = 7;
static HashEntry hashTable[kHashSize];
int keyToHash(const char key[])
{
assert(strlen(key) &lt kMaxKeyLength);
int hashValue = 0;
for(int i=0; &lt strlen(key); i++)
{
hashValue += key[i];
}
return hashValue;
}
bool hashAdd(const char key[], const int value)
{
int hashValue = keyToHash(key);
int hashFullSentinal = 0;
while(strcmp(hashTable[hashValue & kHashMask].key, kEmptyHash))
{
hashValue += kSmallPrimeNumber;
if(hashFullSentinal++ &gt= (kHashSize - 1))
{
return false;
}
}
strcpy(hashTable[hashValue & kHashMask].key, key);
hashTable[hashValue & kHashMask].value = value;
return true;
}
bool hashFind(const char key[], int *value)
{
int hashValue = keyToHash(key);
while(strcmp(hashTable[hashValue & kHashMask].key, kEmptyHash))
{
if(!strcmp(hashTable[hashValue & kHashMask].key, key))
{
*value = hashTable[hashValue & kHashMask].value;
return true;
}
}
return false;
}
bool hashRemove(const char key[])
{
int hashValue = keyToHash(key);
while(strcmp(hashTable[hashValue & kHashMask].key, kEmptyHash))
{
if(!strcmp(hashTable[hashValue & kHashMask].key, key))
{
hashTable[hashValue & kHashMask].value = 0;
hashTable[hashValue & kHashMask].key[0] = 0;
return true;
}
}
return false;
}

In the case that you know your list of keys ahead of time (or some superset thereof), you can use a perfect hash function generator like gperf. gperf will spit out either C or C++ code.
(You may need to do some work to actually build a container, given the hash function, though.)

You can use the unordered associative container from Boost, aka. boost::unordered_map, which is implemented in terms of a hash table.

It's a moot point since STL has no hash table container; std::map would be the alternative. For most purposes there is no reason not to use std::map. For uses that require a hashtable, boost::unordered_map is the best choice (and I think matches the hashtable defined in the new C++ TR1 proposed standard. Some compilers -- but I can't name them -- may provide the TR1 hashtable as std::tr1::unordered_map

You might want to check out glib hash tables
http://library.gnome.org/devel/glib/stable/glib-Hash-Tables.html

If you need maximum performance, use MCT's closed_hash_map or Google's dense_hash_map. The former is easier to use, the latter is more mature. Your use case sounds like it would benefit from closed hashing.

Related

Is it possible to get hash values as compile-time constants?

I thought I'd try selecting different options as strings by hashing them, but this doesn't work:
#include <type_traits>
#include <string>
inline void selectMenuOptionString(const std::string& str)
{
switch (std::hash<std::string>()(str))
{
case std::hash<std::string>()(std::string("Selection one")) : break;
// Expression must have a constant value
}
}
inline void selectMenuOptionString2(const std::string& str)
{
size_t selectionOneHash = std::hash<std::string>()(std::string("Selection one"));
switch (std::hash<std::string>()(str))
{
case selectionOneHash: // Expression must have a constant value
// The variable of selectionOneHash cannot be used as a constant
}
constexpr size_t hash = std::hash<int>()(6); // Expression must have a constant value
}
It seems I can't get hash values at compile time. From what I've read each different input should yield the same unique output every time, with a very low chance of collision. Given these properties couldn't the hash value be calculated at compile time? I don't know much at all about hashing, I usually use an unordered_map, but I wanted to try something new for learning's sake.
std::hash::operator() isn't constexpr, so you can't just use it. Instead, you'd have to write your own constexpr hash function. For example, the following is the FNV-1a hash algorithm (untested):
template <typename Str>
constexpr size_t hashString(const Str& toHash)
{
// For this example, I'm requiring size_t to be 64-bit, but you could
// easily change the offset and prime used to the appropriate ones
// based on sizeof(size_t).
static_assert(sizeof(size_t) == 8);
// FNV-1a 64 bit algorithm
size_t result = 0xcbf29ce484222325; // FNV offset basis
for (char c : toHash) {
result ^= c;
result *= 1099511628211; // FNV prime
}
return result;
}
And then you can use it:
int selectMenuOptionString(const std::string& str)
{
switch (hashString(str))
{
case hashString(std::string_view("Selection one")): return 42;
default: return 0;
}
}
Note that if you wrote hashString("Selection one"), it would actually hash the null terminator as well, so you might want to have an overload to catch string literals, such as:
template <size_t N>
constexpr size_t hashString(char const (&toHash)[N])
{
return hashString(std::string_view(toHash));
}
Demo
You'll need to implement your own hash function, because there's no suitable instantiation of std::hash that's constexpr. Here's a cheap-and-dirty...
EDIT: In order not to be humiliated too badly by Justin's answer, I added a 32 bit branch.
constexpr size_t hash(const char *str) {
static_assert(sizeof(size_t) == 8 || sizeof(size_t) == 4);
size_t h = 0;
if constexpr(sizeof(size_t) == 8) {
h = 1125899906842597L; // prime
} else {
h = 4294967291L;
}
int i = 0;
while (str[i] != 0) {
h = 31 * h + str[i++];
}
return h;
}
I just wanted to add this because I think it's cool. The constexpr strlen I got from a question here: constexpr strlen
#include <iostream>
#include <string>
int constexpr strlength(const char* str)
{
return *str ? 1 + strlength(str + 1) : 0;
}
size_t constexpr Hash(const char *first)
{ // FNV-1a hash function
const size_t FNVoffsetBasis = 14695981039346656037ULL;
const size_t FNVprime = 1099511628211ULL;
const size_t count = strlength(first);
size_t val = FNVoffsetBasis;
for (size_t next = 0; next < count; ++next)
{
val ^= (size_t)first[next];
val *= FNVprime;
}
return val;
}
inline void selectMenuOptionString(const std::string& str)
{
switch (Hash(str.c_str()))
{
case Hash("Selection one"): /*Do something*/ break;
case Hash("Selection two"): /*Do something*/ break;
}
}
int main()
{
static_assert(strlength("Hello") == 5, "String length not equal");
}
You can't get the hash of a runtime value at compile-time, no.
Even if you passed std::hash a constant expression, it is not defined to be able to do its hashing work at compile-time.
As far as I know (which isn't far), you'd have to come up with some monstrous template metahackery (or, worse, macros!) to do this. Personally, if your text input is known at build, I'd just pregenerate a hash outside of the code, perhaps in some Python-driven pre-build step.

How construct hash function for a user defined type?

For example, in the following struct:
1) editLine is a pointer to a data line which has CLRF,
2) nDisplayLine is the display line index of this editLine,
3) start is the offset in the display line,
4) len is the length of the text;
struct CacheKey {
const CEditLine* editLine;
int32 nDisplayLine;
int32 start;
int32 len;
friend bool operator==(const CacheKey& item1, const CacheKey& item2) {
return (item1.start == item2.start && item1.len == item2.len && item1.nDisplayLine == item2.nDisplayLine &&
item1.editLine == item2.editLine);
}
CacheKey() {
editLine = NULL;
nDisplayLine = 0;
start = 0;
len = 0;
}
CacheKey(const CEditLine* editLine, int32 dispLine, int32 start, int32 len) :
editLine(editLine), nDisplayLine(dispLine), start(start), len(len)
{
}
int hash() {
return (int)((unsigned char*)editLine - 0x10000) + nDisplayLine * nDisplayLine + start * 2 - len * 1000;
}
};
Now I need to put it into a std::unordered_map<int, CacheItem> cacheMap_
The problem is how to design the hash function for this structure, is there any guidelines?
How could i make sure the hash function is collision-free?
To create a hash function, you can use std::hash, which is defined for integers. Then, you can combine them "as the boost guys does" (because doing a good hash is something non trivial) as explained here : http://en.cppreference.com/w/cpp/utility/hash.
Here is a hash_combine method :
inline void hash_combine(std::size_t& seed, std::size_t v)
{
seed ^= v + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}
So the "guideline" is more or less what's is shown on cppreference.
You CAN'T be sure your hash function is colision free. Colision free means that you do not loose data (or you restrict yourself to a small set of possibilities for your class). If any int32 value is allowed for each fields, a collision free hash is a monstrously big index, and it won't fit in a small table. Let unordered_map take care of collisions, and combine std::hash hash as explained above.
In you case, it will look something like
std::size_t hash() const
{
std::size_t h1 = std::hash<CEditLine*>()(editLine);
//Your int32 type is probably a typedef of a hashable type. Otherwise,
// you'll have to static_cast<> it to a type supported by std::hash.
std::size_t h2 = std::hash<int32>()(nDisplayLine);
std::size_t h3 = std::hash<int32>()(start);
std::size_t h4 = std::hash<int32>()(len);
std::size_t hash = 0;
hash_combine(hash, h1);
hash_combine(hash, h2);
hash_combine(hash, h3);
hash_combine(hash, h4);
return hash;
}
Then, you can specialize the std::hash operator for your class.
namespace std
{
template<>
struct hash<CacheKey>
{
public:
std::size_t operator()(CacheKey const& s) const
{
return s.hash();
}
};
}

Efficient way to convert int to string

I'm creating a game in which I have a main loop. During one cycle of this loop, I have to convert int value to string about ~50-100 times. So far I've been using this function:
std::string Util::intToString(int val)
{
std::ostringstream s;
s << val;
return s.str();
}
But it doesn't seem to be quite efficient as I've encountered FPS drop from ~120 (without using this function) to ~95 (while using it).
Is there any other way to convert int to string that would be much more efficient than my function?
It's 1-72 range. I don't have to deal with negatives.
Pre-create an array/vector of 73 string objects, and use an index to get your string. Returning a const reference will let you save on allocations/deallocations, too:
// Initialize smallNumbers to strings "0", "1", "2", ...
static vector<string> smallNumbers;
const string& smallIntToString(unsigned int val) {
return smallNumbers[val < smallNumbers.size() ? val : 0];
}
The standard std::to_string function might be a useful.
However, in this case I'm wondering if maybe it's not the copying of the string when returning it might be as big a bottleneck? If so you could pass the destination string as a reference argument to the function instead. However, if you have std::to_string then the compiler probably is C++11 compatible and can use move semantics instead of copying.
Yep — fall back on functions from C, as explored in this previous answer:
namespace boost {
template<>
inline std::string lexical_cast(const int& arg)
{
char buffer[65]; // large enough for arg < 2^200
ltoa( arg, buffer, 10 );
return std::string( buffer ); // RVO will take place here
}
}//namespace boost
In theory, this new specialisation will take effect throughout the rest of the Translation Unit in which you defined it. ltoa is much faster (despite being non-standard) than constructing and using a stringstream.
However, I've experienced problems with name conflicts between instantiations of this specialisation, and instantiations of the original function template, between competing shared libraries.
In order to get around that, I actually just give this function a whole new name entirely:
template <typename T>
inline std::string fast_lexical_cast(const T& arg)
{
return boost::lexical_cast<std::string>(arg);
}
template <>
inline std::string my_fast_lexical_cast(const int& arg)
{
char buffer[65];
if (!ltoa(arg, buffer, 10)) {
boost::throw_exception(boost::bad_lexical_cast(
typeid(std::string), typeid(int)
));
}
return std::string(buffer);
}
Usage: std::string myString = fast_lexical_cast<std::string>(42);
Disclaimer: this modification is reverse-engineered from Kirill's original SO code, not the version that I created and put into production from my company codebase. I can't think right now, though, of any other significant modifications that I made to it.
Something like this:
const int size = 12;
char buf[size+1];
buf[size] = 0;
int index = size;
bool neg = false
if (val < 0) { // Obviously don't need this if val is always positive.
neg = true;
val = -val;
}
do
{
buf[--index] = (val % 10) + '0';
val /= 10;
} while(val);
if (neg)
{
buf[--index] = '-';
}
return std::string(&buf[index]);
I use this:
void append_uint_to_str(string & s, unsigned int i)
{
if(i > 9)
append_uint_to_str(s, i / 10);
s += '0' + i % 10;
}
If You want negative insert:
if(i < 0)
{
s += '-';
i = -i;
}
at the beginning of function.

boost::dynamic_bitset concat performance

I want to concat a big bitset with a smaller one in a way that wont kill performance. Currently my application spends 20% of cpu time in just the following code:
boost::dynamic_bitset<> encode(const std::vector<char>& data)
{
boost::dynamic_bitset<> result;
std::for_each(data.begin(), data.end(), [&](unsigned char symbol)
{
for(size_t n = 0; n < codes_[symbol].size(); ++n)
result.push_back(codes_[symbol][n]); // codes_[symbol][n].size() avarage ~5 bits
});
return result;
}
I have read this post which proposes a solution, which unfortunately will not work for me as the size difference between the sizes of destination bitset and the source bitset is very large.
Any ideas?
If this is not possible to do efficiently with boost::dynamic_bitset then I'm open for other suggestions.
This is because you keep using push_back(), but in actual fact, you already know the size in advance. This means lots of redundant copying and reallocating. You should resize it first. In addition, you don't have to push_back() every value- it should be possible for you to use some form of insert() (I don't actually know it's exact interface, but I think append() is the name) to insert the whole target vector at once, which should be significantly better.
In addition, you're leaving the dynamic_bitset as unsigned long, but as far as I can see, you're only actually inserting unsigned char into it. Changing that could make life easier for you.
I'm also curious as to what type codes_ is- if it's a map you could replace it with a vector, or infact since it's statically sized maximally (256 entries is the max of an unsigned char) , a static array.
I've tried using boost bitset in performance code before and been disappointed. I dug into it a bit, and concluded I'd be better off implementing my own bit-buffer class, although I forget the details of what convinced me boost's class was never going to be fast (I did get as far as inspecting the assembly produced).
I still don't know what the fastest way of building bit-buffers/bitsets/bitstreams or whatever you want to call them is. A colleague is trying to find out with this related question, but at time of writing it's still awaiting a good answer.
I wrote my own bitset class. I appreciate any suggestions for improvements. I will try to look into SSE and see if there is anything useful there.
With my very rough benchmark I got a 11x performance increase while appending 6 bits at a time.
class fast_bitset
{
public:
typedef unsigned long block_type;
static const size_t bits_per_block = sizeof(block_type)*8;
fast_bitset()
: is_open_(true)
, blocks_(1)
, space_(blocks_.size()*bits_per_block){}
void append(const fast_bitset& other)
{
assert(!other.is_open_);
for(size_t n = 0; n < other.blocks_.size()-1; ++n)
append(other.blocks_[n], bits_per_block);
append(other.blocks_.back() >> other.space_, bits_per_block - other.space_);
}
void append(block_type value, size_t n_bits)
{
assert(is_open_);
assert(n_bits < bits_per_block);
if(space_ < n_bits)
{
blocks_.back() = blocks_.back() << space_;
blocks_.back() = blocks_.back() | (value >> (n_bits - space_));
blocks_.push_back(value);
space_ = bits_per_block - (n_bits - space_);
}
else
{
blocks_.back() = blocks_.back() << n_bits;
blocks_.back() = blocks_.back() | value;
space_ -= n_bits;
}
}
void push_back(bool bit)
{
append(bit, 1);
}
bool operator[](size_t index) const
{
assert(!is_open_);
static const size_t high_bit = 1 << (bits_per_block-1);
const size_t block_index = index / bits_per_block;
const size_t bit_index = index % bits_per_block;
const size_t bit_mask = high_bit >> bit_index;
return blocks_[block_index] & bit_mask;
}
void close()
{
blocks_.back() = blocks_.back() << space_;
is_open_ = false;
}
size_t size() const
{
return blocks_.size()*bits_per_block-space_;
}
const std::vector<block_type>& blocks() const {return blocks_;}
class reader
{
public:
reader(const fast_bitset& bitset)
: bitset_(bitset)
, index_(0)
, size_(bitset.size()){}
bool next_bit(){return bitset_[index_++];}
bool eof() const{return index_ >= size_;}
private:
const fast_bitset& bitset_;
size_t index_;
size_t size_;
};
private:
bool is_open_;
std::vector<block_type> blocks_;
size_t space_;
};

What is the best way of comparing a string variable to a set of string constants?

if statement looks too awkward, because i need a possibility to increase the number of constatnts.
Sorry for leading you into delusion by that "constant" instead of what i meant.
Add all your constants to a std::set then you can check if the set contains your string with
std::set<std::string> myLookup;
//populate the set with your strings here
set<std::string>::size_type i;
i = myLookup.count(searchTerm);
if( i )
std::cout << "Found";
else
std::cout << "Not found";
Depends whether you care about performance.
If not, then the simplest code is probably to put the various strings in an array (or vector if you mean you want to increase the number of constants at run time). This will also be pretty fast for a small number of strings:
static const char *const strings[] = { "fee", "fie", "fo", "fum" };
static const int num_strings = sizeof(strings) / sizeof(char*);
Then either:
int main() {
const char *search = "foe";
bool match = false;
for (int i = 0; i < num_strings; ++i) {
if (std::strcmp(search, strings[i]) == 0) match = true;
}
}
Or:
struct stringequal {
const char *const lhs;
stringequal(const char *l) : lhs(l) {}
bool operator()(const char *rhs) {
return std::strcmp(lhs, rhs) == 0;
}
};
int main() {
const char *search = "foe";
std::find_if(strings, strings+num_strings, stringequal(search));
}
[Warning: I haven't tested the above code, and I've got the signatures wrong several times already...]
If you do care about performance, and there are a reasonable number of strings, then one quick option would be something like a Trie. But that's a lot of effort since there isn't one in the standard C++ library. You can get much of the benefit either using a sorted array/vector, searched with std::binary_search:
// These strings MUST be in ASCII-alphabetical order. Don't add "foo" to the end!
static const char *const strings[] = { "fee", "fie", "fo", "fum" };
static const int num_strings = sizeof(strings) / sizeof(char*);
bool stringcompare(const char *lhs, const char *rhs) {
return std::strcmp(lhs, rhs) < 0;
}
std::binary_search(strings, strings+num_strings, "foe", stringcompare);
... or use a std::set. But unless you're changing the set of strings at runtime, there is no advantage to using a set over a sorted array with binary search, and a set (or vector) has to be filled in with code whereas an array can be statically initialized. I think C++0x will improve things, with initializer lists for collections.
Put the strings to be compared in a static vector or set and then use std::find algorithm.
The technically best solution is: build a 'perfect hash function' tailored to your set of string constants, so later there are no collisions during hashing.
const char * values[]= { "foo", "bar", ..., 0 };
bool IsValue( const std::string & s ) {
int i = 0;
while( values[i] ) {
if ( s == values[i] ) {
return true;
}
i++;
}
return false;
}
Or use a std::set.