Simple hashmap implementation in C++ - c++

I'm relatively new to C++. In Java, it's easy for me to instantiate and use a hashmap. I'd like to know how to do it in a simple way in C++, since I saw many different implementations and none of them looked simple to me.

Most compilers should define std::hash_map for you; in the coming C++0x standard, it will be part of the standard library as std::unordered_map. The STL Page on it is fairly standard. If you use Visual Studio, Microsoft has a page on it.
If you want to use your class as the value, not as the key, then you don't need to do anything special. All primitive types (things like int, char, bool and even char *) should "just work" as keys in a hash_map. However, for anything else you will have to define your own hashing and equality functions and then write "functors" that wrap them in a class.
Assuming your class is called MyClass and you have already defined:
size_t MyClass::HashValue() const { /* something */ }
bool MyClass::Equals(const MyClass& other) const { /* something */ }
You will need to define two functors to wrap those methods in objects.
struct MyClassHash {
size_t operator()(const MyClass& p) const {
return p.HashValue();
}
};
struct MyClassEqual {
bool operator()(const MyClass& c1, const MyClass& c2) const {
return c1.Equals(c2);
}
};
And instantiate your hash_map/hash_set as:
hash_map<MyClass, DataType, MyClassHash, MyClassEqual> my_hash_map;
hash_set<MyClass, MyClassHash, MyClassEqual> my_hash_set;
Everything should work as expected after that.

Using hashmaps in C++ is easy! It's like using standard C++ map. You can use your's compiler/library implementation of unordered_map or use the one provided by boost, or some other vendor. Here's a quick sample. You will find more if you follow the links you were given.
#include <unordered_map>
#include <string>
#include <iostream>
int main()
{
typedef std::tr1::unordered_map< std::string, int > hashmap;
hashmap numbers;
numbers["one"] = 1;
numbers["two"] = 2;
numbers["three"] = 3;
std::tr1::hash< std::string > hashfunc = numbers.hash_function();
for( hashmap::const_iterator i = numbers.begin(), e = numbers.end() ; i != e ; ++i ) {
std::cout << i->first << " -> " << i->second << " (hash = " << hashfunc( i->first ) << ")" << std::endl;
}
return 0;
}

Take a look at boost.unordered, and its data structure.

Try boost's unordered classes.

Check out Simple Hash Map (Hash Table) Implementation in C++ for a basic Hash Table with generic type key-value pairs and separate chaining strategy.

Related

Is there any way to hook insertion and deletion operations for the std containers?

Let's say, we are going to subclass the std::map and we need to catch all insertions and deletions to/from the container. For example, in order to save some application-specific information about the keys present in the container.
What's the easiest way to do this, if at all possible?
Probably, the most obvious way to do this is to override all methods and operators that perform the insertion and deletion. But I think, something may be easily lost sight of on this way, isn't it?
There is no way to do that in the general case. Inheritance is not a good idea because std::map is not polymorphic and no virtual dispatch will happen when you use a pointer to a map. You might as well use a simple wrapper class at that point and save yourself a lot of hassle:
#include <iostream>
#include <map>
template <class Key, class Value>
struct Map {
private:
std::map<Key, Value> _data;
public:
template <class Y, class T>
void insert(Y &&key, T &&val) {
std::cout << "[" << key << "] = " << val << "\n";
_data.insert_or_assign(std::forward<Y>(key), std::forward<T>(val));
}
void remove(Key const &key) {
auto const it = _data.find(key);
if (it == _data.end())
return;
std::cout << "[" << key << "] -> removed\n";
_data.erase(it);
}
Value *get(Key const &key) {
auto const it = _data.find(key);
if (it == _data.end())
return nullptr;
return &it->second;
}
};
int main() {
Map<int, char const *> map;
map.insert(10, "hello");
map.insert(1, "world");
map.remove(1);
map.remove(10);
map.remove(999);
}
Short answer: No
C++ standard library data structures were not designed to support this use case. You may subclass and try to override but this will not work as you'd expect. In fact you'll get an error at compile time if you do it properly with the help of the keyword override. The problem is that std::map methods are not virtual so they don't support so called late binding. Functions that work with references and pointers to std::map will keep using std::map methods even in the case of passing instances of your std::map subclass.
Your only option is to create a completely new class your_map with a subset of requred methods of std::map and to delegate the job to an inner instance of std::map as shown in Ayxan Haqverdili's answer. Unfortunately this solution requires you to change the signature of functions working with your code replacing std::map & arguments with your_map & which may not be always possible.

Is there any way to iterate through a struct?

I would like to iterate through a struct which is defined in other library whose source is not under my control. So any lib which requires to define the struct with its own macros/adaptors like previous questions is not usable here. I found the closest way is using boost::hana. However, it still requires to fill up an adaptor before I can iterate through it. I attached an example here. I wonder is there any way I can automate the BOOST_HANA_ADAPT_STRUCT then I do not need to fill up all the struct member names in there (those structs in total have more than hundred members).
#include <iostream>
#include <boost/hana.hpp>
#include <typeinfo>
namespace hana=boost::hana;
struct adapt_test
{
std::string name;
int data;
};
BOOST_HANA_ADAPT_STRUCT(
adapt_test
, name
, data
);
auto names = hana::transform(hana::accessors<adapt_test>(), hana::first);
int main() {
hana::for_each(
names,
[] (auto item)
{
std::cout << hana::to<char const *>(item) << std::endl;
}
);
adapt_test s1{"a", 2};
hana::for_each(
s1,
[] (auto pair)
{
std::cout << hana::to<char const *>(hana::first(pair)) << "=" << hana::second(pair) << std::endl;
}
);
return 0;
}
You can use Boost Flat Reflection like:
struct adapt_test
{
std::string name;
int data;
};
adapt_test s1{"a", 2};
std::cout << boost::pfr::get<0>(s1) << std::endl;
std::cout << boost::pfr::get<1>(s1) << std::endl;
boost::pfr::flat_for_each_field(s1, [] (const auto& field) { std::cout << field << std::endl; } );
P.S. Respect for #apolukhin for this library.
The basic answer to your question is no.
C++ does not treat identifiers as string literal (it could be indeed useful in some cases), and there is no bridge unfortunately between these kind of strings.
Hopefully, some standard one day will bring this ability, relieving us from having to go through macros or code generation, or maybe doing differently like this: telling "please treat my struct A { int x, y; } as a pair", where the meaning would be to match type of first and second to the members x and y and then building the types so that it works, it would be really useful for tuples as well. A kind of structured template matching.
Currently the best that can be done to my knowledge is to match structs to tuple without the names (as of C++17) because of the above limitation, such as with boost::hana or boost::fusion as you do.

Does C++ have ordered hash?

Perl has a structure called "ordered hash" Tie::IxHash. One can use it as a hashtable/map. The entries are in the order of insertion.
Wonder if there is such a thing in C++.
Here is a sample Perl snippet:
use Tie::IxHash;
tie %food_color, "Tie::IxHash";
$food_color{Banana} = "Yellow";
$food_color{Apple} = "Green";
$food_color{Lemon} = "Yellow";
print "In insertion order, the foods are:\n";
foreach $food (keys %food_color) {
print " $food\n"; #will print the entries in order
}
Update 1
As #kerrek-sb pointed out, one can use Boost Multi-index Containers Library. Just wonder if it is possible to do it with STL.
Yes and no. No, there's no one that that's specifically intended to provide precisely the same functionality. But yes, you can do the same in a couple of different ways. If you expect to access the data primarily in the order inserted, then the obvious way to go would be a simple vector of pairs:
std::vector<std::string, std::string> food_colors;
food_colors.push_back({"banana", "yellow"});
food_colors.push_back({"apple", "green"});
food_colors.push_back({"lemon", "yellow"});
for (auto const &f : food_colors)
std::cout << f.first << ": " << f.second << "\n";
This preserves order by simply storing the items in order. If you need to access them by key, you can use std::find to do a linear search for a particular item. That minimizes extra memory used, at the expense of slow access by key if you get a lot of items.
If you want faster access by key with a large number of items, you could use a Boost MultiIndex. If you really want to avoid that, you can create an index of your own pretty easily. To do this, you'd start by inserting your items into a std::unordered_map (or perhaps an std::map). This gives fast access by key, but no access in insertion order. It does, however, return an iterator to each items as it's inserted into the map. You can simply store those iterators into a vector to get access in the order of insertion. Although the principle of this is fairly simple, the code is a bit on the clumsy side, to put it nicely:
std::map<std::string, std::string> fruit;
std::vector<std::map<std::string, std::string>::iterator> in_order;
in_order.push_back(fruit.insert(std::make_pair("banana", "yellow")).first);
in_order.push_back(fruit.insert(std::make_pair("apple", "green")).first);
in_order.push_back(fruit.insert(std::make_pair("lemon", "yellow")).first);
This allows access either by key:
// ripen the apple:
fruit["apple"] = "red";
...or in insertion order:
for (auto i : in_order)
std::cout << i->first << ": " << i->second << "\n";
For the moment, I've shown the basic mechanism for doing this--if you wanted to use it much, you'd probably want to wrap that up into a nice class to hide some of the ugliness and the keep things pretty and clean in normal use.
An associative container that remembers insertion order does not come with the C++ standard library, but it is straightforward to implement one using existing STL containers.
For example, a combination of std::map (for fast lookup) and std::list (to maintain key ordering) can be used to emulate an insertion-ordered map. Here is an example that demonstrates the idea:
#include <unordered_map>
#include <list>
#include <stdexcept>
template<typename K, typename V>
class InsOrderMap {
struct value_pos {
V value;
typename std::list<K>::iterator pos_iter;
value_pos(V value, typename std::list<K>::iterator pos_iter):
value(value), pos_iter(pos_iter) {}
};
std::list<K> order;
std::unordered_map<K, value_pos> map;
const value_pos& locate(K key) const {
auto iter = map.find(key);
if (iter == map.end())
throw std::out_of_range("key not found");
return iter->second;
}
public:
void set(K key, V value) {
auto iter = map.find(key);
if (iter != map.end()) {
// no order change, just update value
iter->second.value = value;
return;
}
order.push_back(key);
map.insert(std::make_pair(key, value_pos(value, --order.end())));
}
void erase(K key) {
order.erase(locate(key).pos_iter);
map.erase(key);
}
V operator[](K key) const {
return locate(key).value;
}
// iterate over the mapping with a function object
// (writing a real iterator is too much code for this example)
template<typename F>
void walk(F fn) const {
for (auto key: order)
fn(key, (*this)[key]);
}
};
// TEST
#include <string>
#include <iostream>
#include <cassert>
int main()
{
typedef InsOrderMap<std::string, std::string> IxHash;
IxHash food_color;
food_color.set("Banana", "Yellow");
food_color.set("Apple", "Green");
food_color.set("Lemon", "Yellow");
assert(food_color["Banana"] == std::string("Yellow"));
assert(food_color["Apple"] == std::string("Green"));
assert(food_color["Lemon"] == std::string("Yellow"));
auto print = [](std::string k, std::string v) {
std::cout << k << ' ' << v << std::endl;
};
food_color.walk(print);
food_color.erase("Apple");
std::cout << "-- without apple" << std::endl;
food_color.walk(print);
return 0;
}
Developing this code into a drop-in replacement for a full-fledged container such as std::map requires considerable effort.
C++ has standard containers for this. An unordered map seems like what you are looking for:
std::unordered_map <std::string, std::string> mymap = {{"Banana", "Yellow" }, {"Orange","orange" } }

C++ - string ID in debug, int in release

In my code, I am using string IDs. That is good in debug and in code during coding. You can imagine it like this:
MyCar * c = cars->GetCarID("my_car_1");
MyCar * c = cars->GetCarID(variable);
But this is slower in release, because of string - string comparison in GetCarID code.
I would like to have something like this
MyCar * c = cars->GetCarID(CREATE_ID("my_car_1"));
MyCar * c = cars->GetCarID(CREATE_ID(variable));
CREATE_ID - in debug, it will return string that is written in code, in release, it will return int hash or something like that.
How can I achieve this? Or how is this usually solved?
You can wrap your id in a class, something like:
class StringID
{
public:
StringID(const StringID&);
StringID(const char*);
StringID(const std::string&);
...
private:
#if DEBUG
std::string _idAsString;
#endif
int _id;
};
and define common operation for your class, like operator<, operator== and so on. In this way on release build you will have a wrapper over int and on debug build the class will contain the string to make it easier to debug.
For performance reasons you can make id/hash computation constexprand compute it at compile time for string literals (for more info please check this: Computing length of a C string at compile time. Is this really a constexpr?).
With this approach you can catch hash clashes by checking also the strings in debug mode, not only the hashes. As you probably already know, different strings can lead to same hash (unlikely for common english words, but possible) and this will be very difficult to debug without knowing the string that generated the hash.
In debug mode, #define CREATE_ID(x) #x and use cars->GetCarID(CREATE_ID(my_car_1));. In release mode, #define CREATE_ID(x) x, add an enum { my_car_1, ... } and you still use cars->GetCarID(CREATE_ID(my_car_1));
Note that you never use CREATE_ID(variable) but you may use auto variable = CREATE_ID(my_car_id).
You could use enums instead of strings. This way you have readable names for your integers:
enum ID {
my_car_1
, my_train_1
, my_g6_1
};
My approach would be to use enum class. This allows to have typesafe identifier, that is char or int underneath. A simple code example would be:
#include <iostream>
enum class COLOR {
RED
, GREEN
};
std::ostream & operator << (std::ostream & o, const COLOR & a) {
switch(a) {
case COLOR::RED: o << "RED";break;
case COLOR::GREEN: o << "GREEN";break;
default: o << static_cast<int>(a);
}
return o;
}
int main() {
COLOR c = COLOR::RED;
std::cout << c << std::endl;
return 0;
}
The drawback is, that you have to explicitly writeout all identifiers twice - once in class and once in operator.
One of the mayor advantages of enum class is that names are scoped and it does not allow stuff like:
std::string get(COLOR x);
...
get(3); //Compile time error
get(CAR::GOLF);//Compile time error
Judging by your comments on other answers, you can define your own identifier like this:
class MyId {
public:
int id;
static std::unordered_map<int,std::string> names;
};
std::ostream & operator << (std::ostream &o,const MyId & m) {
auto itr = MyId::names.find(m.id);
if(itr!= MyId::names.end()) {
o << itr->second;
} else {
o << "Unknown " << m.id;
}
return o;
}
Its typesafe, theres no more overhead then int, and it is able to survive user input.

Is this the best way to do a "with" statement in C++?

Edit:
So this question was misinterpreted to such a ludicrous degree that it has no point anymore. I don't know how, since the question that I actually asked was whether my specific implementation of this—yes, known to be pointless, yes, not remotely resembling idiomatic C++—macro was as good as it could be, and whether it necessarily had to use auto, or if there was a suitable workaround instead. It was not supposed to generate this much attention, and certainly not a misunderstanding of this magnitude. It's pointless to ask respondents to edit their answers, I don't want anybody to lose reputation over this, and there's some good information floating around in here for potential future viewers, so I'm going to arbitrarily pick one of the lower-voted answers to evenly distribute the reputation involved. Move along, nothing to see here.
I saw this question and decided it might be fun to write a with statement in C++. The auto keyword makes this really easy, but is there a better way to do it, perhaps without using auto? I've elided certain bits of the code for brevity.
template<class T>
struct with_helper {
with_helper(T& v) : value(v), alive(true) {}
T* operator->() { return &value; }
T& operator*() { return value; }
T& value;
bool alive;
};
template<class T> struct with_helper<const T> { ... };
template<class T> with_helper<T> make_with_helper(T& value) { ... }
template<class T> with_helper<const T> make_with_helper(const T& value) { ... }
#define with(value) \
for (auto o = make_with_helper(value); o.alive; o.alive = false)
Here's an (updated) usage example with a more typical case that shows the use of with as it is found in other languages.
int main(int argc, char** argv) {
Object object;
with (object) {
o->member = 0;
o->method(1);
o->method(2);
o->method(3);
}
with (object.get_property("foo").perform_task(1, 2, 3).result()) {
std::cout
<< (*o)[0] << '\n'
<< (*o)[1] << '\n'
<< (*o)[2] << '\n';
}
return 0;
}
I chose o because it's an uncommon identifier, and its form gives the impression of a "generic thing". If you've got an idea for a better identifier or a more usable syntax altogether, then please do suggest it.
If you use auto, why use macros at all?
int main()
{
std::vector<int> vector_with_uncommonly_long_identifier;
{
auto& o = vector_with_uncommonly_long_identifier;
o.push_back(1);
o.push_back(2);
o.push_back(3);
}
const std::vector<int> constant_duplicate_of_vector_with_uncommonly_long_identifier
(vector_with_uncommonly_long_identifier);
{
const auto& o = constant_duplicate_of_vector_with_uncommonly_long_identifier;
std::cout
<< o[0] << '\n'
<< o[1] << '\n'
<< o[2] << '\n';
}
{
auto o = constant_duplicate_of_vector_with_uncommonly_long_identifier.size();
std::cout << o <<'\n';
}
}
EDIT: Without auto, just use typedef and references.
int main()
{
typedef std::vector<int> Vec;
Vec vector_with_uncommonly_long_identifier;
{
Vec& o = vector_with_uncommonly_long_identifier;
o.push_back(1);
o.push_back(2);
o.push_back(3);
}
}
?? attempted vb syntax into C++
with says do all the things in the following block by default referencing the object I've said to do it with right? Executes a series of statements making repeated reference to a single object or structure.
with(a)
.do
.domore
.doitall
so how is the example giving you the same syntax?
to me examples of why to use a with where multiple de referencess
so rather than
book.sheet.table.col(a).row(2).setColour
book.sheet.table.col(a).row(2).setFont
book.sheet.table.col(a).row(2).setText
book.sheet.table.col(a).row(2).setBorder
you have
with( book.sheet.table.col(a).row(2) )
.setColour
.setFont
.setText
.setBorder
seems just as easy, and more common syntax in C++ to
cell& c = book.sheet.table.col(a).row(2);
c.setColour
c.setFont
c.setText
c.setBorder
For C++0x (which you're assuming):
int main() {
std::vector<int> vector_with_uncommonly_long_identifier;
{
auto& o = vector_with_uncommonly_long_identifier;
o.push_back(1);
o.push_back(2);
o.push_back(3);
}
}
Why not just use a good lambda?
auto func = [&](std::vector<int>& o) {
};
func(vector_with_a_truly_ridiculously_long_identifier);
The simple fact is that if your identifiers are so long, that you can't type them out every time, use a reference, function, pointer, etc to solve this problem, or better, refactor the name. Statements like this (e.g. using() in C#) have additional side effects (deterministic cleanup, in my example). Your statement in C++ has no notable actual benefits, since it doesn't actually invoke any additional behaviour against just writing the code out.