Bidirectional iterators in unordered_map? - c++

The following minimal example:
#include <iostream>
#include <boost/unordered_map.hpp>
int main()
{
boost::unordered_map<int, int> m;
boost::unordered_map<int, int>::const_iterator i;
m.insert(std::make_pair(1, 2));
i = m.end();
--i;
std::cout << i->first << " -> " << i->second << std::endl;
return 0;
}
...fails to compile.
bidi.cxx: In function ‘int main()’:
bidi.cxx:13: error: no match for ‘operator--’ in ‘--i’
According to Boost's own documentation:
iterator, const_iterator are of at least the forward category.
It would appear that that's all they are. Why? What technical restriction does a hash-map impose that prevents iterators from being bidirectional?
(gcc version 4.1.2, Boost versions 1.40.0 and 1.43.0.)

There is no technical reason why an unordered_map can't have bidirectional iterators. The main reason is that it would add additional cost to the implementation, and the designers thought nobody would really need bidirectional iterators in a hash map. After all, there's no order in a hash, and so the order the iterator gives you is entirely arbitrary. What would traversing a fixed but arbitrary order backwards give you?
Normally, one would access an unordered_map on an element-by-element basis, or traverse the whole map. I've never done otherwise in Perl, myself. To do this, a forward iterator is necessary, and therefore there is one in there, and Boost guarantees it. To have bidirectional iterators, it would likely be necessary to include an additional pointer in each entry, which increases memory use and processing time.
I'm not coming up with a good, plausible, use case for bidirectional iterators here. If you can, you can ask the Boost maintainers to consider it, although you're almost certainly too late for C++0x.

Related

Should items with duplicate keys in unordered_multimap be kept in the order of their insertion?

One book mentioned that for std::unordered_multimap:
The order of the elements is undefined. The only guarantee is that
duplicates, which are possible because a multiset is used, are grouped
together in the order of their insertion.
But from the output of the example below, we can see that the print order is reverse from their insertion.
#include <string>
#include <unordered_map>
int main()
{
std::unordered_multimap<int, std::string> um;
um.insert( {1,"hello1.1"} );
um.insert( {1,"hello1.2"} );
um.insert( {1,"hello1.3"} );
for (auto &a: um){
cout << a.first << '\t' << a.second << endl;
}
}
Which when compiled and run produces this output (g++ 5.4.0):
1 hello1.3
1 hello1.2
1 hello1.1
updated: unordered_multiset has the same issue:
auto cmp = [](const pair<int,string> &p1, const pair<int,string> &p2)
{return p1.first == p2.first;};
auto hs = [](const pair<int,string> &p1){return std::hash<int>()(p1.first);};
unordered_multiset<pair<int, string>, decltype(hs), decltype(cmp)> us(0, hs, cmp);
us.insert({1,"hello1.1"});
us.insert({1,"hello1.2"});
us.insert({1,"hello1.3"});
for(auto &a:us){
cout<<a.first<<"\t"<<a.second<<endl;
}
output:
1 hello1.3
1 hello1.2
1 hello1.1
Here is what the standard says of the ordering [unord.req] / §6:
... In containers that support equivalent keys, elements with equivalent keys are adjacent to each other in the iteration order of the container. Thus, although the absolute order of elements in an unordered container is not specified, its elements are grouped into equivalent-key groups such that all elements of each group have equivalent keys. Mutating operations on unordered containers shall preserve the relative order of elements within each equivalent-key group unless otherwise specified.
So, to answer the question:
Should items with duplicate keys in unordered_multimap be kept in the order of their insertion?
No, there is no such requirement, or guarantee. If the book makes such claim about the standard, then it is not correct. If the book describes a particular implementation of std::unordered_multimap, then the description could be true for that implementation.
The requirements of the standard make an implementation using open addressing impractical. Therefore, compliant implementations typically use separate chaining of hash collisions, see How does C++ STL unordered_map resolve collisions?
Because equivalent keys - which necessarily collide - are (in practice, not explicitly required to be) stored in a separate linked list, the most efficient way to insert them, is in order of insertion (push_back) or in reverse (push_front). Only the latter is efficient if the separate chain is singly linked.

Iterating Streams in Reverse

I would like to use std::find_if to traverse the contents of an std::streambuf in reverse. This involves constructing an std::reverse_iterator from an std::istream_iterator or std::istreambuf_iterator. Unfortunately, trying to do this, as shown in the code sample below, results in a compilation error. How can I get this to work? If necessary, solutions using Boost would be great.
#include <cstddef>
#include <fstream>
#include <iterator>
template <class Iterator>
static std::reverse_iterator<Iterator>
make_reverse_iterator(Iterator i)
{
return std::reverse_iterator<Iterator>(i);
}
int main()
{
std::ifstream is{"foo.txt", std::ios::binary};
std::istreambuf_iterator<char> i{is};
auto r = make_reverse_iterator(i);
// Error =(
*r;
return EXIT_SUCCESS;
}
Here is the compilation error reported by g++-4.8.1:
In file included from /opt/local/include/gcc48/c++/bits/stl_algobase.h:67:0,
from /opt/local/include/gcc48/c++/bits/char_traits.h:39,
from /opt/local/include/gcc48/c++/ios:40,
from /opt/local/include/gcc48/c++/istream:38,
from /opt/local/include/gcc48/c++/fstream:38,
from ri.cpp:9:
/opt/local/include/gcc48/c++/bits/stl_iterator.h: In instantiation of 'std::reverse_iterator<_Iterator>::reference std::reverse_iterator<_Iterator>::operator*() const [with _Iterator = std::istream_iterator<char>; std::reverse_iterator<_Iterator>::reference = const char&]':
ri.cpp:24:3: required from here
/opt/local/include/gcc48/c++/bits/stl_iterator.h:163:10: error: no match for 'operator--' (operand type is 'std::istream_iterator<char>')
return *--__tmp;
^
Thanks for your help!
As far as I know input iterators (such as those of ifstreams) are not capable of going backwards which is why the reverse iterator is not available. Which makes sense because if you think about it, the forward of the reverse_iterator (i.e. operator ++) is the backwards of the normal iterator (i.e. operator --) and so if the normal iterator doesn't provide operator --, then it stands to reason that the reverse_iterator should not exist.
As I recall there are 3 types of iterators: forward, bidirectional and random access. Forward can only go in one direction (guess which :P), bidirectional can go forward and backwards by 1 and random access can go forwards and backwards by whatever increment.
As you can see random access iterators offer all the operations of bidirectional iterators (and more) who themselves offer all the operations of forward iterators (and more). Which means random access iterators can be used where forward iterators are require but not the other way around.
As you may have guessed from this explanation make_reverse_iterator most likely requires either bidirectional or random access iterators and ifstream most likely offers only forward which is why the template instantiation fails.

C++: Proper way to iterate over STL containers

In my game engine project, I make extensive use of the STL, mostly of the std::string and std::vector classes.
In many cases, I have to iterate through them. Right now, the way I'm doing it is:
for( unsigned int i = 0; i < theContainer.size(); i ++ )
{
}
Am I doing it the right way?
If not, why, and what should I do instead?
Is size() really executed every loop cycle with this implementation? Would the performance loss be negligible?
C++11 has a new container aware for loop syntax that can be used if your compiler supports the new standard.
#include <iostream>
#include <vector>
#include <string>
using namespace std;
int main()
{
vector<string> vs;
vs.push_back("One");
vs.push_back("Two");
vs.push_back("Three");
for (const auto &s : vs)
{
cout << s << endl;
}
return 0;
}
You might want to look at the standard algorithms.
For example
vector<mylass> myvec;
// some code where you add elements to your vector
for_each(myvec.begin(), myvec.end(), do_something_with_a_vector_element);
where do_something_with_a_vector_element is a function that does what goes in your loop
for example
void
do_something_with_a_vector_element(const myclass& element)
{
// I use my element here
}
The are lots of standard algorithms - see http://www.cplusplus.com/reference/algorithm/ - so most things are supported
STL containers support Iterators
vector<int> v;
for (vector<int>::iterator it = v.begin(); it!=v.end(); ++it) {
cout << *it << endl;
}
size() would be re-computed every iteration.
For random-access containers, it's not wrong.
But you can use iterators instead.
for (string::const_iterator it = theContainer.begin();
it != theContainer.end(); ++it) {
// do something with *it
}
There are some circumstances under which a compiler may optimize away the .size() (or .end() in the iterator case) calls (e.g. only const access, function is pure). But do not depend on it.
Usually the right way to "iterate" over a container is using "iterators". Something like
string myStr = "hello";
for(string::iterator i = myStr.begin(); i != myStr.end(); ++i){
cout << "Current character: " << *i << endl;
}
Of course, if you aren't going to modify each element, it's best to use string::const_iterator.
And yes, size() gets called every time, and it's O(n), so in many cases the performance loss will be noticeable and it's O(1), but it's a good practice to calculate the size prior to the loop than calling size every time.
No, this is not the correct way to do it. For a ::std::vector or a ::std::string it works fine, but the problem is that if you ever use anything else, it won't work so well. Additionally, it isn't idiomatic.
And, to answer your other question... The size function is probably inline. This means it likely just fetches a value from the internals of ::std::string or ::std::vector. The compiler will optimize this away and only fetch it once in most cases.
But, here is the idiomatic way:
for (::std::vector<Foo>::iterator i = theContainer.begin();
i != theContainer.end();
++i)
{
Foo &cur_element = *i;
// Do stuff
}
The ++i is very important. Again, for ::std:vector or ::std::string where the iterator is basically a pointer, it's not so important. But for more complicated data structures it is. i++ has to make a copy and create a temporary because the old value needs to stick around. ++i has no such issue. Get into the habit of always using ++i unless you have a compelling reason not to.
Lastly, theContainer.end() will also be generally optimized out of existence. But you can force things to be a little better by doing this:
const ::std::vector<Foo>::iterator theEnd = theContainer.end();
for (::std::vector<Foo>::iterator i = theContainer.begin(); i != theEnd; ++i)
{
Foo &cur_element = *i;
// Do stuff
}
Of course, C++0x simplifies all of this considerably with a new syntax for for loops:
for (Foo &i: theContainer)
{
// Do stuff with i
}
These will work on standard fix-sized arrays as well as any type that defines begin and end to return iterator-like things.
Native for-loop (especially index-based) - it's C-way, not C++-way.
Use BOOST_FOREACH for loops.
Compare, for container of integers:
typedef theContainer::const_iterator It;
for( It it = theContainer.begin(); it != theContainer.end(); ++it ) {
std::cout << *it << std::endl;
}
and
BOOST_FOREACH ( int i, theContainer ) {
std::cout << i << std::endl;
}
But this is not perfect way. If you can do your work without loop - you MUST do it without loop. For example, with algorithms and Boost.Phoenix:
boost::range::for_each( theContainer, std::cout << arg1 << std::endl );
I understand that these solutions bring additional dependencies in your code, but Boost is 'must-have' for modern C++.
You're doing it OK for vectors, although that doesn't translate into the right way for other containers.
The more general way is
for(std::vector<foo>::const_iterator i = theContainer.begin(); i != theContainer.end; ++i)
which is more typing than I really like, but will become a lot more reasonable with the redefinition of auto in the forthcoming Standard. This will work on all standard containers. Note that you refer to the individual foo as *i, and use &*i if you want its address.
In your loop, .size() is executed every time. However, it's constant time (Standard, 23.1/5) for all standard containers, so it won't slow you down much if at all. Addition: the Standard says "should" have constant complexity, so a particularly bad implementation could make it not constant. If you're using such a bad implementation, you've got other performance issues to worry about.

What is the best way to use a HashMap in C++?

I know that STL has a HashMap API, but I cannot find any good and thorough documentation with good examples regarding this.
Any good examples will be appreciated.
The standard library includes the ordered and the unordered map (std::map and std::unordered_map) containers. In an ordered map (std::map) the elements are sorted by the key, insert and access is in O(log n). Usually the standard library internally uses red black trees for ordered maps. But this is just an implementation detail. In an unordered map (std::unordered_map) insert and access is in O(1). It is just another name for a hashtable.
An example with (ordered) std::map:
#include <map>
#include <iostream>
#include <cassert>
int main(int argc, char **argv)
{
std::map<std::string, int> m;
m["hello"] = 23;
// check if key is present
if (m.find("world") != m.end())
std::cout << "map contains key world!\n";
// retrieve
std::cout << m["hello"] << '\n';
std::map<std::string, int>::iterator i = m.find("hello");
assert(i != m.end());
std::cout << "Key: " << i->first << " Value: " << i->second << '\n';
return 0;
}
Output:
23
Key: hello Value: 23
If you need ordering in your container and are fine with the O(log n) runtime then just use std::map.
Otherwise, if you really need a hash-table (O(1) insert/access), check out std::unordered_map, which has a similar to std::map API (e.g. in the above example you just have to search and replace map with unordered_map).
The unordered_map container was introduced with the C++11 standard revision. Thus, depending on your compiler, you have to enable C++11 features (e.g. when using GCC 4.8 you have to add -std=c++11 to the CXXFLAGS).
Even before the C++11 release GCC supported unordered_map - in the namespace std::tr1. Thus, for old GCC compilers you can try to use it like this:
#include <tr1/unordered_map>
std::tr1::unordered_map<std::string, int> m;
It is also part of boost, i.e. you can use the corresponding boost-header for better portability.
A hash_map is an older, unstandardized version of what for standardization purposes is called an unordered_map (originally in TR1, and included in the standard since C++11). As the name implies, it's different from std::map primarily in being unordered -- if, for example, you iterate through a map from begin() to end(), you get items in order by key1, but if you iterate through an unordered_map from begin() to end(), you get items in a more or less arbitrary order.
An unordered_map is normally expected to have constant complexity. That is, an insertion, lookup, etc., typically takes essentially a fixed amount of time, regardless of how many items are in the table. An std::map has complexity that's logarithmic on the number of items being stored -- which means the time to insert or retrieve an item grows, but quite slowly, as the map grows larger. For example, if it takes 1 microsecond to lookup one of 1 million items, then you can expect it to take around 2 microseconds to lookup one of 2 million items, 3 microseconds for one of 4 million items, 4 microseconds for one of 8 million items, etc.
From a practical viewpoint, that's not really the whole story though. By nature, a simple hash table has a fixed size. Adapting it to the variable-size requirements for a general purpose container is somewhat non-trivial. As a result, operations that (potentially) grow the table (e.g., insertion) are potentially relatively slow (that is, most are fairly fast, but periodically one will be much slower). Lookups, which cannot change the size of the table, are generally much faster. As a result, most hash-based tables tend to be at their best when you do a lot of lookups compared to the number of insertions. For situations where you insert a lot of data, then iterate through the table once to retrieve results (e.g., counting the number of unique words in a file) chances are that an std::map will be just as fast, and quite possibly even faster (but, again, the computational complexity is different, so that can also depend on the number of unique words in the file).
1 Where the order is defined by the third template parameter when you create the map, std::less<T> by default.
Here's a more complete and flexible example that doesn't omit necessary includes to generate compilation errors:
#include <iostream>
#include <unordered_map>
class Hashtable {
std::unordered_map<const void *, const void *> htmap;
public:
void put(const void *key, const void *value) {
htmap[key] = value;
}
const void *get(const void *key) {
return htmap[key];
}
};
int main() {
Hashtable ht;
ht.put("Bob", "Dylan");
int one = 1;
ht.put("one", &one);
std::cout << (char *)ht.get("Bob") << "; " << *(int *)ht.get("one");
}
Still not particularly useful for keys, unless they are predefined as pointers, because a matching value won't do! (However, since I normally use strings for keys, substituting "string" for "const void *" in the declaration of the key should resolve this problem.)
Evidence that std::unordered_map uses a hash map in GCC stdlibc++ 6.4
This was mentioned at: https://stackoverflow.com/a/3578247/895245 but in the following answer: What data structure is inside std::map in C++? I have given further evidence of such for the GCC stdlibc++ 6.4 implementation by:
GDB step debugging into the class
performance characteristic analysis
Here is a preview of the performance characteristic graph described in that answer:
How to use a custom class and hash function with unordered_map
This answer nails it: C++ unordered_map using a custom class type as the key
Excerpt: equality:
struct Key
{
std::string first;
std::string second;
int third;
bool operator==(const Key &other) const
{ return (first == other.first
&& second == other.second
&& third == other.third);
}
};
Hash function:
namespace std {
template <>
struct hash<Key>
{
std::size_t operator()(const Key& k) const
{
using std::size_t;
using std::hash;
using std::string;
// Compute individual hash values for first,
// second and third and combine them using XOR
// and bit shifting:
return ((hash<string>()(k.first)
^ (hash<string>()(k.second) << 1)) >> 1)
^ (hash<int>()(k.third) << 1);
}
};
}
For those of us trying to figure out how to hash our own classes whilst still using the standard template, there is a simple solution:
In your class you need to define an equality operator overload ==. If you don't know how to do this, GeeksforGeeks has a great tutorial https://www.geeksforgeeks.org/operator-overloading-c/
Under the standard namespace, declare a template struct called hash with your classname as the type (see below). I found a great blogpost that also shows an example of calculating hashes using XOR and bitshifting, but that's outside the scope of this question, but it also includes detailed instructions on how to accomplish using hash functions as well https://prateekvjoshi.com/2014/06/05/using-hash-function-in-c-for-user-defined-classes/
namespace std {
template<>
struct hash<my_type> {
size_t operator()(const my_type& k) {
// Do your hash function here
...
}
};
}
So then to implement a hashtable using your new hash function, you just have to create a std::map or std::unordered_map just like you would normally do and use my_type as the key, the standard library will automatically use the hash function you defined before (in step 2) to hash your keys.
#include <unordered_map>
int main() {
std::unordered_map<my_type, other_type> my_map;
}

I would like to see a hash_map example in C++

I don't know how to use the hash function in C++, but I know that we can use hash_map. Does g++ support that by simply including #include <hash_map>? What is a simple example using hash_map?
The current C++ standard does not have hash maps, but the coming C++0x standard does, and these are already supported by g++ in the shape of "unordered maps":
#include <unordered_map>
#include <iostream>
#include <string>
using namespace std;
int main() {
unordered_map <string, int> m;
m["foo"] = 42;
cout << m["foo"] << endl;
}
In order to get this compile, you need to tell g++ that you are using C++0x:
g++ -std=c++0x main.cpp
These maps work pretty much as std::map does, except that instead of providing a custom operator<() for your own types, you need to provide a custom hash function - suitable functions are provided for types like integers and strings.
#include <tr1/unordered_map> will get you next-standard C++ unique hash container. Usage:
std::tr1::unordered_map<std::string,int> my_map;
my_map["answer"] = 42;
printf( "The answer to life and everything is: %d\n", my_map["answer"] );
Wikipedia never lets down:
http://en.wikipedia.org/wiki/Hash_map_(C%2B%2B)
hash_map is a non-standard extension. unordered_map is part of std::tr1, and will be moved into the std namespace for C++0x. http://en.wikipedia.org/wiki/Unordered_map_%28C%2B%2B%29
The name accepted into TR1 (and the draft for the next standard) is std::unordered_map, so if you have that available, it's probably the one you want to use.
Other than that, using it is a lot like using std::map, with the proviso that when/if you traverse the items in an std::map, they come out in the order specified by operator<, but for an unordered_map, the order is generally meaningless.