Why std::map is red black tree and not hash table ? - c++

This is very strange for me, i expected it to be a hash table.
I saw 3 reasons in the following answer (which maybe correct but i don't think that they are the real reason).
Hash tables v self-balancing search trees
Although hash might be not a trivial operation. I think that for most of the types it is pretty simple.
when you use map you expect something that will give you amortized O(1) insert, delete, find , not log(n).
i agree that trees have better worst case performance.
I think that there is a bigger reason for that, but i can't figure it out.
In c# for example Dictionary is a hash table.

It's largely a historical accident. The standard containers (along with iterators and algorithms) were one of the very last additions before the feature set of the standard was frozen. As it happened, they didn't have what they considered an adequate definition of a hash-based map at the time, and there wasn't time to add it before features were frozen, so the original specification included only a tree-based map.
C++ 11 added std::unordered_map (as well as std::unordered_set and multi versions of both), which is based on hashing though.

The reason is that map is explicitly called out as an ordered container. It keeps the elements sorted and allows you to iterate in sorted order in linear time. A hashtable couldn't fulfill those requirements.
In C++11 they added std::unordered_map which is a hashtable implementation.

A hash table requires an additional hash function. The current implementation of map which uses a tree can work without an extra hash function by using operator<. Additionally the map allows sorted access to elements, which may be beneficial for some applications. With C++ we now have the hash versions available in form of unordered_set.

Simple answer: because a hash table cannot satisfy the complexity requirements of iteration over a std::map.
Why does std::map hold these requirements? Unanswerable question. Historical factors contribute but, overall, that's just the way it is.
Hashes are available as std::unordered_map.
It doesn't really matter what the two are called, or what they're called in some other language.

Related

Why use an unordered container? (C++)

I have already tried searching for this but haven't found anything.
I am learning about STL containers, and understand the pros and cons of sequential and associative containers, however am not sure why anyone would prefer an unordered container over an associative one, as surely it would not affect element insertion, lookup and removal.
Is it purely a performance thing, i.e it would take more processing to insert / remove to an associative container as it has to go through sorting?
I don't know too much about the system side of things but in my head I feel like an unordered container would require more 'upkeep' than one that is automatically organised.
If anyone could shed some light it would be really appreciated.
Purely abstractly, consider the fact that an ordering of the elements is an extra "feature" that you have to pay for, so if you don't need it (like in lookup-only dictionary), then you shouldn't have to pay for it.
Technically this means that an unordered container can be implemented with expected lookup and in­ser­tion complexity O(1), rather than the O(log n) of ordered containers, by using hash tables.
On a tangentially related note, though, there is a massive practical advantage when using strings as keys: An ordered container has to perform full string comparison everywhere along the tree walk, while a hash container only performs a single hashing operation (which can even be "optimized" to only sam­ple a fixed number of characters from very long strings), and often turns out to be a lot faster in practice.
If ordering is not a requirement, then the best thing to do is to try out both container types (whose inter­face is almost identical) and compare the performance in your usage profile.
We use the Unordered Container When the ordering of the Objects is not necessary and you care most about performance of objects lookup because the Unordered Container have a fastest search /insert at any place is O(1) rather than the Ordered Container(Associative Container take O(log n)) and Sequence containers take O(n)).
not sure why anyone would prefer an unordered container over an associative one
Those features are not exclusive. A container may be associative, and if it is, separately may also be unordered.
If you are familiar with hash maps, that is the technology being leveraged by unordered containers. The standard library uses the term "unordered" instead of "hash" so as not to impose a specific technology when what is desired is just specific performance promises. (see comment)

why stl choose tree based map instead of hash based map?

I'm wondering why STL's map is base on rb tree?
I mean, hash-based map seems to be more efficient in inserting/deleting or even getting the value.
Are there any specific considerations?
The STL originally chose both. It had a hash table and the tree-based map.
However, when it was adopted into the standard, many parts were stripped away in order to simplify the task (it was easier to talk the committee into including a smaller library, and it required less work in terms of actually specifying their behavior).
So the hash table was skipped.
However, both data structures have their advantages. In particular, a binary tree allows the contents of the map to be ordered (you can iterate over the contents of a map in sorted order, or you can ask for all elements smaller than a specific element, for example), and I can only guess that this property was considered more important than the performance advantages of a hash map.
However, in C++11, std::unordered_map is added, which is the long lost hash table. Its original omission was simply due to time pressure and, quite possibly, committee politics (keeping the library small to minimize resistance against it)

Choosing between std::map and std::unordered_map [duplicate]

This question already has answers here:
Is there any advantage of using map over unordered_map in case of trivial keys?
(15 answers)
Closed 4 years ago.
Now that std has a real hash map in unordered_map, why (or when) would I still want to use the good old map over unordered_map on systems where it actually exists? Are there any obvious situations that I cannot immediately see?
As already mentioned, map allows to iterate over the elements in a sorted way, but unordered_map does not. This is very important in many situations, for example displaying a collection (e.g. address book). This also manifests in other indirect ways like: (1) Start iterating from the iterator returned by find(), or (2) existence of member functions like lower_bound().
Also, I think there is some difference in the worst case search complexity.
For map, it is O( lg N )
For unordered_map, it is O( N ) [This may happen when the hash function is not good leading to too many hash collisions.]
The same is applicable for worst case deletion complexity.
In addition to the answers above you should also note that just because unordered_map is constant speed (O(1)) doesn't mean that it's faster than map (of order log(N)). The constant may be bigger than log(N) especially since N is limited by 232 (or 264).
So in addition to the other answers (map maintains order and hash functions may be difficult) it may be that map is more performant.
For example in a program I ran for a blog post I saw that for VS10 std::unordered_map was slower than std::map (although boost::unordered_map was faster than both).
Note 3rd through 5th bars.
This is due to Google's Chandler Carruth in his CppCon 2014 lecture
std::map is (considered by many to be) not useful for performance-oriented work: If you want O(1)-amortized access, use a proper associative array (or for lack of one, std::unorderded_map); if you want sorted sequential access, use something based on a vector.
Also, std::map is a balanced tree; and you have to traverse it, or re-balance it, incredibly often. These are cache-killer and cache-apocalypse operations respectively... so just say NO to std::map.
You might be interested in this SO question on efficient hash map implementations.
(PS - std::unordered_map is cache-unfriendly because it uses linked lists as buckets.)
I think it's obvious that you'd use the std::map you need to iterate across items in the map in sorted order.
You might also use it when you'd prefer to write a comparison operator (which is intuitive) instead of a hash function (which is generally very unintuitive).
Say you have very large keys, perhaps large strings. To create a hash value for a large string you need to go through the whole string from beginning to end. It will take at least linear time to the length of the key. However, when you only search a binary tree using the > operator of the key each string comparison can return when the first mismatch is found. This is typically very early for large strings.
This reasoning can be applied to the find function of std::unordered_map and std::map. If the nature of the key is such that it takes longer to produce a hash (in the case of std::unordered_map) than it takes to find the location of an element using binary search (in the case of std::map), it should be faster to lookup a key in the std::map. It's quite easy to think of scenarios where this would be the case, but they would be quite rare in practice i believe.

What is the difference between set and hashset in C++ STL?

When should I choose one over the other?
Are there any pointers that you would recommend for using the right STL containers?
hash_set is an extension that is not part of the C++ standard. Lookups should be O(1) rather than O(log n) for set, so it will be faster in most circumstances.
Another difference will be seen when you iterate through the containers. set will deliver the contents in sorted order, while hash_set will be essentially random (Thanks Lou Franco).
Edit: The C++11 update to the C++ standard introduced unordered_set which should be preferred instead of hash_set. The performance will be similar and is guaranteed by the standard. The "unordered" in the name stresses that iterating it will produce results in no particular order.
stl::set is implemented as a binary search tree.
hashset is implemented as a hash table.
The main issue here is that many people use stl::set thinking it is a hash table with look-up of O(1), which it isn't, and doesn't have. It really has O(log(n)) for look-ups. Other than that, read about binary trees vs hash tables to get a better idea of the data structures.
Another thing to keep in mind is that with hash_set you have to provide the hash function, whereas a set only requires a comparison function ('<') which is easier to define (and predefined for native types).
I don't think anyone has answered the other part of the question yet.
The reason to use hash_set or unordered_set is the usually O(1) lookup time. I say usually because every so often, depending on implementation, a hash may have to be copied to a larger hash array, or a hash bucket may end up containing thousands of entries.
The reason to use a set is if you often need the largest or smallest member of a set. A hash has no order so there is no quick way to find the smallest item. A tree has order, so largest or smallest is very quick. O(log n) for a simple tree, O(1) if it holds pointers to the ends.
A hash_set would be implemented by a hash table, which has mostly O(1) operations, whereas a set is implemented by a tree of some sort (AVL, red black, etc.) which have O(log n) operations, but are in sorted order.
Edit: I had written that trees are O(n). That's completely wrong.

I don't understand std::tr1::unordered_map

I need an associative container that makes me index a certain object through a string, but that also keeps the order of insertion, so I can look for a specific object by its name or just iterate on it and retrieve objects in the same order I inserted them.
I think this hybrid of linked list and hash map should do the job, but before I tried to use std::tr1::unordered_map thinking that it was working in that way I described, but it wasn't. So could someone explain me the meaning and behavior of unordered_map?
#wesc: I'm sure std::map is implemented by STL, while I'm sure std::hash_map is NOT in the STL (I think older version of Visual Studio put it in a namespace called stdext).
#cristopher: so, if I get it right, the difference is in the implementation (and thus performances), not in the way it behaves externally.
You've asked for the canonical reason why Boost::MultiIndex was made: list insertion order with fast lookup by key. Boost MultiIndex tutorial: list fast lookup
You need to index an associative container two ways:
Insertion order
String comparison
Try Boost.MultiIndex or Boost.Intrusive. I haven't used it this way but I think it's possible.
Boost documentation of unordered containers
The difference is in the method of how you generate the look up.
In the map/set containers the operator< is used to generate an ordered tree.
In the unordered containers, an operator( key ) => index is used.
See hashing for a description of how that works.
Sorry, read your last comment wrong. Yes, hash_map is not in STL, map is. But unordered_map and hash_map are the same from what I've been reading.
map -> log (n) insertion, retrieval, iteration is efficient (and ordered by key comparison)
hash_map/unordered_map -> constant time insertion and retrieval, iteration time is not guarantee to be efficient
Neither of these will work for you by themselves, since the map orders things based on the key content, and not the insertion sequence (unless your key contains info about the insertion sequence in it).
You'll have to do either what you described (list + hash_map), or create a key type that has the insertion sequence number plus an appropriate comparison function.
I think that an unordered_map and hash_map are more or less the same thing. The difference is that the STL doesn't officially have a hash_map (what you're using is probably a compiler specific thing), so unordered_map is the fix for that omission.
unordered_map is just that... unordered. You can't depend on it preserving any ordering on iteration.
You sure that std::hash_map exists in all STL implementations? SGI STL implements it, however GNU g++ doesn't have it (it's located in the __gnu_cxx namespace) as of 4.3.1 anyway. As far as I know, hash_map has always been non-standard, and now tr1 is fixing that.
#wesc: STL has std::map... so what's the difference with unordered_map? I don't think STL would implement twice the same thing and call it differently.