I have already tried searching for this but haven't found anything.
I am learning about STL containers, and understand the pros and cons of sequential and associative containers, however am not sure why anyone would prefer an unordered container over an associative one, as surely it would not affect element insertion, lookup and removal.
Is it purely a performance thing, i.e it would take more processing to insert / remove to an associative container as it has to go through sorting?
I don't know too much about the system side of things but in my head I feel like an unordered container would require more 'upkeep' than one that is automatically organised.
If anyone could shed some light it would be really appreciated.
Purely abstractly, consider the fact that an ordering of the elements is an extra "feature" that you have to pay for, so if you don't need it (like in lookup-only dictionary), then you shouldn't have to pay for it.
Technically this means that an unordered container can be implemented with expected lookup and insertion complexity O(1), rather than the O(log n) of ordered containers, by using hash tables.
On a tangentially related note, though, there is a massive practical advantage when using strings as keys: An ordered container has to perform full string comparison everywhere along the tree walk, while a hash container only performs a single hashing operation (which can even be "optimized" to only sample a fixed number of characters from very long strings), and often turns out to be a lot faster in practice.
If ordering is not a requirement, then the best thing to do is to try out both container types (whose interface is almost identical) and compare the performance in your usage profile.
We use the Unordered Container When the ordering of the Objects is not necessary and you care most about performance of objects lookup because the Unordered Container have a fastest search /insert at any place is O(1) rather than the Ordered Container(Associative Container take O(log n)) and Sequence containers take O(n)).
not sure why anyone would prefer an unordered container over an associative one
Those features are not exclusive. A container may be associative, and if it is, separately may also be unordered.
If you are familiar with hash maps, that is the technology being leveraged by unordered containers. The standard library uses the term "unordered" instead of "hash" so as not to impose a specific technology when what is desired is just specific performance promises. (see comment)
Related
I have a set of pointers. In the first step, I insert data pointers, and in the second step, I iterate over the whole set and do something with the elements. The order is not important, I just need to avoid duplicates, which works fine with pointer comparison.
My question is, whether it might be advantageous to use an unordered set for the same purpose. Is insertion faster for an unordered set?
As Ami Tavory commented, if you don't need order, then it's usually best to go for unordered containers. The reason being that if order somehow improved performance, unordered containers would still be free to use it, and hence get the same or better complexity anyhow.
A downside of unordered collections is that they usually require a hash function for the key type. If it's too hard or expensive to make one, then containers which don't use hashes might be better.
In C++'s standard library, the average insertion complexity for std::set is O(log(N)), whereas for std::unordered_set it's O(1). Aside from that, there are probably less cache misses on average when using std::unordered_set.
At the end of the day though, this is just theory. You should try something that sounds good enough and profile it to see if it really is.
I want to know which data-structures are more efficient for iterating through their elements between std::set, std::map and std::unordered_set, std::unordered_map.
I searched through SO and I found this question. The answers either propose to copy the elements in a std::vector or to use Boost.Container, which IMHO don't answer my question.
My purpose is to keep in a container a big number of unique elements, that most of the time I want to iterate through them. Insertions and extractions are more rare. I want to avoid std::vector in combination with std::unique.
Lets consider set vs unordered_set.
The main difference here is the 'nature' of the iteration, that is the traversal of the set will give you the elements in order while traversing a range in an unordered set will give you a bunch of values in no particular order.
Suppose you want to traverse a range [it1, it2]. If we exclude the lookup time that's needed to find elements it1 and it2 there can be no direct mapping from one case to another since the elements in between are not guarrandeed to be the same even if you've used the same elements to construct the container.
There are cases however where something like this has meaning when e.g. you want to traverse a fixed number of elements (regardless of what they are) or when you need to traverse the whole container. In such cases you need to consider implementation mechanics :
Sets are usually implemented like Red–black trees (a form of binary search trees). Like all binary search trees allow efficient in-order traversal (LRR: left root right) of their elements. That is to traverse you pay the cost of pointer chasing (just like traversing a list).
Unordered sets on the other hand are hash tables and to my knowledge the STL implementation uses hashing with chaining. That means (in a very very high level) that what's used for the structure is a (contiguous) buffer where each element is the head of a chain (list) that contains the elements. The way the elements are layed out across those chains (buckets) and across the buffer will affect the traversal time, however you'll be chasing pointers once again jumping through differents lists as well this time. I don't think it'll vary significantly from the tree case but won't be any better for sure.
In any case micro tuning and benchmarking will give you the answer for your particular application.
The difference does not lie between the ordering or lack of one but in the backing container. If it's a contiguous memory it should be fast to iterate over, due to simple implementation of iterator and cache friendliness.
Unordered containers are usually stored as a vector of vectors (or a similar thing), while ordered containers are implemented using trees, but it is left for implementation after all. This would suggest that iterating over unordered version should be waster. However this is left for implementation after all, and I saw implementations (which bent rules a little to be fair) with different behaviour.
Generally speaking, container performance is quite a complex topic and usually has to be tested in actual application to get reliable answer. There is plenty on implemention-defined stuff that might affect the performance. I'd go with hash_set if I had to go in blind. Copying into a vector might also turn out a good option.
EDIT: As #TonyD said in it's comment, there is a rule, that disallows invalidating iterators during addition of element when the max_load_factor() is not exceeded, this practically rules out backing containers which are contiguous in memory.
Thus, copying everything into a vector seems like even more reasonable option. If you need to remove duplicates, a feasible option might be to use http://en.cppreference.com/w/cpp/algorithm/sort and have dupes easily ignored. I have heard that using vector and sort to have a sorted array (or vector) is quite often a used option in case of need for a container that needs to be sorter and is being iterated over more often than modified.
iterate from fastest to slowest should be : set > map > unordered_set > unordered_map;
set is a little lighter than map, and they are ordered with binary tree rule, so should be faster than unordered_ containers.
This is very strange for me, i expected it to be a hash table.
I saw 3 reasons in the following answer (which maybe correct but i don't think that they are the real reason).
Hash tables v self-balancing search trees
Although hash might be not a trivial operation. I think that for most of the types it is pretty simple.
when you use map you expect something that will give you amortized O(1) insert, delete, find , not log(n).
i agree that trees have better worst case performance.
I think that there is a bigger reason for that, but i can't figure it out.
In c# for example Dictionary is a hash table.
It's largely a historical accident. The standard containers (along with iterators and algorithms) were one of the very last additions before the feature set of the standard was frozen. As it happened, they didn't have what they considered an adequate definition of a hash-based map at the time, and there wasn't time to add it before features were frozen, so the original specification included only a tree-based map.
C++ 11 added std::unordered_map (as well as std::unordered_set and multi versions of both), which is based on hashing though.
The reason is that map is explicitly called out as an ordered container. It keeps the elements sorted and allows you to iterate in sorted order in linear time. A hashtable couldn't fulfill those requirements.
In C++11 they added std::unordered_map which is a hashtable implementation.
A hash table requires an additional hash function. The current implementation of map which uses a tree can work without an extra hash function by using operator<. Additionally the map allows sorted access to elements, which may be beneficial for some applications. With C++ we now have the hash versions available in form of unordered_set.
Simple answer: because a hash table cannot satisfy the complexity requirements of iteration over a std::map.
Why does std::map hold these requirements? Unanswerable question. Historical factors contribute but, overall, that's just the way it is.
Hashes are available as std::unordered_map.
It doesn't really matter what the two are called, or what they're called in some other language.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
What are differences between ordered and unordered STL containers?
The main difference is that if you iterate through an ordered container by incrementing the iterator, you'll visit the elements of the container in the order of the keys.
That doesn't necessarily hold for unordered containers.
Ordered STL containers are based on a comparison. For example, std::set is typically implemented as a red-black tree. Unordered STL containers are based on hash algorithms and unordered_set is a hash table.
Unordered containers typically offer better algorithmic cost for operations like insertion, lookup and removal. However, their constant cost is quite high and hashing for custom types can be non-trivial in some cases. It's also impossible to iterate through an unordered container in a specific order.
Typically, I would use an ordered container for most uses, unless the performance of the container is identified to be a problem, because extending them for custom types is typically simpler.
The ordered and unordered apply to containers transitively.
What you are interested in is the Sequence, that is the order in which elements will appear when you iterate over (possibly a slice of) the container.
Sequence are more general though, so for this kind of concept I'll refer to the Martin Broadhurst's copy of the SGI STL website (the mother of the current STL) when one can find a taxonomy of the different concepts that lurk behind the STL.
To begin with, the Sequence. What's interesting in the sequence is that there is no guarantee that traversing it twice, without altering it in the meantime, will yield the elements in the same order. This is the case for example for containers that implement some form of caching by moving to the beginning the last element seen. In this case, iteration will effectively reverse the container.
An Ordered Associative Container1 is a container for which a criterion order has been fixed, and that guarantees that whenever iterating over a slice of its elements you'll always encounter them ordered according to this criterion.
A Hashed Associative Container on the other hand is different. Instead of an ordering criterion it uses hashing. The SGI STL also precise it must use buckets, which is kind of restrictive. Here the iteration is basically unordered. You have absolutely no control on how the elements will get out, and it might indeed not be identical from one run of the program to the other if some sort of randomness is applied to rehashing.
An Unordered container, is the term they came up with for Boost and C++0x, because they did not want the name to clash with already existing implementation of hash_set and hash_map. And thus, though the not documented in the SGI STL, the Unordered kind approximates the previous Hashed kind.
What you really need to know: Ordered means the elements will come out sorted while Unordered means that no kind of order (at all) is enforced. Order comes at a cost, so make sure to only pay for it when you need it. For example, Python dict is actually unordered.
1 I don't really like the term associative here. It's a bit misleading when one consider that a set is a model of this requirement, where an element is both key and value at once...
The unordered collections (tr1::unordered_map and tr1::unordered_set) retrieve values generally through a hash table implementation. This gives an amortized average look up of O(1).
The ordered collections (std::map and std::set) are node based. These collections retrieve values in O(log) time.
First off: I assume you are talking about std::(map|set) vs the 0x std::unordered_(map|set) Well, the obvious thing first: ordered containers keep their content - well, ordered. This means extra work is needed when you insert something (because you have to find out where to insert first). However, you only need to specify (if it's not builtin, like it is for many builtin types) how to compare two elements (i.e. if one is less than the other). Unordered containers don't need to order their contents, insertion and element access is faster, but you need (for custom types) to provide a good hash function and a function testing for equality, so it's more effort on your side.
One more thing somehow overlooked in the other replies. Ordered containers require strict weak ordering either using operator<, or Traits class with operator(). As a result iterating those containers is ordered in accordance with those functions. Unordered containers only require equality comparison operator of Predicate function doing the same. Therefore, there is no ordering of elements in container (other than internal ordering of buckets).
In addition to other answers,
equality test of 2 unordered containers(not elements)is difficult owing to
its unordered nature.
It is possible, but may be expensive.
I'm looking for a C++ container that will enjoy both map container and list container benefits.
map container advantages I would like to maintain:
O(log(n)) access
operator[] ease of use
sparse nature
list container advantages I would like to maintain:
having an order between the items
being able to traverse the list easily UPDATE: by a sorting order based on the key or value
A simple example application would be to hold a list of certain valid dates (business dates, holidays, some other set of important dates...), once given a specific date, you could find it immediately "map style" and then find the next valid date "list style".
std::map is already a sorted container where you can iterate over the contained items in order. It only provides O(log(n)) access, though.
std::tr1::unordered_map (or std::unordered_map in C++0x) has O(1) access but is unsorted.
Do you really need O(1) access? You have to use large datasets and do many lookups for O(log(n)) not being fast enough.
If O(log(n)) is enough, std::map provides everything you are asking for.
If you don't consider the sparse nature, you can take a look at the Boost Multi-Index library. For the sparse nature, you can take a look at the Boost Flyweight library, but I guess you'll have to join both approaches by yourself. Note that your requirements are often contradictory and hard to achieve. For instance, O(1) and order between the items is difficult to maintain efficiently.
Maps are generally implemented as trees and thus have logarithmic look up time, not O(1), but it sounds like you want a sorted associative container. Hash maps have O(1) best case, O(N) worst case, so perhaps that is what you mean, but they are not sorted, and I don't think they are part of the standard library yet.
In the C++ standard library, map, set, multimap, and multiset are sorted associative containers, but you have to give up the O(1) look up requirement.
According to Stroustrup, the [] operator for maps is O(log(n)). That is much better than the O(n) you'd get if you were to try such a thing with a list, but it is definitely not O(1). The only container that gives you that for the [] operator is vector.
That aside, you can already do all your listy stuff with maps. Iterators work fine on them. So if I were you, I'd stick with map.
having an order between the items
being able to traverse the list easily
Maps already do both. They are sorted, so you start at begin() and traverse until you hit end(). You can, of course, start at any map iterator; you may find map's find, lower_bound, and related methods helpful.
You can store data in a list and have a map to iterators of your list enabling you to find the actual list element itself. This kind of thing is something I often use for LRU containers, where I want a list because I need to move the accessed element to the end to make it the most recently accessed. You can use the splice function to do this, and since the 2003 standard it does not invalidate the iterator as long as you keep it in the same list.
How about this one: all dates are stored in std::list<Date>, but you look it up with helper structure stdext::hash_map<Date, std::list<Date>::iterator>. Once you have iterator for the list access to the next element is simple. In your STL implementation it could be std::tr1::unordered_map instead of stdext::hash_map, and there is boost::unordered_map as well.
You will never find a container that satisfies both O(log n) access and an ordered nature. The reason is that if a container is ordered then inherently it must support an arbitrary order. That's what an ordered nature means: you get to decide exactly where any element is positioned. So to find any element you have to guess where it is. It can be anywhere, because you can place it anywhere!
Note that an ordered sequence is not the same as a sorted sequence. A sorted nature means there is one particular ordering relation between any two elements. An ordered nature means there may be more than one ordering relation among the elements.