is there, in the c++ "Standard Library", any "Associative" (i.e. "Key-Value") Container/Data Structure, that has the ability, to preserve order, by order of insertion?
I have seen several topics on this, however, it seems, most before C++11.
Some suggest using "boost::multi_index", but, if at all possible, I would "rather" use standard containers/structures.
I see that C++11 has several, apparently, "unordered" associative containers :link.
Are any of these, by some way, "configurable", such that they are only sorted by insertion order?
Thanks!
C
No.
You are mixing linear access with random. Not very good bed fellows.
Just use both a vector/list (i.e. order of insertion) along with a map using an index into the former.
No; such capability was apparently sacrificed in the name of performance.
The order of equivalent items is required to be preserved across operations including rehashes, but there's no way to specify the original order. You could, in theory, use std::rotate or the like to permute the objects into the desired order after each insertion. Obviously impractical, but it proves the lack of capability is a little arbitrary.
Your best bet is to keep the subsequences in inner containers. You may use an iterator adaptor to iterate over such a "deep" container as if it were a single sequence. Such a utility can probably be found in Boost.
No.
In unordered maps too, there are not stored according to the order of insertion.
You can use vector to keep the track of the key!
Related
Is it safe to say that if I don't want duplicates in my container, and I don't care about element position as I only want to iterate through the container, then I should use an unordered_set instead of vector?
Is it safe to say that if I don't want duplicates in my container, and I don't care about element position as I only want to iterate through the container, then I should use an unordered_set instead of vector?
No, it is not. It depends on many factors. For example if you seldom add new elements but iterate over container quite often it would be preferable to use std::vector and maintain uniqueness manually. There also could be other factors affecting your decision. But normally yes you may prefer std::unordered_set as it simplifies your program.
Not entirely. unordered_sets are not required to be contiguous containers; in the case where you'd frequently want to read all numerous values contained in the set, you may prefer std::vector on time-critic application.
std::unordered_set:
Internally, the elements are not sorted in any particular order, but organized into buckets. Which bucket an element is placed into depends entirely on the hash of its value. This allows fast access to individual elements, since once a hash is computed, it refers to the exact bucket the element is placed into.
But in the general case, I'd say Yes.
I generally prefer vector or map. (or in your case, std::set).
Hash tables can be faster than maps/sets (red-black trees), but red-black trees have guaranteed performance 100% of the time. And logarithmic performance is REALLY fast! A hash table kan kill performance when it starts rehashing.
std::vector is the workhorse of the STL and should be your default choice. Vector is very straightforward, and is very cache-friendly
This article by Matt Austern is related to this topic and it is worth reading:
Why you shouldn't use set (and what you should use instead) by Matt Austern
This thread is trying to identify conditions under which unordered_set is preferable over vectors. Similarly, in the above article, the author clearly identifies four conditions, which all need to be satisfied in order to prefer set over a custom but simpler data structure called sorted_vector (last section: What is set good for?). It will be interesting to clearly state a set of conditions for preferring unordered_set over vector.
also, the last paragraph of the article summarizes a useful rule to keep in mind:
Every component in the standard C++ library is there because it's useful for some purpose, but sometimes that purpose is narrowly defined and rare. As a general rule you should always use the simplest data structure that meets your needs. The more complicated a data structure, the more likely that it's not as widely useful as it might seem.
Of course yes. If you do not want duplicates, you have to use a key-aware container, and since unordered_* totally win over their tree-based counterparts, this is pretty much your only choice.
I am wondering in which uses cases, we can be interested by having an ordered associative container.
In other terms, why use std::map and no std::unorderd_map
You use an ordered map when you need to be able to iterate over the keys in an ordered fashion.
As Mat pointed out, it depends on whether you need to access your elements in sorted order. Also, map has guaranteed, worst-case logarithmic access time (in the number of elements), whereas unordered map has average constant access time.
The main reason for using map instead of unordered_map is that map is in standard and for sure you have it. Some companies have policy not to use definitions from tr1.
But you ask when order is important - it is needed for operations like lower_bound, upper_bound, set_union, set_intersection etc.
If the comparison between unordered containers is needed, the complexity
might be an issue.
It seems to be difficult to implement operator==/operator!= between
unordered containers efficiently.
N3290 ยง23.2.5/11 says
the complexity of operator== ... is
proportional to ... N^2 in the worst
case, where N is a.size().
while other containers have linear complexity.
Unordered_map use more memory than map. If there is a constraint in memory then its recommended to use map. So the operations become slower when using unordered_map
EDIT
Quoting from the wiki article
It is similar to the map class in the C++ standard library but has different
constraints. As its name implies, unlike the map class, the elements of an
unordered_map are not ordered. This is due to the use of hashing to store objects.
unordered_map can still be iterated through like a regular map.
I've got a situation where I want to use an associative container, and I chose to use a std::unordered_map, because it's perfectly feasible that this container could be used to hold millions or more of elements. But now I also need to iterate in order. I considered having the value types link to each other in a list, but now I'm going to have issues with memory management.
Should I change container, say to a std::map? Or just iterate once through my unordered_map, insert into a vector, and sort, then iterate? It's pretty unlikely that I will need to iterate in an ordered fashion repeatedly.
Well, you know the O() of the various operations of the two alternatives you've picked. You should pick based on that and do a cost/benefit analysis based on where you need the performance to happen and which container does best for THAT.
Of course, I couldn't possibly know enough to do that analysis for you.
You could use Boost.MultiIndex, specifying the unordered (hashed) index as well as an ordered one, on the same underlying object collection.
Possible issues with this - there is no natural mapping from an existing associative container model, and it might be overkill if you don't need the second index all the time.
I'm looking for a C++ container that will enjoy both map container and list container benefits.
map container advantages I would like to maintain:
O(log(n)) access
operator[] ease of use
sparse nature
list container advantages I would like to maintain:
having an order between the items
being able to traverse the list easily UPDATE: by a sorting order based on the key or value
A simple example application would be to hold a list of certain valid dates (business dates, holidays, some other set of important dates...), once given a specific date, you could find it immediately "map style" and then find the next valid date "list style".
std::map is already a sorted container where you can iterate over the contained items in order. It only provides O(log(n)) access, though.
std::tr1::unordered_map (or std::unordered_map in C++0x) has O(1) access but is unsorted.
Do you really need O(1) access? You have to use large datasets and do many lookups for O(log(n)) not being fast enough.
If O(log(n)) is enough, std::map provides everything you are asking for.
If you don't consider the sparse nature, you can take a look at the Boost Multi-Index library. For the sparse nature, you can take a look at the Boost Flyweight library, but I guess you'll have to join both approaches by yourself. Note that your requirements are often contradictory and hard to achieve. For instance, O(1) and order between the items is difficult to maintain efficiently.
Maps are generally implemented as trees and thus have logarithmic look up time, not O(1), but it sounds like you want a sorted associative container. Hash maps have O(1) best case, O(N) worst case, so perhaps that is what you mean, but they are not sorted, and I don't think they are part of the standard library yet.
In the C++ standard library, map, set, multimap, and multiset are sorted associative containers, but you have to give up the O(1) look up requirement.
According to Stroustrup, the [] operator for maps is O(log(n)). That is much better than the O(n) you'd get if you were to try such a thing with a list, but it is definitely not O(1). The only container that gives you that for the [] operator is vector.
That aside, you can already do all your listy stuff with maps. Iterators work fine on them. So if I were you, I'd stick with map.
having an order between the items
being able to traverse the list easily
Maps already do both. They are sorted, so you start at begin() and traverse until you hit end(). You can, of course, start at any map iterator; you may find map's find, lower_bound, and related methods helpful.
You can store data in a list and have a map to iterators of your list enabling you to find the actual list element itself. This kind of thing is something I often use for LRU containers, where I want a list because I need to move the accessed element to the end to make it the most recently accessed. You can use the splice function to do this, and since the 2003 standard it does not invalidate the iterator as long as you keep it in the same list.
How about this one: all dates are stored in std::list<Date>, but you look it up with helper structure stdext::hash_map<Date, std::list<Date>::iterator>. Once you have iterator for the list access to the next element is simple. In your STL implementation it could be std::tr1::unordered_map instead of stdext::hash_map, and there is boost::unordered_map as well.
You will never find a container that satisfies both O(log n) access and an ordered nature. The reason is that if a container is ordered then inherently it must support an arbitrary order. That's what an ordered nature means: you get to decide exactly where any element is positioned. So to find any element you have to guess where it is. It can be anywhere, because you can place it anywhere!
Note that an ordered sequence is not the same as a sorted sequence. A sorted nature means there is one particular ordering relation between any two elements. An ordered nature means there may be more than one ordering relation among the elements.
I'm fairly new to the STL, so I was wondering whether there are any dynamically sortable containers? At the moment my current thinking is to use a vector in conjunction with the various sort algorithms, but I'm not sure whether there's a more appropriate selection given the (presumably) linear complexity of inserting entries into a sorted vector.
To clarify "dynamically", I am looking for a container that I can modify the sorting order at runtime - e.g. sort it in an ascending order, then later re-sort in a descending order.
You'll want to look at std::map
std::map<keyType, valueType>
The map is sorted based on the < operator provided for keyType.
Or
std::set<valueType>
Also sorted on the < operator of the template argument, but does not allow duplicate elements.
There's
std::multiset<valueType>
which does the same thing as std::set but allows identical elements.
I highly reccomend "The C++ Standard Library" by Josuttis for more information. It is the most comprehensive overview of the std library, very readable, and chock full of obscure and not-so-obscure information.
Also, as mentioned by 17 of 26, Effective Stl by Meyers is worth a read.
If you know you're going to be sorting on a single value ascending and descending, then set is your friend. Use a reverse iterator when you want to "sort" in the opposite direction.
If your objects are complex and you're going to be sorting in many different ways based on the member fields within the objects, then you're probably better off with using a vector and sort. Try to do your inserts all at once, and then call sort once. If that isn't feasible, then deque may be a better option than the vector for large collections of objects.
I think that if you're interested in that level of optimization, you had better be profiling your code using actual data. (Which is probably the best advice anyone here can give: it may not matter that you call sort after each insert if you're only doing it once in a blue moon.)
It sounds like you want a multi-index container. This allows you to create a container and tell that container the various ways you may want to traverse the items in it. The container then keeps multiple lists of the items, and those lists are updated on each insert/delete.
If you really want to re-sort the container, you can call the std::sort function on any std::deque, std::vector, or even a simple C-style array. That function takes an optional third argument to determine how to sort the contents.
The stl provides no such container. You can define your own, backed by either a set/multiset or a vector, but you are going to have to re-sort every time the sorting function changes by either calling sort (for a vector) or by creating a new collection (for set/multiset).
If you just want to change from increasing sort order to decreasing sort order, you can use the reverse iterator on your container by calling rbegin() and rend() instead of begin() and end(). Both vector and set/multiset are reversible containers, so this would work for either.
std::set is basically a sorted container.
You should definitely use a set/map. Like hazzen says, you get O(log n) insert/find. You won't get this with a sorted vector; you can get O(log n) find using binary search, but insertion is O(n) because inserting (or deleting) an item may cause all existing items in the vector to be shifted.
It's not that simple. In my experience insert/delete is used less often than find. Advantage of sorted vector is that it takes less memory and is more cache-friendly. If happen to have version that is compatible with STL maps (like the one I linked before) it's easy to switch back and forth and use optimal container for every situation.
in theory an associative container (set, multiset, map, multimap) should be your best solution.
In practice it depends by the average number of the elements you are putting in.
for less than 100 elements a vector is probably the best solution due to:
- avoiding continuous allocation-deallocation
- cache friendly due to data locality
these advantages probably will outperform nevertheless continuous sorting.
Obviously it also depends on how many insertion-deletation you have to do. Are you going to do per-frame insertion/deletation?
More generally: are you talking about a performance-critical application?
remember to not prematurely optimize...
The answer is as always it depends.
set and multiset are appropriate for keeping items sorted but are generally optimised for a balanced set of add, remove and fetch. If you have manly lookup operations then a sorted vector may be more appropriate and then use lower_bound to lookup the element.
Also your second requirement of resorting in a different order at runtime will actually mean that set and multiset are not appropriate because the predicate cannot be modified a run time.
I would therefore recommend a sorted vector. But remember to pass the same predicate to lower_bound that you passed to the previous sort as the results will be undefined and most likely wrong if you pass the wrong predicate.
Set and multiset use an underlying binary tree; you can define the <= operator for your own use. These containers keep themselves sorted, so may not be the best choice if you are switching sort parameters. Vectors and lists are probably best if you are going to be resorting quite a bit; in general list has it's own sort (usually a mergesort) and you can use the stl binary search algorithm on vectors. If inserts will dominate, list outperforms vector.
STL maps and sets are both sorted containers.
I second Doug T's book recommendation - the Josuttis STL book is the best I've ever seen as both a learning and reference book.
Effective STL is also an excellent book for learning the inner details of STL and what you should and shouldn't do.
For "STL compatible" sorted vector see A. Alexandrescu's AssocVector from Loki.