Simple and efficient container in C++ with characteristics of map and list containers - c++

I'm looking for a C++ container that will enjoy both map container and list container benefits.
map container advantages I would like to maintain:
O(log(n)) access
operator[] ease of use
sparse nature
list container advantages I would like to maintain:
having an order between the items
being able to traverse the list easily UPDATE: by a sorting order based on the key or value
A simple example application would be to hold a list of certain valid dates (business dates, holidays, some other set of important dates...), once given a specific date, you could find it immediately "map style" and then find the next valid date "list style".

std::map is already a sorted container where you can iterate over the contained items in order. It only provides O(log(n)) access, though.
std::tr1::unordered_map (or std::unordered_map in C++0x) has O(1) access but is unsorted.
Do you really need O(1) access? You have to use large datasets and do many lookups for O(log(n)) not being fast enough.
If O(log(n)) is enough, std::map provides everything you are asking for.

If you don't consider the sparse nature, you can take a look at the Boost Multi-Index library. For the sparse nature, you can take a look at the Boost Flyweight library, but I guess you'll have to join both approaches by yourself. Note that your requirements are often contradictory and hard to achieve. For instance, O(1) and order between the items is difficult to maintain efficiently.

Maps are generally implemented as trees and thus have logarithmic look up time, not O(1), but it sounds like you want a sorted associative container. Hash maps have O(1) best case, O(N) worst case, so perhaps that is what you mean, but they are not sorted, and I don't think they are part of the standard library yet.
In the C++ standard library, map, set, multimap, and multiset are sorted associative containers, but you have to give up the O(1) look up requirement.

According to Stroustrup, the [] operator for maps is O(log(n)). That is much better than the O(n) you'd get if you were to try such a thing with a list, but it is definitely not O(1). The only container that gives you that for the [] operator is vector.
That aside, you can already do all your listy stuff with maps. Iterators work fine on them. So if I were you, I'd stick with map.

having an order between the items
being able to traverse the list easily
Maps already do both. They are sorted, so you start at begin() and traverse until you hit end(). You can, of course, start at any map iterator; you may find map's find, lower_bound, and related methods helpful.

You can store data in a list and have a map to iterators of your list enabling you to find the actual list element itself. This kind of thing is something I often use for LRU containers, where I want a list because I need to move the accessed element to the end to make it the most recently accessed. You can use the splice function to do this, and since the 2003 standard it does not invalidate the iterator as long as you keep it in the same list.

How about this one: all dates are stored in std::list<Date>, but you look it up with helper structure stdext::hash_map<Date, std::list<Date>::iterator>. Once you have iterator for the list access to the next element is simple. In your STL implementation it could be std::tr1::unordered_map instead of stdext::hash_map, and there is boost::unordered_map as well.

You will never find a container that satisfies both O(log n) access and an ordered nature. The reason is that if a container is ordered then inherently it must support an arbitrary order. That's what an ordered nature means: you get to decide exactly where any element is positioned. So to find any element you have to guess where it is. It can be anywhere, because you can place it anywhere!
Note that an ordered sequence is not the same as a sorted sequence. A sorted nature means there is one particular ordering relation between any two elements. An ordered nature means there may be more than one ordering relation among the elements.

Related

Difference between a list and multiset in C++

In C++:
A list is a collection that can contain non-unique values in sequence
A multiset is a collection that can contain non-unique elements in sequence
What then is the specific difference between the two? Why would I use on over the other?
I've tried finding this information online but most references (e.g. cplusplus.com) talk about the two containers in different ways, such that the difference is not apparent.
From multiset:
std::multiset is an associative container that contains a sorted
set of objects
Search, insertion, and removal operations have logarithmic complexity.
From list:
std::list is a container that supports constant time insertion and removal of elements from anywhere in the container
Fast random access is not supported
Thus, if you want to have a faster search, use multiset.
For faster insertion and removal: use list.
The biggest difference is std::list is a linked list while std::multiset is a tree structure (typically an RB Tree). This means element access in a std::list has O(N) access while a std::multiset has O(logN).
This also means iterating a std::multiset from begin() to end() will give you sorted data while iterating a std::list will give you the order the data was inserted.
Its not that straightforward, but to be short:
If you need to query search by value many times, opt for std::multiset.
Otherwise std::list.
There are quite a few containers in standard library, they have different properties and implication on timing for access, insert and remove elements. You should choose one that is most applicable in particular case from whole group. Your definition of std::multiset and std::list is artificial and does not make any sense. If chair and horse both have legs it does not mean they are similar.

iterate ordered versus unordered containers

I want to know which data-structures are more efficient for iterating through their elements between std::set, std::map and std::unordered_set, std::unordered_map.
I searched through SO and I found this question. The answers either propose to copy the elements in a std::vector or to use Boost.Container, which IMHO don't answer my question.
My purpose is to keep in a container a big number of unique elements, that most of the time I want to iterate through them. Insertions and extractions are more rare. I want to avoid std::vector in combination with std::unique.
Lets consider set vs unordered_set.
The main difference here is the 'nature' of the iteration, that is the traversal of the set will give you the elements in order while traversing a range in an unordered set will give you a bunch of values in no particular order.
Suppose you want to traverse a range [it1, it2]. If we exclude the lookup time that's needed to find elements it1 and it2 there can be no direct mapping from one case to another since the elements in between are not guarrandeed to be the same even if you've used the same elements to construct the container.
There are cases however where something like this has meaning when e.g. you want to traverse a fixed number of elements (regardless of what they are) or when you need to traverse the whole container. In such cases you need to consider implementation mechanics :
Sets are usually implemented like Red–black trees (a form of binary search trees). Like all binary search trees allow efficient in-order traversal (LRR: left root right) of their elements. That is to traverse you pay the cost of pointer chasing (just like traversing a list).
Unordered sets on the other hand are hash tables and to my knowledge the STL implementation uses hashing with chaining. That means (in a very very high level) that what's used for the structure is a (contiguous) buffer where each element is the head of a chain (list) that contains the elements. The way the elements are layed out across those chains (buckets) and across the buffer will affect the traversal time, however you'll be chasing pointers once again jumping through differents lists as well this time. I don't think it'll vary significantly from the tree case but won't be any better for sure.
In any case micro tuning and benchmarking will give you the answer for your particular application.
The difference does not lie between the ordering or lack of one but in the backing container. If it's a contiguous memory it should be fast to iterate over, due to simple implementation of iterator and cache friendliness.
Unordered containers are usually stored as a vector of vectors (or a similar thing), while ordered containers are implemented using trees, but it is left for implementation after all. This would suggest that iterating over unordered version should be waster. However this is left for implementation after all, and I saw implementations (which bent rules a little to be fair) with different behaviour.
Generally speaking, container performance is quite a complex topic and usually has to be tested in actual application to get reliable answer. There is plenty on implemention-defined stuff that might affect the performance. I'd go with hash_set if I had to go in blind. Copying into a vector might also turn out a good option.
EDIT: As #TonyD said in it's comment, there is a rule, that disallows invalidating iterators during addition of element when the max_load_factor() is not exceeded, this practically rules out backing containers which are contiguous in memory.
Thus, copying everything into a vector seems like even more reasonable option. If you need to remove duplicates, a feasible option might be to use http://en.cppreference.com/w/cpp/algorithm/sort and have dupes easily ignored. I have heard that using vector and sort to have a sorted array (or vector) is quite often a used option in case of need for a container that needs to be sorter and is being iterated over more often than modified.
iterate from fastest to slowest should be : set > map > unordered_set > unordered_map;
set is a little lighter than map, and they are ordered with binary tree rule, so should be faster than unordered_ containers.

what is the uses cases in which order in key value associative container is useful

I am wondering in which uses cases, we can be interested by having an ordered associative container.
In other terms, why use std::map and no std::unorderd_map
You use an ordered map when you need to be able to iterate over the keys in an ordered fashion.
As Mat pointed out, it depends on whether you need to access your elements in sorted order. Also, map has guaranteed, worst-case logarithmic access time (in the number of elements), whereas unordered map has average constant access time.
The main reason for using map instead of unordered_map is that map is in standard and for sure you have it. Some companies have policy not to use definitions from tr1.
But you ask when order is important - it is needed for operations like lower_bound, upper_bound, set_union, set_intersection etc.
If the comparison between unordered containers is needed, the complexity
might be an issue.
It seems to be difficult to implement operator==/operator!= between
unordered containers efficiently.
N3290 §23.2.5/11 says
the complexity of operator== ... is
proportional to ... N^2 in the worst
case, where N is a.size().
while other containers have linear complexity.
Unordered_map use more memory than map. If there is a constraint in memory then its recommended to use map. So the operations become slower when using unordered_map
EDIT
Quoting from the wiki article
It is similar to the map class in the C++ standard library but has different
constraints. As its name implies, unlike the map class, the elements of an
unordered_map are not ordered. This is due to the use of hashing to store objects.
unordered_map can still be iterated through like a regular map.

c++ std::map question about iterator order

I am a C++ newbie trying to use a map so I can get constant time lookups for the find() method.
The problem is that when I use an iterator to go over the elements in the map, elements do not appear in the same order that they were placed in the map.
Without maintaining another data structure, is there a way to achieve in order iteration while still retaining the constant time lookup ability?
Please let me know.
Thanks,
jbu
edit: thanks for letting me know map::find() isn't constant time.
Without maintaining another data structure, is there a way to achieve in order iteration while still retaining the constant time lookup ability?
No, that is not possible. In order to get an efficient lookup, the container will need to order the contents in a way that makes the efficient lookup possible. For std::map, that will be some type of sorted order; for std::unordered_map, that will be an order based on the hash of the key.
In either case, the order will be different then the order in which they were added.
First of all, std::map guarantees O(log n) lookup time. You might be thinking about std::tr1::unordered_map. But that by definitions sacrifices any ordering to get the constant-time lookup.
You'd have to spend some time on it, but I think you can bash boost::multi_index_container to do what you want.
What about using a vector for the keys in the original order and a map for the fast access to the data?
Something like this:
vector<string> keys;
map<string, Data*> values;
// obtaining values
...
keys.push_back("key-01");
values["key-01"] = new Data(...);
keys.push_back("key-02");
values["key-02"] = new Data(...);
...
// iterating over values in original order
vector<string>::const_iterator it;
for (it = keys.begin(); it != keys.end(); it++) {
Data* value = values[*it];
}
I'm going to actually... go backward.
If you want to preserve the order in which elements were inserted, or in general to control the order, you need a sequence that you will control:
std::vector (yes there are others, but by default use this one)
You can use the std::find algorithm (from <algorithm>) to search for a particular value in the vector: std::find(vec.begin(), vec.end(), value);.
Oh yes, it has linear complexity O(N), but for small collections it should not matter.
Otherwise, you can start looking up at Boost.MultiIndex as already suggested, but for a beginner you'll probably struggle a bit.
So, shirk the complexity issue for the moment, and come up with something that work. You'll worry about speed when you are more familiar with the language.
Items are ordered by operator< (by default) when applied to the key.
PS. std::map does not gurantee constant time look up.
It gurantees max complexity of O(ln(n))
First off, std::map isn't constant-time lookup. It's O(log n). Just thought I should set that straight.
Anyway, you have to specify your own comparison function if you want to use a different ordering. There isn't a built-in comparison function that can order by insertion time, but, if your object holds a timestamp field, you can arrange to set the timestamp at the time of insertion, and using a by-timestamp comparison.
Map is not meant for placing elements in some order - use vector for that.
If you want to find something in map you should "search" by the key using [the operator
If you need both: iteration and search by key see this topic
Yes you can create such a data structure, but not using the standard library... the reason is that standard containers can be nested but cannot be mixed.
There is no problem implementing for example a map data structure where all the nodes are also in a doubly linked list in order of insertion, or for example a map where all nodes are in an array. It seems to me that one of these structures could be what you're looking for (depending which operation you prefer to be fast), but neither of them is trivial to build using standard containers because every standard container (vector, list, set, ...) wants to be the one and only way to access contained elements.
For example I found useful in many cases to have nodes that were at the same time in multiple doubly-linked lists, but you cannot do that using std::list.

Dynamically sorted STL containers

I'm fairly new to the STL, so I was wondering whether there are any dynamically sortable containers? At the moment my current thinking is to use a vector in conjunction with the various sort algorithms, but I'm not sure whether there's a more appropriate selection given the (presumably) linear complexity of inserting entries into a sorted vector.
To clarify "dynamically", I am looking for a container that I can modify the sorting order at runtime - e.g. sort it in an ascending order, then later re-sort in a descending order.
You'll want to look at std::map
std::map<keyType, valueType>
The map is sorted based on the < operator provided for keyType.
Or
std::set<valueType>
Also sorted on the < operator of the template argument, but does not allow duplicate elements.
There's
std::multiset<valueType>
which does the same thing as std::set but allows identical elements.
I highly reccomend "The C++ Standard Library" by Josuttis for more information. It is the most comprehensive overview of the std library, very readable, and chock full of obscure and not-so-obscure information.
Also, as mentioned by 17 of 26, Effective Stl by Meyers is worth a read.
If you know you're going to be sorting on a single value ascending and descending, then set is your friend. Use a reverse iterator when you want to "sort" in the opposite direction.
If your objects are complex and you're going to be sorting in many different ways based on the member fields within the objects, then you're probably better off with using a vector and sort. Try to do your inserts all at once, and then call sort once. If that isn't feasible, then deque may be a better option than the vector for large collections of objects.
I think that if you're interested in that level of optimization, you had better be profiling your code using actual data. (Which is probably the best advice anyone here can give: it may not matter that you call sort after each insert if you're only doing it once in a blue moon.)
It sounds like you want a multi-index container. This allows you to create a container and tell that container the various ways you may want to traverse the items in it. The container then keeps multiple lists of the items, and those lists are updated on each insert/delete.
If you really want to re-sort the container, you can call the std::sort function on any std::deque, std::vector, or even a simple C-style array. That function takes an optional third argument to determine how to sort the contents.
The stl provides no such container. You can define your own, backed by either a set/multiset or a vector, but you are going to have to re-sort every time the sorting function changes by either calling sort (for a vector) or by creating a new collection (for set/multiset).
If you just want to change from increasing sort order to decreasing sort order, you can use the reverse iterator on your container by calling rbegin() and rend() instead of begin() and end(). Both vector and set/multiset are reversible containers, so this would work for either.
std::set is basically a sorted container.
You should definitely use a set/map. Like hazzen says, you get O(log n) insert/find. You won't get this with a sorted vector; you can get O(log n) find using binary search, but insertion is O(n) because inserting (or deleting) an item may cause all existing items in the vector to be shifted.
It's not that simple. In my experience insert/delete is used less often than find. Advantage of sorted vector is that it takes less memory and is more cache-friendly. If happen to have version that is compatible with STL maps (like the one I linked before) it's easy to switch back and forth and use optimal container for every situation.
in theory an associative container (set, multiset, map, multimap) should be your best solution.
In practice it depends by the average number of the elements you are putting in.
for less than 100 elements a vector is probably the best solution due to:
- avoiding continuous allocation-deallocation
- cache friendly due to data locality
these advantages probably will outperform nevertheless continuous sorting.
Obviously it also depends on how many insertion-deletation you have to do. Are you going to do per-frame insertion/deletation?
More generally: are you talking about a performance-critical application?
remember to not prematurely optimize...
The answer is as always it depends.
set and multiset are appropriate for keeping items sorted but are generally optimised for a balanced set of add, remove and fetch. If you have manly lookup operations then a sorted vector may be more appropriate and then use lower_bound to lookup the element.
Also your second requirement of resorting in a different order at runtime will actually mean that set and multiset are not appropriate because the predicate cannot be modified a run time.
I would therefore recommend a sorted vector. But remember to pass the same predicate to lower_bound that you passed to the previous sort as the results will be undefined and most likely wrong if you pass the wrong predicate.
Set and multiset use an underlying binary tree; you can define the <= operator for your own use. These containers keep themselves sorted, so may not be the best choice if you are switching sort parameters. Vectors and lists are probably best if you are going to be resorting quite a bit; in general list has it's own sort (usually a mergesort) and you can use the stl binary search algorithm on vectors. If inserts will dominate, list outperforms vector.
STL maps and sets are both sorted containers.
I second Doug T's book recommendation - the Josuttis STL book is the best I've ever seen as both a learning and reference book.
Effective STL is also an excellent book for learning the inner details of STL and what you should and shouldn't do.
For "STL compatible" sorted vector see A. Alexandrescu's AssocVector from Loki.