When are lists algorithmically faster than maps? - list

I think this is a valid question because if you use a Map with integers as keys you have a similar structure as a list. You can read elements in order with a for loop:
for i in 1,..., map.length():
if i in map:
doSomething(map[i])
Besides, inserting in list and reading an element is O(n) while inserting and reading in a map is O(1).
In what case are lists faster than maps?
If we are concerned about how fast the code runs, in what cases are the lists not strictly worse than maps? Should we always use maps in that case?
Wouldn't a list implemented with a map be a better list?

I think that fundamentally, a list is a particular implementation of a hash map. For this reason maps can only be better than lists. In theoretical complexity terms.

Related

Fast string search?

I have a vector of strings and have to check if each element in vector is present in a given list of 5000 words.
Besides the mundane method of two nested loops, is there any faster way to do this in C++?
You should put the list of strings into an std::set. It's a data structure optimized for searching. Finding if a given element is in the set or not is an operation which is much faster than iterating all entries.
When you are already using C++11, you can also use the std::unordered_set which is even faster for lookup, because it's implemented as a hash table.
Should this be for school/university: Be prepared to explain how these data structures manage to be faster. When your instructor asks you to explain why you used them, "some guys on the internet told me" is unlikely to earn you a sticker in the class book.
You could put the list of words in an std::unordered_set. Then, for each element in the vector, you just have to test if it is in the unordered_set in O(1). You would have an expected complexity of O(n) (look at the comment to see why it is only expected).
You could sort the vector, then you can solve this with one "loop" (taken that your dictionary is sorted too) which means O(n) not counting in the cost of the sort.
So you have a vector of strings, with each string having one or more words, and you have a vector that's a dictionary, and you're supposed to determine which words in the vector of strings are also in the dictionary? The vector of strings is an annoyance, since you need to look at each word. I'd start by creating a new vector, splitting each string into words, and pushing each word into the new vector. Then sort the new vector and run it through the std::unique algorithm to eliminate duplicates. Then sort the dictionary. Then run both ranges through std::set_intersection to write the result.

A vector or multimap dilemma

I have a dilemma of whether to have a multimap <int key, int value> or maintain a vector containing a vector of all values corresponding to int key.
I'm interested in which performs faster when looking up the values for a certain int key.
If you want a multimap and not just a map, the alternative will probably be a vector< list<int> > or something like that (a multimap, being generally implemented as an RB tree that allows multiple equivalent keys, is somewhat akin to a map with a list element type).
In general, a vector lookup is faster: it's O(1) for the array vs O(log n) for the map (in both case I'm not counting the search into the list/vector/set/whatever is used for the "multi" part). But, to use the vector, you have to make it as big as the biggest int key you want to use; if your keys are sequential this is not a problem, but if your index is sparse the multimap can be a better choice.
On the other hand, if you don't need ordered traversal, unordered_multimap (which is actually a hash table) could be the best of both worlds: you get array-like O(1) lookup without having to keep an enormous empty array.
Forget which is "faster". You can profile it later, but don't obsess over this. Far more important is that one approach gives you sparse storage, and the other does not -- focus on this and decide which is the most appropriate for your problem.
I would say if your keys are sequential go with the vector, but if there are big holes in your keys then the map will be better (as you won't have to store "empty" records as in your vector), plus it will make it easier to count how many records you have etc.
Performance wise vectors are based on arrays so lookups are generally faster (as maps have to go through a few pieces of data to do a lookup).
I would recommend map<int, vector<int>>
Since once you have done the search in the map you have a vector with all the values.
Otherwise you solution will require a new search of each value
I guess you are doing premature optimization. It's not good because you should optimize only after everything is working with use of profilers. Don't waste time and use a specialized container for your needs.

When should I use unordered_map and not std::map

I'm wondering in which case I should use unordered_map instead of std::map.
I have to use unorderd_map each time I don't pay attention of order of element in the map ?
map
Usually implemented using red-black tree.
Elements are sorted.
Relatively small memory usage (doesn't need additional memory for the hash-table).
Relatively fast lookup: O(log N).
unordered_map
Usually implemented using hash-table.
Elements are not sorted.
Requires additional memory to keep the hash-table.
Fast lookup O(1), but constant time depends on the hash-function which could be relatively slow. Also keep in mind that you could meet with the Birthday problem.
Compare hash table (undorded_map) vs. binary tree (map), remember your CS classes and adjust accordingly.
The hash map usually has O(1) on lookups, the map has O(logN). It can be a real difference if you need many fast lookups.
The map keeps the order of the elements, which is also useful sometimes.
map allows to iterate over the elements in a sorted way, but unordered_map does not.
So use the std::map when you need to iterate across items in the map in sorted order.
The reason you'd choose one over the other is performance. Otherwise they'd only have created std::map, since it does more for you :)
Use std::map when you need elements to automatically be sorted. Use std::unordered_map other times.
See the SGI STL Complexity Specifications rationale.
unordered_map is O(1) but quite high constant overhead for lookup, insertion, and deletion. map is O(log(n)), so pick the complexity that best suits your needs. In addition, not all keys can be placed into both kinds of map.

Simple and efficient container in C++ with characteristics of map and list containers

I'm looking for a C++ container that will enjoy both map container and list container benefits.
map container advantages I would like to maintain:
O(log(n)) access
operator[] ease of use
sparse nature
list container advantages I would like to maintain:
having an order between the items
being able to traverse the list easily UPDATE: by a sorting order based on the key or value
A simple example application would be to hold a list of certain valid dates (business dates, holidays, some other set of important dates...), once given a specific date, you could find it immediately "map style" and then find the next valid date "list style".
std::map is already a sorted container where you can iterate over the contained items in order. It only provides O(log(n)) access, though.
std::tr1::unordered_map (or std::unordered_map in C++0x) has O(1) access but is unsorted.
Do you really need O(1) access? You have to use large datasets and do many lookups for O(log(n)) not being fast enough.
If O(log(n)) is enough, std::map provides everything you are asking for.
If you don't consider the sparse nature, you can take a look at the Boost Multi-Index library. For the sparse nature, you can take a look at the Boost Flyweight library, but I guess you'll have to join both approaches by yourself. Note that your requirements are often contradictory and hard to achieve. For instance, O(1) and order between the items is difficult to maintain efficiently.
Maps are generally implemented as trees and thus have logarithmic look up time, not O(1), but it sounds like you want a sorted associative container. Hash maps have O(1) best case, O(N) worst case, so perhaps that is what you mean, but they are not sorted, and I don't think they are part of the standard library yet.
In the C++ standard library, map, set, multimap, and multiset are sorted associative containers, but you have to give up the O(1) look up requirement.
According to Stroustrup, the [] operator for maps is O(log(n)). That is much better than the O(n) you'd get if you were to try such a thing with a list, but it is definitely not O(1). The only container that gives you that for the [] operator is vector.
That aside, you can already do all your listy stuff with maps. Iterators work fine on them. So if I were you, I'd stick with map.
having an order between the items
being able to traverse the list easily
Maps already do both. They are sorted, so you start at begin() and traverse until you hit end(). You can, of course, start at any map iterator; you may find map's find, lower_bound, and related methods helpful.
You can store data in a list and have a map to iterators of your list enabling you to find the actual list element itself. This kind of thing is something I often use for LRU containers, where I want a list because I need to move the accessed element to the end to make it the most recently accessed. You can use the splice function to do this, and since the 2003 standard it does not invalidate the iterator as long as you keep it in the same list.
How about this one: all dates are stored in std::list<Date>, but you look it up with helper structure stdext::hash_map<Date, std::list<Date>::iterator>. Once you have iterator for the list access to the next element is simple. In your STL implementation it could be std::tr1::unordered_map instead of stdext::hash_map, and there is boost::unordered_map as well.
You will never find a container that satisfies both O(log n) access and an ordered nature. The reason is that if a container is ordered then inherently it must support an arbitrary order. That's what an ordered nature means: you get to decide exactly where any element is positioned. So to find any element you have to guess where it is. It can be anywhere, because you can place it anywhere!
Note that an ordered sequence is not the same as a sorted sequence. A sorted nature means there is one particular ordering relation between any two elements. An ordered nature means there may be more than one ordering relation among the elements.

std::list or std::multimap

Hey, I right now have a list of a struct that I made, I sort this list everytime I add a new object, using the std::list sort method.
I want to know what would be faster, using a std::multimap for this or std::list,
since I'm iterating the whole list every frame (I am making a game).
I would like to hear your opinion, for what should I use for this incident.
std::multimap will probably be faster, as it is O(log n) per insertion, whereas an insert and sort of the list is O(n log n).
Depending on your usage pattern, you might be better off with sorted vectors. If you insert a whole bunch of items at once and then do a bunch of reads -- i.e. reads and writes aren't interleaved -- then you'll have better performance with vector, std::sort, and std::binary_search.
You might consider using the lower_bound algorithm to find where to insert into your list. http://stdcxx.apache.org/doc/stdlibref/lower-bound.html
Edit: In light of Neil's comment, note that this will work with any sequence container (vector, deque, etc.)
If you do not need Key/Value pairs std::set or std::multiset is probably better than using std::multimap.
Reference for std::set:
http://www.cplusplus.com/reference/stl/set/
Reference for std::multiset:
http://www.cplusplus.com/reference/stl/multiset/
Edit: (seems like it was unclear before)
It is in general better to use a container like std::(multi)set or std:(multi)map than using std::list and sorting it afterwards everytime an element is inserted because std::list does not perform very good in inserting elements in the middle of the container.
Generally speaking, iterating over a container is likely to take about as much time as iterating over another, so if you keep adding to a container and then iterating over it, it's mainly a question of picking a container that avoids constantly having to reallocate memory and inserts the way you want quickly.
Both list and multimap will avoid having to reallocate themselves simply from adding an element (like you could get with a vector), so it's primarily a question of how long it takes to insert. Adding to the end of a list will be O(1) while adding to a multimap will be O(log n). However, the multimap will insert the elements in sorted order, while if you want to have the list be sorted, you're going to have to either sort the list in O(n log n) or insert the element in a sorted manner with something like lower_bound which would be O(n). In either case, it will be far worse (in the worst case at least) to use the list.
Generally, if you're maintaining a container in sorted order and continually adding to it rather than creating it and sorting it once, sets and maps are more efficient since they're designed to be sorted. Of course, as always, if you really care about performance, profiling your specific application and seeing which works better is what you need to do. However, in this case, I'd say that it's almost a guarantee that multimap will be faster (especially if you have very many elements at all).