Which STL structure to use? - c++

I've need a solution, which will store a non unique key - values pairs. I don't want to repeat keys (space efficiency) and I want to focus on lookup speed (efficiency of inserting a new data is less important). I will use std::multimap here. But I will have to lookup for keys, which meets some range criteria.
The most complex example:
Key is a string, values are not important.
I'd like to find out all values, which keys starts with "Lol".
Or I'd like to find out all values, which keys "are between" "bar" and "foo".
Can I do it with a multimap? My second thought is to use sorted vector, which will point to vectors of values. Something like that:
std::vector<std::string, std::vector<T>> sorted_vec;
Then I can easily meet search criteria. But I really care about a performance of lookups. Is it a right approach?

Yes, you can use std::multimap.
And to complete your range bases query, you can use the two algorithms std::lower_bound and std::upper_bound in the header file algorithm.

Or I'd like to find out all values, which keys "are between" "bar" and "foo".
If you have Boost, I would recommend to use boost::flat_map (it just header-only library) - it maintains sorted vector inside. Generally, it has better lookup than std::map due to better compactness/locality.
Otherwise use
vector<pair<string,value>>
or
vector<pair<string,vector<value>>> //(for re-use of keys)
// instead of pair, you may use just struct if you concerned with lexicographical compare
plus std::sort
plus std::equal_range/lower_bound

You should use a Patricia trie.
There's one in libstdc++, though it's nonstandard.
If you're going to stick with standard containers, I think your sorted vector solution is best.

Related

Underlying storage and functionality of unordered_maps vs unordered_multimaps in C++?

I'm having a hard time wrapping my head around unordered_maps and unordered_multimaps because my test code isn't producing what I've been told to expect.
std::unordered_map<string, int> names;
names.insert(std::make_pair("Peter", 4));
names.insert(std::make_pair("George", 4));
names.insert(std::make_pair("George", 4));
When I iterate through this list, I get one instance of George first, then Peter.
1) It's my understanding unordered_maps do not allow multiple keys to map to one value, and that multimaps due. Is this true?
2) Why can Peter and George coexist at a value of 4? What is happening to the second George? And for that matter, why is George appearing first when I iterate from begin() to end() if this is unordered?
3) What is the underlying representation of an unordered map vs. unordered multimap?
4) Is there a way to insert keys into either map without providing a value? E.g. have the compiler create its own hash function that I don't need to worry about when I retrieve keys and look for collisions?
I'll make it short:
No. Multi... refers to keys. A (non-multi)map can't have multiple equivalent keys with differeny values, ie. per key there is at most one value. A multi map can. The same holds for the unordered versions.
Peter != George, which is why they have different key and may very well have the same value.
A hashmap.
Use sets.
In your example the second insertion for George using a (non-multi) is skipped as the same key was previously inserted.
You want to use unordered_multimap to have several keys that are the same.
Since this is unordered you can't really hope to have any particular order, because it depends on the hash function.
If you want order in which you insert things, you need to use std::vector. Even normal maps, which are supposed to be ordered imply the comparison order, and not the order in which you insert things, for example string "AB" comes before "BB", because "A" is less than "B".
To insert without providing a value you need a set, and not a map.
The underlying structure of "unordered_" things is hashtable.

A vector or multimap dilemma

I have a dilemma of whether to have a multimap <int key, int value> or maintain a vector containing a vector of all values corresponding to int key.
I'm interested in which performs faster when looking up the values for a certain int key.
If you want a multimap and not just a map, the alternative will probably be a vector< list<int> > or something like that (a multimap, being generally implemented as an RB tree that allows multiple equivalent keys, is somewhat akin to a map with a list element type).
In general, a vector lookup is faster: it's O(1) for the array vs O(log n) for the map (in both case I'm not counting the search into the list/vector/set/whatever is used for the "multi" part). But, to use the vector, you have to make it as big as the biggest int key you want to use; if your keys are sequential this is not a problem, but if your index is sparse the multimap can be a better choice.
On the other hand, if you don't need ordered traversal, unordered_multimap (which is actually a hash table) could be the best of both worlds: you get array-like O(1) lookup without having to keep an enormous empty array.
Forget which is "faster". You can profile it later, but don't obsess over this. Far more important is that one approach gives you sparse storage, and the other does not -- focus on this and decide which is the most appropriate for your problem.
I would say if your keys are sequential go with the vector, but if there are big holes in your keys then the map will be better (as you won't have to store "empty" records as in your vector), plus it will make it easier to count how many records you have etc.
Performance wise vectors are based on arrays so lookups are generally faster (as maps have to go through a few pieces of data to do a lookup).
I would recommend map<int, vector<int>>
Since once you have done the search in the map you have a vector with all the values.
Otherwise you solution will require a new search of each value
I guess you are doing premature optimization. It's not good because you should optimize only after everything is working with use of profilers. Don't waste time and use a specialized container for your needs.

Choosing between std::map and std::unordered_map [duplicate]

This question already has answers here:
Is there any advantage of using map over unordered_map in case of trivial keys?
(15 answers)
Closed 4 years ago.
Now that std has a real hash map in unordered_map, why (or when) would I still want to use the good old map over unordered_map on systems where it actually exists? Are there any obvious situations that I cannot immediately see?
As already mentioned, map allows to iterate over the elements in a sorted way, but unordered_map does not. This is very important in many situations, for example displaying a collection (e.g. address book). This also manifests in other indirect ways like: (1) Start iterating from the iterator returned by find(), or (2) existence of member functions like lower_bound().
Also, I think there is some difference in the worst case search complexity.
For map, it is O( lg N )
For unordered_map, it is O( N ) [This may happen when the hash function is not good leading to too many hash collisions.]
The same is applicable for worst case deletion complexity.
In addition to the answers above you should also note that just because unordered_map is constant speed (O(1)) doesn't mean that it's faster than map (of order log(N)). The constant may be bigger than log(N) especially since N is limited by 232 (or 264).
So in addition to the other answers (map maintains order and hash functions may be difficult) it may be that map is more performant.
For example in a program I ran for a blog post I saw that for VS10 std::unordered_map was slower than std::map (although boost::unordered_map was faster than both).
Note 3rd through 5th bars.
This is due to Google's Chandler Carruth in his CppCon 2014 lecture
std::map is (considered by many to be) not useful for performance-oriented work: If you want O(1)-amortized access, use a proper associative array (or for lack of one, std::unorderded_map); if you want sorted sequential access, use something based on a vector.
Also, std::map is a balanced tree; and you have to traverse it, or re-balance it, incredibly often. These are cache-killer and cache-apocalypse operations respectively... so just say NO to std::map.
You might be interested in this SO question on efficient hash map implementations.
(PS - std::unordered_map is cache-unfriendly because it uses linked lists as buckets.)
I think it's obvious that you'd use the std::map you need to iterate across items in the map in sorted order.
You might also use it when you'd prefer to write a comparison operator (which is intuitive) instead of a hash function (which is generally very unintuitive).
Say you have very large keys, perhaps large strings. To create a hash value for a large string you need to go through the whole string from beginning to end. It will take at least linear time to the length of the key. However, when you only search a binary tree using the > operator of the key each string comparison can return when the first mismatch is found. This is typically very early for large strings.
This reasoning can be applied to the find function of std::unordered_map and std::map. If the nature of the key is such that it takes longer to produce a hash (in the case of std::unordered_map) than it takes to find the location of an element using binary search (in the case of std::map), it should be faster to lookup a key in the std::map. It's quite easy to think of scenarios where this would be the case, but they would be quite rare in practice i believe.

Hash map, string compares, and std::map?

First off, I would like to make a few points I believe to be true. Please can these be verified?
A hash map stores strings by
converting them into an integer
somehow.
std::map is not a hash map, and if I'm using strings, I should consider using a hash map for memory issues?
String compares are not good to rely on.
If std::map is not a hash map and I should not be relying on string compares (basically, I have a map with strings as keys...I was told to look up using hash maps instead?), is there a hash map in the C++ STL? If not, how about Boost?
Secondly, Is a hash map worth it for [originally] an std::map< std::string, non-POD GameState >?
I think my point is getting across...I plan to have a store of different game states I can look up and register into a factory.
If any more info is needed, please ask.
Thanks for your time.
I don't believe most of your points are correct.
there is no hash map in the current standard. C++0x introduces unordered_map, who's implementation will be a hash table and your compiler probably already supports it.
std::map is implemented as a balanced tree, not a hash table. There are no "memory issues" when using either map type with strings, either as keys or data.
strings are not stored as numbers in either case - an unordered_map will use a hashing function to derive a numeric key from the string, but this is not stored.
my experience is that unordered_map is about twice the speed of map - they have basically the same interface, so you can try both with your own data - whenever you are interested in performance you should always perform tests your self with your own real data, rather than depending on the experience of others. Both map types will be somewhat sensitive to the length of the string key.
Assuming you have some class A, that you want to access via a string key, the maps would be declared as:
map <string, A> amap;
unordered_map <string, A> umap;
I made a benchmark that compares std::map with boost::unordered_map.
My conclusion was basically this: If you do not need map-specific things like equal_range, then always use boost::unordered_map.
The full benchmark can be found here
A hash map will typically have some integral representation of a string, yes.
std::map has a requirement to be sorted, so implementing it as a hash table is unlikely, and I've never seen it in practice.
Whether string comparisons are good or bad depends entirely on what you're doing, what data you're comparing, and how often. If the first letter differs then that's barely any different from an integer comparison, for example.
You want unordered_map (that's the Boost version - there is also a version in the TR1 standard library if your compiler has that).
Is it worth it for game states? Yes, but only because using an unordered_map is simple. You're prematurely worrying about optimisations at this stage. Save the worries over access patterns for things you're going to look up thousands of times a second (ie. when your profiler tells you that it's a problem).

c++ std::map question about iterator order

I am a C++ newbie trying to use a map so I can get constant time lookups for the find() method.
The problem is that when I use an iterator to go over the elements in the map, elements do not appear in the same order that they were placed in the map.
Without maintaining another data structure, is there a way to achieve in order iteration while still retaining the constant time lookup ability?
Please let me know.
Thanks,
jbu
edit: thanks for letting me know map::find() isn't constant time.
Without maintaining another data structure, is there a way to achieve in order iteration while still retaining the constant time lookup ability?
No, that is not possible. In order to get an efficient lookup, the container will need to order the contents in a way that makes the efficient lookup possible. For std::map, that will be some type of sorted order; for std::unordered_map, that will be an order based on the hash of the key.
In either case, the order will be different then the order in which they were added.
First of all, std::map guarantees O(log n) lookup time. You might be thinking about std::tr1::unordered_map. But that by definitions sacrifices any ordering to get the constant-time lookup.
You'd have to spend some time on it, but I think you can bash boost::multi_index_container to do what you want.
What about using a vector for the keys in the original order and a map for the fast access to the data?
Something like this:
vector<string> keys;
map<string, Data*> values;
// obtaining values
...
keys.push_back("key-01");
values["key-01"] = new Data(...);
keys.push_back("key-02");
values["key-02"] = new Data(...);
...
// iterating over values in original order
vector<string>::const_iterator it;
for (it = keys.begin(); it != keys.end(); it++) {
Data* value = values[*it];
}
I'm going to actually... go backward.
If you want to preserve the order in which elements were inserted, or in general to control the order, you need a sequence that you will control:
std::vector (yes there are others, but by default use this one)
You can use the std::find algorithm (from <algorithm>) to search for a particular value in the vector: std::find(vec.begin(), vec.end(), value);.
Oh yes, it has linear complexity O(N), but for small collections it should not matter.
Otherwise, you can start looking up at Boost.MultiIndex as already suggested, but for a beginner you'll probably struggle a bit.
So, shirk the complexity issue for the moment, and come up with something that work. You'll worry about speed when you are more familiar with the language.
Items are ordered by operator< (by default) when applied to the key.
PS. std::map does not gurantee constant time look up.
It gurantees max complexity of O(ln(n))
First off, std::map isn't constant-time lookup. It's O(log n). Just thought I should set that straight.
Anyway, you have to specify your own comparison function if you want to use a different ordering. There isn't a built-in comparison function that can order by insertion time, but, if your object holds a timestamp field, you can arrange to set the timestamp at the time of insertion, and using a by-timestamp comparison.
Map is not meant for placing elements in some order - use vector for that.
If you want to find something in map you should "search" by the key using [the operator
If you need both: iteration and search by key see this topic
Yes you can create such a data structure, but not using the standard library... the reason is that standard containers can be nested but cannot be mixed.
There is no problem implementing for example a map data structure where all the nodes are also in a doubly linked list in order of insertion, or for example a map where all nodes are in an array. It seems to me that one of these structures could be what you're looking for (depending which operation you prefer to be fast), but neither of them is trivial to build using standard containers because every standard container (vector, list, set, ...) wants to be the one and only way to access contained elements.
For example I found useful in many cases to have nodes that were at the same time in multiple doubly-linked lists, but you cannot do that using std::list.