Efficient C++ associative container with vector key - c++

I've constructed a map which has a vector as its key: map<vector<KeyT>, T> which I'm trying to optimize now.
An experiment with manually nested maps map<vector<KeyT>, map<KeyT,T> > where the first key is the original vector minus the last element and the second key is the last element shows a reasonable speed-up.
Now I'm wondering whether there exists a semi-standard implementation (like boost or similar) of an associative container where vector keys are implemented as such a hierarchical structure of containers.
Ideally, this would create as many layers as there are elements in the key vector, while keeping a uniform syntax for vectors of different length.

Are you sure you need to optimise it? std::string is basically like a std::vector and we happily use std::string as an array key!
Have you profiled your code? std::map doesn't copy its key/value pairs unneccesarily -- what exactly are you afraid of?
Are your vector keys of a fixed-size? std::tuple might help in that case.
If not, it might help to partition your containers according to the length of the key, although the effectiveness of schemes such as this are highly domain-dependent.

My first hunch is that you want to improve map lookup time by reducing the volume of the key. This is what hash functions are for. C++ tr1 and Boost have hash_maps by the name of unordered_map
I'll try to devise a small sample in some time here

Related

A vector or multimap dilemma

I have a dilemma of whether to have a multimap <int key, int value> or maintain a vector containing a vector of all values corresponding to int key.
I'm interested in which performs faster when looking up the values for a certain int key.
If you want a multimap and not just a map, the alternative will probably be a vector< list<int> > or something like that (a multimap, being generally implemented as an RB tree that allows multiple equivalent keys, is somewhat akin to a map with a list element type).
In general, a vector lookup is faster: it's O(1) for the array vs O(log n) for the map (in both case I'm not counting the search into the list/vector/set/whatever is used for the "multi" part). But, to use the vector, you have to make it as big as the biggest int key you want to use; if your keys are sequential this is not a problem, but if your index is sparse the multimap can be a better choice.
On the other hand, if you don't need ordered traversal, unordered_multimap (which is actually a hash table) could be the best of both worlds: you get array-like O(1) lookup without having to keep an enormous empty array.
Forget which is "faster". You can profile it later, but don't obsess over this. Far more important is that one approach gives you sparse storage, and the other does not -- focus on this and decide which is the most appropriate for your problem.
I would say if your keys are sequential go with the vector, but if there are big holes in your keys then the map will be better (as you won't have to store "empty" records as in your vector), plus it will make it easier to count how many records you have etc.
Performance wise vectors are based on arrays so lookups are generally faster (as maps have to go through a few pieces of data to do a lookup).
I would recommend map<int, vector<int>>
Since once you have done the search in the map you have a vector with all the values.
Otherwise you solution will require a new search of each value
I guess you are doing premature optimization. It's not good because you should optimize only after everything is working with use of profilers. Don't waste time and use a specialized container for your needs.

Dynamic array width id?

I need some sort of dynamic array in C++ where each element have their own id represented by an int.
The datatype needs these functions:
int Insert() - return ID
Delete(int ID)
Get(ID) - return Element
What datatype should I use? I'we looked at Vector and List, but can't seem to find any sort of ID. Also I'we looked at map and hastable, these may be usefull. I'm however not sure what to chose.
I would probably use a vector and free id list to handle deletions, then the index is the id. This is really fast to insert and get and fairly easy to manage (the only trick is the free list for deleted items).
Otherwise you probably want to use a map and just keep track of the lowest unused id and assign it upon insertion.
A std::map could work for you, which allows to associate a key to a value. The key would be your ID, but you should provide it yourself when adding an element to the map.
An hash table is a sort of basic mechanism that can be used to implement an unordered map. It corresponds to std::unordered_map.
It seems that the best container to use is unordered_map.
It is based on hash. You can insert, delete or searche for elements in O(n).
Currently unordered_map is not in STL. If you want to use STL container use std::map.
It is based on tree. Inserts, deletes and searches for elements in O(n*log(n)).
Still the container choice depends much on the usage intensity. For example, if you will find for elements rare, vector and list could be ok. These containers do not have find method, but <algorithm> library include it.
A vector gives constant-time random access, the "id" can simply be the offset (index) into the vector. A deque is similar, but doesn't store all items contiguously.
Either of these would be appropriate, if the ID values can start at 0 (or a known offset from 0 and increment monotonically). Over time if there are a large amount of removals, either vector or deque can become sparsely populated, which may be detrimental.
std::map doesn't have the problem of becoming sparsely populated, but look ups move from constant time to logarithmic time, which could impact performance.
boost::unordered_map may be the best yet, as the best case scenario as a hash table will likely have the best overall performance characteristics given the question. However, usage of the boost library may be necessary -- but there are also unordered container types in std::tr1 if available in your STL implementation.

Associating and iterating in C++

I've got a situation where I want to use an associative container, and I chose to use a std::unordered_map, because it's perfectly feasible that this container could be used to hold millions or more of elements. But now I also need to iterate in order. I considered having the value types link to each other in a list, but now I'm going to have issues with memory management.
Should I change container, say to a std::map? Or just iterate once through my unordered_map, insert into a vector, and sort, then iterate? It's pretty unlikely that I will need to iterate in an ordered fashion repeatedly.
Well, you know the O() of the various operations of the two alternatives you've picked. You should pick based on that and do a cost/benefit analysis based on where you need the performance to happen and which container does best for THAT.
Of course, I couldn't possibly know enough to do that analysis for you.
You could use Boost.MultiIndex, specifying the unordered (hashed) index as well as an ordered one, on the same underlying object collection.
Possible issues with this - there is no natural mapping from an existing associative container model, and it might be overkill if you don't need the second index all the time.

Hash map, string compares, and std::map?

First off, I would like to make a few points I believe to be true. Please can these be verified?
A hash map stores strings by
converting them into an integer
somehow.
std::map is not a hash map, and if I'm using strings, I should consider using a hash map for memory issues?
String compares are not good to rely on.
If std::map is not a hash map and I should not be relying on string compares (basically, I have a map with strings as keys...I was told to look up using hash maps instead?), is there a hash map in the C++ STL? If not, how about Boost?
Secondly, Is a hash map worth it for [originally] an std::map< std::string, non-POD GameState >?
I think my point is getting across...I plan to have a store of different game states I can look up and register into a factory.
If any more info is needed, please ask.
Thanks for your time.
I don't believe most of your points are correct.
there is no hash map in the current standard. C++0x introduces unordered_map, who's implementation will be a hash table and your compiler probably already supports it.
std::map is implemented as a balanced tree, not a hash table. There are no "memory issues" when using either map type with strings, either as keys or data.
strings are not stored as numbers in either case - an unordered_map will use a hashing function to derive a numeric key from the string, but this is not stored.
my experience is that unordered_map is about twice the speed of map - they have basically the same interface, so you can try both with your own data - whenever you are interested in performance you should always perform tests your self with your own real data, rather than depending on the experience of others. Both map types will be somewhat sensitive to the length of the string key.
Assuming you have some class A, that you want to access via a string key, the maps would be declared as:
map <string, A> amap;
unordered_map <string, A> umap;
I made a benchmark that compares std::map with boost::unordered_map.
My conclusion was basically this: If you do not need map-specific things like equal_range, then always use boost::unordered_map.
The full benchmark can be found here
A hash map will typically have some integral representation of a string, yes.
std::map has a requirement to be sorted, so implementing it as a hash table is unlikely, and I've never seen it in practice.
Whether string comparisons are good or bad depends entirely on what you're doing, what data you're comparing, and how often. If the first letter differs then that's barely any different from an integer comparison, for example.
You want unordered_map (that's the Boost version - there is also a version in the TR1 standard library if your compiler has that).
Is it worth it for game states? Yes, but only because using an unordered_map is simple. You're prematurely worrying about optimisations at this stage. Save the worries over access patterns for things you're going to look up thousands of times a second (ie. when your profiler tells you that it's a problem).

c++ std::map question about iterator order

I am a C++ newbie trying to use a map so I can get constant time lookups for the find() method.
The problem is that when I use an iterator to go over the elements in the map, elements do not appear in the same order that they were placed in the map.
Without maintaining another data structure, is there a way to achieve in order iteration while still retaining the constant time lookup ability?
Please let me know.
Thanks,
jbu
edit: thanks for letting me know map::find() isn't constant time.
Without maintaining another data structure, is there a way to achieve in order iteration while still retaining the constant time lookup ability?
No, that is not possible. In order to get an efficient lookup, the container will need to order the contents in a way that makes the efficient lookup possible. For std::map, that will be some type of sorted order; for std::unordered_map, that will be an order based on the hash of the key.
In either case, the order will be different then the order in which they were added.
First of all, std::map guarantees O(log n) lookup time. You might be thinking about std::tr1::unordered_map. But that by definitions sacrifices any ordering to get the constant-time lookup.
You'd have to spend some time on it, but I think you can bash boost::multi_index_container to do what you want.
What about using a vector for the keys in the original order and a map for the fast access to the data?
Something like this:
vector<string> keys;
map<string, Data*> values;
// obtaining values
...
keys.push_back("key-01");
values["key-01"] = new Data(...);
keys.push_back("key-02");
values["key-02"] = new Data(...);
...
// iterating over values in original order
vector<string>::const_iterator it;
for (it = keys.begin(); it != keys.end(); it++) {
Data* value = values[*it];
}
I'm going to actually... go backward.
If you want to preserve the order in which elements were inserted, or in general to control the order, you need a sequence that you will control:
std::vector (yes there are others, but by default use this one)
You can use the std::find algorithm (from <algorithm>) to search for a particular value in the vector: std::find(vec.begin(), vec.end(), value);.
Oh yes, it has linear complexity O(N), but for small collections it should not matter.
Otherwise, you can start looking up at Boost.MultiIndex as already suggested, but for a beginner you'll probably struggle a bit.
So, shirk the complexity issue for the moment, and come up with something that work. You'll worry about speed when you are more familiar with the language.
Items are ordered by operator< (by default) when applied to the key.
PS. std::map does not gurantee constant time look up.
It gurantees max complexity of O(ln(n))
First off, std::map isn't constant-time lookup. It's O(log n). Just thought I should set that straight.
Anyway, you have to specify your own comparison function if you want to use a different ordering. There isn't a built-in comparison function that can order by insertion time, but, if your object holds a timestamp field, you can arrange to set the timestamp at the time of insertion, and using a by-timestamp comparison.
Map is not meant for placing elements in some order - use vector for that.
If you want to find something in map you should "search" by the key using [the operator
If you need both: iteration and search by key see this topic
Yes you can create such a data structure, but not using the standard library... the reason is that standard containers can be nested but cannot be mixed.
There is no problem implementing for example a map data structure where all the nodes are also in a doubly linked list in order of insertion, or for example a map where all nodes are in an array. It seems to me that one of these structures could be what you're looking for (depending which operation you prefer to be fast), but neither of them is trivial to build using standard containers because every standard container (vector, list, set, ...) wants to be the one and only way to access contained elements.
For example I found useful in many cases to have nodes that were at the same time in multiple doubly-linked lists, but you cannot do that using std::list.