A vector or multimap dilemma - c++

I have a dilemma of whether to have a multimap <int key, int value> or maintain a vector containing a vector of all values corresponding to int key.
I'm interested in which performs faster when looking up the values for a certain int key.

If you want a multimap and not just a map, the alternative will probably be a vector< list<int> > or something like that (a multimap, being generally implemented as an RB tree that allows multiple equivalent keys, is somewhat akin to a map with a list element type).
In general, a vector lookup is faster: it's O(1) for the array vs O(log n) for the map (in both case I'm not counting the search into the list/vector/set/whatever is used for the "multi" part). But, to use the vector, you have to make it as big as the biggest int key you want to use; if your keys are sequential this is not a problem, but if your index is sparse the multimap can be a better choice.
On the other hand, if you don't need ordered traversal, unordered_multimap (which is actually a hash table) could be the best of both worlds: you get array-like O(1) lookup without having to keep an enormous empty array.

Forget which is "faster". You can profile it later, but don't obsess over this. Far more important is that one approach gives you sparse storage, and the other does not -- focus on this and decide which is the most appropriate for your problem.

I would say if your keys are sequential go with the vector, but if there are big holes in your keys then the map will be better (as you won't have to store "empty" records as in your vector), plus it will make it easier to count how many records you have etc.
Performance wise vectors are based on arrays so lookups are generally faster (as maps have to go through a few pieces of data to do a lookup).

I would recommend map<int, vector<int>>
Since once you have done the search in the map you have a vector with all the values.
Otherwise you solution will require a new search of each value

I guess you are doing premature optimization. It's not good because you should optimize only after everything is working with use of profilers. Don't waste time and use a specialized container for your needs.

Related

Best STL container for fast lookups

I need to store a list of integers, and very quickly determine if an integer is already in the list. No duplicates will be stored.
I won't be inserting or deleting any values, but I will be appending values (which might be implemented as inserting).
What is the best STL container class for this? I found std::multimap on Google, but I that seems to require a key/value pair, which I don't have.
I'm fairly new to STL. Can I get some recommendations?
Instead of a map, you can use a set when the value and the key aren't separate.
Instead of a multiset/-map, you can use the non-multi version which doesn't store duplicates.
Instead of a set, you have the std::unordered_set as an alternative. It may be faster for your use case.
There are other, less generic, data structures that can be used to represent sets of integers, and some of those may be more efficient depending on the use case. But those other data structures aren't necessarily provided for you by the standard library.
But I'm not clear which have the fastest lookup.
Unordered set has better asymptotic complexity for lookups than the ordered one. Whether it is faster for your use case, you can find out by measuring.
not likely to exceed a thousand or so
In that case, asymptotic complexity is not necessarily relevant.
Especially for small-ish sets like this, a sorted vector can be quite efficient. Given that you "won't be inserting or deleting any values", the vector shouldn't have significant downsides either. The standard library doesn't provide a set container implemented internally using a sorted vector, but it does provide a vector container as well as all necessary algorithms.
I don't know how the containers compute hashes.
You can provide a hash function for the unordered container. By default it uses std::hash. std::hash uses an implementation defined hash function.
std::unordered_set<int> is a good choice for keeping track of duplicates of ints, since both insertion and lookup can be achieved in constant time on average.
Insertion
Inserting a new int into the collection can be achieved with the insert() member function:
std::unordered_set<int> s;
s.insert(7);
Lookup
Checking whether a given int is present in the collection can be done with the find() member function:
bool is_present(const std::unordered_set<int>& s, int value) {
return s.find(value) != s.end();
}

C++ Data Structure that would be best to hold a large list of names

Can you share your thoughts on what the best STL data structure would be for storing a large list of names and perform searches on these names?
Edit:
The names are not unique and the list can grow as new names can continuously added to it. And by large I am talking of from 1 million to 10 million names.
Since you want to search names, you want a structure that support fast random access. That means vector, deque and list are all out of the question. Also, vector/array are slow on random adds/inserts for sorted sets because they have to shift items to make room for each inserted item. Adding to end is very fast, though.
Consider std::map, std::unordered_map or std::unordered_multimap (or their siblings std::set, std::unordered_set and std::unordered_multiset if you are only storing keys).
If you are purely going to do unique, random access, I'd start with one of the unordered_* containers.
If you need to store an ordered list of names, and need to do range searches/iteration and sorted operations, a tree based container like std::map or std::set should do better with the iteration operation than a hash based container because the former will store items adjacent to their logical predecessors and successors. For random access, it is O(log N) which is still decent.
Prior to std::unordered_*, I used std::map to hold large numbers of objects for an object cache and though there are faster random access containers, it scaled well enough for our uses. The newer unordered_map has O(1) access time so it is a hashed structure and should give you the near best access times.
You can consider the possibility of using concatenation of those names using a delimiter but the searching might take a hit. You would need to come up with a adjusted binary searching.
But you should try the obvious solution first which is a hashmap which is called unordered_map in stl. See if that meets your needs. Searching should be plenty fast there but at a cost of memory.

Random access to hash map values

What's the best way to design such a container, which can support randomized value access? but the container has to support other operations, such as insert key/value pairs and remove by key, with the best possible time performance.
One way to do this is to combine hash map with array, but if hash map is used, what's the best way to do random access of hash map values, i.e., without generating a key.
If you're talking about data structures, and not existing language support - then you just have to design a data structure to support that.
You can do that, for example, by implementing a hash map which will hold, in addition, an array of pointers to its members. You can then translate random access operators to that array, and maintain it with every insertion or removal (that is of course a general idea, some implementation details omitted).
Some languages support traversing the data structures through iterators. Although looping on an iterator for a random amount of times is not really random access (performance-wise), it will give you the same result in more time.
I would say your question sounds like some algorithms' coursework homework. Why would you want to do it in the real life? What is the problem you're trying to solve?
edit
In the comments you phrased the problem as:
what's the best way to design such a container, which can support
randomized value access? but the container has to support other
operations, such as insert key/value pairs and remove by key, with the
best possible performance.
My suggestions above hold, but the question is what is the trade-off. If the "best performance" is time-wise, then my suggestion with the array gives you that. If the best performance is memory-wise then the iterating over the tree would give you that, that's my other suggestion.
In general, when you come to a need to design a new data structure, you need to answer the following questions:
What are the operations required?
What is the time complexity required, for each operation?
What is the memory complexity required for the structure?
Which is more important, memory or time?
Sometimes you just can't do in O(1) without additional memory. Sometimes you can do in O(1) with additional O(n) memory, but you can make it with O(lg n) memory if you compromise on O(lg n) time. There are trade offs that you have to make your decision about, I don't know them.
So my first suggestion (combining a BST or hash with an array of pointers to its nodes) does all the operations of the BST (map) or hash with the standard complexity of BST/hash operations, and all the read operations of array with the standard complexity (i.e.: random access in O(1) time). Write operations of the array will be with complexity of map/hash, and additional memory footprint is O(n).
My second suggestion has no additional memory footprint, but the "random" access is pseudo random: you can just iterate to the point you want instead of directly accessing it. That would make your random access in O(n) while zero additional coding, or wasting memory.
Name of the game? Trade-offs.
If you simply want to inspect the key portion of a unorederd_map, use it's iterators.
If you mean "without inserting a new element", then find() is the preferred method over []:
if (auto it = mymap.find("joe") != mymap.end())
{
make_phone_call(it->second);
}
This is particular to ordered and unordered maps, which are unique among the associative containers for providing the intrusive []-operator.
For unordered maps, the lookup time is constant on average.

Dynamic array width id?

I need some sort of dynamic array in C++ where each element have their own id represented by an int.
The datatype needs these functions:
int Insert() - return ID
Delete(int ID)
Get(ID) - return Element
What datatype should I use? I'we looked at Vector and List, but can't seem to find any sort of ID. Also I'we looked at map and hastable, these may be usefull. I'm however not sure what to chose.
I would probably use a vector and free id list to handle deletions, then the index is the id. This is really fast to insert and get and fairly easy to manage (the only trick is the free list for deleted items).
Otherwise you probably want to use a map and just keep track of the lowest unused id and assign it upon insertion.
A std::map could work for you, which allows to associate a key to a value. The key would be your ID, but you should provide it yourself when adding an element to the map.
An hash table is a sort of basic mechanism that can be used to implement an unordered map. It corresponds to std::unordered_map.
It seems that the best container to use is unordered_map.
It is based on hash. You can insert, delete or searche for elements in O(n).
Currently unordered_map is not in STL. If you want to use STL container use std::map.
It is based on tree. Inserts, deletes and searches for elements in O(n*log(n)).
Still the container choice depends much on the usage intensity. For example, if you will find for elements rare, vector and list could be ok. These containers do not have find method, but <algorithm> library include it.
A vector gives constant-time random access, the "id" can simply be the offset (index) into the vector. A deque is similar, but doesn't store all items contiguously.
Either of these would be appropriate, if the ID values can start at 0 (or a known offset from 0 and increment monotonically). Over time if there are a large amount of removals, either vector or deque can become sparsely populated, which may be detrimental.
std::map doesn't have the problem of becoming sparsely populated, but look ups move from constant time to logarithmic time, which could impact performance.
boost::unordered_map may be the best yet, as the best case scenario as a hash table will likely have the best overall performance characteristics given the question. However, usage of the boost library may be necessary -- but there are also unordered container types in std::tr1 if available in your STL implementation.

Efficient C++ associative container with vector key

I've constructed a map which has a vector as its key: map<vector<KeyT>, T> which I'm trying to optimize now.
An experiment with manually nested maps map<vector<KeyT>, map<KeyT,T> > where the first key is the original vector minus the last element and the second key is the last element shows a reasonable speed-up.
Now I'm wondering whether there exists a semi-standard implementation (like boost or similar) of an associative container where vector keys are implemented as such a hierarchical structure of containers.
Ideally, this would create as many layers as there are elements in the key vector, while keeping a uniform syntax for vectors of different length.
Are you sure you need to optimise it? std::string is basically like a std::vector and we happily use std::string as an array key!
Have you profiled your code? std::map doesn't copy its key/value pairs unneccesarily -- what exactly are you afraid of?
Are your vector keys of a fixed-size? std::tuple might help in that case.
If not, it might help to partition your containers according to the length of the key, although the effectiveness of schemes such as this are highly domain-dependent.
My first hunch is that you want to improve map lookup time by reducing the volume of the key. This is what hash functions are for. C++ tr1 and Boost have hash_maps by the name of unordered_map
I'll try to devise a small sample in some time here