I'm looking for a C++ associative map container type which I can perform multiple key lookups on. The map needs to have constant time lookups, but I don't care if it's ordered or unordered. It just needs to be fast.
For example, I want to store a bunch of std::vector objects in a map with an int and a void* as the lookup keys. Both the int and the void* must match for my vector to be retrieved.
Does such a container exist already? Or am I going to have to roll my own? If so, how could I implement it? I've been trying to store a boost::unordered_map inside another boost::unordered_map, but I have not had any success with this method, yet. Maybe I will continue Pershing this method if there is no simpler way.
Constant look up requires a hash map. You can use a the boost::unordered_map (or tr1). The key would be the combined hash of the int and the void pointer.
If you don't want to use boost, you can try map< int, map<void*, vector> >. The lookups are however O(log(map size)).
Since C++11 you can also use an std::unordered_map, as it seems to fit your requirements quite nicely:
Unordered map is an associative container that contains key-value pairs with unique keys. Search, insertion, and removal of elements have average constant-time complexity.
In order to combine your int and void* into a single key, you can make use of std::pair.
To make the unordered map work with the pair, you have to specify a suitable hash function. In order to keep the following example short, I use a handcrafted combination of std::hash<> function calls inside a lambda expression. In case this function causes performance problems, you might want to create a more sophisticated hash.
You didn't specify the content of your vectors, so I chose int for the sake of simplicity. However, you can adapt the solution to any vector content. All in all your code could look as follows:
using Key = std::pair<int, void*>;
auto hash = [](const Key & k) {
return std::hash<int>()(k.first) * 31 + std::hash<void*>()(k.second);
};
std::unordered_map<Key, std::vector<int>, decltype(hash)> um(8, hash);
Code on Ideone
You could use boost::multi_index.
(although I think what you actually want is to use a type that contains both the void* and the integer as the key to your map, and just to compare the raw data for both in order to provide the comparison operator for the map)
Related
Essentially, I'd like a container in which one element can be accessed by many keys. This could be done by defining some multi-key class to be used as the map's key type, but since such a solution doesn't allow modification of the keys of elements which have already been inserted, I'm unable to make new aliases for existing entries.
I appreciate that with std::map keys need to be constant for the purposes of ordering, but why should this limitation exist with std::unordered_map?
If need be, I suppose I could just use a map of pointers, but is there a better, more elegant solution?
Edit: Thanks for clearing that up Andrei, Xeo. Nicol, any suggestions as to what container I should used?
Well, the reason why std::unordered_map does not let you modify the key is pretty much the same as the reason why other associative containers won't let you modify it: it would mess up the internal organization of that data structure.
In an unordered_map, the key is used to obtain a hash, and that hash tells the container in which bucket to place your element (and which bucket to retrieve it from, of course). If you modify the key, you modify the hash, and that means your element should be moved to a different bucket. That's like removing it and inserting it again, basically.
The whole idea of an associative container, on the other hand, is that any element is represented by one value which is fixed, so that its position in the container can be quickly computed as a function of that value. If multiple keys were allowed, which one would you use to quickly determine where the element is stored or is to be stored?
What you want is probably an ad-hoc data structure with complexity guarantees different from the ones of the Standard Library.
Personally, however, it seems to me like you are just looking for reference semantics, because you intend to share several views of one object. That would naturally lead me to the use of (smart) pointers, especially when I hear the world "alias". I suggest you to go for a map with shared_ptrs as values.
In associative containers (e.g. map, unordered_map), the value of key determines the position of the element in the data structure. In the non-multi associative containers, it is also necessary that the keys are unique.
If modifying the key were allowed, then that would jeopardize the aforementioned design invariants.
In map, placement of the element in the binary search tree
In unordered_map, linking the element to a hash bucket
If I understand OP's requirement, then it might be achieved by writing a wrapper on the container's insert(), such as the following C++-ish pseudocode:
Iterator insert_wrapper( Container & cont, Element const & elem ) {
if elem in cont {
cont.erase( elem );
}
return cont.insert( elem );
}
Would you find a map of references more tasteful than a map of pointers?
int value = 6;
std::unordered_map<int, int&> m;
for(int i=0; i<5; ++i)
m.emplace(i, value);
value = 4;
for(auto const& i: m)
std::cout<<i.second<<' ';
Of course, you have to store the actual values somewhere else, as the example has shown.
I need to implement a key-value data structure that search for a unique key in O(lgn) or O(1) AND get max value in O(1). I am thinking about
boost::bimap< unordered_set_of<key> ,multiset_of<value> >
Note that there is no duplicated key in my key-value data sets. However, two keys may have the same value. Therefore I used multiset to store values.
I need to insert/remove/update key-value pair frequently
How does it sounds ?
It depends on what you want to do. So it is clear that you want to use it for some get-the-maximum-values-in-an-iteration construction.
Question 1: Do you access the elements by their keys as well?
If yes, I can think of two solutions for you:
use boost::bimap - easy, tested solution with logarithmic runtimes.
create a custom container that contains an std::map (or for even faster by key access std::unordered_map) and a heap implementation (e.g. std::priority_queue<std::map<key, value>::iterator> + custom comparator) as well, keeps them in sync, etc. This is the hard way, but maybe faster. Most operations on both will be O(logn) (insert, get&pop max, get by key, remove) but the constant sometimes do matter. Although is you use std::unordered_map the access by key operation will be O(1).
You may want to write tests for the new container as well and optimize it for the operation you use the most.
If no, you really just access using elements using the maximum value
Question 2: do you insert/remove/update elements randomly or you first put in all elements in one round and then remove them one by one?
for random insert/remove/updates use std::priority_queue<std::pair<value, key>>
if you put in the elements first, and then remove them one-by-one, use and std::vector<std::pair<value, key>> and std::sort() it before the first remove operation. Do not actually remove the elements, just iterate on them.
You could build this using a std::map and a std::set.
One map to hold the actual values, i.e. std::map<key, value> m;. This is where your values are stored. Inserting elements into a map is O(log n).
A set of iterators pointing into the map; this set is sorted by the value of the respective map entry, i.e. std::set<std::map<key, value>::iterator, CompareBySecondField> s; with something like (untested):
template <class It>
struct CompareBySecondField : std::binary_function<It, It, bool> {
bool operator() ( const T &lhs, const T &rhs ) const {
return lhs->second > rhs->second;
}
};
You can then get an iterator to the map entry with the largest value using *s.begin();.
This is rather easy to build, but you have to make sure to update both containers whenever you add/remove elements.
I have a bunch of data full of duplicates and I want to eliminate the duplicates. You know, e.g. [1, 1, 3, 5, 5, 5, 7] becomes [1, 3, 5, 7].
It looks like I can use either std::map or std::set to handle this. However I'm not sure whether it's faster to (a) simply insert all the values into the container, or (b) check whether they already exist in the container and only insert if they don't - are inserts very efficient? Even if there's a better way... can you suggest a fast way to do this?
Another question - if the data I'm storing in them isn't as trivial as integers, and instead is a custom class, how does the std::map manage to properly store (hash?) the data for fast access via operator[]?
std::map doesn't use hashing. std::unordered_map does, but that's C++11. std::map and std::set both use a comparator that you provide. The class templates have defaults for this comparator, which boils down to an operator< comparison, but you can provide your own.
If you don't need both a key and a value to be stored (looks like you don't) you should just use a std::set, as that's more appropriate.
The Standard doesn't say what data structures maps and sets use under the hood, only that certian actions have certain time complexities. In reality, most implementations I'm aware of use a tree.
It makes no difference time-complexity-wise if you use operator[] or insert, but I would use insert or operator[] before I did a search followed by an insert if the item isn't found. The later would imply two seperate searches to insert an item in to the set.
An insert() on any of the associated containers does a find() to see if the object exists and then inserts the object. Simply inserting the elements into an std::set<T> should get rid of the duplicates reasonably efficiently.
Depending on the size of your set and the ratio of duplicates to unique values, it may be faster to put the objects into std::vector<T>, std::sort() then, and then use std::unique() together with std::vector<T>::erase() to get rid of the duplicates.
How many times should you do it?
If insert is usual:
//*/
std::set<int> store;
/*/
// for hash:
std::unordered_set<int> store;
//*/
int number;
if ( store.insert(number).second )
{
// was not in store
}
If you fill once:
std::vector<int> store;
int number;
store.push_back(number);
std::sort(store.begin(),store.end());
store.erase(std::unique(store.begin(),store.end()),store.end() );
// elements are unique
Assuming the common implementation strategy for std::map and std::set, i.e. balanced binary search trees, both insertion and lookup have to do a tree traversal to find the spot where the key should be. So failed lookup followed by insertion would be roughly twice as slow as just inserting.
how does the std::map manage to properly store (hash?) the data for fast access via operator[]?
By means of a comparison function that you specify (or std::less, which works if you overload operator< on your custom type). In any case, std::map and std::set are not hash tables.
std::set and std::map are both implemented as red black tree as far as I know. And probably using only insertion would be faster (then both because you would be doubling the lookup time).
Also map and set use operator <. As long as your class has defined operator < It would be able to use them as keys.
I have a map<std::string, myStruct> I wonder how to sort items in map by int property that is in myStruct.order and if 2 or more of myStruct orders are same throw list of keys (strings) that make map unsortable by that field of myStruct? Is there any fancy way of doing in in C++03 (may be with boost)?
boost has a complete (and complex) functionality in MultiIndex, but if I understand your requirements, it's overkill in this case. A fairly easy way could be to build a list of pointers to myStruct and sort. Then you can easily check for duplicate keys (these became adjacents).
sort should use a functor of type less<const myStruct*>, i.e.
bool compare_orders(const myStruct* a, const myStruct* b) { return a->order < b->order; }
What meaning is there in "sorting" a map? Sure, the map may be internally organized as a BST or something to make access faster, but conceptionally the elements of the map are in no particular order.
Other than that, you CAN supply an ordering method for the keys (because even if you do organize the map in a particular way, it is by the order of the keys, not by that of the values), but not for the values with the third template argument for map.
You can't do this, using std::map - the first in the pair is used for comparing (it's called key). In your case, thestd::string is the key.
You can use std::set< std::pair< std::string, MyStruct > > and then implement operator< for two std::pair< std::string, MyStruct >-s.
Or, you can change the std::map's definition, if it's possible/allowed/suitable/etc. It really depends on what you're trying to do and what you're allowed to do.
Or some other container (that keeps the order of elements, as inserted - like std::list, std::vector, etc.), and then using std::sort or container's sort method, if exists.
I agree with the other answers that it's not clear why you need to sort the map. But you can use a bidirectional map to let you view both myStruct and string as keys.
There's one in boost:
http://www.boost.org/doc/libs/1_48_0/libs/bimap/doc/html/index.html
There's an example here
http://www.boost.org/doc/libs/1_48_0/libs/bimap/doc/html/boost_bimap/examples/simple_bimap.html
I need a container of unique elements to be accessed with a triplet of int, and each int can be over 1.000.000.000.
(Only few of these elements will be actually filled, and actually these elements are boost::unordered_map themselves).
Is it faster to have a multiindex array like boost::multiindex (or maybe something else I don't know) or just a boost::unordered_map with a composed string as a key ?
Multi-index isn't what you want, you seem to want a single index whose type is a triple. (Unless you actually do want three independent indexes; if I misunderstood, leave a comment.)
Don't use strings, heavens no. Just use the triple as a key:
typedef std::tuple<int, int, int> key_type;
If you use an std::map<key_type, T>, you get logarithmic lookup, which may be sufficient, and I think you don't even have to do any more work (not sure if lexicographic comparison is defined by default for tuples).
If you want to use an std::unordered_map<key_type, T> (or the boost version), you have to define a hash function. Boost already has one for tuples, I think, but C++11 doesn't; but it's very easy to implement yourself based on hash_combine() which you can just crop out off the Boost code.