Use 2 keys in std::map - c++

I was trying to create a std::map with 2 keys. I can do it with std::pair or create a struct and use it as the key.
In my software, there's a search function that is being called a lot. This function searches by the 1st or the 2nd key.
If I had about 1000 items in the map, I'm guessing it will take some time if I wanted to search it. So I thought that if I make another std::map that holds the 2nd key and the value is 1st key, then I can take the value and search in the other map to get the real value.
But my guess is that this will take more memory. What is the best option in this scenario?

This is an engineering decision that you'll have to resolve.
Multiple maps only makes sense (IMHO) if you know that the sets of key1 and key2 keys conflict. Otherwise, why not just insert both keys into the same map, each corresponding value referring to your object?
You don't want to duplicate your object, so you might put them in a vector, and put the vector index as the mapped value. Or use a map of key-to-pointer, etc.
1k items is not really that many, so I'm not concerned about the memory use here, but using a map instead of an unordered_map might be a concern (rb-tree vs. hash table).
Also, if you remove items from the map, you'll need to be able to remove both keys together, so be sure to account for that.

Brute force approach
You can store your items in std::vector and have two maps: the first with your first key and pointers (or indices) to vector items and the second with your second key and pointers (or indices) to vector items. The problem is to maintain all three containers when your set is modified.
Pointers vs. indices: Pointers are dangerous as pointed correctly in the comment, but simpler if you're going to delete items from the vector. Otherwise indices are safer.
Smart approach
You can use Boost.MultiIndex container that was designed for cases exactly like yours.

Related

Unordered_map produce secondary key

I'm using strings as a type of key for my unordered_map but is it possible that I could associate a secondary unique key, independent from the primary, so I could perform a find operation with the second key?
I was thinking that the key could be a hash number the internal hash algorithm came up with.
I thought of including an id (increasing by 1 each time) to the structure I'm saving but then again, I would have to look up for the key that is a string first.
Reason behind this: I want to make lists that enlist some of the elements in the unordered_map but saving strings in the list is very inefficient instead of saving int or long long. (I would prefer not to use pointers but rather a bookkeeping style of procedure).
You couldn't use a hash number the internal hash algorithm came up with because it could change the numbers due to the table growth in size. This is called rehashing. Also hashes are not guaranteed to be unique (they certainly won't be).
Keeping pointers to elements in your list will work just fine, since unordered_map doesn't invalidate pointers. But the deletion of the elements will be hard.
Boost has multi_index_container, which provides many useful database-like functions. It will be perfect for your task.
If you don't want to use Boost, you could use unordered_map with unique integer indices, and another unordered_map which keeps string->index pairs for searching by string keys. The deletion will also be hard, because either you will check all your lists each time you delete a record, or you will check if the record still exists each time you are traversing the list.

Underlying storage and functionality of unordered_maps vs unordered_multimaps in C++?

I'm having a hard time wrapping my head around unordered_maps and unordered_multimaps because my test code isn't producing what I've been told to expect.
std::unordered_map<string, int> names;
names.insert(std::make_pair("Peter", 4));
names.insert(std::make_pair("George", 4));
names.insert(std::make_pair("George", 4));
When I iterate through this list, I get one instance of George first, then Peter.
1) It's my understanding unordered_maps do not allow multiple keys to map to one value, and that multimaps due. Is this true?
2) Why can Peter and George coexist at a value of 4? What is happening to the second George? And for that matter, why is George appearing first when I iterate from begin() to end() if this is unordered?
3) What is the underlying representation of an unordered map vs. unordered multimap?
4) Is there a way to insert keys into either map without providing a value? E.g. have the compiler create its own hash function that I don't need to worry about when I retrieve keys and look for collisions?
I'll make it short:
No. Multi... refers to keys. A (non-multi)map can't have multiple equivalent keys with differeny values, ie. per key there is at most one value. A multi map can. The same holds for the unordered versions.
Peter != George, which is why they have different key and may very well have the same value.
A hashmap.
Use sets.
In your example the second insertion for George using a (non-multi) is skipped as the same key was previously inserted.
You want to use unordered_multimap to have several keys that are the same.
Since this is unordered you can't really hope to have any particular order, because it depends on the hash function.
If you want order in which you insert things, you need to use std::vector. Even normal maps, which are supposed to be ordered imply the comparison order, and not the order in which you insert things, for example string "AB" comes before "BB", because "A" is less than "B".
To insert without providing a value you need a set, and not a map.
The underlying structure of "unordered_" things is hashtable.

How to retrieve the elements from map in the order of insertion?

I have a map which stores <int, char *>. Now I want to retrieve the elements in the order they have been inserted. std::map returns the elements ordered by the key instead. Is it even possible?
If you are not concerned about the order based on the int (IOW you only need the order of insertion and you are not concerned with the normal key access), simply change it to a vector<pair<int, char*>>, which is by definition ordered by the insertion order (assuming you only insert at the end).
If you want to have two indices simultaneously, you need, well, Boost.MultiIndex or something similar. You'd probably need to keep a separate variable that would only count upwards (would be a steady counter), though, because you could use .size()+1 as new "insertion time key" only if you never erased anything from your map.
Now I want to retrieve the elements in the order they have been inserted. [...] Is it even possible?
No, not with std::map. std::map inserts the element pair into an already ordered tree structure (and after the insertion operation, the std::map has no way of knowing when each entry was added).
You can solve this in multiple ways:
use a std::vector<std::pair<int,char*>>. This will work, but not provide the automatic sorting that a map does.
use Boost stuff (#BartekBanachewicz suggested Boost.MultiIndex)
Use two containers and keep them synchronized: one with a sequential insert (e.g. std::vector) and one with indexing by key (e.g. std::map).
use/write a custom container yourself, so that supports both type of indexing (by key and insert order). Unless you have very specific requirements and use this a lot, you probably shouldn't need to do this.
A couple of options (other than those that Bartek suggested):
If you still want key-based access, you could use a map, along with a vector which contains all the keys, in insertion order. This gets inefficient if you want to delete elements later on, though.
You could build a linked list structure into your values: instead of the values being char*s, they're structs of a char* and the previously- and nextly-inserted keys**; a separate variable stores the head and tail of the list. You'll need to do the bookkeeping for that on your own, but it gives you efficient insertion and deletion. It's more or less what boost.multiindex will do.
** It would be nice to store map iterators instead, but this leads to circular definition problems.

Is there a linked hash set in C++?

Java has a LinkedHashSet, which is a set with a predictable iteration order. What is the closest available data structure in C++?
Currently I'm duplicating my data by using both a set and a vector. I insert my data into the set. If the data inserted successfully (meaning data was not already present in the set), then I push_back into the vector. When I iterate through the data, I use the vector.
If you can use it, then a Boost.MultiIndex with sequenced and hashed_unique indexes is the same data structure as LinkedHashSet.
Failing that, keep an unordered_set (or hash_set, if that's what your implementation provides) of some type with a list node in it, and handle the sequential order yourself using that list node.
The problems with what you're currently doing (set and vector) are:
Two copies of the data (might be a problem when the data type is large, and it means that your two different iterations return references to different objects, albeit with the same values. This would be a problem if someone wrote some code that compared the addresses of the "same" elements obtained in the two different ways, expecting the addresses to be equal, or if your objects have mutable data members that are ignored by the order comparison, and someone writes code that expects to mutate via lookup and see changes when iterating in sequence).
Unlike LinkedHashSet, there is no fast way to remove an element in the middle of the sequence. And if you want to remove by value rather than by position, then you have to search the vector for the value to remove.
set has different performance characteristics from a hash set.
If you don't care about any of those things, then what you have is probably fine. If duplication is the only problem then you could consider keeping a vector of pointers to the elements in the set, instead of a vector of duplicates.
To replicate LinkedHashSet from Java in C++, I think you will need two vanilla std::map (please note that you will get LinkedTreeSet rather than the real LinkedHashSet instead which will get O(log n) for insert and delete) for this to work.
One uses actual value as key and insertion order (usually int or long int) as value.
Another ones is the reverse, uses insertion order as key and actual value as value.
When you are going to insert, you use std::map::find in the first std::map to make sure that there is no identical object exists in it.
If there is already exists, ignore the new one.
If it does not, you map this object with the incremented insertion order to both std::map I mentioned before.
When you are going to iterate through this by order of insertion, you iterate through the second std::map since it will be sorted by insertion order (anything that falls into the std::map or std::set will be sorted automatically).
When you are going to remove an element from it, you use std::map::find to get the order of insertion. Using this order of insertion to remove the element from the second std::map and remove the object from the first one.
Please note that this solution is not perfect, if you are planning to use this on the long-term basis, you will need to "compact" the insertion order after a certain number of removals since you will eventually run out of insertion order (2^32 indexes for unsigned int or 2^64 indexes for unsigned long long int).
In order to do this, you will need to put all the "value" objects into a vector, clear all values from both maps and then re-insert values from vector back into both maps. This procedure takes O(nlogn) time.
If you're using C++11, you can replace the first std::map with std::unordered_map to improve efficiency, you won't be able to replace the second one with it though. The reason is that std::unordered map uses a hash code for indexing so that the index cannot be reliably sorted in this situation.
You might wanna know that std::map doesn't give you any sort of (log n) as in "null" lookup time. And using std::tr1::unordered is risky business because it destroys any ordering to get constant lookup time.
Try to bash a boost multi index container to be more freely about it.
The way you described your combination of std::set and std::vector sounds like what you should be doing, except by using std::unordered_set (equivalent to Java's HashSet) and std::list (doubly-linked list). You could also use std::unordered_map to store the key (for lookup) along with an iterator into the list where to find the actual objects you store (if the keys are different from the objects (or only a part of them)).
The boost library does provide a number of these types of combinations of containers and look-up indices. For example, this bidirectional list with fast look-ups example.

Is there a data structure that doesn't allow duplicates and also maintains order of entry?

Duplicate: Choosing a STL container with uniqueness and which keeps insertion ordering
I'm looking for a data structure that acts like a set in that it doesn't allow duplicates to be inserted, but also knows the order in which the items were inserted. It would basically be a combination of a set and list/vector.
I would just use a list/vector and check for duplicates myself, but we need that duplicate verification to be fast as the size of the structure can get quite large.
Take a look at Boost.MultiIndex. You may have to write a wrapper over this.
A Boost.Bimap with the insertion order as an index should work (e.g. boost::bimap < size_t, Foo > ). If you are removing objects from the data structure, you will need to track the next insertion order value separately.
Writing your own class that wraps a vector and a set would seem the obvious solution - there is no C++ standard library container that does what you want.
Java has this in the form of an ordered set. I don't thing C++ has this, but it is not that difficult to implement yourself. What the Sun guys did with the Java class was to extend the hash table such that each item was simultaneously inserted into a hash table and kept in a double linked list. There is very little overhead in this, especially if you preallocate the items that are used to construct the linked list from.
If I where you, I would write a class that either used a private vector to store the items in or implement a hashtable in the class yourself. When any item is to be inserted into the set, check to see if it is in the hash table and optionally replace the item in there if such an item is in it. Then find the old item in the hash table, update the list to point to the new element and you are done.
To insert a new element you do the same, except you have to use a new element in the list - you can't reuse the old ones.
To delete an item, you reorder the list to point around it, and free the list element.
Note that it should be possible for you to get the part of the linked list where the element you are interested in is directly from the element so that you don't have to walk the chain each time you have to move or change an element.
If you anticipate having a lot of these items changed during the program run, you might want to keep a list of the list items, such that you can merely take the head of this list, rather than allocating memory each time you have to add a new element.
You might want to look at the dancing links algorithm.
I'd just use two data structures, one for order and one for identity. (One could point into the other if you store values, depending on which operation you want the fastest)
Sounds like a job for an OrderedDictionary.
Duplicate verification that's fast seems to be the critical part here. I'd use some type of a map/dictionary maybe, and keep track of the insertion order yourself as the actual data. So the key is the "data" you're shoving in (which is then hashed, and you don't allow duplicate keys), and put in the current size of the map as the "data". Of course this only works if you don't have any deletions. If you need that, just have an external variable you increment on every insertion, and the relative order will tell you when things were inserted.
Not necessarily pretty, but not that hard to implement either.
Assuming that you're talking ANSI C++ here, I'd either write my own or use composition and delegation to wrap a map for data storage and a vector of the keys for order of insertion. Depending on the characteristics of the data, you might be able to use the insertion index as your map key and avoid using the vector.