I'm implementing a multi-index map in C++11, which I want to be optimized for specific features. The problem I'm currently trying to solve, is to not store key elements more then once. But let me explain.
The problem arose from sorting histograms to overlay them in different combinations. The histograms had names, which could be split into tokens (properties).
Here are the features I want my property map to have:
Be able to loop over properties in any order;
Be able to return container with unique values for each property;
Accumulate properties' values in the order they arrive, but to be able to sort properties using a custom comparison operator after the map is filled;
I have a working implementation in C++11 using std::unordered_map with std::tuple as key_type. I'm accumulating property values as they arrive into a tuple of forward_lists. The intended use, is to iterate over the lists to compose keys.
The optimization I would like to introduce, is to only store properties' value in the lists, and not store them in tuples used as keys in the map. I'd like to maintain ability to have functions returning const references to lists of property values, instead of lists of some wrappers.
I know that boost::multi_index has similar functionality, but I don't need the overhead of sorting as the keys arrive. I'd like to have new property values stored sequentially, and only be sortable postfactum. I've also looked at boost::flyweight, but in the simplest approach, the lists will then be of flyweight<T> instead of T, and I'd like to not do that. (If that IS the best solution, I could definitely live with it.)
I know that lists are stable, i.e. once an element is created, its pointer and iterator remain valid, even after invoking list::sort(). Knowing that, can something be done to the map, to eliminate redundant copies of tuple elements? Could a custom map allocator help here?
Thanks for suggestions.
Have your map be from tuples of iterators to your prop containers.
Write a hash the dereferences the iterators and combines the result.
Replace the forward list prop containers with sets that first order on hash, then contents.
Do lookup by first finding in set, then doing lookup in hash.
If you need a different order for props, have another container of set iterators.
Related
I have a map which stores <int, char *>. Now I want to retrieve the elements in the order they have been inserted. std::map returns the elements ordered by the key instead. Is it even possible?
If you are not concerned about the order based on the int (IOW you only need the order of insertion and you are not concerned with the normal key access), simply change it to a vector<pair<int, char*>>, which is by definition ordered by the insertion order (assuming you only insert at the end).
If you want to have two indices simultaneously, you need, well, Boost.MultiIndex or something similar. You'd probably need to keep a separate variable that would only count upwards (would be a steady counter), though, because you could use .size()+1 as new "insertion time key" only if you never erased anything from your map.
Now I want to retrieve the elements in the order they have been inserted. [...] Is it even possible?
No, not with std::map. std::map inserts the element pair into an already ordered tree structure (and after the insertion operation, the std::map has no way of knowing when each entry was added).
You can solve this in multiple ways:
use a std::vector<std::pair<int,char*>>. This will work, but not provide the automatic sorting that a map does.
use Boost stuff (#BartekBanachewicz suggested Boost.MultiIndex)
Use two containers and keep them synchronized: one with a sequential insert (e.g. std::vector) and one with indexing by key (e.g. std::map).
use/write a custom container yourself, so that supports both type of indexing (by key and insert order). Unless you have very specific requirements and use this a lot, you probably shouldn't need to do this.
A couple of options (other than those that Bartek suggested):
If you still want key-based access, you could use a map, along with a vector which contains all the keys, in insertion order. This gets inefficient if you want to delete elements later on, though.
You could build a linked list structure into your values: instead of the values being char*s, they're structs of a char* and the previously- and nextly-inserted keys**; a separate variable stores the head and tail of the list. You'll need to do the bookkeeping for that on your own, but it gives you efficient insertion and deletion. It's more or less what boost.multiindex will do.
** It would be nice to store map iterators instead, but this leads to circular definition problems.
Java has a LinkedHashSet, which is a set with a predictable iteration order. What is the closest available data structure in C++?
Currently I'm duplicating my data by using both a set and a vector. I insert my data into the set. If the data inserted successfully (meaning data was not already present in the set), then I push_back into the vector. When I iterate through the data, I use the vector.
If you can use it, then a Boost.MultiIndex with sequenced and hashed_unique indexes is the same data structure as LinkedHashSet.
Failing that, keep an unordered_set (or hash_set, if that's what your implementation provides) of some type with a list node in it, and handle the sequential order yourself using that list node.
The problems with what you're currently doing (set and vector) are:
Two copies of the data (might be a problem when the data type is large, and it means that your two different iterations return references to different objects, albeit with the same values. This would be a problem if someone wrote some code that compared the addresses of the "same" elements obtained in the two different ways, expecting the addresses to be equal, or if your objects have mutable data members that are ignored by the order comparison, and someone writes code that expects to mutate via lookup and see changes when iterating in sequence).
Unlike LinkedHashSet, there is no fast way to remove an element in the middle of the sequence. And if you want to remove by value rather than by position, then you have to search the vector for the value to remove.
set has different performance characteristics from a hash set.
If you don't care about any of those things, then what you have is probably fine. If duplication is the only problem then you could consider keeping a vector of pointers to the elements in the set, instead of a vector of duplicates.
To replicate LinkedHashSet from Java in C++, I think you will need two vanilla std::map (please note that you will get LinkedTreeSet rather than the real LinkedHashSet instead which will get O(log n) for insert and delete) for this to work.
One uses actual value as key and insertion order (usually int or long int) as value.
Another ones is the reverse, uses insertion order as key and actual value as value.
When you are going to insert, you use std::map::find in the first std::map to make sure that there is no identical object exists in it.
If there is already exists, ignore the new one.
If it does not, you map this object with the incremented insertion order to both std::map I mentioned before.
When you are going to iterate through this by order of insertion, you iterate through the second std::map since it will be sorted by insertion order (anything that falls into the std::map or std::set will be sorted automatically).
When you are going to remove an element from it, you use std::map::find to get the order of insertion. Using this order of insertion to remove the element from the second std::map and remove the object from the first one.
Please note that this solution is not perfect, if you are planning to use this on the long-term basis, you will need to "compact" the insertion order after a certain number of removals since you will eventually run out of insertion order (2^32 indexes for unsigned int or 2^64 indexes for unsigned long long int).
In order to do this, you will need to put all the "value" objects into a vector, clear all values from both maps and then re-insert values from vector back into both maps. This procedure takes O(nlogn) time.
If you're using C++11, you can replace the first std::map with std::unordered_map to improve efficiency, you won't be able to replace the second one with it though. The reason is that std::unordered map uses a hash code for indexing so that the index cannot be reliably sorted in this situation.
You might wanna know that std::map doesn't give you any sort of (log n) as in "null" lookup time. And using std::tr1::unordered is risky business because it destroys any ordering to get constant lookup time.
Try to bash a boost multi index container to be more freely about it.
The way you described your combination of std::set and std::vector sounds like what you should be doing, except by using std::unordered_set (equivalent to Java's HashSet) and std::list (doubly-linked list). You could also use std::unordered_map to store the key (for lookup) along with an iterator into the list where to find the actual objects you store (if the keys are different from the objects (or only a part of them)).
The boost library does provide a number of these types of combinations of containers and look-up indices. For example, this bidirectional list with fast look-ups example.
I'm looking for a function in C++ that for swap the contents of a map ...
that is:
those that were the keys now become the items and those that the items were now the keys.
Can you tell me if there is something about this?
As Geoffroy said, std::map doesn't allow this behaviour. However, you might want to use a STL-like container Boost.Bimap - bidirectorial map.
A Bimap is a data structure that represents bidirectional relations between elements of two collections. The container is designed to work as two opposed STL maps. A bimap between a collection X and a collection Y can be viewed as a map from X to Y (this view will be called the left map view) or as a map from Y to X (known as the right map view).
There's no standard method / way to do this, you have to write your own function.
It's not something very hard to do, but first think about doing it in a different way.
If you have to invert your key/values, then you're code may be ill-though, you don't keep the logic of the container.
If you want more information, explain why you want to do this.
Insert the items in the map into a multimap - value first, key second, with appropriate comparison function that compares two values of the original map. Once all value-key items are inserted, the multimap will be sorted as intended. Job done!
we have a collection of key and value pairs. We are in the need for a container which can help us to retrieve the value o(1) but also remember the insertion order so that when we do iteration, we could iterate like a inserting order. Since the key is a string, we will not able to use a set or similar structure.
Currently we have defined our own collection class which contains a list, also a map and the values are stored into 2 different structure.
Are there any readily available implementation available?
Sounds like you need a Boost Multi-Index container.
Let's say I have a collection of Person objects, each of which looks like this:
class Person
{
string Name;
string UniqueID;
}
Now, the objects must be stored in a container which allows me to order them so that I can given item X easily locate item X+1 and X-1.
However, I also need fast access based on the UniqueID, as the collection will be large and a linear search won't cut it.
My current 'solution' is to use a std::list in conjunction with a std::map. The list holds the Persons (for ordered access) and the map is used to map UniqueID to a reference to the list item. Updating the 'container' typically involves updating both map and list.
It works, but I feel there should be a smarter way of doing it, maybe boost:bimap. Suggestions?
EDIT: There's some confusion about my requirement for "ordering". To explain, the objects are streamed in sequentially from a file, and the 'order' of items in the container should match the file order. The order is unrelated to the IDs.
boost:bimap is the most obvious choice. bimap is based on boost::multi_index, but bimap has simplified syntax. Personally I will prefer boost::multi_index over boost::bimap because it will allow to easily add more indices to the Person structure in the future.
There is no Standard Library container that does what you want - so you will have to use two containers or the Boost solution. If using two containers, I would normally prefer a vector or a deque over a list, in almost all circumstances.
Why not to use two maps , one having Person as Key and another one having UniqueId as Key, but that requires updating both of them.
you can create a callback function which updates both the maps whenever there is any change.