Insertion sort of a map - c++

I have a map (for example, of character to integer). I insert values into this map one by one. For example, here are four insertions:
1: A -> 1
2: B -> 2
3: C -> 3
4: D -> 1
I would like to sort the map keys based on their associated values. So, at each insertion, I would have the sorted output:
1: A(1)
2: B(2), A(1)
3: C(3), B(2), A(1)
4: C(3), B(2), A(1), D(1)
Furthermore, I would like to be able to overwrite existing mappings to keep the keys (characters) unique. So a fifth insertion:
5: A -> 27
Would cause the sorted output to be:
5: A(27), C(3), B(2), D(1)
One way I can do this is using a multimap. The key of the multimap would be the integer, and the values would be the characters. Each insertion into the multimap would first need to check if the character is already present in the multimap, and remove that mapping before performing the insertion. The multimap keeps the keys ordered, so that takes care of the sorting.
Is there a faster way to do this? Is there a different, more efficient container that I should be using?
EDIT
Here's the C++ STL multimap I'm using. It's handy because it keeps its elements internally ordered.
Here's a related question. I'm trying to avoid doing what's suggested in the accepted solution: creating another map.

What I would have done would be something like this. Imagine your map maps x to y. I would create a struct (class, whatever):
struct element
{
map_this_t x;
to_this_t y;
bool operator < (element &rhs)
{
return y < rhs.y; // sorted based on y
}
};
Then create a set<element> and insert into that. If you iterate over all the data in the set, you well get the data in the order you want. When inserting, first search and see if an element with x value as the one you want exists. If so, just update the y, otherwise insert a new element.

Generally a map is a shuffled data structure (at least for the associated values).
So, sorting elements of a map is meaningless.
Therefore you should use something like list or array to keep elements sorted.
I think the best solution for your issue is the simplest way: Store the elements in a list and sort it. Or you can use heaps.
UPDATE:
Maps are a kind of associative container that stores elements formed
by the combination of a key value and a mapped value.
In a map, the key value is generally used to uniquely identify the
element, while the mapped value is some sort of value associated to
this key. Types of key and mapped value may differ. For example, a
typical example of a map is a telephone guide where the name is the
key and the telephone number is the mapped value.
Internally, the elements in the map are sorted from lower to higher
key value following a specific strict weak ordering criterion set on
construction.
As associative containers, they are especially designed to be
efficient accessing its elements by their key (unlike sequence
containers, which are more efficient accessing elements by their
relative or absolute position) [1].
[1] http://www.cplusplus.com/reference/stl/map/

A map is sorted by it's keys, by definition.
The values cannot be used in the ordering predicate for a map, because the values are non-const! Changing them would invalidate the container invariants (ordering).
If you want the entries ordered by 'value', it is implicitely a key, and you likely require a
std::set<std::pair<key, value> >
Update Here is a working demo of that: https://ideone.com/SgbEN
instead. To be able to re-order by a different predicate in place, you'd need a list, vector or any other sequential container.
Edit
The list solution is far too inefficient, since I would have to sort after each insertion, and wouldn't have an effective way of keeping the keys unique.
The solution here is to use the
std::make_heap
std::push_heap
and std::sort_heap
algorithms. They work well with lists (as well as vectors for cheap value_types - or using c++0x move semantics)
If you cannot afford the sort_heap step before using the list as ordered: use
insert_point = std::lower_bound(lst.begin(), lst.end(), insertee);
st.insert(insertion_point, insertee);
to insert directly in ordered position. You can expect push_heap to be faster for many insertions, and lowerbound/insert to be faster for many accesses

just use a map <char,int> and your keys will remain unique an pairs of <char,int> will store sorted in it.
to sort the elements based on their associated values store pair<int,char> in a set too.

Related

Keep the order of unordered_map as we insert a new key

I inserted the elements to the unordered_map with this code:
myMap.insert(std::make_pair("A", 10));
myMap.insert(std::make_pair("B", 11));
myMap.insert(std::make_pair("C", 12));
myMap.insert(std::make_pair("D", 13));
But when I used this command to print the keys
for (const auto i : myMap)
{
cout << i.first << std::endl;
}
they are not in the same order as I inserted them.
Is it possible to keep the order?
No, it is not possible.
Usage of std::unordered_map doesn't give you any guarantee on element order.
If you want to keep elements sorted by map keys (as seems from your example) you should use std::map.
If you need to keep list of ordered pairs you can use std::vector<std::pair<std::string,int>>.
Not with an unordered associative data structure. However, other data structures preserve order, such as std::map which keeps the data sorted by their keys. If you search Stackoverflow a little, you will find many solutions for a data structure with fast key-based lookup and ordered access, e.g. using boost::multi_index.
If it is just about adding values to a container, and taking them out in the order of insertion, then you can go with something that models a queue, e.g. std::dequeue. Just push_back to add a new value, and pop_front to remove the oldest value. If there is no need to remove the values from the container then just go with a std::vector.
Remarkably common request without too many clean,simple solutions. But here they are:
The big library: https://www.boost.org/doc/libs/1_71_0/libs/multi_index/doc/tutorial/index.html Use unique index std::string (or is it char?) and sequenced index to retain the order.
Write it yourself using 2 STL containers: Use a combination of std::unordered_map (or std::map if you want sorted key order traverse as well) and a vector to retain the sequenced order. There are several ways to set this up, depending on the types on your keys/values. Usually keys in the map and the values in the vector. then the map is map<"key_type",int> where the int points to the element in the vector and therefore the value.
I might have a play with a simple template for a wrapper to tie the 2 STL containers together and post it here later...
I have put a proof of concept up for review here:
I went with std::list to store the order in the end, because I wanted efficient delete. But You might choose std::vector if you wanted random access by insertion order.
I used a a list of Pair pointers to avoid duplicate key storage.

Is there a linked hash set in C++?

Java has a LinkedHashSet, which is a set with a predictable iteration order. What is the closest available data structure in C++?
Currently I'm duplicating my data by using both a set and a vector. I insert my data into the set. If the data inserted successfully (meaning data was not already present in the set), then I push_back into the vector. When I iterate through the data, I use the vector.
If you can use it, then a Boost.MultiIndex with sequenced and hashed_unique indexes is the same data structure as LinkedHashSet.
Failing that, keep an unordered_set (or hash_set, if that's what your implementation provides) of some type with a list node in it, and handle the sequential order yourself using that list node.
The problems with what you're currently doing (set and vector) are:
Two copies of the data (might be a problem when the data type is large, and it means that your two different iterations return references to different objects, albeit with the same values. This would be a problem if someone wrote some code that compared the addresses of the "same" elements obtained in the two different ways, expecting the addresses to be equal, or if your objects have mutable data members that are ignored by the order comparison, and someone writes code that expects to mutate via lookup and see changes when iterating in sequence).
Unlike LinkedHashSet, there is no fast way to remove an element in the middle of the sequence. And if you want to remove by value rather than by position, then you have to search the vector for the value to remove.
set has different performance characteristics from a hash set.
If you don't care about any of those things, then what you have is probably fine. If duplication is the only problem then you could consider keeping a vector of pointers to the elements in the set, instead of a vector of duplicates.
To replicate LinkedHashSet from Java in C++, I think you will need two vanilla std::map (please note that you will get LinkedTreeSet rather than the real LinkedHashSet instead which will get O(log n) for insert and delete) for this to work.
One uses actual value as key and insertion order (usually int or long int) as value.
Another ones is the reverse, uses insertion order as key and actual value as value.
When you are going to insert, you use std::map::find in the first std::map to make sure that there is no identical object exists in it.
If there is already exists, ignore the new one.
If it does not, you map this object with the incremented insertion order to both std::map I mentioned before.
When you are going to iterate through this by order of insertion, you iterate through the second std::map since it will be sorted by insertion order (anything that falls into the std::map or std::set will be sorted automatically).
When you are going to remove an element from it, you use std::map::find to get the order of insertion. Using this order of insertion to remove the element from the second std::map and remove the object from the first one.
Please note that this solution is not perfect, if you are planning to use this on the long-term basis, you will need to "compact" the insertion order after a certain number of removals since you will eventually run out of insertion order (2^32 indexes for unsigned int or 2^64 indexes for unsigned long long int).
In order to do this, you will need to put all the "value" objects into a vector, clear all values from both maps and then re-insert values from vector back into both maps. This procedure takes O(nlogn) time.
If you're using C++11, you can replace the first std::map with std::unordered_map to improve efficiency, you won't be able to replace the second one with it though. The reason is that std::unordered map uses a hash code for indexing so that the index cannot be reliably sorted in this situation.
You might wanna know that std::map doesn't give you any sort of (log n) as in "null" lookup time. And using std::tr1::unordered is risky business because it destroys any ordering to get constant lookup time.
Try to bash a boost multi index container to be more freely about it.
The way you described your combination of std::set and std::vector sounds like what you should be doing, except by using std::unordered_set (equivalent to Java's HashSet) and std::list (doubly-linked list). You could also use std::unordered_map to store the key (for lookup) along with an iterator into the list where to find the actual objects you store (if the keys are different from the objects (or only a part of them)).
The boost library does provide a number of these types of combinations of containers and look-up indices. For example, this bidirectional list with fast look-ups example.

Why retrieving elements from CMap is not ordered

In my application, I have a CMap of CString values. After adding the elements in the Map, if I retrieve the elements in some other place, am not getting the elements in the order of insertion.Suppose I retrieve the third element, I get the fifth like that. Is it a behavior of CMap. Why this happens?
You asked for "why", so here goes:
A Map provides for an efficient way to retrieve values by key. It does this by using a clever datastructure that is faster for this than a list or an array would be (where you have to search through the whole list before you know if an element is in there or not). There are trade-offs, such as increased memory usage, and the inability to do some other things (such as knowing in which order things were inserted).
There are two common ways to implement this
a hashmap, which puts keys into buckets by hash value.
a treemap, which arranges keys into a binary tree, according to how they are sorted
You can iterate over maps, but it will be according to how they are stored internally, either in key order (treemap) or completely unpredictable (hashmap). Your CMap seems to be a hashmap.
Either way, insertion order is not preserved. If you want that, you need an extra datastructure (such as a list).
How about read documentation to CMap?
http://msdn.microsoft.com/ru-ru/library/s897094z%28v=vs.71%29.aspx
It's unordered map really. How you retrieve elements? By GetStartPosition and GetNextAssoc? http://msdn.microsoft.com/ru-ru/library/d82fyybt%28v=vs.71%29.aspx read Remark here
Remarks
The iteration sequence is not predictable; therefore, the "first element in the map" has no special significance.
CMap is a dictionary collection class that maps unique keys to values. Once you have inserted a key-value pair (element) into the map, you can efficiently retrieve or delete the pair using the key to access it. You can also iterate over all the elements in the map.

push back for the map container

we got this map :
std::map <int, int> values;
would this function be the same as the push_back function of a Vector :
void PushBack(int value)
{
values[values.size()] = value;
}
since size returns the size of the container I thought it would be correct, according to the following scenario it is :
index 0 = 200
index 1 = 150
you want to push back 100, values.size() would return 2, right? so then, it would , just like the normal push_back go inside index 2, correct?
The whole point of maps is to lookup and store data based on a key that uniquely represents that data.
If you're doing this there's no point in using a map; you should choose another data structure that more appropriately addresses the design needs of the application.
Maps and vectors are very different.
The short version to the actual question you asked:
if all you do on your customized map is key-based lookup of already existing keys (the operator[]) and your push_back it may act like an inefficient drop-in replacement for vector where you only use the vector operator[] and push_back, yes.
The long version providing some background on why what you are doing is probably not actually what you want:
A map doesn't have an index, it has a key. A map is usually implemented as a red-black tree. Such a data structure allows for efficient lookup based on the key. You usually care about the key of a particular element, the key itself carries important information. Keys are usually not contiguous, and maps will not allocate space for keys that are not used in the map.
A vector is a contiguous block of memory. This allows for efficient indexed access. An index is not the same as a key: you generally don't care about which index a particular element gets, which index you do get depends on the order of insertion (they key is independent of the insertion order in in a map), indexes into vectors are always integer values, and you cannot have non-contiguous indexes.
If all you do in your map is your own custom push_back then to the outside it might appear to function like a vector in some contexts, and it might not in ways in other contexts (e.g. iterator invalidation).
Since you don't actually care about the key for the element that gets added in your example the choice of a map is pointless. Indexed lookup in the vector will be faster, and memory overhead will be smaller (though you could end up with memory fragmentation issues if you allocate very many objects, but that's a separate subject).
And finally, if you don't know which container class to use, vector and list are the places to start. Understand the differences between those two, and when you should use either of them, and then move on to the more advanced, specialized containers like map, set, their "multi" variants, and their "unordered" variants.
Unless you only use the map in a very special way, it won't be correct. Consider this scenario:
std::map<int, int> values;
values[1] = 42;
PushBack(7);
values now holds just one element, 7, at index 1.
The question is, of course, if you need 'push back', why use a map in the first place.
If you want push_back then consider using a std::vector. A map is an associative array, to do fast lookup by a specified type of key. They are not designed to do push_back like vector is.
Difficult to say what you want to achieve, and why you try to use map instead of vector, but better method could be:
void PushBack(int value)
{
int idx = 0;
if( values.size() ) idx = values.rbegin()->first + 1;
values[idx] = value;
}

key-value data structure to search key in O(1) and get Max value in O(1)

I need to implement a key-value data structure that search for a unique key in O(lgn) or O(1) AND get max value in O(1). I am thinking about
boost::bimap< unordered_set_of<key> ,multiset_of<value> >
Note that there is no duplicated key in my key-value data sets. However, two keys may have the same value. Therefore I used multiset to store values.
I need to insert/remove/update key-value pair frequently
How does it sounds ?
It depends on what you want to do. So it is clear that you want to use it for some get-the-maximum-values-in-an-iteration construction.
Question 1: Do you access the elements by their keys as well?
If yes, I can think of two solutions for you:
use boost::bimap - easy, tested solution with logarithmic runtimes.
create a custom container that contains an std::map (or for even faster by key access std::unordered_map) and a heap implementation (e.g. std::priority_queue<std::map<key, value>::iterator> + custom comparator) as well, keeps them in sync, etc. This is the hard way, but maybe faster. Most operations on both will be O(logn) (insert, get&pop max, get by key, remove) but the constant sometimes do matter. Although is you use std::unordered_map the access by key operation will be O(1).
You may want to write tests for the new container as well and optimize it for the operation you use the most.
If no, you really just access using elements using the maximum value
Question 2: do you insert/remove/update elements randomly or you first put in all elements in one round and then remove them one by one?
for random insert/remove/updates use std::priority_queue<std::pair<value, key>>
if you put in the elements first, and then remove them one-by-one, use and std::vector<std::pair<value, key>> and std::sort() it before the first remove operation. Do not actually remove the elements, just iterate on them.
You could build this using a std::map and a std::set.
One map to hold the actual values, i.e. std::map<key, value> m;. This is where your values are stored. Inserting elements into a map is O(log n).
A set of iterators pointing into the map; this set is sorted by the value of the respective map entry, i.e. std::set<std::map<key, value>::iterator, CompareBySecondField> s; with something like (untested):
template <class It>
struct CompareBySecondField : std::binary_function<It, It, bool> {
bool operator() ( const T &lhs, const T &rhs ) const {
return lhs->second > rhs->second;
}
};
You can then get an iterator to the map entry with the largest value using *s.begin();.
This is rather easy to build, but you have to make sure to update both containers whenever you add/remove elements.