sorting std::map by value - c++

Right now I have a map and I need to sort it by value(int), and then by key(string) if there is a tie. I know I would need to write a customized comparison function for this, however so far I haven't been able to make it work.
(I need to store my stuffs in a map since the strings are words and ints are the frequencies and I will need to 'find' the pairs by searching the keys later)

The std::map can only be sorted by key (string in your case).
If you need to sort it by value as well, you'd need to create a std::multimap with the int as key and the string as value, and populate it by iterating over the map.
Alternatively, you could also create a vector<pair<int,string>> that you populate by iteration over the map and just use std::sort().

You can use a std::multiset<std::pair<int, std::string>>

With the information given, it's a bit of a guessing game, but unless you are shuffling massive amounts of data, this may do.
using entry = std::pair<std::string, int>;
using CompareFunc = bool(*)(const entry&, const entry&);
using sortset = std::set<entry, CompareFunc>;
sortset bv(themap.begin(), themap.end(), [](auto& a, auto&b){ a.second!=b.second?a.second<b.second:a.first<b.first; });
for(const auto& d : bv) {
//
}

Related

How to safely "pop-front" from std::map without extra copy?

I have a std::map object.
std::map<std::string, std::string> m;
m.insert({ "abcd", "foo" });
m.insert({ "1234", "bar" });
and I want to get and remove the first element, like:
auto iter = m.begin();
auto [key, value] = std::move(*iter);
m.erase(iter);
do_something_with(key, value);
Is this considered safe?
(Moving from the iterator should make the key an empty string, which makes the m an invalid map.)
You can use std::map::extract like this:
auto nh = m.extract(m.begin());
and then use the key and value like this:
do_something(nh.key(), nh.mapped());
This has the needed property that no extra copies are made.
Is this considered safe?
On the condition that the map isn't empty, yes.
However, note that the key will be a deep copy; not moved one. This is because the the key of the map element is const.
How to safely “pop-front” from std::map without extra copy?
It is possible to move from the key too, if you use the extract member function:
auto handle = m.extract(m.begin());
// if you need separate objects:
auto key = std::move(handle.key());
auto mapped = std::move(handle.mapped());
Use std map extract. Using the resulting node handle, move the key/value to your key/value variables.
Prior to std map extract, this isn't fully possible. std map extract was added to let you do this, and similar operations like splicing maps.

Loss of data while ordering an unordered_map c++

I have a unordered_map<string, int> freq and I order it transforming it into a
map<int,string> freq2. I use the next function in order to do that:
map<int, string> order(unordered_map<string, int> x) {
map <int, string> map;
for (auto it = x.begin(); it != x.end(); ++it) {
map.emplace(it->second, it->first);
}
return map;
}
the size of the unordered_mapis 2355831 and the returned map is 505, so as you see the loss of data is quite big and i have no idea why....
Any idea why this happens?
Thanks.
EDIT:
Thanks to all, you are all right, I have a lot of int with same value, that´s why i loose the data( really stupid from my part to not see it before)
Most likely this is because there are duplicates among the int values. Try replacing map<int, string> with multimap<int, string>.
The code itself looks fine. However, since you are mapping from string keys to integers, it might be very well that you have multiple keys with the same value.
From the documentation of emplace:
The insertion only takes place if no other element in the container has a key equivalent to the one being emplaced (keys in a map container are unique).
So if a lot of your entries in the first map have the same value (which is the key in the second map), then your dataset will decrease by a lot.
If you need to preserve those elements, then std::map is not the right container.

Working with a vector of pair vectors?

I've been search around Google but I didn't find what I need. I'm trying to create a vector that allows me to add 3 (and after I'll need to store 4) variables, access and sort them.
I'm implementing the vector as follows for 3 variables:
std::vector<std::pair<std::string, std::pair<int, double> > > chromosomes;
To add information (variables), I'm doing:
chromosomes.emplace_back(dirp->d_name, std::make_pair(WSA, fault_percent));
How can I access each parameter and sort them based on the WSA and fault coverage? As in a vector of pair that I can do that using members first and second.
And for 4 variables, it would be as follows?
std::vector<std::pair<std::string, std::string>, std::pair<int, double> > > chromosomes;
chromosomes.emplace_back( std::make_pair(dirp->d_name, x), std::make_pair(WSA, fault_percent));`
As suggested here I think you should be using a vector of tuple<string, int, double>s or tuple<string, string, int, double>s respectively.
There is a defined tuple::operator< which uses the less-than-operator for each of it's composing types moving left to right. If a simple comparison of each element is sufficient then all you'll need to do is call sort:
sort(chromosomes.begin(), chromosomes.end());
If the tuple::operatior< does not provide a sufficient comparison for your needs sort provides an overload which takes a comparison lambda. Your lambda would need to do the following:
Take in 2 const references to the tuples
Return true if the first tuple is strictly smaller than the second tuple
Return false if the first tuple is greater or equal to the second tuple
In the end your call would look something like this:
sort(chromosomes.begin(), chromosomes.end(), [](const auto& lhs, const auto& rhs) {
// Your comparison between the two goes here
});
If you're not familiar with working with tuples you'll need to use the templated get method to extract either by index or type in the cases where there is not a duplicate type contained by the tuple.
First to access to the different elements:
for (auto& x :chromosomes)
cout <<x.first<<": "<<x.second.first<<" "<<x.second.second<<endl;
Next, to sort the elements on WSA:
sort(chromosomes.begin(), chromosomes.end(),
[](auto &x, auto &y) { return x.second.first<y.second.first;});
If you want to sort on several criteria, for example WSA and fault_percent, you just have to change the lambda function for comparison:
sort(chromosomes.begin(), chromosomes.end(),
[](auto &x, auto &y) { return x.second.first<y.second.first
|| (x.second.first==y.second.first
&& x.second.second<y.second.second );});
Here is an online demo
Remark
Now what puzzles me, is why you want to use pairs of pairs or even tuples, when you could use a clean struct which would be easier to store/retrieve, and access its members:
struct Chromosome {
string name;
int WSA;
double fault_percent;
};
vector <Chromosome> chromosomes;
It would be much more readable and maintainable this way:
sort(chromosomes.begin(), chromosomes.end(),
[](auto &x, auto &y) { return x.WSA<y.WSA
|| (x.WSA==y.WSA && x.fault_percent<y.fault_percent );});
It seems like you need a table-like data structure, that allows sorting by multiple columns. C++ isn't the easiest language to manipulate table/matrix data structures in, but here's a few links to help you get started.
An example Table class:
How to dynamically sort data by arbitrary column(s)
A vector/tuple solution, which is a slightly cleaner version of what you're currently working on:
sorting table in place using stl sort
A lengthy discussion of this problem, which might give you some additional ideas:
https://softwareengineering.stackexchange.com/questions/188130/what-is-the-best-way-to-store-a-table-in-c

How can I sort a std::map first by value, then by key?

I need to sort a std::map by value, then by key. The map contains data like the following:
1 realistically
8 really
4 reason
3 reasonable
1 reasonably
1 reassemble
1 reassembled
2 recognize
92 record
48 records
7 recs
I need to get the values in order, but the kicker is that the keys need to be in alphabetical order after the values are in order. How can I do this?
std::map will sort its elements by keys. It doesn't care about the values when sorting.
You can use std::vector<std::pair<K,V>> then sort it using std::sort followed by std::stable_sort:
std::vector<std::pair<K,V>> items;
//fill items
//sort by value using std::sort
std::sort(items.begin(), items.end(), value_comparer);
//sort by key using std::stable_sort
std::stable_sort(items.begin(), items.end(), key_comparer);
The first sort should use std::sort since it is nlog(n), and then use std::stable_sort which is n(log(n))^2 in the worst case.
Note that while std::sort is chosen for performance reason, std::stable_sort is needed for correct ordering, as you want the order-by-value to be preserved.
#gsf noted in the comment, you could use only std::sort if you choose a comparer which compares values first, and IF they're equal, sort the keys.
auto cmp = [](std::pair<K,V> const & a, std::pair<K,V> const & b)
{
return a.second != b.second? a.second < b.second : a.first < b.first;
};
std::sort(items.begin(), items.end(), cmp);
That should be efficient.
But wait, there is a better approach: store std::pair<V,K> instead of std::pair<K,V> and then you don't need any comparer at all — the standard comparer for std::pair would be enough, as it compares first (which is V) first then second which is K:
std::vector<std::pair<V,K>> items;
//...
std::sort(items.begin(), items.end());
That should work great.
You can use std::set instead of std::map.
You can store both key and value in std::pair and the type of container will look like this:
std::set< std::pair<int, std::string> > items;
std::set will sort it's values both by original keys and values that were stored in std::map.
As explained in Nawaz's answer, you cannot sort your map by itself as you need it, because std::map sorts its elements based on the keys only. So, you need a different container, but if you have to stick to your map, then you can still copy its content (temporarily) into another data structure.
I think, the best solution is to use a std::set storing flipped key-value pairs as presented in ks1322's answer.
The std::set is sorted by default and the order of the pairs is exactly as you need it:
3) If lhs.first<rhs.first, returns true. Otherwise, if rhs.first<lhs.first, returns false. Otherwise, if lhs.second<rhs.second, returns true. Otherwise, returns false.
This way you don't need an additional sorting step and the resulting code is quite short:
std::map<std::string, int> m; // Your original map.
m["realistically"] = 1;
m["really"] = 8;
m["reason"] = 4;
m["reasonable"] = 3;
m["reasonably"] = 1;
m["reassemble"] = 1;
m["reassembled"] = 1;
m["recognize"] = 2;
m["record"] = 92;
m["records"] = 48;
m["recs"] = 7;
std::set<std::pair<int, std::string>> s; // The new (temporary) container.
for (auto const &kv : m)
s.emplace(kv.second, kv.first); // Flip the pairs.
for (auto const &vk : s)
std::cout << std::setw(3) << vk.first << std::setw(15) << vk.second << std::endl;
Output:
1 realistically
1 reasonably
1 reassemble
1 reassembled
2 recognize
3 reasonable
4 reason
7 recs
8 really
48 records
92 record
Code on Ideone
Note: Since C++17 you can use range-based for loops together with structured bindings for iterating over a map.
As a result, the code for copying your map becomes even shorter and more readable:
for (auto const &[k, v] : m)
s.emplace(v, k); // Flip the pairs.
std::map already sorts the values using a predicate you define or std::less if you don't provide one. std::set will also store items in order of the of a define comparator. However neither set nor map allow you to have multiple keys. I would suggest defining a std::map<int,std::set<string> if you want to accomplish this using your data structure alone. You should also realize that std::less for string will sort lexicographically not alphabetically.
EDIT: The other two answers make a good point. I'm assuming that you want to order them into some other structure, or in order to print them out.
"Best" can mean a number of different things. Do you mean "easiest," "fastest," "most efficient," "least code," "most readable?"
The most obvious approach is to loop through twice. On the first pass, order the values:
if(current_value > examined_value)
{
current_value = examined_value
(and then swap them, however you like)
}
Then on the second pass, alphabetize the words, but only if their values match.
if(current_value == examined_value)
{
(alphabetize the two)
}
Strictly speaking, this is a "bubble sort" which is slow because every time you make a swap, you have to start over. One "pass" is finished when you get through the whole list without making any swaps.
There are other sorting algorithms, but the principle would be the same: order by value, then alphabetize.

predicate for a map from string to int

I have this small program that reads a line of input & prints the words in it, with their respective number of occurrences. I want to sort the elements in the map that stores these values according to their occurrences. I mean, the words that only appear once, will be ordered to be at the beginning, then the words that appeared twice 7 so on. I know that the predicate should return a bool value, but I don't know what the parameters should be. Should it be two iterators to the map? If some one could explain this, it would be greatly appreciated. Thank you in advance.
#include<iostream>
#include<map>
using std::cout;
using std::cin;
using std::endl;
using std::string;
using std::map;
int main()
{
string s;
map<string,int> counters; //store each word & an associated counter
//read the input, keeping track of each word & how often we see it
while(cin>>s)
{
++counters[s];
}
//write the words & associated counts
for(map<string,int>::const_iterator iter = counters.begin();iter != counters.end();iter++)
{
cout<<iter->first<<"\t"<<iter->second<<endl;
}
return 0;
}
std::map is always sorted according to its key. You cannot sort the elements by their value.
You need to copy the contents to another data structure (for example std::vector<std::pair<string, int> >) which can be sorted.
Here is a predicate that can be used to sort such a vector. Note that sorting algorithms in C++ standard library need a "less than" predicate which basically says "is a smaller than b".
bool cmp(std::pair<string, int> const &a, std::pair<string, int> const &b) {
return a.second < b.second;
}
You can't resort a map, it's order is predefined (by default, from std::less on the key type). The easiest solution for your problem would be to create a std::multimap<int, string> and insert your values there, then just loop over the multimap, which will be ordered on the key type (int, the number of occurences), which will give you the order that you want, without having to define a predicate.
You are not going to be able to do this with one pass with an std::map. It can only be sorted on one thing at a time, and you cannot change the key in-place. What I would recommend is to use the code you have now to maintain the counters map, then use std::max_element with a comparison function that compares the second field of each std::pair<string, int> in the map.
A map has its keys sorted, not its values. That's what makes the map efficent. You cannot sort it by occurrences without using another data structure (maybe a reversed index!)
As stated, it simply won't work -- a map always remains sorted by its key value, which would be the strings.
As others have noted, you can copy the data to some other structure, and sort by the value. Another possibility would be to use a Boost bimap instead. I've posted a demo of the basic idea previously.
You probably want to transform map<string,int> to vector<pair<const string, int> > then sort the vector on the int member.
You could do
struct PairLessSecond
{
template< typename P >
bool operator()( const P& pairLeft, const P& pairRight ) const
{
return pairLeft.second < pairRight.second;
}
};
You can probably also construct all this somehow using a lambda with a bind.
Now
std::vector< std::map<std::string,int>::value_type > byCount;
std::sort( byCount.begin(), byCount.end(), PairLessSecond() );