Working with a vector of pair vectors? - c++

I've been search around Google but I didn't find what I need. I'm trying to create a vector that allows me to add 3 (and after I'll need to store 4) variables, access and sort them.
I'm implementing the vector as follows for 3 variables:
std::vector<std::pair<std::string, std::pair<int, double> > > chromosomes;
To add information (variables), I'm doing:
chromosomes.emplace_back(dirp->d_name, std::make_pair(WSA, fault_percent));
How can I access each parameter and sort them based on the WSA and fault coverage? As in a vector of pair that I can do that using members first and second.
And for 4 variables, it would be as follows?
std::vector<std::pair<std::string, std::string>, std::pair<int, double> > > chromosomes;
chromosomes.emplace_back( std::make_pair(dirp->d_name, x), std::make_pair(WSA, fault_percent));`

As suggested here I think you should be using a vector of tuple<string, int, double>s or tuple<string, string, int, double>s respectively.
There is a defined tuple::operator< which uses the less-than-operator for each of it's composing types moving left to right. If a simple comparison of each element is sufficient then all you'll need to do is call sort:
sort(chromosomes.begin(), chromosomes.end());
If the tuple::operatior< does not provide a sufficient comparison for your needs sort provides an overload which takes a comparison lambda. Your lambda would need to do the following:
Take in 2 const references to the tuples
Return true if the first tuple is strictly smaller than the second tuple
Return false if the first tuple is greater or equal to the second tuple
In the end your call would look something like this:
sort(chromosomes.begin(), chromosomes.end(), [](const auto& lhs, const auto& rhs) {
// Your comparison between the two goes here
});
If you're not familiar with working with tuples you'll need to use the templated get method to extract either by index or type in the cases where there is not a duplicate type contained by the tuple.

First to access to the different elements:
for (auto& x :chromosomes)
cout <<x.first<<": "<<x.second.first<<" "<<x.second.second<<endl;
Next, to sort the elements on WSA:
sort(chromosomes.begin(), chromosomes.end(),
[](auto &x, auto &y) { return x.second.first<y.second.first;});
If you want to sort on several criteria, for example WSA and fault_percent, you just have to change the lambda function for comparison:
sort(chromosomes.begin(), chromosomes.end(),
[](auto &x, auto &y) { return x.second.first<y.second.first
|| (x.second.first==y.second.first
&& x.second.second<y.second.second );});
Here is an online demo
Remark
Now what puzzles me, is why you want to use pairs of pairs or even tuples, when you could use a clean struct which would be easier to store/retrieve, and access its members:
struct Chromosome {
string name;
int WSA;
double fault_percent;
};
vector <Chromosome> chromosomes;
It would be much more readable and maintainable this way:
sort(chromosomes.begin(), chromosomes.end(),
[](auto &x, auto &y) { return x.WSA<y.WSA
|| (x.WSA==y.WSA && x.fault_percent<y.fault_percent );});

It seems like you need a table-like data structure, that allows sorting by multiple columns. C++ isn't the easiest language to manipulate table/matrix data structures in, but here's a few links to help you get started.
An example Table class:
How to dynamically sort data by arbitrary column(s)
A vector/tuple solution, which is a slightly cleaner version of what you're currently working on:
sorting table in place using stl sort
A lengthy discussion of this problem, which might give you some additional ideas:
https://softwareengineering.stackexchange.com/questions/188130/what-is-the-best-way-to-store-a-table-in-c

Related

Better way to get common keys from 2 std::maps

I have references to 2 maps of type std::map<std::string, int>. I want to create a master list containing all the keys that both maps have in common. My current solution is as follows, but I am curious if there is a more efficient way of approaching this problem?
const std::map<std::string, int>& map1;
const std::map<std::string, int>& map2;
std::vector<std::string> shared_keys;
// only add to master list if both contain the string as a key
for (auto& entry : map1) {
if (map2.find(entry.first) != map2.end()) {
shared_keys.push_back(entry.first);
}
}
It would be nice if I could forgo the for loop entirely / do this as a "one-liner", but not sure how to accomplish that...
std::map is sorted, so you can just use std::set_intersection.
You'll need a custom comparator, since you're only comparing keys ... and then you need an adapter to only use the key in the output iterator ...
A one-liner is pushing it, unless you use something like the Boost.Iterator adapters. Rough sketch (untested):
template <typename K, typename V>
vector<K> map_key_intersection(map<K,V> const &a, map<K,V> const &b)
{
vector<K> result;
using Elem = typename map<K,V>::value_type;
set_intersection(a.begin(), a.end(),
b.begin(), b.end(),
boost::make_function_ouput_iterator(
[&result](Elem const &e) { result.push_back(e.first); }),
[](Elem const& a, Elem const& b) { return a.first < b.first; });
return result;
}
NB, there are several things wrong with this in practice, even apart from the fact that ranges are a better approach if you have access:
The std::map has more than two template parameters. So, add the Compare and Allocator params to your list.
What if they had different Compare types? Now we might not meet the requirements of set_intersection.
What if they have the same Compare type, but were constructed with a stateful comparator that does a different thing for each instance? Weird, but possible ... and we still don't meet the ordering constraint, but it's more expensive to check.
So, to be exactly correct, you should use eg. a.value_comp() instead of the bare operator<, but you also need to be reasonably sure that both maps use the same ordering. At least, you should add a comment to the effect that it's your client's problem if they don't.
You can use std::set_intersection, although as a one liner you will also get values from the one of the maps.
std::vector<std::pair<const std::string, int>> shared;
std::set_intersection(map1.begin(), map1.end(), map2.begin(), map2.end(), std::back_inserter(shared), map1.value_comp());
With C++20's ranges library (or a similar C++11 library), you can grab the keys for the intersection.
std::ranges::set_intersection(std::ranges::keys_view(map1), std::ranges::keys_view(map2), std::back_inserter(shared_keys));

How to implement something like std::copy_if but apply a function before inserting into a different container

Full disclosure, this may be a hammer and nail situation trying to use STL algorithms when none are needed. I have seen a reappearing pattern in some C++14 code I am working with. We have a container that we iterate through, and if the current element matches some condition, then we copy one of the elements fields to another container.
The pattern is something like:
for (auto it = std::begin(foo); it!=std::end(foo); ++it){
auto x = it->Some_member;
// Note, the check usually uses the field would add to the new container.
if(f(x) && g(x)){
bar.emplace_back(x);
}
}
The idea is almost an accumulate where the function being applied does not always return a value. I can only think of a solutions that either
Require a function for accessing the member your want to accumulate and another function for checking the condition. i.e How to combine std::copy_if and std::transform?
Are worse then the thing I want to replace.
Is this even a good idea?
A quite general solution to your issue would be the following (working example):
#include <iostream>
#include <vector>
using namespace std;
template<typename It, typename MemberType, typename Cond, typename Do>
void process_filtered(It begin, It end, MemberType iterator_traits<It>::value_type::*ptr, Cond condition, Do process)
{
for(It it = begin; it != end; ++it)
{
if(condition((*it).*ptr))
{
process((*it).*ptr);
}
}
}
struct Data
{
int x;
int y;
};
int main()
{
// thanks to iterator_traits, vector could also be an array;
// kudos to #Yakk-AdamNevraumont
vector<Data> lines{{1,2},{4,3},{5,6}};
// filter even numbers from Data::x and output them
process_filtered(std::begin(lines), std::end(lines), &Data::x, [](int n){return n % 2 == 0;}, [](int n){cout << n;});
// output is 4, the only x value that is even
return 0;
}
It does not use STL, that is right, but you merely pass an iterator pair, the member to lookup and two lambdas/functions to it that will first filter and second use the filtered output, respectively.
I like your general solutions but here you do not need to have a lambda that extracts the corresponding attribute.
Clearly, the code can be refined to work with const_iterator but for a general idea, I think, it should be helpful. You could also extend it to have a member function that returns a member attribute instead of a direct member attribute pointer, if you'd like to use this method for encapsulated classes.
Sure. There are a bunch of approaches.
Find a library with transform_if, like boost.
Find a library with transform_range, which takes a transformation and range or container and returns a range with the value transformed. Compose this with copy_if.
Find a library with filter_range like the above. Now, use std::transform with your filtered range.
Find one with both, and compose filtering and transforming in the appropriate order. Now your problem is just copying (std::copy or whatever).
Write your own back-inserter wrapper that transforms while inserting. Use that with std::copy_if.
Write your own range adapters, like 2 3 and/or 4.
Write transform_if.

sorting std::map by value

Right now I have a map and I need to sort it by value(int), and then by key(string) if there is a tie. I know I would need to write a customized comparison function for this, however so far I haven't been able to make it work.
(I need to store my stuffs in a map since the strings are words and ints are the frequencies and I will need to 'find' the pairs by searching the keys later)
The std::map can only be sorted by key (string in your case).
If you need to sort it by value as well, you'd need to create a std::multimap with the int as key and the string as value, and populate it by iterating over the map.
Alternatively, you could also create a vector<pair<int,string>> that you populate by iteration over the map and just use std::sort().
You can use a std::multiset<std::pair<int, std::string>>
With the information given, it's a bit of a guessing game, but unless you are shuffling massive amounts of data, this may do.
using entry = std::pair<std::string, int>;
using CompareFunc = bool(*)(const entry&, const entry&);
using sortset = std::set<entry, CompareFunc>;
sortset bv(themap.begin(), themap.end(), [](auto& a, auto&b){ a.second!=b.second?a.second<b.second:a.first<b.first; });
for(const auto& d : bv) {
//
}

std::sort to sort an array and a list of index?

I have a function that takes two vectors of the same size as parameters :
void mysort(std::vector<double>& data, std::vector<unsigned int>& index)
{
// For example :
// The data vector contains : 9.8 1.2 10.5 -4.3
// The index vector contains : 0 1 2 3
// The goal is to obtain for the data : -4.3 1.2 9.8 10.5
// The goal is to obtain for the index : 3 1 0 2
// Using std::sort and minimizing copies
}
How to solve that problem minimizing the number of required copies ?
An obvious way would be to make a single vector of std::pair<double, unsigned int> and specify the comparator by [](std::pair<double, unsigned int> x, std::pair<double, unsigned int> y){return x.first < y.first;} and then to copy the results in the two original vectors but it would not be efficient.
Note : the signature of the function is fixed, and I cannot pass a single vector of std::pair.
Inside the function, make a vector positions = [0,1,2,3...]
Sort positions with the comparator (int x, int y){return data[x]<data[y];}.
Then iterate over positions , doing result.push_back(index[*it]);
This assumes the values in index can be arbitrary. If it is guaranteed to already be [0,1,2..] as in your example, then you don't to make the positions array, just use index in it's place and skip the last copy.
http://www.boost.org/doc/libs/1_52_0/libs/iterator/doc/index.html#iterator-facade-and-adaptor
Write a iterator over std::pair<double&, signed int&> that actually wraps a pair of iterators into each vector. The only tricky part is making sure that std::sort realizes that the result is a random access iterator.
If you can't use boost, just write the equivalent yourself.
Before doing this, determine if it is worth your bother. A zip, sort and unzip is easier to write, and programmer time can be exchanged for performance in lots of spots: until you konw where it is optimally spent, maybe you should just do a good-enough job and then benchmark where you need to speed things up.
You can use a custom iterator class, which iterates over both vectors in parallel. Its internal members would consist of
Two references (or pointers), one for each vector
An index indicating the current position
The value type of the iterator should be a pair<double, unsigned>. This is because std::sort will not only swap items, but in some cases also temporarily store single values. I wrote more details about this in section 3 of this question.
The reference type has to be some class which again holds references to both vectors and a current index. So you might make the reference type the same as the iterator type, if you are careful. The operator= of the reference type must allow assignment from the value type. And the swap function should be specialized for this reference, to allow swapping such list items in place, by swapping for both lists separately.
You can use a functor class to hold a reference to the value array and use it as the comparator to sort the index array. Then copy the values to a new value array and swap the contents.
struct Comparator
{
Comparator(const std::vector<double> & data) : m_data(data) {}
bool operator()(int left, int right) const { return data[left] < data[right]; }
const std::vector<double> & m_data;
};
void mysort(std::vector<double>& data, std::vector<unsigned int>& index)
{
std::sort(index.begin(), index.end(), Comparator(data));
std::vector<double> result;
result.reserve(data.size());
for (std::vector<int>::iterator it = index.begin(), e = index.end(); it != e; ++it)
result.push_back(data[*it]);
data.swap(result);
}
This should do it:
std::sort(index.begin(), index.end(), [&data](unsigned i1, unsigned i2)->bool
{ return data[i1]<data[i2]; });
std::sort(data.begin(), data.end());

predicate for a map from string to int

I have this small program that reads a line of input & prints the words in it, with their respective number of occurrences. I want to sort the elements in the map that stores these values according to their occurrences. I mean, the words that only appear once, will be ordered to be at the beginning, then the words that appeared twice 7 so on. I know that the predicate should return a bool value, but I don't know what the parameters should be. Should it be two iterators to the map? If some one could explain this, it would be greatly appreciated. Thank you in advance.
#include<iostream>
#include<map>
using std::cout;
using std::cin;
using std::endl;
using std::string;
using std::map;
int main()
{
string s;
map<string,int> counters; //store each word & an associated counter
//read the input, keeping track of each word & how often we see it
while(cin>>s)
{
++counters[s];
}
//write the words & associated counts
for(map<string,int>::const_iterator iter = counters.begin();iter != counters.end();iter++)
{
cout<<iter->first<<"\t"<<iter->second<<endl;
}
return 0;
}
std::map is always sorted according to its key. You cannot sort the elements by their value.
You need to copy the contents to another data structure (for example std::vector<std::pair<string, int> >) which can be sorted.
Here is a predicate that can be used to sort such a vector. Note that sorting algorithms in C++ standard library need a "less than" predicate which basically says "is a smaller than b".
bool cmp(std::pair<string, int> const &a, std::pair<string, int> const &b) {
return a.second < b.second;
}
You can't resort a map, it's order is predefined (by default, from std::less on the key type). The easiest solution for your problem would be to create a std::multimap<int, string> and insert your values there, then just loop over the multimap, which will be ordered on the key type (int, the number of occurences), which will give you the order that you want, without having to define a predicate.
You are not going to be able to do this with one pass with an std::map. It can only be sorted on one thing at a time, and you cannot change the key in-place. What I would recommend is to use the code you have now to maintain the counters map, then use std::max_element with a comparison function that compares the second field of each std::pair<string, int> in the map.
A map has its keys sorted, not its values. That's what makes the map efficent. You cannot sort it by occurrences without using another data structure (maybe a reversed index!)
As stated, it simply won't work -- a map always remains sorted by its key value, which would be the strings.
As others have noted, you can copy the data to some other structure, and sort by the value. Another possibility would be to use a Boost bimap instead. I've posted a demo of the basic idea previously.
You probably want to transform map<string,int> to vector<pair<const string, int> > then sort the vector on the int member.
You could do
struct PairLessSecond
{
template< typename P >
bool operator()( const P& pairLeft, const P& pairRight ) const
{
return pairLeft.second < pairRight.second;
}
};
You can probably also construct all this somehow using a lambda with a bind.
Now
std::vector< std::map<std::string,int>::value_type > byCount;
std::sort( byCount.begin(), byCount.end(), PairLessSecond() );