C++ Armadillo: Get the ranks of the elements in a vector - c++

Suppose I have a vector, and I want to get the ranks of the elements if they were sorted.
So if I have the vector:
0.5
1.5
3.5
0.1
and I need returned the ranks of each element:
2
3
4
1
Is there a way to do this in Armadillo? This is different than the previous post since we are getting the ranks and not the indices before sorting.

Here, check this out:
#include<iostream>
#include<vector> // std:: vector
#include <algorithm> // std::sort
#include <map> // std::map
using namespace std;
int main() {
vector<double> myVector = { 0.5, 1.5, 3.5, 0.1 };
vector<double> Sorted = myVector;
std::sort(Sorted.begin(), Sorted.end());
map<double, int> myMap;
for (int i = 0; i < Sorted.size() ; i++)
{
myMap.insert(make_pair(Sorted[i],i));
}
for (int i = 0; i < myVector.size() ; i++)
{
auto it = myMap.find(myVector[i]);
cout << it->second + 1 << endl;
}
return 0;
};
Output:

Here is my codes using STL to get the rankings in a concise way
template <typename T>
vector<size_t> calRank(const vector<T> & var) {
vector<size_t> result(var.size());
//sorted index
vector<size_t> indx(var.size());
iota(indx.begin(),indx.end(),0);
sort(indx.begin(),indx.end(),[&var](int i1, int i2){return var[i1]<var[i2];});
//return ranking
for(size_t iter=0;iter<var.size();++iter){
//it may cause overflow for a really long vector, in practice it should be ok.
result[indx[iter]]=iter+1;
}
return result;
}

Here is a solution in pure Armadillo code, using the function arma::sort_index().
The function arma::sort_index() calculates the permutation index to sort a given vector into ascending order.
Applying the function arma::sort_index() twice:
arma::sort_index(arma::sort_index())}, calculates the reverse permutation index to sort the vector from ascending order back into its original unsorted order. The ranks of the elements are equal to the reverse permutation index.
Below is Armadillo code wrapped in RcppArmadillo that defines the function calc_ranks(). The function calc_ranks() calculates the ranks of the elements of a vector, and it can be called from R.
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::depends(RcppArmadillo)]]
// Define the function calc_ranks(), to calculate
// the ranks of the elements of a vector.
//
// [[Rcpp::export]]
arma::uvec calc_ranks(const arma::vec& da_ta) {
return (arma::sort_index(arma::sort_index(da_ta)) + 1);
} // end calc_ranks
The above code can be saved to the file calc_ranks.cpp, so it can be compiled in R using the function Rcpp::sourceCpp().
Below is R code to test the function calc_ranks() (after it's been compiled in R):
# Compile Rcpp functions
Rcpp::sourceCpp(file="C:/Develop/R/Rcpp/calc_ranks.cpp")
# Create a vector of random data
da_ta <- runif(7)
# Calculate the ranks of the elements
calc_ranks(da_ta)
# Compare with the R function rank()
all.equal(rank(da_ta), drop(calc_ranks(da_ta)))

Related

C++ vector sorting and mapping from unsorted to sorted elements

I have to perform the following task. Take a std::vector<float>, sort the elements in descending order and have an indexing that maps the unsorted elements to the sorted ones. Please note that the order really matters: I need a map that, given the i-th element in the unsorted vector, tells me where this element is found in the sorted one. The vice-versa has been achieved already in a pretty smart way (through c++ lambdas) e.g., here: C++ sorting and keeping track of indexes. Nevertheless, I was not able to find an equally smart way to perform the "inverse" task. I would like to find a fast way, since this kind of mapping has to be performed many times and the vector has big size.
Please find in the following a simple example of what I need to achieve and my (probably suboptimal, since it relies on std::find) solution of the problem. Is it the most fast/efficient way to perform this task? If not, are there better solutions?
Example
Starting vector: v = {4.5, 1.2, 3.4, 2.3}
Sorted vector: v_s = {4.5, 3.4, 2.3, 1.2}
What I do want: map = {0, 3, 1, 2}
What I do not want: map = {0, 2, 3, 1}
My solution
template <typename A> std::vector<size_t> get_indices(std::vector<A> & v_unsorted, std::vector<A> & v_sorted) {
std::vector<size_t> idx;
for (auto const & element : v_unsorted) {
typename std::vector<A>::iterator itr = std::find(v_sorted.begin(), v_sorted.end(), element);
idx.push_back(std::distance(v_sorted.begin(), itr));
}
return idx;
}
Thanks a lot for your time, cheers!
You can use the code below.
My version of get_indices does the following:
Create a vector of indices mapping sorted -> unsorted, using code similar to the one in the link you mentioned in your post (C++ sorting and keeping track of indexes).
Then by traversing those indices once, create the sorted vector, and the final indices mapping unsorted -> sorted.
The complexity is O(n * log(n)), since the sort is done in O(n * log(n)), and the final traversal is linear.
The code:
#include <iostream>
#include <vector>
#include <algorithm>
#include <numeric>
template <typename T>
std::vector<size_t> get_indices(std::vector<T> const & v_unsorted, std::vector<T> & v_sorted)
{
std::vector<size_t> idx_sorted2unsorted(v_unsorted.size());
std::iota(idx_sorted2unsorted.begin(), idx_sorted2unsorted.end(), 0);
// Create indices mapping (sorted -> unsorted) sorting in descending order:
std::stable_sort(idx_sorted2unsorted.begin(), idx_sorted2unsorted.end(),
[&v_unsorted](size_t i1, size_t i2)
{ return v_unsorted[i1] > v_unsorted[i2]; }); // You can use '<' for ascending order
// Create final indices (unsorted -> sorted) and sorted array:
std::vector<size_t> idx_unsorted2sorted(v_unsorted.size());
v_sorted.resize(v_unsorted.size());
for (size_t i = 0; i < v_unsorted.size(); ++i)
{
idx_unsorted2sorted[idx_sorted2unsorted[i]] = i;
v_sorted[i] = v_unsorted[idx_sorted2unsorted[i]];
}
return idx_unsorted2sorted;
}
int main()
{
std::vector<double> v_unsorted{ 4.5, 1.2, 3.4, 2.3 };
std::vector<double> v_sorted;
std::vector<size_t> idx_unsorted2sorted = get_indices(v_unsorted, v_sorted);
for (auto const & i : idx_unsorted2sorted)
{
std::cout << i << ", ";
}
return 0;
}
Output:
0, 3, 1, 2,
Once you have the mapping from sorted to unsorted indices, you only need a loop to invert it.
I am building on the code from this answer: https://stackoverflow.com/a/12399290/4117728. It provides a function to get the vector you do not want:
#include <iostream>
#include <vector>
#include <numeric> // std::iota
#include <algorithm> // std::sort, std::stable_sort
using namespace std;
template <typename T>
vector<size_t> sort_indexes(const vector<T> &v) {
// initialize original index locations
vector<size_t> idx(v.size());
iota(idx.begin(), idx.end(), 0);
// sort indexes based on comparing values in v
// using std::stable_sort instead of std::sort
// to avoid unnecessary index re-orderings
// when v contains elements of equal values
stable_sort(idx.begin(), idx.end(),
[&v](size_t i1, size_t i2) {return v[i1] < v[i2];});
return idx;
}
Sorting the vector is O(N logN) your code calls find N times resulting in O(N*N). On the other hand, adding a single loop is only linear, hence sorting plus the loop is still O(N log N).
int main() {
std::vector<double> unsorted{4.5, 1.2, 3.4, 2.3};
auto idx = sort_indexes(unsorted);
for (auto i : idx) std::cout << unsorted[i] << "\n";
// the vector you do not want
for (auto i : idx) std::cout << i << "\n";
// invert it
std::vector<size_t> idx_inverse(idx.size());
for (size_t i=0;i<idx.size();++i) idx_inverse[ idx[i] ] = i;
// the vector you do want
for (auto i : idx_inverse) std::cout << i << "\n";
}
Live Demo

what leads to the difference of sorting way in priority_queue and sort() function?

I'm learning to use priority_queue container and sort() function in the standard library and am puzzled by the parameters of compare function of priority_queue and sort() function
for the sort() function, my compare function: first argument < second argument will sort from smallest to greatest. just as (1) code below
for priority_queue container, my compare function: first arguemnt > second argument will sort element from smallest to greatest. just like (2) code below
what leads to the difference ?
this is my code of using std::priority_queue:
#include <functional>
#include <queue>
#include <vector>
#include <iostream>
using namespace std;
struct Compare {
bool operator() (int left, int right) {
//here is left>right .......... (1)
return(left>right);
}
};
template<typename T>
void print_queue(T& q) {
while(!q.empty()) {
std::cout << q.top() << " ";
q.pop();
}
std::cout << '\n';
}
int main() {
std::priority_queue<int,vector<int>,Compare> q;
for(int n : {1,8,5,6,3,4,0,9,7,2})
q.push(n);
print_queue(q);
}
Output:
0 1 2 3 4 5 6 7 8 9
this is my code of using sort():
#include<bits/stdc++.h>
using namespace std;
bool compare(int i1, int i2)
{ //here is i1<i2 .............(2)
return (i1 < i2);
}
int main()
{
int arr[] = { 6,2,1,5,8,4};
int n = sizeof(arr)/sizeof(arr[0]);
sort(arr, arr+n, compare);
cout << "Intervals sorted by start time : \n";
for (int i=0; i<n; i++)
cout << arr[i]<<",";
return 0;
}
Output:
1,2,4,5,6,8
The short answer is that the c++ standard says so:
std::sort sorts the elements in the range [first, last) in ascending order
std::priority_queue provides lookup of the largest element (based on std::less)
So you need, to invert the comparison to obtain the same result.
Why std::priority_queue was chosen to expose the greater element, I don't know but I suspect it was inspired by section 5.2.3 of Knuth. (D. E. Knuth, The Art of Computer Programming. Volume 3: Sorting and Searching.) as stated in the notes of good old Stl SGI

exclude non-duplicated int in vector

I'm new to C++, I came from Swift background, and I'm thankful for your help in advance.
I have a vector that contains int values. Some of them are repeated.
My task here is to get the largest repeated value from the vector.
Example:
std::vector<int> myVector;
myVector.push_back(1);
myVector.push_back(8);
myVector.push_back(4);
myVector.push_back(4);
I need a function that returns 4 because it's the largest duplicate int in the vector.
Thanks again, and please, if you have any question, please ask it instead of downvoting.
Solution based only on std algorithms:
Sort the list using std::sort.
Iterate backwards over its elements and detect the first one that is equal to its predecessor using std::adjacent_find and reverse iterators.
I doubt it gets simpler than this. For your enjoyment:
#include <algorithm>
#include <iostream>
#include <vector>
int main()
{
std::vector<int> v{1,2,3,3,4,4,4,5,6,7,7,8};
std::sort(v.begin(), v.end());
auto result = std::adjacent_find(v.rbegin(), v.rend());
if(result == v.rend())
std::cout << "No duplicate elements found.";
else
std::cout << "Largest non-unique element: " << *result;
}
Live example on Coliru.
Properties:
Zero space overhead if the list can be sorted in place.
Complexity: O(N log(N)) less than comparisons and K equality comparisons where K is equal to the number of unique elements larger than the one you're after.
Lines of code making up the algorithm: 2
You could use a map, as someone who commented above, and then place the number of appearances of each element of the vector. Afterwards, you take the maximum element via a custom comparator.
Ideone
#include <iostream>
#include <algorithm>
#include <map>
#include <vector>
int largestRepeatingNumber(const std::vector<int> & vec)
{
std::map<int, int> counter;
std::for_each(std::begin(vec), std::end(vec), [&counter] (int elem) {
counter.find(elem) == counter.end() ? counter[elem] = 1 : ++counter[elem]; });
return std::max_element(std::begin(counter), std::end(counter), [] (auto lhs, auto rhs) {
if (lhs.second == rhs.second)
return lhs.first < rhs.first;
return lhs.second < rhs.second;
})->first;
}
int main()
{
std::vector<int> myVector;
myVector.push_back(1);
myVector.push_back(8);
myVector.push_back(4);
myVector.push_back(4);
myVector.push_back(3);
myVector.push_back(3);
std::cout << largestRepeatingNumber(myVector);
return 0;
}
I have used the lower bound and upper bound
so strategy is
1)Sort the original vector (so that can use unique function on it)
2)find unique (copy the unique values to a vector so that we can use it to find the values in original vector , not extra search)
3)Find the lower and upper bound value with max distance
for example 1 4 4 8
4 will have max distance
5)Store in map using the count as index (map is ordered so max duplicate value will be at the end )
#include<iostream>
#include<algorithm>
#include<vector>
#include<map>
using namespace std;
int main()
{
std::vector<int> myVector,myvec,counter;
map<int,int> maxdupe;
myVector.push_back(1);
myVector.push_back(8);
myVector.push_back(4);
myVector.push_back(4);
sort(myVector.begin(),myVector.end());
std::unique_copy(myVector.begin(), myVector.end(), std::back_inserter(myvec));
std::copy(myvec.begin(), myvec.end(), std::ostream_iterator<int>(std::cout, " "));
cout<<endl;
for(auto i = myvec.begin(); i!=myvec.end();i++)
{
auto lower = std::lower_bound(myVector.begin(), myVector.end(), *i);
auto upper = std::upper_bound(myVector.begin(), myVector.end(), *i);
maxdupe[upper - lower] = *i;
}
for(auto i= maxdupe.begin();i!= maxdupe.end();i++)
{
cout<<i->first<<i->second<<endl;
}
return 0;
}
Output
1 4 8
18
24
Program ended with exit code: 0

How to count duplicate entries of a vector in C++

I'm using Armadillo to do linear algebra calculation in C++.
For example, there is a
vector a = (1,1,2,2,0,2,1,0)
I wish return a matrix
(0, 2) //means 0 shows 2 times in the vector
(1, 3) //1 shows 3 times
(2, 3) //2 shows 3 times
Is there any function can fulfill such job?
As mentioned in comments you could use a std::map to collect the results. Then you can convert to a matrix as you see fit. You could skip the map step and use a matrix directly if it's already pre-initialised with the rows you're after.
As for a function to do this, you can use std::for_each from <algorithm> along with a lambda expression, although it seems overkill when a loop would be fine.
#include <algorithm>
#include <iostream>
#include <vector>
#include <map>
using namespace std;
int main()
{
vector<int> v{1,1,2,2,0,2,1,0};
map<int,int> dup;
for_each( v.begin(), v.end(), [&dup]( int val ){ dup[val]++; } );
for( auto p : dup ) {
cout << p.first << ' ' << p.second << endl;
}
return 0;
}
Here's another solution, using only Armadillo functions, and a C++11 compiler:
vec a = {1,1,2,2,0,2,1,0}; // vec holds elements of type 'double'
vec b = unique(a);
uvec c = hist(a,b); // uvec holds unsigned integers
mat X(b.n_rows, 2);
X.col(0) = b;
X.col(1) = conv_to<vec>::from(c);
X.print("X:");
Explanation:
vec b = unique(a) creates a vector containing the unique elements of a, sorted in ascending order
uvec c = hist(a,b) creates a histogram of counts of elements in a, using b as the bin centers
conv_to<vec>::from(c) converts c (vector with unsigned integers) to the same vector type as a

C++ sort on vector using function object

I'm trying to sort a vector v1 using another vector v2. I can't wrap my head around this error:
terminate called after throwing an instance of 'std::out_of_range'
what(): vector::_M_range_check
Abort trap
while running this code:
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
class Comp
{
public:
Comp(vector<double>& inVec): _V(inVec) {}
bool operator()(int i, int j) {return (_V.at(i)<_V.at(j));}
private:
vector<double> _V;
};
int main(int argc, char** argv)
{
double x1[] = {90.0, 100.0, 80.0};
double x2[] = {9.0, 3.0, 1.0};
vector<double> v1(x1,x1+3);
vector<double> v2(x2,x2+3);
sort(v1.begin(), v1.end(), Comp(v2)); // sort v1 according to v2
for(unsigned int i=0; i<v1.size(); i++)
{
cout << v1.at(i) << " " << v2.at(i) << endl;
}
return 0;
}
v1 and v2 are of the same size. Why the out_of_range error?
Thanks in advance for any pointers.
I believe that your problem is in this line:
bool operator()(int i, int j) {return (_V.at(i)<_V.at(j));}
The problem is that when the std::sort algorithm uses a custom callback, it passes in the actual values stored in the vector at particular locations, not the indices of those locations within the vector. As a result, when you call
sort(v1.begin(), v1.end(), Comp(v2)); // sort v1 according to v2
The Comp comparator you've written will be getting passed as parameters the values stored in the v1 vector and will then try indexing at those positions into the v2 vector. Since the values in v1 are larger than the size of v2, the call to _V.at(i) will cause an out_of_range exception to be thrown.
If you want to sort the two ranges with respect to one another, you'll need to adopt a different approach. I'm not aware of a straightforward way of doing this, but I'll let you know if I think of one.
Size of v1 is just 3, but you're using each value of v2 as index of v1. And as v2 has one value 9 which is greater than the size of v1, that is what gives std::out_of_range error in here:
bool operator()(int i, int j) {return (_V.at(i)<_V.at(j));}
std::vector::at function gives std::out_of_range exception of the index passed to it as argument is greater than the size of vector. That is, the index must be less than vector::size().
Ok, now you're probably aware of the fact, that i and j are actual values held in vector rather than indices. There is a good reason for that: sorting is all about values, not indexes. Note you're passing iterators to sort method, so there is no way it can extract index for you. Of course, you could get index relative to first iterator, but there is no reason for doing this.
However, let's be insane for awhile and imagine you would get indices rather than values in your comparator. Assume that your code does what you want and let's think about following scenario:
v1 = {20,10}; v2 = {2,1}
I secretly assume you want the following output:
v1 = {10, 20}
right? Now imagine I'm a sorting function you're calling and I do following steps:
v2[0] < v2[1] is false, so swap(&v1[0], &v1[1])
It's sorted, isn't it? But wait, I'm a crazy sorting function, so I want to make sure it's sorted, so I do the following:
v2[0] < v2[1] is false, swap(&v1[0], &v1[1])
And again:
v2[0] < v2[1] is false, swap(&v1[0], &v1[1])
and again, again, again...
Can you see a problem? Sorting function has some requirements and for sure you're breaking fundamental one.
I suspect you need completely different container (maybe std::map with keys from vec1 and values from vec2) or at least something like vector< pair<double, double> >, so you can easily sort by either first or second value. If not, consider creating vector with values in range [0, v2.size()), sorting it using your comparator (values are equal to indices, so will be all right) and then print correct values from v1. This code works fine:
vector<size_t> indices;
for(size_t i =0; i < v1.size(); ++i)
{
indices.push_back(i);
}
// yes, it works using your original comparator
sort(indices.begin(), indices.end(), Comp(v2));
for(size_t i =0; i < indices.size(); ++i)
{
cout << v1.at(indices[i]) << " " << v2.at(indices[i]) << endl;
}
Like said in other answers, the problem is that the sort algorithm passes the actual values to compare rather than indices.
Here is how you can solve it:
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
typedef pair<double, double> Zipped; // Represent an element of two lists
// "zipped" together
// Compare the second values of two pairs
bool compareSeconds ( Zipped i, Zipped j )
{
return i.second < j.second;
}
int main ( int argc, char **argv )
{
double x1[] = { 90, 100, 80 };
double x2[] = { 9, 3, 1 };
vector<double> v1(x1, x1 + 3);
vector<double> v2(x2, x2 + 3);
vector<Zipped> zipped(v1.size()); // This will be the zipped form of v1
// and v2
for ( int i = 0; i < zipped.size(); ++i )
{
zipped[i] = Zipped(v1[i], v2[i]);
}
sort(zipped.begin(), zipped.end(), &compareSeconds);
for ( int i = 0; i < zipped.size(); ++i )
{
cout << zipped[i].first << " " << zipped[i].second << endl;
}
for ( int i = 0; i < v1.size(); ++i )
{
v1[i] = zipped[i].first;
}
// At this point, v1 is sorted according to v2
return 0;
}