MATLAB find() / Numpy nonzero idioms for Eigen - c++

Chances are this is a very stupid question but I spent a pretty absurd amount of time looking for it on the documentation, to no avail.
in MATLAB, the find() function gives me an array with the indices of nonzero elements. Numpy's np.nonzero function does something similar.
How do I do this in the C++ Eigen library? I have a Boolean array of
typedef <bool, 10, 1> foobar = MatrixA < MatrixB;
so far. Thanks!

Not sure if this is part of your question, but to construct the appropriate element-wise inequality result you must first cast your matrices to arrays:
MatrixXd A,B;
...
Matrix<bool,Dynamic,Dynamic> C = A.array()<B.array();
Now C is the same size as A and B and C(i,j) = A(i,j) < B(i,j).
To find all of the indices (assuming column-major order) of the true entries, you can use this compact c++11 routine---as described by libigl's conversion table:
VectorXi I = VectorXi::LinSpaced(C.size(),0,C.size()-1);
I.conservativeResize(std::stable_partition(
I.data(), I.data()+I.size(), [&C](int i){return C(i);})-I.data());
Now I is C.nonZeros() long and contains indices of the true entries in C. These two lines essentially implement find.

It is reasonable to expect Eigen to have a find() function. Unfortunately, Eigen doesn't have one, or even a less than operator for matrices. Fortunately, the problem isn't too difficult. Here is one solution to the problem. I am using vector to store the Column Major indices of elements > 0. You could use VectorXf if you prefer that. Use this on B - A (B-A > 0 is the same as evaluating B>A). I'm using the stl for_each() function.
#include<algorithm>
#include<vector>
#include <Eigen/Dense>
using namespace Eigen;
using namespace std;
class isGreater{
public:
vector<int>* GT;
isGreater(vector<int> *g){GT = g;}
void operator()(float i){static int it = 0; if(i>0)GT->push_back(it); it++;}
};
int main(int argc,char **argv){
MatrixXf P = MatrixXf::Random(4,5);
vector<int> GT;
for_each(P.data(),P.data()+P.rows()*P.cols(),isGreater(&GT));
cout<<P<<endl;
for(int i=0;i<GT.size();++i)cout<<GT[i]<<" ";
cout<<GT.size()<<endl;
return 0;
}

This might work for you and others who check this out. In order to set elements of a matrix m based on the condition on another matrix A, you can use this notation:
m = (A.array() != 0).select(1, m);
This command replaces those elements in matrix m that have non-zero corresponding elements in A, with one.

Related

how to sort an array in descending order using a boolean function?

Since I am new in competitive programming so I am finding this a bit difficult. I encountered a code and I am not able to figure it out, need some help to understand it.
#include<iostream>
#include<algorithm>
using namespace std;
bool mycompare(int a ,int b){
return a>b;
}
int main(){
int a[]={5,4,3,1,2,6,7};
int n =sizeof(a)/sizeof(int);
sort(a,a+n,mycompare);
for(int i=0; i<n;i++){
cout<<a[i]<<"";
}
return 0;
}
output:
7 6 5 4 3 2 1
How does this code work more specifically what does the mycompare function do in the code?
My doubt is that why haven't we passed any arguments in the mycompare() function inside the main() function since the prototype of the function is
bool mycompare(int a, int b);
A comparison-based sorting algorithm sorts the elements solely by pair-wise comparison, i.e., if a < b holds, then a has to be placed before b.
This is a fine approach, but if you limit yourself to using <, it only allows you to sort elements in an ascending order. What if you want to have them in descending order, or any other ordering? This is where the concept of a Comparator (or a Compare callable in the context of the C++ standard) comes into play: It is a binary predicate bool compare(element a, element b) that is supposed to replace the < operator, i.e., a < b becomes compare(a, b) instead. This generalization allows you to encapsulate all types of orderings, in your question you already provided an example where the comparison uses a greater-than operator >, which gives you the aforementioned descending sorted order.
As for how this works internally in C++, the details can be rather complicated, but you can look at it as this:
mycompare without any parameters is a function pointer, i.e. a pointer to the memory address where the machine code for mycompare starts. You can do something like
auto func_pointer = mycompare;
func_pointer(1, 2); // calls mycompare(1, 2)
By giving this function pointer as a parameter to std::sort, you replace the default < comparison function by your own. The way C++ works internally gives the additional advantage that this function call can most likely be inlined, i.e., the compiler avoids the function call can be avoided by copying the code from mycompare into the std::sort invocation, which can speed up your code significantly.
std::sort takes a RandomIt (random iterator) as the first and second arguments that must satisfy the requirements of ValueSwappable and LegacyRandomAccessIterator. Instead of using a Plain-Old-Array of int, you want to use std::array which can then provide the iterators with the member functions .begin() and .end().
Using a proper container from the C++ standard template library makes sorting with std::sort trivial. You need not even provide a custom compare function to sort in descending order as std::less<int>() is provided for you (though your purpose may be to provide the compare function)
Your prototype for mycompare will work fine as is, but preferably the parameters are const type rather than just type, e.g.
bool mycompare(const int a, const int b)
{
return a > b;
}
The implementation using the array container is quite trivial. Simply declare/initialize your array a and then call std::sort (a.begin(), a.end(), mycompare); A complete working example would be:
#include <iostream>
#include <algorithm>
#include <array>
bool mycompare(const int a, const int b)
{
return a > b;
}
int main (void) {
std::array<int, 7> a = { 5, 4, 3, 1, 2, 6, 7 };
std::sort (a.begin(), a.end(), mycompare);
for (auto& i : a)
std::cout << " " << i;
std::cout << '\n';
}
Example Use/Output
$ ./bin/array_sort
7 6 5 4 3 2 1
Sorting the Plain Old Array*
If you must use a Plain-Old-Array, then you can use plain-old-pointers as your random iterrators. While not a modern C++ approach, you can handle the plain-old-array with std::sort. You can make use of the builtin std::greater<type>() for a descending sort or std::less<type>() for an ascending sort.
An implementation using pointers would simply be:
#include <iostream>
#include <algorithm>
int main (void) {
int a[] = { 5, 4, 3, 1, 2, 6, 7 };
size_t n = sizeof a / sizeof *a;
#if defined (ASCEND)
std::sort (a, a + n, std::less<int>());
#else
std::sort (a, a + n, std::greater<int>());
#endif
for (size_t i = 0; i < n; i++)
std::cout << " " << a[i];
std::cout << '\n';
}
(same output unless -DASCEND is added as a define on the commandline, and then an ascending sort will result from the use of std::less<int>())
Look things over and let me know if you have further questions.

generating random array in c++11

I want to create an MxN array (M particles in N dimensional space) filled with random numbers within an upper and lower boundary. I have a working python code that looks something like this:
# upper_bound/lower_bound are arrays of shape (dim,)
positions = np.random.rand(num_particle,dim)*(upper_bound-lower_bound)+lower_bound
Each row represents a particle, and each column represents a dimension in the problem space. So the upper_bound and lower_bound applies to each column. Now I want to translate the above code to c++, and I have something like this:
#include <iostream>
#include <vector>
#include <random>
#include <algorithm>
#include <ctime>
typedef std::vector<double> vect1d;
std::vector<vect1d> positions;
for (int i=0; i<num_particle; i++){
std::mt19937_64 generator(static_cast<std::mt19937::result_type>(time(0)));
std::uniform_real_distribution<double> distribution(0,1);
vect1d pos(dimension);
std::generate(pos.begin(),pos.end(),distribution(generator));
positions[i] = pos;
}
My problems:
It gives error regarding the generator, so I'm not sure if I set it properly. I'm also not sure how to use the std::generator. I'm trying it as I've looked at other similar posts and it seems that it allows me to generate more than one random number at a time, so I don't have to run it MxN times for each element. Is this true and how to correctly use it?
In python I can just vectorization and broadcasting to manipulate the numpy array. What's the most 'vectorized' way to do it in c++?
The above (incorrect) code only creates random number between 0 and 1, but how to incorporate the lower_bound and upper_bound as in the python version? I understand that I can change the values inside distribution(0,1), but the problem is the limits can be different for each dimension (so each column can have different valid range), so what's the most efficient way to generate random number, taking into account the range for each dimension?
Thanks
First of all, you're doing more work than you need to with your Python version, just use:
np.random.uniform(lower_bound, upper_bound, size=(num_particle, dim))
In your C++ attempt, the line
std::generate(pos.begin(),pos.end(),distribution(generator));
Is incorrect as the third argument must be a function not a value. A reasonable C++ equivalent would be:
using RandomVector = std::vector<double>;
using RandomMatrix = std::vector<RandomVector>;
template <typename Generator=std::mt19937_64>
RandomMatrix&
fill_uniform(const double low, const double high, RandomMatrix& result)
{
Generator gen {static_cast<typename Generator::result_type>(time(0))};
std::uniform_real_distribution<double> dist {low, high};
for (auto& col : result) {
std::generate(std::begin(col), std::end(col), [&] () { return dist(gen); });
}
return result;
}
template <typename Generator=std::mt19937_64>
RandomMatrix
generate_uniform(const double low, const double high,
const std::size_t ncols, const std::size_t nrows)
{
RandomMatrix result(ncols, RandomVector(nrows));
return fill_uniform<Generator>(low, high, result);
}
int main()
{
auto m = generate_uniform(2, 11, 2, 3);
for (const auto& col : m) {
for (const auto& v : col) {
std::cout << v << " ";
}
std::cout << '\n';
}
}
You could generalise this to generate arbitrary dimension tensors (like the NumPy version) without too much work.
I'll address them in random order:
3.You have several options - using one generator per row, created like distribution(row_lower_limit, row_upper_limit). Should be cheap enough to not cause issues. If you want to reuse the same generator, just do something like row_lower_limit + distribution(generator) * (row_upper_limit - row_lower_limit). The distribution is in both cases U[row_lower_limit, row_upper_limit].
2.The vectorization came from the numpy library, not from Python itself. It provided some nice UX at most. C++ doesn't have an equivalent library to numpy (though there's a lot of libraries for it as well - just nothing so univeral). You wouldn't be wrong by doing two nested fors. You'd perhaps be better served by just declaring a NxM array rather than a vector, like here.
1.Not sure how to help with the problem since we don't know the error. The cplusplus.com reference has an example of how to initialize this with reference to a random_device.

Find minimum and maximum of a long vector

I want to find both the minimum and maximum of a long vector. The following code works, but I need to traverse the vector twice.
I could use an old fashioned for loop, but I wonder if there is an elegant (c++11, std) way of doing it.
#include <vector>
#include <algorithm>
using namespace std;
int main(int argc, char** argv) {
vector<double> C;
// code to insert values in C not shown here
const double cLower = *min_element(C.begin(), C.end());
const double cUpper = *max_element(C.begin(), C.end());
// code using cLower and cUpper
}
You mean like std::minmax_element?
auto mm = std::minmax_element(std::begin(c), std::end(c));
const double cLower = *mm.first;
const double cUpper = *mm.second;
Note this assumes the range is not empty (as does your existing solution), else the iterator dereferences are Undefined Behaviour.
Also note this is not quite the same as your solution, as max_element returns the first largest element, and minmax_element returns the last largest element. E.g.
1 2 1 2
^ ^
A B
Where A is found by your solution, and B is found by mine. (This is for reasons of stability; Alex Stepanov got the definition of max wrong in the original STL.)

qsort comparison compilation error

My medianfilter.cpp class invokes qsort as seen below.
vector<float> medianfilter::computeMedian(vector<float> v) {
float arr[100];
std::copy(v.begin(), v.end(), arr);
unsigned int i;
qsort(arr, v.size(), sizeof(float), compare);
for (i = 0; i < v.size(); i++) {
printf("%f ", arr[i]);
}
printf("median=%d ", arr[v.size() / 2]);
return v;
}
The implementaiton of my comparison is:
int medianfilter::compare(const void * a, const void * b) {
float fa = *(const float*) a;
float fb = *(const float*) b;
return (fa > fb) - (fa < fb);
}
while the declaration in mediafilter.hpp is set private and looks like that:
int compare (const void*, const void*);
A compilation error occurs: cannot convert ‘mediafilter::compare’ from type ‘int (mediafilter::)(const void*, const void*)’ to type ‘__compar_fn_t {aka int (*)(const void*, const void*)}’
I don't understand this error completly. How do I correctly declare and implement this comparison method?
Thanks!
Compare is a non-static member function whereas qsort expects a non-member function (or a static member function). As your compare function doesn't seem to use any non-static members of the class, you could just declare it static. In fact I'm not sure what your median filter class does at all. Perhaps you just need a namespace.
Why not sort the vector directly instead of copying it into a second array? Furthermore your code will break if the vector has more than 100 elements.
The default behavior of sort does just want you need, but for completeness I show how to use a compare function.
I also changed the return type of your function because I don't understand why a function called computeMedian wouldn't return the median..
namespace medianfilter
{
bool compare(float fa, float fb)
{
return fa < fb;
}
float computeMedian(vector<float> v)
{
std::sort(v.begin(), v.end(), compare);
// or simply: std::sort(v.begin(), v.end());
for (size_t i = 0; i < v.size(); i++) {
printf("%f ", v[i]);
}
if (v.empty())
{
// what do you want to happen here?
}
else
{
float median = v[v.size() / 2]; // what should happen if size is odd?
printf("median=%f ", median); // it was %d before
return median;
}
}
}
You can't call compare as it is because it is a member function and requires a this pointer (i.e. it needs to be called on an object). However, as your compare function doesn't need a this pointer, simply make it a static function and your code will compile.
Declare it like this in your class:
static int compare(const void * a, const void * b);
Not directly related to your question (for which you already have the answer) but some observations:
Your calculation of median is wrong. If the number of elements is even you should return the average of the two center values not the value of lower one.
The copy to the array with a set size screams buffer overflow. Copy to another vector and std:sort it or (as suggested by #NeilKirk) just sort the original one unless you have cause not to modify it.
There is no guard against empty input. Median is undefined in this case but your implementation would just return whatever happens to be on arr[0]
Ok, this is more of an appendix to Eli Algranti (excellent) answer than an answer to the original question.
Here is a generic code to compute the quantile quant of a vector of double called x (which the code below preserves).
First things first: there are many definitions of quantiles (R alone lists 9). The code below corresponds to definition #5 (which is also the default quantile function in matlab and generally the ones statisticians think of when they think quantile).
The key idea here is that when the quantile do not fall on a precise observation (e.g. when you want the 15% quantile of an array of length 10) the implementation below realizes the (correct) interpolation (in this case between the 10% and 20%) between adjacent quantile. This is important so that when you increase the number of observations (i m hinting at the name medianfilter here) the value of the quantile do not jump about abruptly but converges smoothly instead (which is one reason why this is the statistician's preferred definition).
The code assumes that x has at least one element (the code below is part of a longer one and I feel this point has been made already).
Unfortunately it s written using many function from the (excellent!) c++ eigen library and it is too late for me at this advanced time in the night to translate the eigen functions --or sanitize the variable names--, but the key ideas should be readable.
#include <Eigen/Dense>
#include <Eigen/QR>
using namespace std;
using namespace Eigen;
using Eigen::MatrixXd;
using Eigen::VectorXd;
using Eigen::VectorXi;
double quantiles(const Ref<const VectorXd>& x,const double quant){
//computes the quantile 'quant' of x.
const int n=x.size();
double lq,uq,fq;
const double q1=n*(double)quant+0.5;
const int index1=floor(q1);
const int index2=ceil(q1);
const double index3=(double)index2-q1;
VectorXd x1=x;
std::nth_element(x1.data(),x1.data()+index1-1,x1.data()+x1.size());
lq=x1(index1-1);
if(index1==index2){
fq=lq;
} else {
uq=x1.segment(index1,x1.size()-index1-1).minCoeff();
fq=lq*index3+uq*(1.0-index3);
}
return(fq);
}
So the code uses one call to nth_element, which has average complexity O(n) [sorry for sloppely using big O for average] and (when n is even) one extra call to min() [which in eigen dialect is noted .minCoeff()] on at most n/2 elements of the vector, which is O(n/2).
This is much better than using partial sort (which would cost O(nlog(n/2)), worst case) or sort (which would cost
O(nlogn))

how to vectorize this program

The program below (well, the lines after "from here") is a construct i have to use a lot.
I was wondering whether it is possible (eventually using functions from the eigen library)
to vectorize or otherwise make this program run faster.
Essentially, given a vector of float x, this construct has recover the indexes
of the sorted elements of x in a int vector SIndex. For example, if the first
entry of SIndex is 10, it means that the 10th element of x was the smallest element
of x.
#include <algorithm>
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <vector>
using std::vector;
using namespace std;
typedef pair<int, float> sortData;
bool sortDataLess(const sortData& left, const sortData& right){
return left.second<right.second;
}
int main(){
int n=20,i;
float LO=-1.0,HI=1.0;
srand (time(NULL));
vector<float> x(n);
vector<float> y(n);
vector<int> SIndex(n);
vector<sortData> foo(n);
for(i=0;i<n;i++) x[i]=LO+(float)rand()/((float)RAND_MAX/(HI-LO));
//from here:
for(i=0;i<n;i++) foo[i]=sortData(i,x[i]);
sort(foo.begin(),foo.end(),sortDataLess);
for(i=0;i<n;i++){
sortData bar=foo[i];
y[i]=x[bar.first];
SIndex[i]=bar.first;
}
for(i=0;i<n;i++) std::cout << SIndex[i] << std::endl;
return 0;
}
There's no getting around the fact that this is a sorting problem, and vectorization doesn't necessarily improve sorts very much. For example, the partition step of quicksort can do the comparison in parallel, but it then needs to select and store the 0–n values that passed the comparison. This can absolutely be done, but it starts throwing out the advantages you get from vectorization—you need to convert from a comparison mask to a shuffle mask, which is probably a lookup table (bad), and you need a variable-sized store, which means no alignment (bad, although maybe not that bad). Mergesort needs to merge two sorted lists, which in some cases could be improved by vectorization, but in the worst case (I think) needs the same number of steps as the scalar case.
And, of course, there's a decent chance that any major speed boost you get from vectorization will have already been done inside your standard library's std::sort implementation. To get it, though, you'd need to be sorting primitive types with the default comparison operator.
If you're worried about performance, you can easily avoid the last loop, though. Just sort a list of indices using your float array as a comparison:
struct IndirectLess {
template <typename T>
IndirectLess(T iter) : values(&*iter) {}
bool operator()(int left, int right)
{
return values[left] < values[right];
}
float const* values;
};
int main() {
// ...
std::vector<int> SIndex;
SIndex.reserve(n);
for (int i = 0; i < n; ++i)
SIndex.push_back(n);
std::sort(SIndex.begin(), SIndex.end(), IndirectLess(x.begin()));
// ...
}
Now you've only produced your list of sorted indices. You have the potential to lose some cache locality, so for really big lists it might be slower. At that point it might be possible to vectorize your last loop, depending on the architecture. It's just data manipulation, though—read four values, store 1st and 3rd in one place and 2nd and 4th in another—so I wouldn't expect Eigen to help much at that point.