Ignore NaNs in mean and other stats functions using Armadillo

Ignore NaNs in mean and other stats functions using Armadillo - c++

If a matrix contains NaN values, Armadillo will return NaN for stats performed on the columns/rows containing these NaNs. I.e. the following code
arma::mat A = {{1, 2, 3}, {6, 7, 8}, {4, 9, 10}};
A(1,1) = arma::datum::nan;
std::cout << A << "\n";
std::cout << arma::mean(A) << "\n" << arma::mean(A, 1);
will return
1.0000 2.0000 3.0000
6.0000 nan 8.0000
4.0000 9.0000 10.0000
3.6667 nan 7.0000
2.0000
nan
7.6667
Is there an efficient way to ignore the NaN values much like MATLAB's nanmean() / mean(-, 'omitnan')?
The column-wise mean would then return 5.5, the row-wise mean 7 instead of NaN.

Related

replacing matrix of indices with corresponding vector with armadillo

I have an arma::umat matrix containing indices corresponding to an arma::vec vector containing either 1 or -1:
arma::umat A = { {8,9,7,10,6}, {5,3,1,2,4}};
arma::vec v = {-1, 1, 1, 1, -1, -1, 1, -1, -1 ,1};
I would like to replace each element in the matrix with the corresponding value in the vector, so the output look like this:
A = {{-1,-1,1,1,-1},{-1,1,-1,1,1,1}}
Any suggestions?
Thanks

Saving the result into A is not an option, since A contains unsigned integers, and your v vector has doubles. Just create an arma::mat to contain the result and loop for each row to index v accordingly. One way to do this is using .each_row member.
#include <armadillo>
int main(int argc, char *argv[]) {
arma::umat A = {{7, 8, 6, 9, 5}, {4, 2, 0, 1, 3}};
arma::vec v = {-1, 1, 1, 1, -1, -1, 1, -1, -1, 1};
arma::mat result(A.n_rows, A.n_cols);
auto lineIdx = 0u;
// We capture everything by reference and increase the line index after usage.
// The `.st()` is necessary because the result of indexing `v` is
// a "column vector" and we need a "row vector".
A.each_row([&](auto row) { result.row(lineIdx++) = v(row).st(); });
result.print("result");
return 0;
}
This code prints
result
-1.0000 -1.0000 1.0000 1.0000 -1.0000
-1.0000 1.0000 -1.0000 1.0000 1.0000

Scatter/Gather like Numpy in ArrayFire

I want to scatter and gather elements from an array X at specific indices along one axis.
So given an array of indices idx, I want to select the idx(0)th element along the 0th column, the idx(1)th element along the 1st column, etc..
In Numpy, the following statement:
X = np.array([[1, 2, 3], [4, 5, 6]])
print(X[[0, 1, 1], range(3)])
prints [1, 5, 6].
Furthermore, I can do this process in reverse:
Y = np.zeros((2, 3))
Y[[0, 1, 1], range(3)] = [1, 5, 6]
print(Y)
This will print
[[1. 0. 0.]
[0. 5. 6.]]
However, when I try to replicate this behavior in ArrayFire:
float elements[] = {1, 2, 3, 4, 5, 6};
af::array X = af::array(3, 2, elements);
int idx_elements[] = {0, 1, 1};
af::array idx = af::array(3, idx_elements);
af::print("", X(af::span, idx));
I get an array of shape [3, 3, 1, 1] with the elements
1.0000 4.0000 4.0000
2.0000 5.0000 5.0000
3.0000 6.0000 6.0000
So how can I achieve the desired numpy-like behavior for scattering and gathering elements in ArrayFire?
To perform the gather operation on a matrix, I can extract the diagonal of the resulting matrix but that may not work in the multidimensional case and it doesn't work in the other (scatter) direction.

X
[3 2 1 1]
1.0000 4.0000
2.0000 5.0000
3.0000 6.0000
idx
[3 1 1 1]
0
1
1
ArrayFire does Cartesian product when af::array are involved. Hence, the output.
Please see the below indices because of that.
Col\Row 0 1 1 from array
0 (0, 0) (0,1) (0, 1)
1 (1, 0) (1,1) (1, 1)
2 (2, 0) (2,1) (2, 1)
^
^ from sequence
Thus, the output of X(af::span, idx)) is a 3x3 matrix.
To gather elements based on coordinates, you would need different function
approx2. Note that this function takes it's indices as floating point arrays only.
float idx_elements[] = {0, 1, 1}; // changed the idx to floats
af::array colIdx = af::array(3, idx_elements);
af::array rowIdx = af::iota(3); // same effect as span
af::array out = approx2(X, rowIdx, colIdx);
af_print(out);
// out
// [3 1 1 1]
// 1.0000
// 5.0000
// 6.0000
To set the values for given indices, you would have to flatten the array because of very reason
that array::operator() considers cartesian product when af::array is involved.
af::array A = af::constant(0, 3, 2); // same size as X
af::array B = af::flat(A); // flatten the array, this involves meta data modification only
B(rowIdx + 3 * colIdx) = out; // use row & col indices to fetch linear indices
// rowIdx + 3 * colIdx
// [3 1 1 1]
// 0.0000
// 4.0000
// 5.0000
B = moddims(B, A.dims()); // reset the dimensions to original A dims
af_print(B);
// B
// [3 2 1 1]
// 1.0000 0.0000
// 0.0000 5.0000
// 0.0000 6.0000
You can look more details in our indexing tutorial.

Arrayfire sparse matrix issues

Getting confused with something that should be simple. Spent a bit of time trying to debug this and am not getting too far. Would appreciate if someone could help me out.
I am trying to define a sparse matrix in arrayfire by specifying the value/column/row triples as specified in this function. I want to store the following matrix as sparse:
3 3 4
3 10 0
4 0 3
I code it up as follows:
int row[] = {0,0,0,1,1,2,2};
int col[] = {0,1,2,0,1,0,2};
double values[] = { 3,3, 4,3,10,4,3};
array rr = sparse(3,3,array(7,values),array(7,row),array(7,col));
af_print(rr);
af_print(dense(rr));
I get the following output:
rr
Storage Format : AF_STORAGE_CSR
[3 3 1 1]
rr: Values
[7 1 1 1]
1.0000
2.0000
4.0000
3.0000
10.0000
4.0000
3.0000
rr: RowIdx
[7 1 1 1]
0
0
0
1
1
2
2
rr: ColIdx
[7 1 1 1]
0
1
2
0
1
0
2
dense(rr)
[3 3 1 1]
0.0000 0.0000 0.0000
0.0000 0.0000 3.0000
3.0000 0.0000 0.0000
When printing out stored matrix in dense format, I get something completely different than intended.
How do I make the output of printing the dense version of rr give:
3 3 4
3 10 0
4 0 3

Arrayfire uses (a modified) CSR format, so the rowarray has to be of length number_of_rows + 1. Normally it would be filled with the number of non-zero entries per row, i.e. {0, 3 ,2, 2}. But for Arrayfire, you need to take the cumulative sum, i.e. {0, 3, 5, 7}. So this works for me:
int row[] = {0,3,5,7};
int col[] = {0,1,2,0,1,0,2};
float values[] = {3,3,4,3,10,4,3};
array rr = sparse(3,3,array(7,values),array(4,row),array(7,col));
af_print(rr);
af_print(dense(rr));
However, this is not really convenient, since it is quite different from your input format. As an alternative, you could specify the COO format:
int row[] = {0,0,0,1,1,2,2};
int col[] = {0,1,2,0,1,0,2};
float values[] = { 3,3, 4,3,10,4,3};
array rr = sparse(3,3,array(7,values),array(7,row),array(7,col), AF_STORAGE_COO);
af_print(rr);
af_print(dense(rr));
which produces:
rr
Storage Format : AF_STORAGE_COO
[3 3 1 1]
rr: Values
[7 1 1 1]
3.0000
3.0000
4.0000
3.0000
10.0000
4.0000
3.0000
rr: RowIdx
[7 1 1 1]
0
0
0
1
1
2
2
rr: ColIdx
[7 1 1 1]
0
1
2
0
1
0
2
dense(rr)
[3 3 1 1]
3.0000 3.0000 4.0000
3.0000 10.0000 0.0000
4.0000 0.0000 3.0000
See also https://github.com/arrayfire/arrayfire/issues/2134.

How does arma::find_unique determine unique indices?

I am using arma::find_unique, and I thought it returned the index of the first occurrence of each unique value in a vector, but it appears to return something else.
Here is a toy function:
// [[Rcpp::export]]
arma::uvec test(arma::vec& x_) {
vec x=arma::sort(x_);
return arma::find_unique(x);
}
If I run the function in R with a simple vector test(5:1) I get a vector of all the indices 0,1,2,3,4 which makes sense since each value is unique.
If I try something like:
set.seed(1991)
var=sample(1:8,20,TRUE)
test(var)
OUTPUT:
1,3,6,7,19,12,14,18.
All those values make sense except the first one. Why is the first unique value at index 1 and not 0? Clearly I am misunderstanding what arma::find_unique intends to do so I would appreciate if someone could enlighten me.
EDIT
My session information

Okay, the following is courtesy of #nrussell, the man is amazing, and was given in the comments to this "answer." (I do not deserve the check mark nor upvotes.)
Actually, I'm pretty sure this is all just a misinterpretation of the Armadillo documentation, which never actually guarantees that a stable sort is used, as #Carl was expecting. Underneath, std::sort is being called, which is not guaranteed to be a stable sort by the C++ standard; also stated here:
"The order of equal elements is not guaranteed to be preserved."
I can demonstrate this here, replicating the "packet" structure use in the Armadillo's algorithm. My guess is that libc++ (typically used by OS X) does implement std::sort as a stable sort, while libstdc++ does not.
My turn: The stable sort, or maintaining the relative order of records with equal keys (i.e. values), is the key issue behind this question. For example, consider the following:
dog car pool dig
Sorting by the first letter with a stable sort gives us:
car dog dig pool
Because the word "dog" appeared prior to "dig" in the vector, it therefore must appear before "dig" in the output.
Sorting by the first letter with a unstable sort gives us:
car dig dog pool
or
car dog dig pool
The principal is relevant to numbers since each key generate is literally present elsewhere. So, we have:
2, 3, 2, 4
Thus, when the unique values are found:
2, 3, 4
The 2 can take id either 0 or 2.
As #nrussell explained, macOS since OS X Mavericks (10.9) relies by default on --stdlib=libc++ vs. the traditional --stdlib=libstdc++ flag for compiling. This was likely the reason why I was unable to replicate it as one implementation opts for stability while the other does not.
Original Answer
First, I'm not able to replicate this on macOS... (See end)
It seems as if we are able to repro this on Linux though (#nrussel). Which means at some point, there is an issue given in the linked code.
Secondly, arma::find_unique is implemented here using matrix ops with op_find_unique. The later is the key as it implements the comparators.
Thus, in short, there should be no way that is possible given that you sort the vector and the first item is always considered to be unique.
Test function
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::uvec test(arma::vec& x_) {
Rcpp::Rcout << "Input:" << x_.t() << std::endl;
arma::vec x = arma::sort(x_);
Rcpp::Rcout << "Sorted:" << x.t() << std::endl;
arma::uvec o = arma::find_unique(x);
Rcpp::Rcout << "Indices:" << o.t() << std::endl;
return o;
}
/*** R
set.seed(1991)
(v=sample(1:8,20,TRUE))
## [1] 2 2 1 5 7 6 7 6 4 1 5 3 1 4 4 2 8 7 7 8
sort(v)
## [1] 1 1 1 2 2 2 3 4 4 4 5 5 6 6 7 7 7 7 8 8
test(v)
### Received
## 2.0000 2.0000 1.0000 5.0000 7.0000 6.0000 7.0000 6.0000 4.0000 1.0000 5.0000 3.0000 1.0000 4.0000 4.0000 2.0000 8.0000 7.0000 7.0000 8.0000
### Sorted
## 1.0000 1.0000 1.0000 2.0000 2.0000 2.0000 3.0000 4.0000 4.0000 4.0000 5.0000 5.0000 6.0000 6.0000 7.0000 7.0000 7.0000 7.0000 8.0000 8.0000
### Output
## 0 3 6 7 10 12 14 18
*/

Complex Number Matrix multiplication Eigen vs Matlab

Can someone explain to me why the results are different.
Code in C++:
MatrixXcd testTest;
testTest.resize(3,3);
testTest.real()(0,0) = 1;
testTest.real()(0,1) = 2;
testTest.real()(0,2) = 3;
testTest.real()(1,0) = 1;
testTest.real()(1,1) = 2;
testTest.real()(1,2) = 3;
testTest.real()(2,0) = 1;
testTest.real()(2,1) = 2;
testTest.real()(2,2) = 3;
testTest.imag()(0,0) = 1;
testTest.imag()(0,1) = 2;
testTest.imag()(0,2) = 3;
testTest.imag()(1,0) = 1;
testTest.imag()(1,1) = 2;
testTest.imag()(1,2) = 3;
testTest.imag()(2,0) = 1;
testTest.imag()(2,1) = 2;
testTest.imag()(2,2) = 3;
cout<< endl << testTest << endl;
cout<< endl << testTest.transpose() << endl;
cout<< endl << testTest*testTest.transpose() << endl;
cout<< endl << testTest << endl;
Results from C++:
(1,1) (2,2) (3,3)
(1,1) (2,2) (3,3)
(1,1) (2,2) (3,3)
(1,1) (1,1) (1,1)
(2,2) (2,2) (2,2)
(3,3) (3,3) (3,3)
(0,28) (0,28) (0,28)
(0,28) (0,28) (0,28)
(0,28) (0,28) (0,28)
(1,1) (2,2) (3,3)
(1,1) (2,2) (3,3)
(1,1) (2,2) (3,3)
And the same thing written in Matlab:
testTest = [ complex(1,1) complex(2,2) complex(3,3);
complex(1,1) complex(2,2) complex(3,3);
complex(1,1) complex(2,2) complex(3,3)];
testTest
testTest'
testTest*testTest'
testTest
Matlab results:
testTest =
1.0000 + 1.0000i 2.0000 + 2.0000i 3.0000 + 3.0000i
1.0000 + 1.0000i 2.0000 + 2.0000i 3.0000 + 3.0000i
1.0000 + 1.0000i 2.0000 + 2.0000i 3.0000 + 3.0000i
ans =
1.0000 - 1.0000i 1.0000 - 1.0000i 1.0000 - 1.0000i
2.0000 - 2.0000i 2.0000 - 2.0000i 2.0000 - 2.0000i
3.0000 - 3.0000i 3.0000 - 3.0000i 3.0000 - 3.0000i
ans =
28 28 28
28 28 28
28 28 28
testTest =
1.0000 + 1.0000i 2.0000 + 2.0000i 3.0000 + 3.0000i
1.0000 + 1.0000i 2.0000 + 2.0000i 3.0000 + 3.0000i
1.0000 + 1.0000i 2.0000 + 2.0000i 3.0000 + 3.0000i
Multiplication of testTest * testTest' in C returns returns complex numbers with real part 0 and imag part 28. Matlab returns just dobule with value 28.

' in Matlab does the transpose and takes the complex conjugate (http://uk.mathworks.com/help/matlab/ref/ctranspose.html). If you want to just do the transpose use .' (with a dot infront).
Thus, if you change your MATLAB test to
testTest*testTest.'
the results should be the same.
If you want the complex transpose in eigen then you can go matrix.adjoint() (or matrix.conjugate().transpose())

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Ignore NaNs in mean and other stats functions using Armadillo - c++

Related

replacing matrix of indices with corresponding vector with armadillo

Scatter/Gather like Numpy in ArrayFire

Arrayfire sparse matrix issues

How does arma::find_unique determine unique indices?

Complex Number Matrix multiplication Eigen vs Matlab

Categories

Resources