Getting confused with something that should be simple. Spent a bit of time trying to debug this and am not getting too far. Would appreciate if someone could help me out.
I am trying to define a sparse matrix in arrayfire by specifying the value/column/row triples as specified in this function. I want to store the following matrix as sparse:
3 3 4
3 10 0
4 0 3
I code it up as follows:
int row[] = {0,0,0,1,1,2,2};
int col[] = {0,1,2,0,1,0,2};
double values[] = { 3,3, 4,3,10,4,3};
array rr = sparse(3,3,array(7,values),array(7,row),array(7,col));
af_print(rr);
af_print(dense(rr));
I get the following output:
rr
Storage Format : AF_STORAGE_CSR
[3 3 1 1]
rr: Values
[7 1 1 1]
1.0000
2.0000
4.0000
3.0000
10.0000
4.0000
3.0000
rr: RowIdx
[7 1 1 1]
0
0
0
1
1
2
2
rr: ColIdx
[7 1 1 1]
0
1
2
0
1
0
2
dense(rr)
[3 3 1 1]
0.0000 0.0000 0.0000
0.0000 0.0000 3.0000
3.0000 0.0000 0.0000
When printing out stored matrix in dense format, I get something completely different than intended.
How do I make the output of printing the dense version of rr give:
3 3 4
3 10 0
4 0 3
Arrayfire uses (a modified) CSR format, so the rowarray has to be of length number_of_rows + 1. Normally it would be filled with the number of non-zero entries per row, i.e. {0, 3 ,2, 2}. But for Arrayfire, you need to take the cumulative sum, i.e. {0, 3, 5, 7}. So this works for me:
int row[] = {0,3,5,7};
int col[] = {0,1,2,0,1,0,2};
float values[] = {3,3,4,3,10,4,3};
array rr = sparse(3,3,array(7,values),array(4,row),array(7,col));
af_print(rr);
af_print(dense(rr));
However, this is not really convenient, since it is quite different from your input format. As an alternative, you could specify the COO format:
int row[] = {0,0,0,1,1,2,2};
int col[] = {0,1,2,0,1,0,2};
float values[] = { 3,3, 4,3,10,4,3};
array rr = sparse(3,3,array(7,values),array(7,row),array(7,col), AF_STORAGE_COO);
af_print(rr);
af_print(dense(rr));
which produces:
rr
Storage Format : AF_STORAGE_COO
[3 3 1 1]
rr: Values
[7 1 1 1]
3.0000
3.0000
4.0000
3.0000
10.0000
4.0000
3.0000
rr: RowIdx
[7 1 1 1]
0
0
0
1
1
2
2
rr: ColIdx
[7 1 1 1]
0
1
2
0
1
0
2
dense(rr)
[3 3 1 1]
3.0000 3.0000 4.0000
3.0000 10.0000 0.0000
4.0000 0.0000 3.0000
See also https://github.com/arrayfire/arrayfire/issues/2134.
Related
When trying to retrieve data from an af::array (arrayfire) from the device via host(), my output data on the host is wrong (i.e. wrong values). For testing that, I wrote a small code sample (based on https://stackoverflow.com/a/29212923/2546099):
int main(void) {
size_t vector_size = 16;
af::array in_test_array = af::constant(1., vector_size), out_test_array = af::constant(0., vector_size);
af_print(in_test_array);
double *local_data_ptr = new double[vector_size]();
for(int i = 0; i < vector_size; ++i)
std::cout << local_data_ptr[i] << '\t';
std::cout << '\n';
in_test_array.host(local_data_ptr);
for(int i = 0; i < vector_size; ++i)
std::cout << local_data_ptr[i] << '\t';
std::cout << '\n';
delete[] local_data_ptr;
out_test_array = in_test_array;
af_print(out_test_array);
return 0;
}
My output is
in_test_array
[16 1 1 1]
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0.007813 0.007813 0.007813 0.007813 0.007813 0.007813 0.007813 0.007813 0 0 0 0 0 0 0 0
out_test_array
[16 1 1 1]
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
Why are half the values in the pointer set to 0.007813, and not all values to 1? When changing the default value for in_test_array to 2, half the values are set to 2, and for 3 those values are set to 32. Why does that happen?
The datatypes between arrayfire and C are in conflict.
For float use:
af::array in_test_array = af::constant(1., vector_size),
out_test_array = af::constant(0., vector_size);
float *local_data_ptr = new float[vector_size]();
For double use:
af::array in_test_array = af::constant(1., vector_size, f64),
out_test_array = af::constant(0., vector_size, f64)
double *local_data_ptr = new double[vector_size]();
IN both cases above, you will see that arrayfire will return you 1.0 in the local_data_ptr buffer, although with different data types.
I want to scatter and gather elements from an array X at specific indices along one axis.
So given an array of indices idx, I want to select the idx(0)th element along the 0th column, the idx(1)th element along the 1st column, etc..
In Numpy, the following statement:
X = np.array([[1, 2, 3], [4, 5, 6]])
print(X[[0, 1, 1], range(3)])
prints [1, 5, 6].
Furthermore, I can do this process in reverse:
Y = np.zeros((2, 3))
Y[[0, 1, 1], range(3)] = [1, 5, 6]
print(Y)
This will print
[[1. 0. 0.]
[0. 5. 6.]]
However, when I try to replicate this behavior in ArrayFire:
float elements[] = {1, 2, 3, 4, 5, 6};
af::array X = af::array(3, 2, elements);
int idx_elements[] = {0, 1, 1};
af::array idx = af::array(3, idx_elements);
af::print("", X(af::span, idx));
I get an array of shape [3, 3, 1, 1] with the elements
1.0000 4.0000 4.0000
2.0000 5.0000 5.0000
3.0000 6.0000 6.0000
So how can I achieve the desired numpy-like behavior for scattering and gathering elements in ArrayFire?
To perform the gather operation on a matrix, I can extract the diagonal of the resulting matrix but that may not work in the multidimensional case and it doesn't work in the other (scatter) direction.
X
[3 2 1 1]
1.0000 4.0000
2.0000 5.0000
3.0000 6.0000
idx
[3 1 1 1]
0
1
1
ArrayFire does Cartesian product when af::array are involved. Hence, the output.
Please see the below indices because of that.
Col\Row 0 1 1 from array
0 (0, 0) (0,1) (0, 1)
1 (1, 0) (1,1) (1, 1)
2 (2, 0) (2,1) (2, 1)
^
^ from sequence
Thus, the output of X(af::span, idx)) is a 3x3 matrix.
To gather elements based on coordinates, you would need different function
approx2. Note that this function takes it's indices as floating point arrays only.
float idx_elements[] = {0, 1, 1}; // changed the idx to floats
af::array colIdx = af::array(3, idx_elements);
af::array rowIdx = af::iota(3); // same effect as span
af::array out = approx2(X, rowIdx, colIdx);
af_print(out);
// out
// [3 1 1 1]
// 1.0000
// 5.0000
// 6.0000
To set the values for given indices, you would have to flatten the array because of very reason
that array::operator() considers cartesian product when af::array is involved.
af::array A = af::constant(0, 3, 2); // same size as X
af::array B = af::flat(A); // flatten the array, this involves meta data modification only
B(rowIdx + 3 * colIdx) = out; // use row & col indices to fetch linear indices
// rowIdx + 3 * colIdx
// [3 1 1 1]
// 0.0000
// 4.0000
// 5.0000
B = moddims(B, A.dims()); // reset the dimensions to original A dims
af_print(B);
// B
// [3 2 1 1]
// 1.0000 0.0000
// 0.0000 5.0000
// 0.0000 6.0000
You can look more details in our indexing tutorial.
I have a 3 dimensional numpy array, (z, x, y). z is a time dimension and x and y are coordinates.
I want to convert this to a multiindexed pandas.DataFrame. I want the row index to be the z dimension
and each column to have values from a unique x, y coordinate (and so, each column would be multi-indexed).
The simplest case (not multi-indexed):
>>> array.shape
(500L, 120L, 100L)
>>> df = pd.DataFrame(array[:,0,0])
>>> df.shape
(500, 1)
I've been trying to pass the whole array into a multiindex dataframe using pd.MultiIndex.from_arrays but I'm getting an error:
NotImplementedError: > 1 ndim Categorical are not supported at this time
Looks like it should be fairly simple but I cant figure it out.
I find that a Series with a Multiindex is the most analagous pandas datatype for a numpy array with arbitrarily many dimensions (presumably 3 or more).
Here is some example code:
import pandas as pd
import numpy as np
time_vals = np.linspace(1, 50, 50)
x_vals = np.linspace(-5, 6, 12)
y_vals = np.linspace(-4, 5, 10)
measurements = np.random.rand(50,12,10)
#setup multiindex
mi = pd.MultiIndex.from_product([time_vals, x_vals, y_vals], names=['time', 'x', 'y'])
#connect multiindex to data and save as multiindexed Series
sr_multi = pd.Series(index=mi, data=measurements.flatten())
#pull out a dataframe of x, y at time=22
sr_multi.xs(22, level='time').unstack(level=0)
#pull out a dataframe of y, time at x=3
sr_multi.xs(3, level='x').unstack(level=1)
I think you can use panel - and then for Multiindex DataFrame add to_frame:
np.random.seed(10)
arr = np.random.randint(10, size=(5,3,2))
print (arr)
[[[9 4]
[0 1]
[9 0]]
[[1 8]
[9 0]
[8 6]]
[[4 3]
[0 4]
[6 8]]
[[1 8]
[4 1]
[3 6]]
[[5 3]
[9 6]
[9 1]]]
df = pd.Panel(arr).to_frame()
print (df)
0 1 2 3 4
major minor
0 0 9 1 4 1 5
1 4 8 3 8 3
1 0 0 9 0 4 9
1 1 0 4 1 6
2 0 9 8 6 3 9
1 0 6 8 6 1
Also transpose can be useful:
df = pd.Panel(arr).transpose(1,2,0).to_frame()
print (df)
0 1 2
major minor
0 0 9 0 9
1 1 9 8
2 4 0 6
3 1 4 3
4 5 9 9
1 0 4 1 0
1 8 0 6
2 3 4 8
3 8 1 6
4 3 6 1
Another possible solution with concat:
arr = arr.transpose(1,2,0)
df = pd.concat([pd.DataFrame(x) for x in arr], keys=np.arange(arr.shape[2]))
print (df)
0 1 2 3 4
0 0 9 1 4 1 5
1 4 8 3 8 3
1 0 0 9 0 4 9
1 1 0 4 1 6
2 0 9 8 6 3 9
1 0 6 8 6 1
np.random.seed(10)
arr = np.random.randint(10, size=(500,120,100))
df = pd.Panel(arr).transpose(2,0,1).to_frame()
print (df.shape)
(60000, 100)
print (df.index.max())
(499, 119)
I am using arma::find_unique, and I thought it returned the index of the first occurrence of each unique value in a vector, but it appears to return something else.
Here is a toy function:
// [[Rcpp::export]]
arma::uvec test(arma::vec& x_) {
vec x=arma::sort(x_);
return arma::find_unique(x);
}
If I run the function in R with a simple vector test(5:1) I get a vector of all the indices 0,1,2,3,4 which makes sense since each value is unique.
If I try something like:
set.seed(1991)
var=sample(1:8,20,TRUE)
test(var)
OUTPUT:
1,3,6,7,19,12,14,18.
All those values make sense except the first one. Why is the first unique value at index 1 and not 0? Clearly I am misunderstanding what arma::find_unique intends to do so I would appreciate if someone could enlighten me.
EDIT
My session information
Okay, the following is courtesy of #nrussell, the man is amazing, and was given in the comments to this "answer." (I do not deserve the check mark nor upvotes.)
Actually, I'm pretty sure this is all just a misinterpretation of the Armadillo documentation, which never actually guarantees that a stable sort is used, as #Carl was expecting. Underneath, std::sort is being called, which is not guaranteed to be a stable sort by the C++ standard; also stated here:
"The order of equal elements is not guaranteed to be preserved."
I can demonstrate this here, replicating the "packet" structure use in the Armadillo's algorithm. My guess is that libc++ (typically used by OS X) does implement std::sort as a stable sort, while libstdc++ does not.
My turn: The stable sort, or maintaining the relative order of records with equal keys (i.e. values), is the key issue behind this question. For example, consider the following:
dog car pool dig
Sorting by the first letter with a stable sort gives us:
car dog dig pool
Because the word "dog" appeared prior to "dig" in the vector, it therefore must appear before "dig" in the output.
Sorting by the first letter with a unstable sort gives us:
car dig dog pool
or
car dog dig pool
The principal is relevant to numbers since each key generate is literally present elsewhere. So, we have:
2, 3, 2, 4
Thus, when the unique values are found:
2, 3, 4
The 2 can take id either 0 or 2.
As #nrussell explained, macOS since OS X Mavericks (10.9) relies by default on --stdlib=libc++ vs. the traditional --stdlib=libstdc++ flag for compiling. This was likely the reason why I was unable to replicate it as one implementation opts for stability while the other does not.
Original Answer
First, I'm not able to replicate this on macOS... (See end)
It seems as if we are able to repro this on Linux though (#nrussel). Which means at some point, there is an issue given in the linked code.
Secondly, arma::find_unique is implemented here using matrix ops with op_find_unique. The later is the key as it implements the comparators.
Thus, in short, there should be no way that is possible given that you sort the vector and the first item is always considered to be unique.
Test function
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::uvec test(arma::vec& x_) {
Rcpp::Rcout << "Input:" << x_.t() << std::endl;
arma::vec x = arma::sort(x_);
Rcpp::Rcout << "Sorted:" << x.t() << std::endl;
arma::uvec o = arma::find_unique(x);
Rcpp::Rcout << "Indices:" << o.t() << std::endl;
return o;
}
/*** R
set.seed(1991)
(v=sample(1:8,20,TRUE))
## [1] 2 2 1 5 7 6 7 6 4 1 5 3 1 4 4 2 8 7 7 8
sort(v)
## [1] 1 1 1 2 2 2 3 4 4 4 5 5 6 6 7 7 7 7 8 8
test(v)
### Received
## 2.0000 2.0000 1.0000 5.0000 7.0000 6.0000 7.0000 6.0000 4.0000 1.0000 5.0000 3.0000 1.0000 4.0000 4.0000 2.0000 8.0000 7.0000 7.0000 8.0000
### Sorted
## 1.0000 1.0000 1.0000 2.0000 2.0000 2.0000 3.0000 4.0000 4.0000 4.0000 5.0000 5.0000 6.0000 6.0000 7.0000 7.0000 7.0000 7.0000 8.0000 8.0000
### Output
## 0 3 6 7 10 12 14 18
*/
Can someone explain to me why the results are different.
Code in C++:
MatrixXcd testTest;
testTest.resize(3,3);
testTest.real()(0,0) = 1;
testTest.real()(0,1) = 2;
testTest.real()(0,2) = 3;
testTest.real()(1,0) = 1;
testTest.real()(1,1) = 2;
testTest.real()(1,2) = 3;
testTest.real()(2,0) = 1;
testTest.real()(2,1) = 2;
testTest.real()(2,2) = 3;
testTest.imag()(0,0) = 1;
testTest.imag()(0,1) = 2;
testTest.imag()(0,2) = 3;
testTest.imag()(1,0) = 1;
testTest.imag()(1,1) = 2;
testTest.imag()(1,2) = 3;
testTest.imag()(2,0) = 1;
testTest.imag()(2,1) = 2;
testTest.imag()(2,2) = 3;
cout<< endl << testTest << endl;
cout<< endl << testTest.transpose() << endl;
cout<< endl << testTest*testTest.transpose() << endl;
cout<< endl << testTest << endl;
Results from C++:
(1,1) (2,2) (3,3)
(1,1) (2,2) (3,3)
(1,1) (2,2) (3,3)
(1,1) (1,1) (1,1)
(2,2) (2,2) (2,2)
(3,3) (3,3) (3,3)
(0,28) (0,28) (0,28)
(0,28) (0,28) (0,28)
(0,28) (0,28) (0,28)
(1,1) (2,2) (3,3)
(1,1) (2,2) (3,3)
(1,1) (2,2) (3,3)
And the same thing written in Matlab:
testTest = [ complex(1,1) complex(2,2) complex(3,3);
complex(1,1) complex(2,2) complex(3,3);
complex(1,1) complex(2,2) complex(3,3)];
testTest
testTest'
testTest*testTest'
testTest
Matlab results:
testTest =
1.0000 + 1.0000i 2.0000 + 2.0000i 3.0000 + 3.0000i
1.0000 + 1.0000i 2.0000 + 2.0000i 3.0000 + 3.0000i
1.0000 + 1.0000i 2.0000 + 2.0000i 3.0000 + 3.0000i
ans =
1.0000 - 1.0000i 1.0000 - 1.0000i 1.0000 - 1.0000i
2.0000 - 2.0000i 2.0000 - 2.0000i 2.0000 - 2.0000i
3.0000 - 3.0000i 3.0000 - 3.0000i 3.0000 - 3.0000i
ans =
28 28 28
28 28 28
28 28 28
testTest =
1.0000 + 1.0000i 2.0000 + 2.0000i 3.0000 + 3.0000i
1.0000 + 1.0000i 2.0000 + 2.0000i 3.0000 + 3.0000i
1.0000 + 1.0000i 2.0000 + 2.0000i 3.0000 + 3.0000i
Multiplication of testTest * testTest' in C returns returns complex numbers with real part 0 and imag part 28. Matlab returns just dobule with value 28.
' in Matlab does the transpose and takes the complex conjugate (http://uk.mathworks.com/help/matlab/ref/ctranspose.html). If you want to just do the transpose use .' (with a dot infront).
Thus, if you change your MATLAB test to
testTest*testTest.'
the results should be the same.
If you want the complex transpose in eigen then you can go matrix.adjoint() (or matrix.conjugate().transpose())