How to use var/variance function in armadillo - c++

How should I be using the var() function in armadillo ?
I have a matrix in which rows are variables/features and columns observations/instances.
I which to get the variance of each row so I can determine variables/features with the greatest variance.
Currently I am calling:
auto variances = arma::var(data, 0, 1);
Where data is my matrix.
As far as I can tell at the moment I am getting a matrix ? And the documentation suggests this is correct. I was expecting to get back a single vector with variance scores for each of my matrix rows.
I can loop through my rows and get the variance for each row individually like so:
for (auto i = 0; i < data.n_rows; ++i)
auto rowVariance = arma::var(dataSet.data.row(i));
But I would prefer not to do this.
I would like to get back a single vector containing variance values for each row in my matrix and then use arma::sort_index() on this vector to get a sorted set of indices corresponding to the sorted variances.
Thanks in advance.

Turns out the error was because I was using arma::var variances = arma::var(data, 0, 1) and should have been using arma::Col<T> variances = arma::var(data, 0, 1)due to my data matrix being of type arma::Mat<T> as I'm allowing both float and double point precision only.
The comment above from vagoberto set me on the right track.

Related

Extract first N columns of Q from `ColPivHouseholderQR`

I am trying to implement canonical correlation analysis in C++ using the Eigen linear algebra library.
Part of the algorithm involves QR decomposition using the column pivot method. For that piece I am using the ColPivHouseholderQR class. This class contains an object of type HouseholderSequenceType which I would like to use to extract only the first N columns of the matrix rather than the entire Q matrix.
The full Q matrix has as many columns as rows and quickly fills memory for even modest dataset sizes. Furthermore, I don't need the full matrix, but just a subset of the first N columns.
How can I extract only the first N columns from this object?
My code:
// Set up the class and compute the QR decomposition
Eigen::ColPivHouseholderQR< Eigen::MatrixXd > qr;
qr.setThreshold(threshold);
qr.compute(mat);
int rk = qr.rank();
Eigen::ColPivHouseholderQR< Eigen::MatrixXd >::HouseholderSequenceType seq = qr.householderQ();
seq.setLength(rk);
// return the matrix
return (Eigen::MatrixXd) seq;

C++ Armadillo reshape a matrix with only one dimension size

Using Armadillo, how do I reshape a matrix when I only specify one dimension size?
In Matlab documentation, there is this example of such functionality:
Reshape a 6-by-6 magic square matrix into a matrix that has only 3
columns. Specify [] for the first dimension size to let reshape
automatically calculate the appropriate number of rows.
A = magic(6);
B = reshape(A,[],3);
The result is a 12-by-3 matrix, which maintains the same number of
elements (36) as the original 6-by-6 matrix. The elements in B also
maintain their columnwise order from A.
How can that be accomplished with Armadillo?
You can use .size() to get the total number of elements of your matrix and calculate the dimensions yourself.
Example:
B = reshape(A, A.size()/3, 3);

Principal Component Analysis with Eigen Library

I'm trying to compute the 2 major principal components from a dataset in C++ with Eigen.
The way I do it at the moment is to normalize the data between [0, 1] and then center the mean. After that I compute the covariance matrix and run an eigenvalue decomposition on it. I know SVD is faster, but I'm confused about the computed components.
Here is the major code about how I do it (where traindata is my MxN sized input matrix):
Eigen::VectorXf normalize(Eigen::VectorXf vec) {
for (int i = 0; i < vec.size(); i++) { // normalize each feature.
vec[i] = (vec[i] - minCoeffs[i]) / scalingFactors[i];
}
return vec;
}
// Calculate normalization coefficients (globals of type Eigen::VectorXf).
maxCoeffs = traindata.colwise().maxCoeff();
minCoeffs = traindata.colwise().minCoeff();
scalingFactors = maxCoeffs - minCoeffs;
// For each datapoint.
for (int i = 0; i < traindata.rows(); i++) { // Normalize each datapoint.
traindata.row(i) = normalize(traindata.row(i));
}
// Mean centering data.
Eigen::VectorXf featureMeans = traindata.colwise().mean();
Eigen::MatrixXf centered = traindata.rowwise() - featureMeans;
// Compute the covariance matrix.
Eigen::MatrixXf cov = centered.adjoint() * centered;
cov = cov / (traindata.rows() - 1);
Eigen::SelfAdjointEigenSolver<Eigen::MatrixXf> eig(cov);
// Normalize eigenvalues to make them represent percentages.
Eigen::VectorXf normalizedEigenValues = eig.eigenvalues() / eig.eigenvalues().sum();
// Get the two major eigenvectors and omit the others.
Eigen::MatrixXf evecs = eig.eigenvectors();
Eigen::MatrixXf pcaTransform = evecs.rightCols(2);
// Map the dataset in the new two dimensional space.
traindata = traindata * pcaTransform;
The result of this code is something like this:
To confirm my results, I tried the same with WEKA. So what I did is to use the normalize and the center filter, in this order. Then the principal component filter and save + plot the output. The result is this:
Technically I should have done the same, however the outcome is so different. Can anyone see if I made a mistake?
When scaling to 0,1, you modify the local variable vec but forgot to update traindata.
Moreover, this can be done more easily this way:
RowVectorXf minCoeffs = traindata.colwise().maxCoeff();
RowVectorXf minCoeffs = traindata.colwise().minCoeff();
RowVectorXf scalingFactors = maxCoeffs - minCoeffs;
traindata = (traindata.rowwise()-minCoeffs).array().rowwise() / scalingFactors.array();
that is, using row-vectors and array features.
Let me also add that the symmetric eigenvalue decomposition is actually faster than SVD. The true advantage of SVD in this case is that it avoids squaring the entries, but since your input data are normalized and centered, and that you only care about the largest eigenvalues, there is no accuracy concern here.
The reason was that Weka standardized the dataset. This means it scales each feature's variance to unit variance. When I did this, the plots looked the same. Technically my approach was correct as well.

Assigning matrix rows with particular indices

I am porting some code from Matlab to Armadillo and am stuck at a simple step. I am finding all the indices of a vector res on the basis of a condition and then want to store all the rows of a matrix Pts corresponding to the condition.
So what it is in matlab
ifAny = find(res < lim);
Pts = Pts(ifAny,:);
In Armadillo -
arma::uvec ifAny = arma::find(res < lim);
// elem gives only the single column
// Pts = Pts.elem(ifAny);
According to the Submatrix view section of Armadillo's API documentation, the X.rows(vector_of_row_indices) would extract the selected set of non-contiguous rows in the provided vector_of_row_indices from the matrix X.
Thus in your case, to obtain a result equivalent to Pts = Pts(ifAny,:) of Matlab, you can use :
Pts = Pts.rows(ifAny);

Eigen SparseMatrix - set row values

I write a simulation with Eigen and now I need to set a list of rows of my ColumnMajor SparseMatrix like this:
In row n:
for column elements m:
if m == n set value to one
else set value to zero
There is always the element with column index = row index inside the sparse matrix. I tried to use the InnerIterator but it did not work well since I have a ColumnMajor matrix. The prune method that was suggested in https://stackoverflow.com/a/21006998/3787689 worked but i just need to set the non-diagonal elements to zero temporarily and prune seems to actually delete them which slows a different part of the program down.
How should I proceed in this case?
Thanks in advance!
EDIT: I forgot to make clear: the sparse matrix is already filled with values.
Use triplets for effective insertion:
const int N = 5;
const int M = 10;
Eigen::SparseMatrix<double> myMatrix(N,M); // N by M matrix with no coefficient, hence this is the null matrix
std::vector<Eigen::Triplet<double>> triplets;
for (int i=0; i<N; ++i) {
triplets.push_back({i,i,1.});
}
myMatrix.setFromTriplets(triplets.begin(), triplets.end());
I solved it like this: Since I want to stick to a ColumnMajor matrix I do a local RowMajor version and use the InnerIterator to assign the values to the specific rows. After that I overwrite my matrix with the result.
Eigen::SparseMatrix<float, Eigen::RowMajor> rowMatrix;
rowMatrix = colMatrix;
for (uint i = 0; i < rowTable.size(); i++) {
int rowIndex = rowTable(i);
for (Eigen::SparseMatrix<float, Eigen::RowMajor>::InnerIterator
it(rowMatrix, rowIndex); it; ++it) {
if (it.row() == it.col())
it.valueRef() = 1.0f;
else
it.valueRef() = 0.0f;
}
}
colMatrix = rowMatrix;
For beginners, the simplest way set to zero a row/column/block is just to multiply it by 0.0.
So to patch an entire row in the way you desire it is enough to do:
A.row(n) *= 0; //Set entire row to 0
A.coeffRef(n,n) = 1; //Set diagonal to 1
This way you don't need to change your code depending of RowMajor/ColMajor orders. Eigen will do all the work in a quick way.
Also, if you are really interested in freeing memory after setting the row to 0, just add a A.prune(0,0) after you have finished editing all the rows in your matrix.