Symmetric array-like data structure for c++ - c++

I'm doing a simulation where I must calculate many averages and I thought that using boost::accumulators would be a good idea. The problem is that one of the quantities I want to average is a symmetric matrix, whose diagonal is known beforehand. SO I just need to calculate the averages for Q[i][j] if i < j.
At first I got the impression that I could use a
using namespace boost::accumulators;
using namespace boost::numeric::ublas;
typedef accumulator_set<double, stats<tag::mean> > accumulator;
symmetric_matrix<accumulator, lower> foo; // a symmetric matrix of accumulators
to hold my accumulators. But then it occurred to me that this symmetric_matrix structure might be adequate to hold numerical values only (they have arithmetic operations defined) or are optimized for this kind of data in some way. Is this right?
If symmetric_matrix from boost are not adequate, I need a data structure that can hold the lower triangle of a symmetric matrix without the diagonal, and it must be suitable to hold the accumulators and have a nice matrix like syntax. Is this readily available from some library? If not, is there an easy implementation for this kind of structure around?

Try the Boost uBLAS Triangular matrix. Here is an example.

Related

Constructing sparse tridiagonal matrix in Eigen

How do I construct a sparse tridiagonal matrix in Eigen? The matrix that I want to construct looks like this in Python:
alpha = 0.5j/dx**2
off_diag = alpha*np.ones(N-1)
A_fixed = sp.sparse.diags([-off_diag,(1/dt+2*alpha)*np.ones(N),-off_diag],[-1,0,1],format='csc')
How do I do it in C++ using the Eigen package? It looks like I need to use the 'triplet' as documented here, but are there easier ways to do this, considering that this should be a fairly common operation?
Another side question is whether I should use row-major or column major. I want to solve the matrix equation Ax=b, where A is a tridiagonal matrix. When we do matrix-vector multiplication by hand, we usually multiply each row of the matrix by the column vector, so storing the matrix in row-major seems to make more sense. But what about a computer? Which one is preferred if I want to solve Ax=b?
Thanks
The triplets are the designated method of setting up a sparse matrix.
You could go the even more straightforward way and use A.coeffRef(row, col) = val or A.inser(row,col) = val, i.e. fill the matrix element-by-element.
Since you have a tridiagonal system you know the number of non-zeros of the matrix beforehand and can reserve the space using A.reserve(Nnz).
A dumb way, which nevertheless works, is:
uint N(1000);
CSRMat U(N,N);
U.reserve(N-1);
for(uint j(0); j<N-1; ++j)
U.insert(j,j+1) = -1;
CSRMat D(N,N);
D.setIdentity();
D *= 2;
CSRMat A = U + CSRMat(U.transpose()) + D;
As to the solvers and preferred storage order that is, as I recall, of minor importance. Whilst C(++) stores contiguous data in row-major format it is up to the algorithm whether the data is accessed in an optimal way (row-by-row for row-major storage order). The correctness of an algorithm does not, as a rule, depend on the storage order of the data. Its performance depends on compatibility of storage order and actual data access patterns.
If you intend to use Eigen's own solvers stick with its default choice (col-major). If you intend to interface with other libraries (e.g. ARPACK) choose the storage order the library prefers/requires.

Efficient way to compute SparseMatrix 1-norm, Inf-norm in Eigen

In Eigen library, I know that there are visitors and reductions for dense Eigen::Matrix class which I can use efficiently to compute their 1-norm, inf-norm, etc. someway like this:
Eigen::MatrixXd A;
...
A.colwise().lpNorm<1>().maxCoeff();
A.rowwise().lpNorm<1>().maxCoeff();
// etc.
Now I have sparse Eigen::SparseMatrix class. How can I efficiently compute these norms in this case?
You can compute the colwise/rowwise 1-norm using a product with a vector of ones:
(Eigen::RowVectorXd::Ones(A.rows()) * A.cwiseAbs()).maxCoeff();
(A.cwiseAbs() * Eigen::VectorXd::Ones(A.cols()).maxCoeff();
Check the generated assembly to see if this gets sufficiently optimized for your purpose. If not, or if you need other lpNorms, you may need to write two nested loops with sparse iterators.

Representation of a symmetric diagonal matrix

Lets assume we have a huge symmetric diagonal matrix. What is the efficient way to implement this?
The only way that i could think of is that by using the symmetric property where Xij = Xji, we can reduce the size of this matrix by half. But then representing this matrix using a 2D array would be inefficient, since we cant reduce the matrix size by using arrays.
Another thing representing this matrix using adjacency list also would be inefficient, because relating this matrix to a graph. It would be a density graph. And the operation of adj list takes lots of time such as removing, inserting and searching.
But what about using heaps?
There is no one answer until you decide what you are going to do with this matrix (or maybe matrices?).
If you are just going to store and remember it, then just store it sequentially, leaving out the redundant entries. (Your code knows how to access it, because that is all it does, right?)
More probably, you want to do normal matrix operations on it. In that case, are you trying to make the storage efficient, or the execution? In the later case, I don't see many opportunities based on it being symmetric--the multiplies are the expensive thing and you probably still need all of those. If it is the storage, then are you limiting yourself to operations that only take symmetric in and symmetric out? Sounds awfully specific. If so, then you only need to do the calculations for the part you are storing, because, by definition the other entries are symmetric, so just write your code to generate that part of the matrix and you are done.

How to create n-dimensional test data for cluster analysis?

I'm working on a C++ implementation of k-means and therefore I need n-dimensional test data. For the beginning 2D points are sufficient, since they can be visualized easily in a 2D image, but I'd finally prefer a general approach that supports n dimensions.
There was an answer here on stackoverflow, which proposed concatenating sequential vectors of random numbers with different offsets and spreads, but I'm not sure how to create those, especially without including a 3rd party library.
Below is the method declaration I have so far, it contains the parameters which should vary. But the can be changed, if necessary - with the exception of data, it needs to be a pointer type since I'm using OpenCL.
auto populateTestData(float** data, uint8_t dimension, uint8_t clusters, uint32_t elements) -> void;
Another problem that came to my mind was the efficient detection/avoidance of collisions when generating random numbers. Couldn't that be a performance bottle neck, e.g. if one's generating 100k numbers in a domain of 1M values, i.e. if the relation between generated numbers and number space isn't small enough?
QUESTION
How can I efficiently create n-dimensional test data for cluster analysis? What are the concepts I need to follow?
It's possible to use c++11 (or boost) random stuff to create clusters, but it's a bit of work.
std::normal_distribution can generate univariate normal distributions with zero mean.
Using 1. you can sample from a normal vector (just create an n dimensional vector of such samples).
If you take a vector n from 2. and output A n + b, then you've transformed the center b away + modified by A. (In particular, for 2 and 3 dimensions it's easy to build A as a rotation matrix.) So, repeatedly sampling 2. and performing this transformation can give you a sample centered at b.
Choose k pairs of A, b, and generate your k clusters.
Notes
You can generate different clustering scenarios using different types of A matrices. E.g., if A is a non-length preserving matrix multiplied by a rotation matrix, then you can get "paraboloid" clusters (it's actually interesting to make them wider along the vectors connecting the centers).
You can either generate the "center" vectors b hardcoded, or using a distribution like used for the x vectors above (perhaps uniform, though, using this).

C++ equivalent of R list

Looking for something in C++ for easy storage and access of matrices of different sizes. I typically use R, and in R I can use a loop and store a matrix in a list as follows (toy example)
myList <- list(1)
for(i in 1:10){
myList[[i]] <- matrix(rnorm(i^2),i,i)
}
This gives me a list where myList[[i]] will then give me the i-th matrix. Is there anything like this in C++? I have seen Boost functions that can do arrays of varying sizes, but fail to apply them to matrices. Trying to use either Eigen or Armadillo if that helps narrow responses.
There is 2 parts to the answer you're looking for, i.e.
The matrices.
The container holding all these matrices.
So, for the matrices: If you are planning on doing linear algebra operations I'd recommend using some special libraries, such as Armadillo which comes with a lot of pre-defined matrix functions (e.g. eigenvalue, matrix multiplication, etc.). If it's just basic 2D data storage with no special operation, then I'd recommend using a STL vector of vector to represent your matrices. These containers are dynamic in size (can be changed at will during execution) and all elements are accessible by index. As Patrick said, more info can be found here: cppreference.com.
An example of a 3x3 matrix of integer filled with 1s would be
std::vector< std::vector<int,int> > matrix(3,std::vector<int>(3,1));
Then, you have to store these matrices somewhere. For this, it is really going to depend on your needs. The simplest solution would be a vector of matrices (so a vector of vector of vector, really). Your code would behave exactly as in R, and you would be able to access matrix by index. The equivalent C++ code is
#include<vector>
using namespace std;
typedef vector< vector<int,int> > int_matrix_t;
...
vector<int_matrix_t> my_vector_of_matrices(10);
for (int i = 0; i<10; ++i) {
my_vector_of_matrices[i] = some_function_that_outputs_a_matrix(i);
}
But there is a lot of other container available. You should survey this chart and choose for yourself!
I believe you can use std::vector.
http://en.cppreference.com/w/cpp/container/vector
std::vector<Matrix> matrices;
matrices[i] = Matrix(data);