Armadillo C++:- Efficient access of columns in a cube structure - c++

Using Armadillo matrix library I am aware that the efficient way of accessing a column in a 2d matrix is via a simply call to .col(i).
I am wondering is there an efficient way of extracting a column stored in a "cube", without first having to call the slice command?
I need the most efficient possible way of accessing the data stored in for instance (using matlab notation) A(:,i,j) . I will be doing this millions of times on a very large dataset, so speed and efficiency is of a high priority.

I think you want
B = A.subcube( span:all, span(i), span(j) );
or equivalently
B = A.subcube( span(), span(i), span(j) );
where B will be a row or column vector of the same type as A (e.g. containing double by default, or a number of other available types).

.slice() should be pretty quick. It simply provides a reference to the underlying Mat class. You could try something along these lines:
cube C(4,3,2);
double* mem = C.slice(1).colptr(2);
Also, bear in mind that Armadillo has range checks enabled by default. If you want to avoid the range checks, use the .at() element accessors:
cube C(4,3,2);
C.at(3,2,1) = 456;
Alternatively, you can store your matrices in a field class:
field<mat> F(100);
F(0).ones(12,34);
Corresponding element access:
F(0)(1,2); // with range checks
F.at(0).at(1,2); // without range checks
You can also compile your code with ARMA_NO_DEBUG defined, which will remove all run-time debugging (such as range checks). This will give you a speedup, but it is only recommended once you have debugged all your code (ie. verified that your algorithm is working correctly). The debugging checks are very useful in picking up mistakes.

Related

Get row vector of 2d vector in C++

I have a vector of vector in C++ defined with: vector < vector<double> > A;
Let's suppose that A has been filled with some values. Is there a quick way to extract a row vector from A ?
For instance, A[0] will give me the first column vector, but how can I get quicky the first row vector?
There is no "quick" way with that data structure, you have to iterate each column vector and get the value for desired row and add it to temporary row vector. Wether this is fast enough for you or not depends on what you need. To make it as fast as possible, be sure to allocate right amount of space in the target row vector, so it doesn't need to be resized while you add the values to it.
Simple solution to performance problem is to use some existing matrix library, such as Eigen suggested in comments.
If you need to do this yourself (because it is assignment, or because of licensing issues, or whatever), you should probably create your own "Matrix 2D" class, and hide implementation details in it. Then depending on what exactly you need, you can employ tricks like:
have a "cache" for rows, so if same row is fetched many times, it can be fetched from the cache and a new vector does not need to be created
store data both as vector of row vectors, and vector of column vectors, so you can get either rows or columns at constant time, at the cost of using more memory and making changes twice as expensive due to duplication of data
dynamically change the internal representation according to current needs, so you get the fixed memory usage, but need to pay the processing cost when you need to change the internal representation
store data in flat vector with size of rows*columns, and calculate the correct offset in your own code from row and column
But it bears repeating: someone has already done this for you, so try to use an existing library, if you can...
There is no really fast way to do that. Also as pointed out, I would say that the convention is the other way around, meaning that A[0] is actually the first row, rather than the first column. However even trying to get a column is not really trivial, since
{0, 1, 2, 3, 4}
{0}
{0, 1, 2}
is a very possible vector<vector<double>> A, but there is no real column 1, 2, 3 or 4. If you wish to enforce behavior like same length columns, creating a Matrix class may be a good idea (or using a library).
You could write a function that would return a vector<double> by iterating over the rows, storing the appropriate column value. But you would have to be careful about whether you want to copy or point to the matrix values (vector<double> / vector<double *>). This is not very fast as the values are not next to each other in memory.
The answer is: in your case there is no corresponding simple option as for columns. And one of the reasons is that vector> is a particular poorly suited container for multi-dimensional data.
In multi-dimension it is one of the important design decisions: which dimension you want to access most efficiently, and based on the answer you can define your container (which might be very specialized and complex).
For example in your case: it is entirely up to you to call A[0] a 'column' or a 'row'. You only need to do it consistently (best define a small interface around which makes this explicit). But STOP, don't do that:
This brings you to the next level: for multi-dimensional data you would typically not use vector> (but this is a different issue). Look at smart and efficient solutions already existing e.g. in ublas https://www.boost.org/doc/libs/1_65_1/libs/numeric/ublas/doc/index.html or eigen3 https://eigen.tuxfamily.org/dox/
You will never be able to beat these highly optimized libraries.

How to use arrays in machine learning classes?

I'm new to C++ and I think a good way for me to jump in is to build some basic models that I've built in other languages. I want to start with just Linear Regression solved using first order methods. So here's how I want things to be organized (in pseudocode).
class LinearRegression
LinearRegression:
tol = <a supplied tolerance or defaulted to 1e-5>
max_ite = <a supplied max iter or default to 1k>
fit(X, y):
// model learns weights specific to this data set
_gradient(X, y):
// compute the gradient
score(X,y):
// model uses weights learned from fit to compute accuracy of
// y_predicted to actual y
My question is when I use fit, score and gradient methods I don't actually need to pass around the arrays (X and y) or even store them anywhere so I want to use a reference or a pointer to those structures. My problem is that if the method accepts a pointer to a 2D array I need to supply the second dimension size ahead of time or use templating. If I use templating I now have something like this for every method that accepts a 2D array
template<std::size_t rows, std::size_t cols>
void fit(double (&X)[rows][cols], double &y){...}
It seems there likely a better way. I want my regression class to work with any size input. How is this done in industry? I know in some situations the array is just flattened into row or column major format where just a pointer to the first element is passed but I don't have enough experience to know what people use in C++.
You wrote a quite a few points in your question, so here are some points addressing them:
Contemporary C++ discourages working directly with heap-allocated data that you need to manually allocate or deallocate. You can use, e.g., std::vector<double> to represent vectors, and std::vector<std::vector<double>> to represent matrices. Even better would be to use a matrix class, preferably one that is already in mainstream use.
Once you use such a class, you can easily get the dimension at runtime. With std::vector, for example, you can use the size() method. Other classes have other methods. Check the documentation for the one you choose.
You probably really don't want to use templates for the dimensions.
a. If you do so, you will need to recompile each time you get a different input. Your code will be duplicated (by the compiler) to the number of different dimensions you simultaneously use. Lots of bad stuff, with little gain (in this case). There's no real drawback to getting the dimension at runtime from the class.
b. Templates (in your setting) are fitting for the type of the matrix (e.g., is it a matrix of doubles or floats), or possibly the number of dimesions (e.g., for specifying tensors).
Your regressor doesn't need to store the matrix and/or vector. Pass them by const reference. Your interface looks like that of sklearn. If you like, check the source code there. The result of calling fit just causes the class object to store the parameter corresponding to the prediction vector β. It doesn't copy or store the input matrix and/or vector.

How to create n-dimensional test data for cluster analysis?

I'm working on a C++ implementation of k-means and therefore I need n-dimensional test data. For the beginning 2D points are sufficient, since they can be visualized easily in a 2D image, but I'd finally prefer a general approach that supports n dimensions.
There was an answer here on stackoverflow, which proposed concatenating sequential vectors of random numbers with different offsets and spreads, but I'm not sure how to create those, especially without including a 3rd party library.
Below is the method declaration I have so far, it contains the parameters which should vary. But the can be changed, if necessary - with the exception of data, it needs to be a pointer type since I'm using OpenCL.
auto populateTestData(float** data, uint8_t dimension, uint8_t clusters, uint32_t elements) -> void;
Another problem that came to my mind was the efficient detection/avoidance of collisions when generating random numbers. Couldn't that be a performance bottle neck, e.g. if one's generating 100k numbers in a domain of 1M values, i.e. if the relation between generated numbers and number space isn't small enough?
QUESTION
How can I efficiently create n-dimensional test data for cluster analysis? What are the concepts I need to follow?
It's possible to use c++11 (or boost) random stuff to create clusters, but it's a bit of work.
std::normal_distribution can generate univariate normal distributions with zero mean.
Using 1. you can sample from a normal vector (just create an n dimensional vector of such samples).
If you take a vector n from 2. and output A n + b, then you've transformed the center b away + modified by A. (In particular, for 2 and 3 dimensions it's easy to build A as a rotation matrix.) So, repeatedly sampling 2. and performing this transformation can give you a sample centered at b.
Choose k pairs of A, b, and generate your k clusters.
Notes
You can generate different clustering scenarios using different types of A matrices. E.g., if A is a non-length preserving matrix multiplied by a rotation matrix, then you can get "paraboloid" clusters (it's actually interesting to make them wider along the vectors connecting the centers).
You can either generate the "center" vectors b hardcoded, or using a distribution like used for the x vectors above (perhaps uniform, though, using this).

Efficiently searching arrays in FORTRAN

I am trying to store the stiffness matrix in FORTRAN in sparse format to save memory, i.e. I am using three vectors of non-zero elements (irows, icols, A). After finding out the size of these arrays the next step is to insert the values in them. So I am using gauss points, i.e. for each gauss point I am going to find out the local stiffness matrix and then insert this local stiffness matrix in the Global (irows, icols, A) one.
The main problem with this insertion is that every time we have to check that either the new value is exists in the global array or not, so if the value exists add the new to the old but if not append to the end. i.e. we have to search the whole array to find that either the value exists or not. If the size of these arrays (irows, icols, A) are large so this search is computationally very expensive.
Can any one suggest a better way of insertion of the local stiffness matrix for each gauss point the global stiffness matrix.
I am fairly sure that this is a well known problem in FEM analysis - I found reference to it in this scipy documentation, but of course the principals are language independent. Basically what you should do is create your matrix in the format you have, but instead of searching the matrix to see whether an entry already exists, just assume that it doesn't. This means that you will end up with duplicate entries which need to be added together to get the correct value.
Once you have constructed your matrix, you will usually convert it to some more efficient form for solving it (e.g. CSR etc.) - the exact format may be determined by the sparse solver you are using. During this conversion process duplicate entries should get added together - and some sparse matrix libraries will do this for you. I know that scipy does this, and many of its internal routines are written in fortran, so you might be able to use one of them (they are all open source). Or you could check if anything suitable is on netlib.
If you use a data structure that is pre-sorted it would be very efficient to search it. Either as your primary data structure or as an auxiliary data structure. You want one that you can insert another entry into the middle. For example, a binary search tree (http://en.wikipedia.org/wiki/Binary_search_tree).

how to create a 20000*20000 matrix in C++

I try to calculate a problem with 20000 points, so there is a distance matrix with 20000*20000 elements, how can I store this matrix in C++? I use Visual Studio 2008, on a computer with 4 GB of RAM. Any suggestion will be appreciated.
A sparse matrix may be what you looking for. Many problems don't have values in every cell of a matrix. SparseLib++ is a library which allows for effecient matrix operations.
Avoid the brute force approach you're contemplating and try to envision a solution that involves populating a single 20000 element list, rather than an array that covers every possible permutation.
For starters, consider the following simplistic approach which you may be able to improve upon, given the specifics of your problem:
int bestResult = -1; // some invalid value
int bestInner;
int bestOuter;
for ( int outer = 0; outer < MAX; outer++ )
{
for ( int inner = 0; inner < MAX; inner++ )
{
int candidateResult = SomeFunction( list[ inner ], list[ outer ] );
if ( candidateResult > bestResult )
{
bestResult = candidateResult;
bestInner = inner;
bestOuter = outer;
}
}
}
You can represent your matrix as a single large array. Whether it's a good idea to do so is for you to determine.
If you need four bytes per cell, your matrix is only 4*20000*20000, that is, 1.6GB. Any platform should give you that much memory for a single process. Windows gives you 2GiB by default for 32-bit processes -- and you can play with the linker options if you need more. All 32-bit unices I tried gave you more than 2.5GiB.
Is there a reason you need the matrix in memory?
Depending on the complexity of calculations you need to perform you could simply use a function that calculates your distances on the fly. This could even be faster than precalculating ever single distance value if you would only use some of them.
Without more references to the problem at hand (and the use of the matrix), you are going to get a lot of answers... so indulge me.
The classic approach here would be to go with a sparse matrix, however the default value would probably be something like 'not computed', which would require special handling.
Perhaps that you could use a caching approach instead.
Apparently I would say that you would like to avoid recomputing the distances on and on and so you'd like to keep them in this huge matrix. However note that you can always recompute them. In general, I would say that trying to store values that can be recomputed for a speed-off is really what caching is about.
So i would suggest using a distance class that abstract the caching for you.
The basic idea is simple:
When you request a distance, either you already computed it, or not
If computed, return it immediately
If not computed, compute it and store it
If the cache is full, delete some elements to make room
The practice is a bit more complicated, of course, especially for efficiency and because of the limited size which requires an algorithm for the selection of those elements etc...
So before we delve in the technical implementation, just tell me if that's what you're looking for.
Your computer should be able to handle 1.6 GB of data (assuming 32bit)
size_t n = 20000;
typedef long dist_type; // 32 bit
std::vector <dist_type> matrix(n*n);
And then use:
dist_type value = matrix[n * y + x];
You can (by using small datatypes), but you probably don't want to.
You are better off using a quad tree (if you need to find the nearest N matches), or a grid of lists (if you want to find all points within R).
In physics, you can just approximate distant points with a field, or a representative amalgamation of points.
There's always a solution. What's your problem?
Man you should avoid the n² problem...
Put your 20 000 points into a voxel grid.
Finding closest pair of points should then be something like n log n.
As stated by other answers, you should try hard to either use sparse matrix or come up with a different algorithm that doesn't need to have all the data at once in the matrix.
If you really need it, maybe a library like stxxl might be useful, since it's specially designed for huge datasets. It handles the swapping for you almost transparently.
Thanks a lot for your answers. What I am doing is to solve a vehicle routing problem with about 20000 nodes. I need one matrix for distance, one matrix for a neighbor list (for each node, list all other nodes according to the distance). This list will be used very often to find who can be some candidates. I guess sometimes distances matrix can be ommited if we can calculate when we need. But the neighbor list is not convenient to create every time. the list data type could be int.
To mgb:
how much can a 64 bit windows system help this situation?