I want to create a matrix in Armadillo, which can keep different datatypes in a matrix. For example, I want to have a matrix with three integer columns, a float column, and a column with enumeration value. Is there any solution?
Armadillo matrices store all elements internally as a standard C array of the element datatype. That means all elements must have the same type. This makes sense for armadillo since it is intended to be used for linear algebra and numerical computations, and not as a general container.
For your particular case it is probably better to simply create separated objects. You could, for instance, create a matrix of integers (arma::imat or arma::umat depending if you want sign), a vector of floats (arma::vec) and for the column of enumeration you could use std::vector.
Then you can create a struct with three fields to store these objects (or use a tuple) if you always want to keep them together (to easily pass them as arguments, for instance).
Related
In MATLAB (and many other similar languages), we can construct such a matrix with
mat = zeros(A,B,C)
It does not appear that there is such a convenience constructor in Stata [J(.,.,.) appears to only construct 2D matrices]. Is there any way to construct a 3D matrix?
Strictly a matrix is two-dimensional, or one-dimensional as a special case. But we know what you mean.
Mata supports associative arrays.
In Stata, people would be more likely to set up a data structure with several variables as row identifier, column identifier, layer identifier, whatever, whatever else, and so on.
I'm new to C++ and I think a good way for me to jump in is to build some basic models that I've built in other languages. I want to start with just Linear Regression solved using first order methods. So here's how I want things to be organized (in pseudocode).
class LinearRegression
LinearRegression:
tol = <a supplied tolerance or defaulted to 1e-5>
max_ite = <a supplied max iter or default to 1k>
fit(X, y):
// model learns weights specific to this data set
_gradient(X, y):
// compute the gradient
score(X,y):
// model uses weights learned from fit to compute accuracy of
// y_predicted to actual y
My question is when I use fit, score and gradient methods I don't actually need to pass around the arrays (X and y) or even store them anywhere so I want to use a reference or a pointer to those structures. My problem is that if the method accepts a pointer to a 2D array I need to supply the second dimension size ahead of time or use templating. If I use templating I now have something like this for every method that accepts a 2D array
template<std::size_t rows, std::size_t cols>
void fit(double (&X)[rows][cols], double &y){...}
It seems there likely a better way. I want my regression class to work with any size input. How is this done in industry? I know in some situations the array is just flattened into row or column major format where just a pointer to the first element is passed but I don't have enough experience to know what people use in C++.
You wrote a quite a few points in your question, so here are some points addressing them:
Contemporary C++ discourages working directly with heap-allocated data that you need to manually allocate or deallocate. You can use, e.g., std::vector<double> to represent vectors, and std::vector<std::vector<double>> to represent matrices. Even better would be to use a matrix class, preferably one that is already in mainstream use.
Once you use such a class, you can easily get the dimension at runtime. With std::vector, for example, you can use the size() method. Other classes have other methods. Check the documentation for the one you choose.
You probably really don't want to use templates for the dimensions.
a. If you do so, you will need to recompile each time you get a different input. Your code will be duplicated (by the compiler) to the number of different dimensions you simultaneously use. Lots of bad stuff, with little gain (in this case). There's no real drawback to getting the dimension at runtime from the class.
b. Templates (in your setting) are fitting for the type of the matrix (e.g., is it a matrix of doubles or floats), or possibly the number of dimesions (e.g., for specifying tensors).
Your regressor doesn't need to store the matrix and/or vector. Pass them by const reference. Your interface looks like that of sklearn. If you like, check the source code there. The result of calling fit just causes the class object to store the parameter corresponding to the prediction vector β. It doesn't copy or store the input matrix and/or vector.
I understand what an array and a matrix is. I want to learn how to create 3D graphics and I want to know if a multi-demionsional array is the same as a matrix.
There are several uses of the term "matrix". Normally however we say that a matrix is a 2-dimensional array of scalar (integer or floating point) values, with known dimensions, an entry for every position (no missing values allowed), and arranged such that the columns represent observations about or operations on the rows of another matrix. So if we have a matrix with four columns, it only makes sense if we have another matrix or vector with four rows to which the four columns apply.
So the obvious way to represent a matrix in C++ is as a 2D array. But 2D arrays aren't identical with matrices. You might have a 2D array that is not a matrix (missing values which are uninitialised or nan), or a matrix that is not a 2D array (we could represent as a 1D array and do the index calculations manually, or as a "sparse matrix" where most values are expected to be zero and we just have a list of non-zero values).
Matrix is an abstract mathematical concept that can be modeled in C++ using a number of ways:
A two-dimensional array,
An array of pointers to arrays with arrays of identical size
A std::vector<std::vector<T>>
An std::array<N,std::array<M,T>>
A library-specific opaque implementation
The actual implementation is always specific to the drawing library that you have in mind.
I want to represent a 2D shape in such a way that it can be interacted with as if it were a vector of points, in particular I want to be able to call operator[] and at() on it and return references to things that act like 2D points. Currently I just use a class whose only member variable is a vector of points and that has various arithmetic and geometric operations defined pointwise on its elements.
However, in other parts of my code I need to treat a vector of n points as an element of 2n dimensional space and perform basic linear algebra on it (e.g. projecting the vector onto a given subspace of R^2n). Currently I'm creating an Eigen::VectorXd object every time I want to do this, and then converting back after performing these operations. I don't want to do this, as I make the conversion often enough that all the copying is a noticeable source of inefficiency.
If I was storing the data as a flat array of doubles/floats/ints, I could cast a pointer to its nth element to a pointer to a Point (whose members would just be a pair of doubles/floats/ints). However, as I don't know the internal representation that Eigen uses for vectors (and it may well change), this isn't possible.
Is there a sensible way of solving this? I could just use Eigen::Vectors everywhere, but I really want most of the code to be able to pretend that it is dealing with a set of points.
However, as I don't know the internal representation that Eigen uses for vectors (and it may well change), this isn't possible.
Eigen offers the Map classes that allow mapping plain arrays to Eigen structures. For example:
double numbers[2];
Eigen::Vector2f::Map( numbers ).dot( Eigen::Vector2f::Constant(1) );
I have a dense matrix where the indices correspond to genes. While gene identifiers are often integers, they are not contiguous integers. They could be strings instead, too.
I suppose I could use a boost sparse matrix of some sort with integer keys, and it wouldn't matter if they're contiguous. Or would this still occupy a great deal of space, particularly if some genes have identifiers that are nine digits?
Further, I am concerned that sparse storage is not appropriate, since this is an all-by-all matrix (there will be a distance in each and every cell, provided the gene exists).
I'm unlikely to need to perform any matrix operations (e.g., matrix multiplication). I will need to pull vectors out of the matrix (slices).
It seems like the best type of matrix would be keyed by a Boost unordered_map (a hash map), or perhaps even simply an STL map.
Am I looking at this the wrong way? Do I really need to roll my own? I thought I saw such a class somewhere before.
Thanks!
You could use a std::map to map the gene identifiers to unique, consecutively assigned integers (every time you add a new gene identifier to the map, you can give it the map's size as its identifier, assuming you never remove genes from the map).
If you want to be able to search for the identifier of a gene based on its unique integer, you can use a second map or you could use a boost::bimap, which provides a bidirectional mapping of elements.
As for which matrix container to use, you might consider boost::ublas::matrix; it provides vector-like access to rows and columns of the matrix.
If you don't need matrix operations, you don't need a matrix. A 2D map with string keys can be done with map<map<string> > in plain C++, or using a hash map accordingly from Boost.
There is Boost.MultiArray which will allow you to manage with non-continuous indexes.
If you want an efficient implementation working with matrices with static size, there is also Boost.LA, which in now on the review schedule.
And las there is also NT2 which should be submitted to Boost soon.