Eigen LLT (Cholesky) fails, while SVD works - c++

I'm trying to reproduce some numpy code on Gaussian Processes (from here) using Eigen. Basically, I need to sample from a multivariate normal distribution:
samples = np.random.multivariate_normal(mu.ravel(), cov, 1)
The mean vector is currently zero, while the covariance matrix is a square matrix generated via the isotropic squared exponential kernel:
sqdist = np.sum(X1**2, 1).reshape(-1, 1) + np.sum(X2**2, 1) - 2 * np.dot(X1, X2.T)
return sigma_f**2 * np.exp(-0.5 / l**2 * sqdist)
I can generate the covariance matrix just fine for now (it can probably be cleaned but for now it's a POC):
Matrix2D kernel(const Matrix2D & x1, const Matrix2D & x2, double l = 1.0, double sigma = 1.0) {
auto dists = ((- 2.0 * (x1 * x2.transpose())).colwise()
+ x1.rowwise().squaredNorm()).rowwise() +
+ x2.rowwise().squaredNorm().transpose();
return std::pow(sigma, 2) * ((-0.5 / std::pow(l, 2)) * dists).array().exp();
}
However, my problems start when I need to sample the multivariate normal.
I've tried using the solution proposed in this accepted answer; however, the decomposition only works with covariance matrices of size up to 30x30; more than that and LLT fails to decompose the matrix. The alternative version provided in the answer also does not work, and creates NaNs. I tried LDLT as well but it also breaks (D contains negative values, so sqrt gives NaN).
Then, I got curious, and I looked into how numpy does this. Turns out the numpy implementation uses SVD decomposition (with LAPACK), rather than Cholesky. So I tried copying their implementation:
// SVD on the covariance matrix generated via kernel function
Eigen::BDCSVD<Matrix2D> solver(covs, Eigen::ComputeFullV);
normTransform = (-solver.matrixV().transpose()).array().colwise() * solver.singularValues().array().sqrt();
// Generate gaussian samples, "randN" is from the multivariate StackOverflow answer
Matrix2D gaussianSamples = Eigen::MatrixXd::NullaryExpr(1, means.size(), randN);
Eigen::MatrixXd samples = (gaussianSamples * normTransform).rowwise() + means.transpose();
The various minuses are me trying to exactly reproduce numpy's results.
In any case, this works perfectly fine, even with large dimensions. I was wondering why Eigen is not able to do LLT, but SVD works. The covariance matrix I use is the same. Is there something I can do to simply use LLT?
EDIT: Here is my full example:
#include <iostream>
#include <random>
#include <Eigen/Cholesky>
#include <Eigen/SVD>
#include <Eigen/Eigenvalues>
using Matrix2D = Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor | Eigen::AutoAlign>;
using Vector = Eigen::Matrix<double, Eigen::Dynamic, 1>;
/*
We need a functor that can pretend it's const,
but to be a good random number generator
it needs mutable state.
*/
namespace Eigen {
namespace internal {
template<typename Scalar>
struct scalar_normal_dist_op
{
static std::mt19937 rng; // The uniform pseudo-random algorithm
mutable std::normal_distribution<Scalar> norm; // The gaussian combinator
EIGEN_EMPTY_STRUCT_CTOR(scalar_normal_dist_op)
template<typename Index>
inline const Scalar operator() (Index, Index = 0) const { return norm(rng); }
};
template<typename Scalar> std::mt19937 scalar_normal_dist_op<Scalar>::rng;
template<typename Scalar>
struct functor_traits<scalar_normal_dist_op<Scalar> >
{ enum { Cost = 50 * NumTraits<Scalar>::MulCost, PacketAccess = false, IsRepeatable = false }; };
} // end namespace internal
} // end namespace Eigen
Matrix2D kernel(const Matrix2D & x1, const Matrix2D & x2, double l = 1.0, double sigma = 1.0) {
auto dists = ((- 2.0 * (x1 * x2.transpose())).colwise() + x1.rowwise().squaredNorm()).rowwise() + x2.rowwise().squaredNorm().transpose();
return std::pow(sigma, 2) * ((-0.5 / std::pow(l, 2)) * dists).array().exp();
}
int main() {
unsigned size = 50;
unsigned seed = 1;
Matrix2D X = Vector::LinSpaced(size, -5.0, 4.8);
Eigen::internal::scalar_normal_dist_op<double> randN; // Gaussian functor
Eigen::internal::scalar_normal_dist_op<double>::rng.seed(seed); // Seed the rng
Vector means = Vector::Zero(X.rows());
auto covs = kernel(X, X);
Eigen::LLT<Matrix2D> cholSolver(covs);
// We can only use the cholesky decomposition if
// the covariance matrix is symmetric, pos-definite.
// But a covariance matrix might be pos-semi-definite.
// In that case, we'll go to an EigenSolver
Eigen::MatrixXd normTransform;
if (cholSolver.info()==Eigen::Success) {
std::cout << "Used LLT\n";
// Use cholesky solver
normTransform = cholSolver.matrixL();
} else {
std::cout << "Broken\n";
Eigen::BDCSVD<Matrix2D> solver(covs, Eigen::ComputeFullV);
normTransform = (-solver.matrixV().transpose()).array().colwise() * solver.singularValues().array().sqrt();
}
Matrix2D gaussianSamples = Eigen::MatrixXd::NullaryExpr(1, means.size(), randN);
Eigen::MatrixXd samples = (gaussianSamples * normTransform).rowwise() + means.transpose();
return 0;
}

Related

2D transformations by normal vector with eigen library

I am looking for a builtin way with the eigen library to perform coordinate transformations by normal vectors in 2D space.
Mathematically, it's not difficult: Let v = (v_x, v_y) be a 2D column vector and n = (n_x, n_y) be a normal vector, then the transformation I am looking for is one by rotational matrix:
v_T = N * v, with v_T being the transformed vector and N being the rotational matrix, which is
| nx, ny |
| -ny, nx |
In my case, the data I need to transform is stored in an Array2Xd and the normal vectors are stored in a Matrix2Xd, with each column holding x- and y-component. I need to transform each column in the array by the corresponding normal vector in the matrix.
Currently, I'm doing it like this:
#include <Eigen/Dense>
#include <iostream>
using namespace Eigen;
/* transform a single vector, just for illustration */
Array2d transform_s( const Ref<const Array2d>& v, const Ref<const Vector2d>& n ){
return {
n.dot( v.matrix() ),
-n.y() * v.x() + n.x() * v.y()
};
}
/* transform multiple columns */
Array2Xd transform_m( const Ref<const Array2Xd>& v, const Ref<const Array2Xd>& n ){
Array2Xd transformed ( 2, v.cols() );
/* colwise dot product for first row */
transformed.row(0) = (n * v).colwise().sum();
/* even less elegant calculation for the second row */
transformed.row(1) = n.row(0) * v.row(1) - n.row(1) * v.row(0);
return transformed;
}
int main(){
Array2Xd vals (2, 3);
vals <<
2, 0,-1,
0, 3, 2;
Matrix2Xd n;
n.resizeLike(vals);
n <<
0, 0, 1,
1,-1, 1;
n.colwise().normalize();
std::cout
<< "single column:\n" << transform_s( vals.col(0), n.col(0) )
<< "\nall columns:\n" << transform_m( vals, n.array() )
<< "\n";
return 0;
}
I'm aware of Eigen::Rotation2D, but it appears to either require an angle or a rotational matrix. I am specifically looking for a way to only provide the normal vectors. Otherwise I need to build the rotational matrices from the normal vectors myself, which doesn't really reduce the complexity on my end.
If there's no way to do this with eigen, I'll accept that as an answer. In that case, I'd be very happy about a more efficient implementation of what I wrote above.
What you are doing is essentially a complex multiplication with conj(n).
There is no elegant way to reinterpret a Vector2d/Array2Xd to a complex<double>/ArrayXcd, but you can hack something together using Maps:
Array2Xd transform_complex( const Ref<const Array2Xd>& v, const Ref<const Array2Xd>& n ){
Array2Xd transformed(2, v.cols());
ArrayXcd::Map(reinterpret_cast<std::complex<double>*>(transformed.data()), v.cols())
= ArrayXcd::Map(reinterpret_cast<std::complex<double> const*>(v.data()), v.cols())
* ArrayXcd::Map(reinterpret_cast<std::complex<double> const*>(n.data()), n.cols()).conjugate();
return transformed;
}
You could write yourself a helper function which takes a const Ref<const Array2Xd>& and returns a Map<ArrayXcd> with the same content.

Armadillo - no member named i in matrix expression

According to Armadillo docs:
.i()
Member function of any matrix expression
Provides an inverse of the matrix expression
...
However, when I try to compile this snippet:
#include <armadillo>
#include <iostream>
arma::sp_mat linReg(arma::sp_mat X, arma::sp_mat Y) {
return (X.t() * X).i() * X.t() * Y;
}
int main() {
arma::sp_mat X = arma::sprandu(1000, 10, 0.3);
arma::sp_mat y = arma::sprandu(1000, 10, 0.3);
std::cout << linReg(X,y).t() << std::endl;
}
I get the following error
lreg.cpp: In function ‘arma::sp_mat linReg(arma::sp_mat,
arma::sp_mat)’: lreg.cpp:6:24: error: ‘arma::enable_if2<true, const
arma::SpGluearma::SpOp<arma::SpMat<double, arma::spop_htrans>,
arma::SpMat, arma::spglue_times> >::result’ {aka ‘const class
arma::SpGluearma::SpOp<arma::SpMat<double, arma::spop_htrans>,
arma::SpMat, arma::spglue_times>’} has no member named ‘i’
6 | return (X.t() * X).i() * X.t() * Y;
|
I already tried with mat and it works fine. Any clue why it's not working with sparse matrix? And if so, how can we calculate the inverse of a sparse matrix?
Taking the inverse of a sparse matrix is often not desired as you end up with a dense matrix. Often the explicit inverse is not required.
Instead of taking the inverse here, maybe treat the problem as solving a system of linear equations. Then reformulate using solve() or spsolve(). Below is an untested example for demonstrating the general approach:
arma::mat linReg(const arma::sp_mat& X, const arma::sp_mat& Y) {
arma::sp_mat A = X.t() * X;
arma::mat B = arma::mat(X.t() * Y); // convert to dense matrix
arma::mat result;
bool ok = arma::spsolve(result, A, B);
if(ok == false) {
// handle failure here
}
return result;
}

Using derivatives as functions in CppAD

I am trying to modify the example here:
# include <cppad/cppad.hpp>
namespace { // ---------------------------------------------------------
// define the template function JacobianCases<Vector> in empty namespace
template <typename Vector>
bool JacobianCases()
{ bool ok = true;
using CppAD::AD;
using CppAD::NearEqual;
double eps99 = 99.0 * std::numeric_limits<double>::epsilon();
using CppAD::exp;
using CppAD::sin;
using CppAD::cos;
// domain space vector
size_t n = 2;
CPPAD_TESTVECTOR(AD<double>) X(n);
X[0] = 1.;
X[1] = 2.;
// declare independent variables and starting recording
CppAD::Independent(X);
// a calculation between the domain and range values
AD<double> Square = X[0] * X[0];
// range space vector
size_t m = 3;
CPPAD_TESTVECTOR(AD<double>) Y(m);
Y[0] = Square * exp( X[1] );
Y[1] = Square * sin( X[1] );
Y[2] = Square * cos( X[1] );
// create f: X -> Y and stop tape recording
CppAD::ADFun<double> f(X, Y);
// new value for the independent variable vector
Vector x(n);
x[0] = 2.;
x[1] = 1.;
// compute the derivative at this x
Vector jac( m * n );
jac = f.Jacobian(x);
/*
F'(x) = [ 2 * x[0] * exp(x[1]) , x[0] * x[0] * exp(x[1]) ]
[ 2 * x[0] * sin(x[1]) , x[0] * x[0] * cos(x[1]) ]
[ 2 * x[0] * cos(x[1]) , -x[0] * x[0] * sin(x[i]) ]
*/
ok &= NearEqual( 2.*x[0]*exp(x[1]), jac[0*n+0], eps99, eps99);
ok &= NearEqual( 2.*x[0]*sin(x[1]), jac[1*n+0], eps99, eps99);
ok &= NearEqual( 2.*x[0]*cos(x[1]), jac[2*n+0], eps99, eps99);
ok &= NearEqual( x[0] * x[0] *exp(x[1]), jac[0*n+1], eps99, eps99);
ok &= NearEqual( x[0] * x[0] *cos(x[1]), jac[1*n+1], eps99, eps99);
ok &= NearEqual(-x[0] * x[0] *sin(x[1]), jac[2*n+1], eps99, eps99);
return ok;
}
} // End empty namespace
# include <vector>
# include <valarray>
bool Jacobian(void)
{ bool ok = true;
// Run with Vector equal to three different cases
// all of which are Simple Vectors with elements of type double.
ok &= JacobianCases< CppAD::vector <double> >();
ok &= JacobianCases< std::vector <double> >();
ok &= JacobianCases< std::valarray <double> >();
return ok;
}
I am trying to modify it in the following way:
Let G be the Jacobian jac that is calculated in this example, in the line:
jac = f.Jacobian(x);
and, as in the example, let X be the independent variables. I would like to construct a new function, H, which is a function of jac, i.e. H(jacobian(X)) = something, such that H is autodifferentiable. An example may be H(X) = jacobian( jacobian(X)[0]), i.e. the jacobian of the first element of jacobian(X) w.r.t X (a second derivative of sorts).
The problem is that jac as written here is of type Vector, which is a parameterized type on a raw double, not an AD<double>. To my knowledge, this means the output is not autodifferentiable.
I am looking for some advice on if it is possible to use the Jacobian in a larger operation, and take the Jacobian of that larger operation (not unlike any arithmetic operator) or if this is not possible.
EDIT: This has been put up for a bounty once, but I'm putting it up again to see if there's a better solution, because I think this is important. To be a bit more clear, the elements that the "correct" answer needs are:
a) A means of calculating arbitrary order derivatives.
b) An intelligent way of not having to specify the order of derivatives a priori. If the maximum order derivative must be known at compile time, the order of derivative can't be determined algorithmically. Further, specifying an enormously large order as in the current answer given will lead to memory allocation issues and, I imagine, performance issues.
c) Abstracting the templating of derivative order away from the end-user. This is important, because it can be difficult to keep track of the order of derivatives needed. This is probably something that comes "for free" if b) is solved.
If anybody can crack this, it would be an awesome contribution and an extremely useful operation.
If you want to nest functions, you should nest the AD<> as well. You can nest Jacobians as other functions, for instance see the code snippet below, which is computing the double derivative by nesting Jacobian
#include <cstring>
#include <iostream> // standard input/output
#include <vector> // standard vector
#include <cppad/cppad.hpp> // the CppAD package http://www.coin-or.org/CppAD/
// main program
int main(void)
{ using CppAD::AD; // use AD as abbreviation for CppAD::AD
using std::vector; // use vector as abbreviation for std::vector
size_t i; // a temporary index
// domain space vector
auto Square = [](auto t){return t*t;};
vector< AD<AD<double>> > X(1); // vector of domain space variables
// declare independent variables and start recording operation sequence
CppAD::Independent(X);
// range space vector
vector< AD<AD<double>> > Y(1); // vector of ranges space variables
Y[0] = Square(X[0]); // value during recording of operations
// store operation sequence in f: X -> Y and stop recording
CppAD::ADFun<AD<double>> f(X, Y);
// compute derivative using operation sequence stored in f
vector<AD<double>> jac(1); // Jacobian of f (m by n matrix)
vector<AD<double>> x(1); // domain space vector
CppAD::Independent(x);
jac = f.Jacobian(x); // Jacobian for operation sequence
CppAD::ADFun<double> f2(x, jac);
vector<double> result(1);
vector<double> x_res(1);
x_res[0]=15.;
result=f2.Jacobian(x_res);
// print the results
std::cout << "f'' computed by CppAD = " << result[0] << std::endl;
}
As a side-note, since C++14 or 11 implementing expression templates and automatic differentiation became easier and can be done with much less effort, as shown e.g. in this video towards the end https://www.youtube.com/watch?v=cC9MtflQ_nI (sorry for the poor quality). If I had to implement reasonably simple symbolic operations I would start from scratch with modern C++: you can write simpler code, and you get errors that you can understand easily.
Edit:
Generalizing the example to build arbitrary order derivatives can be a template metaprogramming exercice. The snippet below shows it is possible using template recursion
#include <cstring>
#include <iostream>
#include <vector>
#include <cppad/cppad.hpp>
using CppAD::AD;
using std::vector;
template<typename T>
struct remove_ad{
using type=T;
};
template<typename T>
struct remove_ad<AD<T>>{
using type=T;
};
template<int N>
struct derivative{
using type = AD<typename derivative<N-1>::type >;
static constexpr int order = N;
};
template<>
struct derivative<0>{
using type = double;
static constexpr int order = 0;
};
template<typename T>
struct Jac{
using value_type = typename remove_ad<typename T::type>::type;
template<typename P, typename Q>
auto operator()(P & X, Q & Y){
CppAD::ADFun<value_type> f(X, Y);
vector<value_type> jac(1);
vector<value_type> x(1);
CppAD::Independent(x);
jac = f.Jacobian(x);
return Jac<derivative<T::order-1>>{}(x, jac);
}
};
template<>
struct Jac<derivative<1>>{
using value_type = derivative<0>::type;
template<typename P, typename Q>
auto operator()(P & x, Q & jac){
CppAD::ADFun<value_type> f2(x, jac);
vector<value_type> res(1);
vector<value_type> x_res(1);
x_res[0]=15.;
return f2.Jacobian(x_res);
}
};
int main(void)
{
constexpr int order=4;
auto Square = [](auto t){return t*t;};
vector< typename derivative<order>::type > X(1);
vector< typename derivative<order>::type > Y(1);
CppAD::Independent(X);
Y[0] = Square(X[0]);
auto result = Jac<derivative<order>>{}(X, Y);
std::cout << "f'' computed by CppAD = " << result[0] << std::endl;
}
There is a new feature in CppAD that eliminates the need for AD< AD >, see
https://coin-or.github.io/CppAD/doc/base2ad.cpp.htm

Abstract class on top of Eigen's Matrix and Vector

I would like to implement a neural network framework consisting of layers which can then be composed into a computational graph (see for example caffe). I am using the eigen library for matrices. Eigen distinguishes between vectors and matrices so that for some operations (adding a bias to a matrix) only a vector can be used (and not a matrix with the same dimensions as the vector). For example:
MatrixXf A = MatrixXf(3, 2); // Variables not initialized for brevity
VectorXf v = VectorXf(2);
MatrixXf R1 = A.array().rowwise() + v.transpose().array(); // Broadcasts v correctly
MatrixXf vMat = MatrixXf(1, 2);
MatrixXf R2 = A.array().rowwise() + vMat.array(); // YOU_TRIED_CALLING_A_VECTOR_METHOD_ON_A_MATRIX Error
If I want the layers to look something like this:
void AffineForward(std::vector<Tensor> in, std::vector<Tensor> out)
{
MatrixXf &X = in[0];
MatrixXf &W = in[1];
VectorXf &b = in[2];
out[0] = X * W;
out[0] += b;
}
how would I design the abstract Tensor class so that I can just send in a std::vector of Tensors? I thought about something like this:
class Tensor
{
public:
virtual Tensor operator*(const Tensor &t) const = 0;
};
class TensorMatrix : Tensor
{
public:
TensorMatrix operator*(const TensorMatrix &t) const;
TensorMatrix operator*(const TensorVector &t) const;
MatrixXf _data;
};
class TensorVector : Tensor
{
public:
VectorXf _data;
};
but the virtual Tensor operator* throws a compile time error (function returning abstract class Tensor is not allowed) which makes sense.
What is the easiest way of doing what I want? Creating some class that could be put into a container and I could get both MatrixXf and VectorXf out of it (depending on what the user put in?). Caffe uses something called 'Blob'.
Eigen distinguishes between vectors and matrices so that for some operations (adding a bias to a matrix) only a vector can be used (and not a matrix with the same dimensions as the vector).
this is not true, a vector is a matrix in Eigen; it's just that some operations require dimensions to be known at compile time; in your example
MatrixXf R1 = A.rowwise() + v.transpose();
MatrixXf R2 = A.rowwise() + vMat;
the second line does not compile because that broadcasting needs a matrix with a compile time row-dimensions == 1;
the solution is to tell Eigen you want a row vector explictly:
MatrixXf R2 = A.rowwise() + vMat.row(0);
a code working with both row and column vectors stored as MatrixXf being something like ( whether this advisable or not depending on your ultimate requirements )
if( vMat.rows() == 1 )
MatrixXf R1 = A.rowwise() + vMat.rows(0); ...
else if( vMat.cols() == 1 )
MatrixXf R2 = A.rowwise() + vMat.transpose().rows(0); ...
else
whatever...
so, you can always store vectors as matrices with Eigen, you just need some care in telling Eigen what to do with them ...

matrix inversion in boost

I am trying to do a simple matrix inversion operation using boost. But I
am getting an error.
Basically what I am trying to find is inversted_matrix =
inverse(trans(matrix) * matrix)
But I am getting an error
Check failed in file boost_1_53_0/boost/numeric/ublas/lu.hpp at line 299:
detail::expression_type_check (prod (triangular_adaptor<const_matrix_type,
upper> (m), e), cm2)
terminate called after throwing an instance of
'boost::numeric::ublas::internal_logic'
what(): internal logic
Aborted (core dumped)
My attempt:
#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/vector.hpp>
#include <boost/numeric/ublas/io.hpp>
#include <boost/numeric/ublas/vector_proxy.hpp>
#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/triangular.hpp>
#include <boost/numeric/ublas/lu.hpp>
namespace ublas = boost::numeric::ublas;
template<class T>
bool InvertMatrix (const ublas::matrix<T>& input, ublas::matrix<T>& inverse) {
using namespace boost::numeric::ublas;
typedef permutation_matrix<std::size_t> pmatrix;
// create a working copy of the input
matrix<T> A(input);
// create a permutation matrix for the LU-factorization
pmatrix pm(A.size1());
// perform LU-factorization
int res = lu_factorize(A,pm);
if( res != 0 )
return false;
// create identity matrix of "inverse"
inverse.assign(ublas::identity_matrix<T>(A.size1()));
// backsubstitute to get the inverse
lu_substitute(A, pm, inverse);
return true;
}
int main(){
using namespace boost::numeric::ublas;
matrix<double> m(4,5);
vector<double> v(4);
vector<double> thetas;
m(0,0) = 1; m(0,1) = 2104; m(0,2) = 5; m(0,3) = 1;m(0,4) = 45;
m(1,0) = 1; m(1,1) = 1416; m(1,2) = 3; m(1,3) = 2;m(1,4) = 40;
m(2,0) = 1; m(2,1) = 1534; m(2,2) = 3; m(2,3) = 2;m(2,4) = 30;
m(3,0) = 1; m(3,1) = 852; m(3,2) = 2; m(3,3) = 1;m(3,4) = 36;
std::cout<<m<<std::endl;
matrix<double> product = prod(trans(m), m);
std::cout<<product<<std::endl;
matrix<double> inversion(5,5);
bool inverted;
inverted = InvertMatrix(product, inversion);
std::cout << inversion << std::endl;
}
Boost Ublas has runtime checks to ensure among other thing numerical stability.
If you look at source of the error, you can see that it tries to make sure that
U*X = B, X = U^-1*B, U*X = B (or smth like that) are coorect to within some epsilon. If you have too much deviation numerically this will likely not hold.
You can disable checks via -DBOOST_UBLAS_NDEBUG or twiddle with BOOST_UBLAS_TYPE_CHECK_EPSILON, BOOST_UBLAS_TYPE_CHECK_MIN.
As m has only 4 rows, prod(trans(m), m) cannot have a rank higher than 4, and as the product is a 5x5 matrix, it must be singular (i.e. it has determinant 0) and calculating the inverse of a singular matrix is like division by 0. Add independent rows to m to solve this singularity problem.
I think your matrix dimension, 4 by 5, caused the error. Like what Maarten Hilferink mentioned, you may try with a square matrix like 5 by 5. Here are requirement to have an inverse:
The matrix must be square (same number of rows and columns).
The determinant of the matrix must not be zero (determinants are covered in section 6.4). This is instead of the real number not being zero to have an inverse, the determinant must not be zero to have an inverse.