I am using Boost's uBLAS in a numerical code and have a 'heavy' solver in place:
The code works excellently, however, it is painfully slow. After some research, I found UMFPACK, which is a sparse matrix solver (among other things). My code generates large sparse matrices which I need to invert very frequently (more correctly solve, the value of the inverse matrix is irrelevant), so UMFPACk and BOOST's Sparse_Matrix class seems to be a happy marriage.
UMFPACK asks for the sparse matrix specified by three vectors: an entry count, row indexes, and the entries. (See example).
My question boils down to, can I get these three vectors efficiently from BOOST's Sparse Matrix class?

There is a binding for this:
The project seems to be two years stagnant, but it does the job well. An example use:
#include <iostream>
#include <boost/numeric/bindings/traits/ublas_vector.hpp>
#include <boost/numeric/bindings/traits/ublas_sparse.hpp>
#include <boost/numeric/bindings/umfpack/umfpack.hpp>
#include <boost/numeric/ublas/io.hpp>
namespace ublas = boost::numeric::ublas;
namespace umf = boost::numeric::bindings::umfpack;
int main() {
ublas::compressed_matrix<double, ublas::column_major, 0,
ublas::unbounded_array<int>, ublas::unbounded_array<double> > A (5,5,12);
ublas::vector<double> B (5), X (5);
A(0,0) = 2.; A(0,1) = 3;
A(1,0) = 3.; A(1,2) = 4.; A(1,4) = 6;
A(2,1) = -1.; A(2,2) = -3.; A(2,3) = 2.;
A(3,2) = 1.;
A(4,1) = 4.; A(4,2) = 2.; A(4,4) = 1.;
B(0) = 8.; B(1) = 45.; B(2) = -3.; B(3) = 3.; B(4) = 19.;
umf::symbolic_type<double> Symbolic;
umf::numeric_type<double> Numeric;
umf::symbolic (A, Symbolic);
umf::numeric (A, Symbolic, Numeric);
umf::solve (A, X, B, Numeric);
std::cout << X << std::endl; // output: [5](1,2,3,4,5)
Implementing the Bartels–Stewart algorithm in Eigen3 -- real matrices only?

Based off this question and solution -- Implementing the Bartels–Stewart algorithm in Eigen3? -- I am trying to solve Lyapunov equations (AX + XA^T = C) using the Eigen library, but am limited to real matrices.
The R (with c++) code below works, but involves complex numbers. It can definitely be simplified (since in this framing, there is no B matrix), but the main difficulty is the reliance on complex numbers. The real schur form seems to be the standard alternative in this case, but the Eigen function matrix_function_solve_triangular_sylvester then does not work because the input matrix is not upper triangular, but is upper block triangular. I would be happy to see suggestions to a) remove the need for complex numbers, and then if that is possible, b) any efficiency improvements.
# R -----------------------------------------------------------------------
d<-6 #dimensions
A<-matrix(rnorm(d^2),d,d) #continuous time transition
G <- matrix(rnorm(d^2),d,d)
C<-G %*% t(G) #continuous time pos def error
AHATCH<-A %x% diag(d) + diag(d) %x% A
Xtrue<-matrix(-solve(AHATCH,c(C)), d) #asymptotic error from continuous time
# c++ in R ---------------------------------------------------------------------
sylcpp <- '
using Eigen::Map;
using Eigen::MatrixXd;
// Map the double matrix A from Ar
const Map<MatrixXd> A(as<Map<MatrixXd> >(Ar));
// Map the double matrix Q from Qr
const Map<MatrixXd> Q(as<Map<MatrixXd> >(Qr));
Eigen::MatrixXd B = A.transpose();
Eigen::ComplexSchur<Eigen::MatrixXd> SchurA(A);
Eigen::MatrixXcd R = SchurA.matrixT();
Eigen::MatrixXcd U = SchurA.matrixU();
Eigen::ComplexSchur<Eigen::MatrixXd> SchurB(B);
Eigen::MatrixXcd S = SchurB.matrixT();
Eigen::MatrixXcd V = SchurB.matrixU();
Eigen::MatrixXcd F = (U.adjoint() * Q) * V;
Eigen::MatrixXcd Y = Eigen::internal::matrix_function_solve_triangular_sylvester(R, S, F);
Eigen::MatrixXd X = ((U * Y) * V.adjoint()).real();
return wrap(X);
syl <- cxxfunction(signature(Ar = "matrix",Qr='matrix'), sylcpp, plugin = "RcppEigen")
X-Xtrue #approx zero
In principle, you could use RealSchur insted.
That will produce a quasi-triangular real R.

Vectorization of weighted outer product

I am looking to accelerate the calculation of an approximate weighted covariance.
Specifically, I have a Eigen::VectorXd(N) w and a Eigen::MatrixXd(M,N) points. I'd like to calculate the sum of w(i)*points.col(i)*(points.col(i).transpose()).
I am using a for loop but would like to see if I can go faster:
Eigen::VectorXd w = Eigen::VectorXd(N) ;
Eigen::MatrixXd points = Eigen::MatrixXd(M,N) ;
Eigen::MatrixXd tempMatrix = Eigen::MatrixXd(M,M) ;
for (int i=0; i < N ; i++){
tempMatrix += w(i)*points.col(i)*(points.col(i).transpose());
Looking forward to see what can be done!
The following should work:
Eigen::MatrixXd tempMatrix; // not necessary to pre-allocate
// assigning the product allocates tempMatrix if needed
// noalias() tells Eigen that no factor on the right aliases with tempMatrix
tempMatrix.noalias() = points * w.asDiagonal() * points.adjoint();
or directly:
Eigen::MatrixXd tempMatrix = points * w.asDiagonal() * points.adjoint();
If M is really big, it can be significantly faster to just compute one side and copy it (if needed):
Eigen::MatrixXd tempMatrix(M,M);
tempMatrix.triangularView<Eigen::Upper>() = points * w.asDiagonal() * points.adjoint();
tempMatrix.triangularView<Eigen::StrictlyLower>() = tempMatrix.adjoint();
Note that .adjoint() is equivalent to .transpose() for non-complex scalars, but with the former the code works as well if points and the result where MatrixXcd instead (w must still be real, if the result must be self-adjoint).
Also, notice that the following (from your original code) does not set all entries to zero:
Eigen::MatrixXd tempMatrix = Eigen::MatrixXd(M,M);
If you want this, you need to write:
Eigen::MatrixXd tempMatrix = Eigen::MatrixXd::Zero(M,M);

efficient distance calculations in armadillo

I'm new to armadillo. I have the below code, which I assume is inefficient. Any suggestions to make it more memory efficient and/or speedy? Following the armadillo docs and Rcpp gallery, I was unable to get .colptr's, uvec's, or batch insertion to work. But I assume any of them would be improvements.
With an input of X (~100 x 30000), even my stupidly large work VM crashes.
Linux release 7.3.1611 (Core)
(24 x 2.494 GHz) processor(s)
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
sp_mat arma_distmat_LT(const arma::mat& x) { // input expected X_{n x p} n << p
int nr, nc;
Col<double> col0, col1;
nr = x.n_rows;
nc = x.n_cols;
sp_mat out(nc, nc);
for (int i = 0; i < nc; i++) {
col0 = x.col(i);
for (int j = i + 1; j < nc; j++) {
col1 = x.col(j);
out(j, i) = as_scalar(col0.t() * col1);
return out;
Call: sourceCpp("<file>"); dist_x <- arma_distmat_LT(X)
Note: these are distances because I am calculating cosine similarities where I have set L2 norm == 1.
It looks to me as if you're just computing the (upper triangular) matrix product t(X)%*%X. You can actually do that directly in R with the underused crossprod function.
X <- matrix(rnorm(100*30000), ncol=30000)
res <- crossprod(X, X)
This takes a few minutes on my laptop. If you change your code to use the Armadillo library then you can use
sp_mat arma_distmat_LT2(const arma::mat& x) { // input expected X_{n x p} n << p
int nr, nc;
Col<double> col0, col1;
nr = x.n_rows;
nc = x.n_cols;
sp_mat out(nc, nc);
out = trimatl(x.t() * x, k=-1);
return out;
Still takes a few minutes. It uses an awful amount of memory though so I doubt you can have a lot of objects in memory at the same time.
The code could probably be optimized to only compute the lower/upper triangular matrix.
Just to show the speedup for a 100*800 matrix:
> microbenchmark(crossprod(X, X), arma_distmat_LT(X), arma_distmat_LT2(X))
Unit: milliseconds
expr min lq mean median uq
crossprod(X, X) 50.25574 53.72049 57.98812 56.29532 58.71277
arma_distmat_LT(X) 1331.83243 1471.42465 1523.74060 1492.84611 1512.45416
arma_distmat_LT2(X) 29.69420 33.23954 36.24613 35.54700 38.05208
max neval cld
160.81227 100 a
3080.37891 100 b
66.07351 100 a
As you can see there is a substantial speedup to be gained by brute-forcing it. That being said I'm sure that the cross product can be optimised further.

Dot product as multiplication in armadillo

I have a row vector and a column vector and I would like to take their dot product.
rowvec v = {1,2,3,4};
vec w = {5,6,7,8};
double a = dot(v,w) // works
double b = v*w // doesn't work
double c = (v*w)(0) // doesn't work
double d = static_cast<vec>(v*w)(0) //works
Is it possible to get something that looks like b? I would like it for readability.
You may also use
double b = as_scalar(v*w);
but that was not really what you wanted ...
Don't think there are any other alternatives available except using mat format for v,w and b. Then you will get a [1x1] matrix for v*w and a [4x4] matrix for w*v

How to implement a left matrix division on C++ using gsl

I am trying to port a MATLAB program to C++.
And I want to implement a left matrix division between a matrix A and a column vector B.
A is an m-by-n matrix with m is not equal to n and B is a column vector with m components.
And I want the result X = A\B is the solution in the least squares sense to the under- or overdetermined system of equations AX = B. In other words, X minimizes norm(A*X - B), the length of the vector AX - B.
That means I want it has the same result as the A\B in MATLAB.
I want to implement this feature in GSL-GNU (GNU Science Library) and I don't know too much about math, least square fitting or matrix operation, can somebody told me how to do this in GSL? Or if implement them in GSL is too complicate, can someone suggest me a good open source C/C++ library that provides the above matrix operation?
Okay, I finally figure out by my self after spend another 5 hours on it.. But still thanks for the suggestions to my question.
Assuming we have a 5 * 2 matrix
A = [1 0
1 0
0 1
1 1
1 1]
and a vector b = [1.8388,2.5595,0.0462,2.1410,0.6750]
The solution to the A \ b would be
#include <stdio.h>
#include <gsl/gsl_linalg.h>
main (void)
double a_data[] = {1.0, 0.0,1.0, 0.0, 0.0,1.0,1.0,1.0,1.0,1.0};
double b_data[] = {1.8388,2.5595,0.0462,2.1410,0.6750};
gsl_matrix_view m
= gsl_matrix_view_array (a_data, 5, 2);
gsl_vector_view b
= gsl_vector_view_array (b_data, 5);
gsl_vector *x = gsl_vector_alloc (2); // size equal to n
gsl_vector *residual = gsl_vector_alloc (5); // size equal to m
gsl_vector *tau = gsl_vector_alloc (2); //size equal to min(m,n)
gsl_linalg_QR_decomp (&m.matrix, tau); //
gsl_linalg_QR_lssolve(&m.matrix, tau, &b.vector, x, residual);
printf ("x = \n");
gsl_vector_fprintf (stdout, x, "%g");
gsl_vector_free (x);
gsl_vector_free (tau);
gsl_vector_free (residual);
return 0;
In addition to the one you gave, a quick search revealed other GSL examples, one using QR decomposition, the other LU decomposition.
There exist other numeric libraries capable of solving linear systems (a basic functionality in every linear algebra library). For one, Armadillo offers a nice and readable interface:
#include <iostream>
#include <armadillo>
using namespace std;
using namespace arma;
int main()
mat A = randu<mat>(5,2);
vec b = randu<vec>(5);
vec x = solve(A, b);
cout << x << endl;
return 0;
Another good one is the Eigen library:
#include <iostream>
#include <Eigen/Dense>
using namespace std;
using namespace Eigen;
int main()
Matrix3f A;
Vector3f b;
A << 1,2,3, 4,5,6, 7,8,10;
b << 3, 3, 4;
Vector3f x = A.colPivHouseholderQr().solve(b);
cout << "The solution is:\n" << x << endl;
return 0;
Now, one thing to remember is that MLDIVIDE is a super-charged function and has multiple execution paths. If the coefficient matrix A has some special structure, then it is exploited to obtain faster or more accurate result (can choose from substitution algorithm, LU and QR factorization, ..)
MATLAB also has PINV which returns the minimal norm least-squares solution, in addition to a number of other iterative methods for solving systems of linear equations.
I'm not sure I understand your question, but if you've already found your solution using MATLAB, you may want to consider using MATLAB Coder, which automatically translates your MATLAB code into C++.