Given a sparse matrix A and a vector b, I would like to obtain a solution x to the equation A * x = b as well as the kernel of A.
One possibility is to convert A to a dense representation.
#include <iostream>
#include <Eigen/Dense>
#include <Eigen/SparseQR>
int main()
{
// This is a toy problem. My actual matrix
// is of course bigger and sparser.
Eigen::SparseMatrix<double> A(2,2);
A.insert(0,0) = 1;
A.insert(0,1) = 2;
A.insert(1,0) = 4;
A.insert(1,1) = 8;
A.makeCompressed();
Eigen::Vector2d b;
b << 3, 12;
Eigen::SparseQR<Eigen::SparseMatrix<double>,
Eigen::COLAMDOrdering<int> > solver;
solver.compute(A);
std::cout << "Solution:\n" << solver.solve(b) << std::endl;
Eigen::Matrix2d A_dense(A);
std::cout << "Kernel:\n" << A_dense.fullPivLu().kernel() << std::endl;
return 0;
}
Is it possible to do the same directly in the sparse representation? I could not find a function kernel() anywhere except in FullPivLu.
I think #chtz's answer is almost correct, except we need to take the last A.cols() - qr.rank() columns. Here is a mathematical derivation.
Say we do a QR decomposition of your matrix Aᵀ as
Aᵀ * P = [Q₁ Q₂] * [R; 0] = Q₁ * R
where P is the permutation matrix, thus
Aᵀ = Q₁ * R * P⁻¹.
We can see that Range(Aᵀ) = Range(Q₁ * R * P⁻¹) = Range(Q₁) (because both P and R are invertible).
Since Aᵀ and Q₁ have the same range space, this implies that A and Q₁ᵀ will also have the same null space, namely Null(A) = Null(Q₁ᵀ). (Here we use the property that Range(M) and Null(Mᵀ) are complements to each other for any matrix M, hence Null(A) = complement(Range(Aᵀ)) = complement(Range(Q₁)) = Null(Q₁ᵀ)).
On the other hand, since the matrix [Q₁ Q₂] is orthonormal, Null(Q₁ᵀ) = Range(Q₂), thus Null(A) = Range(Q₂), i.e., kernal(A) = Q₂.
Since Q₂ is the right A.cols() - qr.rank() columns, you could call rightCols(A.cols() - qr.rank()) to retrieve the kernal of A.
For more information on kernal space, you could refer to https://en.wikipedia.org/wiki/Kernel_(linear_algebra)
Related
I am looking for a builtin way with the eigen library to perform coordinate transformations by normal vectors in 2D space.
Mathematically, it's not difficult: Let v = (v_x, v_y) be a 2D column vector and n = (n_x, n_y) be a normal vector, then the transformation I am looking for is one by rotational matrix:
v_T = N * v, with v_T being the transformed vector and N being the rotational matrix, which is
| nx, ny |
| -ny, nx |
In my case, the data I need to transform is stored in an Array2Xd and the normal vectors are stored in a Matrix2Xd, with each column holding x- and y-component. I need to transform each column in the array by the corresponding normal vector in the matrix.
Currently, I'm doing it like this:
#include <Eigen/Dense>
#include <iostream>
using namespace Eigen;
/* transform a single vector, just for illustration */
Array2d transform_s( const Ref<const Array2d>& v, const Ref<const Vector2d>& n ){
return {
n.dot( v.matrix() ),
-n.y() * v.x() + n.x() * v.y()
};
}
/* transform multiple columns */
Array2Xd transform_m( const Ref<const Array2Xd>& v, const Ref<const Array2Xd>& n ){
Array2Xd transformed ( 2, v.cols() );
/* colwise dot product for first row */
transformed.row(0) = (n * v).colwise().sum();
/* even less elegant calculation for the second row */
transformed.row(1) = n.row(0) * v.row(1) - n.row(1) * v.row(0);
return transformed;
}
int main(){
Array2Xd vals (2, 3);
vals <<
2, 0,-1,
0, 3, 2;
Matrix2Xd n;
n.resizeLike(vals);
n <<
0, 0, 1,
1,-1, 1;
n.colwise().normalize();
std::cout
<< "single column:\n" << transform_s( vals.col(0), n.col(0) )
<< "\nall columns:\n" << transform_m( vals, n.array() )
<< "\n";
return 0;
}
I'm aware of Eigen::Rotation2D, but it appears to either require an angle or a rotational matrix. I am specifically looking for a way to only provide the normal vectors. Otherwise I need to build the rotational matrices from the normal vectors myself, which doesn't really reduce the complexity on my end.
If there's no way to do this with eigen, I'll accept that as an answer. In that case, I'd be very happy about a more efficient implementation of what I wrote above.
What you are doing is essentially a complex multiplication with conj(n).
There is no elegant way to reinterpret a Vector2d/Array2Xd to a complex<double>/ArrayXcd, but you can hack something together using Maps:
Array2Xd transform_complex( const Ref<const Array2Xd>& v, const Ref<const Array2Xd>& n ){
Array2Xd transformed(2, v.cols());
ArrayXcd::Map(reinterpret_cast<std::complex<double>*>(transformed.data()), v.cols())
= ArrayXcd::Map(reinterpret_cast<std::complex<double> const*>(v.data()), v.cols())
* ArrayXcd::Map(reinterpret_cast<std::complex<double> const*>(n.data()), n.cols()).conjugate();
return transformed;
}
You could write yourself a helper function which takes a const Ref<const Array2Xd>& and returns a Map<ArrayXcd> with the same content.
I am trying to perform an inplace real to complex FFT with cufft.
I am aware of the similar question How to perform a Real to Complex Transformation with cuFFT. However I have issues trying to reproduce the same method.
If I do an out of place transformation, there is no problem, but as soon as I do it in place, I do not have the correct values in the FFT (Checked with python, using binary files in between). I do not have errors, but just non correct values.
Here is my code:
void fftCuda2d(mat3d* scene)
{
cufftResult resultStatus;
cudaError_t cuda_status;
cufftHandle plan_forward;
resultStatus = cufftPlan2d(&plan_forward, scene->_height, scene->_width, CUFFT_R2C);
cout << "Creating plan forward: " << _cudaGetErrorEnum(resultStatus) << endl;
cufftComplex *d_fft, *d_scene, *h_fft;
size_t size_fft = (int(scene->_width/2)+1)*scene->_height;
cudaMalloc((void**)&d_scene, sizeof(cufftComplex)*size_fft);
cudaMalloc((void**)&d_fft, sizeof(cufftComplex)*size_fft);
h_fft = (cufftComplex*) malloc(sizeof(cufftComplex)*size_fft);
cuda_status = cudaMemcpy(d_scene, scene->_pData, sizeof(cufftReal) * scene->_height * scene->_width, cudaMemcpyHostToDevice);
resultStatus = cufftExecR2C(plan_forward, (cufftReal*) d_scene, d_scene);
cuda_status = cudaMemcpy(h_fft, d_scene, sizeof(cufftReal)*scene->_height*scene->_width, cudaMemcpyDeviceToHost);
FILE* *pFileTemp;
pFileTemp = fopen("temp.bin", "wb");
check = fwrite(h_fft, sizeof(cufftComplex), sizeFft, pFileTemp);
}
If I use resultStatus = cufftExecR2C(plan_forward, (cufftReal*) d_scene, d_fft); and save the output of d_fft I have the correct result.
So you see any mistake of mine here?
P.S Mat3d is a struct where _width and _height contain the size of the matrix and pData is the pointer to the data but there is no issue with that.
(It seems like this should be a duplicate question but I was not able to locate the duplicate.)
Your input data needs to be organized differently (padded) when using an in-place transform. This is particularly noticeable in the 2D case, because each row of data must be padded.
In the non-inplace R2C transform, the input data is real-valued and of size height*width (for an example R=4, C=4 case):
X X X X
X X X X
X X X X
X X X X
The above data would occupy exactly 16*sizeof(cufftReal) (assuming float input data, dimension R = 4, C = 4), and it would be organized that way in memory, linearly, with no gaps. However, when we switch to an in-place transform, the size of the input buffer changes. And this change in size has ramifications for data arrangement. Specifically, the sizeof the input buffer is R*(C/2 + 1)*sizeof(cufftComplex). For the R=4, C=4 example case, that is 12*sizeof(cufftComplex) or 24*sizeof(cufftReal), but it is still organized as 4 rows of data. Each row, therefore, is of length 6 (if measured in cufftReal) or 3 (if measured in cufftComplex). Considering it as cufftReal, then when we create our input data, we must organize it like this:
X X X X P P
X X X X P P
X X X X P P
X X X X P P
where the P locations are "padding" data, not your input data. If we view this linearly in memory, it looks like:
X X X X P P X X X X P P X X X X P P X X X X P P
That is the expectation/requirement of CUFFT (and I believe it is the same for FFTW). However since you made no changes to the way you deposited your data, you provided data that looks like this:
X X X X X X X X X X X X X X X X P P P P P P P P
and the difference in those 2 patterns is what accounts for the difference in the result output. There are a variety of ways to fix this. I'll choose to demonstrate using cudaMemcpy2D to populate the device input buffer in the in-place case, which will give us the desired pattern. This may not be the best/fastest way, depending on your application needs.
You were also not copying the correct size of the result data from device back to host.
Here is a fixed example:
$ cat t1589.cu
#include <cufft.h>
#include <iostream>
#include <cstdlib>
struct mat3d{
int _width;
int _height;
cufftReal *_pData;
};
void fftCuda2d(mat3d* scene)
{
cufftResult resultStatus;
cudaError_t cuda_status;
cufftHandle plan_forward;
resultStatus = cufftPlan2d(&plan_forward, scene->_height, scene->_width, CUFFT_R2C);
std::cout << "Creating plan forward: " << (int)resultStatus << std::endl;
cufftComplex *d_fft, *d_scene, *h_fft;
size_t size_fft = (int(scene->_width/2)+1)*scene->_height;
cudaMalloc((void**)&d_scene, sizeof(cufftComplex)*size_fft);
cudaMalloc((void**)&d_fft, sizeof(cufftComplex)*size_fft);
h_fft = (cufftComplex*) malloc(sizeof(cufftComplex)*size_fft);
#ifdef USE_IP
cuda_status = cudaMemcpy2D(d_scene, ((scene->_width/2)+1)*sizeof(cufftComplex), scene->_pData, (scene->_width)*sizeof(cufftReal), sizeof(cufftReal) * scene->_width, scene->_height, cudaMemcpyHostToDevice);
resultStatus = cufftExecR2C(plan_forward, (cufftReal*) d_scene, d_scene);
cuda_status = cudaMemcpy(h_fft, d_scene, sizeof(cufftComplex)*size_fft, cudaMemcpyDeviceToHost);
#else
cuda_status = cudaMemcpy(d_scene, scene->_pData, sizeof(cufftReal) * scene->_height * scene->_width, cudaMemcpyHostToDevice);
resultStatus = cufftExecR2C(plan_forward, (cufftReal*) d_scene, d_fft);
cuda_status = cudaMemcpy(h_fft, d_fft, sizeof(cufftComplex)*size_fft, cudaMemcpyDeviceToHost);
#endif
std::cout << "exec: " << (int)resultStatus << std::endl;
for (int i = 0; i < size_fft; i++)
std::cout << h_fft[i].x << " " << h_fft[i].y << ",";
std::cout << std::endl;
}
const int dim = 4;
int main(){
mat3d myScene;
myScene._pData = new cufftReal[dim*dim];
myScene._width = dim;
myScene._height = dim;
for (int i = 0; i < dim*dim; i++) myScene._pData[i] = rand()/(float)RAND_MAX;
fftCuda2d(&myScene);
std::cout << cudaGetErrorString(cudaGetLastError()) << std::endl;
}
$ nvcc -lineinfo -o t1589 t1589.cu -lcufft
t1589.cu(15): warning: variable "cuda_status" was set but never used
$ ./t1589
Creating plan forward: 0
exec: 0
9.71338 0,-0.153554 1.45243,0.171302 0,0.878097 0.533959,0.424595 -0.834714,0.858133 -0.393671,-0.205139 0,-0.131513 -0.494514,-0.165712 0,0.878097 -0.533959,0.0888268 1.49303,0.858133 0.393671,
no error
$ nvcc -lineinfo -o t1589 t1589.cu -lcufft -DUSE_IP
t1589.cu(15): warning: variable "cuda_status" was set but never used
$ ./t1589
Creating plan forward: 0
exec: 0
9.71338 0,-0.153554 1.45243,0.171302 0,0.878097 0.533959,0.424595 -0.834714,0.858133 -0.393671,-0.205139 0,-0.131513 -0.494514,-0.165712 0,0.878097 -0.533959,0.0888268 1.49303,0.858133 0.393671,
no error
$
// computing the matrix operation here
// resultEigen = Input matrix
// result1Eigen = hidden bias
// result2Eigen = visible bias
// result3Eigen = weight matrix
MatrixXd H;
MatrixXd V;
double well[36];
Map<MatrixXd>( well, H.rows(), H.cols() ) = H;
H = resultEigen * result3Eigen + result1Eigen;
mexPrintf("H is here\n");
for (int i=0; i<36; i++)
{
mexPrintf("%d\n",H);
}
mexPrintf("\n");
I need to build a reconstructing function for my RBM and since direct matrix multiplication could get me a better result, I have been referring to eigen library to solve my issues but I am facing some difficulties.
when running the above code I end up getting a single value for the H matrix and I wonder why!
Moreover the parameters used in for the computation of H have been initiated as follows:
double *data1 = hbias;
Map<VectorXd>hidden_bias(data1,6,1);
VectorXd result1Eigen;
double result1[6];
result1Eigen = hidden_bias.transpose();
Map<VectorXd>(result1, result1Eigen.cols()) = result1Eigen;
// next param
double *data2 = vbias;
Map<VectorXd>visible_bias(data2,6,1);
VectorXd result2Eigen;
double result2[6];
result2Eigen = visible_bias.transpose();
Map<VectorXd>(result2, result2Eigen.cols()) = result2Eigen;
// next param
double *data3 = w;
Map<MatrixXd>weight_matrix(data3,n_visible,n_hidden);
MatrixXd result3Eigen;
// double result3[36];
mxArray * result3Matrix = mxCreateDoubleMatrix(n_visible, n_hidden, mxREAL );
double *result3=(double*)mxGetData(result3Matrix);
result3Eigen = weight_matrix.transpose();
Map<MatrixXd>(result3, result3Eigen.rows(), result3Eigen.cols()) = result3Eigen
At last I also face issues printing out data using std::cout from inside the mexFunction.
Thanks for any hints.
The problem is in the printing code which should be:
mexPrintf("%d\n",H(i));
Then, there is no need to duplicate vectors and matrices. For instance, result1 is useless, as you can get a raw pointer to the data stored in result1Eigen using result1Eigen.data(). Likewise, you can directly assign weight_matrix.transpose() to Map<MatrixXd>(result3,...), and I don't see the purpose of well.
Finally, if sizes are really known at compile-time, then better using Matrix<double,6,1> instead of a VectorXd and Matrix<double,6,6> instead of a MatrixXd. Yo ucan expect significant speedup.
I am trying to do a simple matrix inversion operation using boost. But I
am getting an error.
Basically what I am trying to find is inversted_matrix =
inverse(trans(matrix) * matrix)
But I am getting an error
Check failed in file boost_1_53_0/boost/numeric/ublas/lu.hpp at line 299:
detail::expression_type_check (prod (triangular_adaptor<const_matrix_type,
upper> (m), e), cm2)
terminate called after throwing an instance of
'boost::numeric::ublas::internal_logic'
what(): internal logic
Aborted (core dumped)
My attempt:
#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/vector.hpp>
#include <boost/numeric/ublas/io.hpp>
#include <boost/numeric/ublas/vector_proxy.hpp>
#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/triangular.hpp>
#include <boost/numeric/ublas/lu.hpp>
namespace ublas = boost::numeric::ublas;
template<class T>
bool InvertMatrix (const ublas::matrix<T>& input, ublas::matrix<T>& inverse) {
using namespace boost::numeric::ublas;
typedef permutation_matrix<std::size_t> pmatrix;
// create a working copy of the input
matrix<T> A(input);
// create a permutation matrix for the LU-factorization
pmatrix pm(A.size1());
// perform LU-factorization
int res = lu_factorize(A,pm);
if( res != 0 )
return false;
// create identity matrix of "inverse"
inverse.assign(ublas::identity_matrix<T>(A.size1()));
// backsubstitute to get the inverse
lu_substitute(A, pm, inverse);
return true;
}
int main(){
using namespace boost::numeric::ublas;
matrix<double> m(4,5);
vector<double> v(4);
vector<double> thetas;
m(0,0) = 1; m(0,1) = 2104; m(0,2) = 5; m(0,3) = 1;m(0,4) = 45;
m(1,0) = 1; m(1,1) = 1416; m(1,2) = 3; m(1,3) = 2;m(1,4) = 40;
m(2,0) = 1; m(2,1) = 1534; m(2,2) = 3; m(2,3) = 2;m(2,4) = 30;
m(3,0) = 1; m(3,1) = 852; m(3,2) = 2; m(3,3) = 1;m(3,4) = 36;
std::cout<<m<<std::endl;
matrix<double> product = prod(trans(m), m);
std::cout<<product<<std::endl;
matrix<double> inversion(5,5);
bool inverted;
inverted = InvertMatrix(product, inversion);
std::cout << inversion << std::endl;
}
Boost Ublas has runtime checks to ensure among other thing numerical stability.
If you look at source of the error, you can see that it tries to make sure that
U*X = B, X = U^-1*B, U*X = B (or smth like that) are coorect to within some epsilon. If you have too much deviation numerically this will likely not hold.
You can disable checks via -DBOOST_UBLAS_NDEBUG or twiddle with BOOST_UBLAS_TYPE_CHECK_EPSILON, BOOST_UBLAS_TYPE_CHECK_MIN.
As m has only 4 rows, prod(trans(m), m) cannot have a rank higher than 4, and as the product is a 5x5 matrix, it must be singular (i.e. it has determinant 0) and calculating the inverse of a singular matrix is like division by 0. Add independent rows to m to solve this singularity problem.
I think your matrix dimension, 4 by 5, caused the error. Like what Maarten Hilferink mentioned, you may try with a square matrix like 5 by 5. Here are requirement to have an inverse:
The matrix must be square (same number of rows and columns).
The determinant of the matrix must not be zero (determinants are covered in section 6.4). This is instead of the real number not being zero to have an inverse, the determinant must not be zero to have an inverse.
I have a ~3000x3000 covariance-alike matrix on which I compute the eigenvalue-eigenvector decomposition (it's a OpenCV matrix, and I use cv::eigen() to get the job done).
However, I actually only need the, say, first 30 eigenvalues/vectors, I don't care about the rest. Theoretically, this should allow to speed up the computation significantly, right? I mean, that means it has 2970 eigenvectors less that need to be computed.
Which C++ library will allow me to do that? Please note that OpenCV's eigen() method does have the parameters for that, but the documentation says they are ignored, and I tested it myself, they are indeed ignored :D
UPDATE:
I managed to do it with ARPACK. I managed to compile it for windows, and even to use it. The results look promising, an illustration can be seen in this toy example:
#include "ardsmat.h"
#include "ardssym.h"
int n = 3; // Dimension of the problem.
double* EigVal = NULL; // Eigenvalues.
double* EigVec = NULL; // Eigenvectors stored sequentially.
int lowerHalfElementCount = (n*n+n) / 2;
//whole matrix:
/*
2 3 8
3 9 -7
8 -7 19
*/
double* lower = new double[lowerHalfElementCount]; //lower half of the matrix
//to be filled with COLUMN major (i.e. one column after the other, always starting from the diagonal element)
lower[0] = 2; lower[1] = 3; lower[2] = 8; lower[3] = 9; lower[4] = -7; lower[5] = 19;
//params: dimensions (i.e. width/height), array with values of the lower or upper half (sequentially, row major), 'L' or 'U' for upper or lower
ARdsSymMatrix<double> mat(n, lower, 'L');
// Defining the eigenvalue problem.
int noOfEigVecValues = 2;
//int maxIterations = 50000000;
//ARluSymStdEig<double> dprob(noOfEigVecValues, mat, "LM", 0, 0.5, maxIterations);
ARluSymStdEig<double> dprob(noOfEigVecValues, mat);
// Finding eigenvalues and eigenvectors.
int converged = dprob.EigenValVectors(EigVec, EigVal);
for (int eigValIdx = 0; eigValIdx < noOfEigVecValues; eigValIdx++) {
std::cout << "Eigenvalue: " << EigVal[eigValIdx] << "\nEigenvector: ";
for (int i = 0; i < n; i++) {
int idx = n*eigValIdx+i;
std::cout << EigVec[idx] << " ";
}
std::cout << std::endl;
}
The results are:
9.4298, 24.24059
for the eigenvalues, and
-0.523207, -0.83446237, -0.17299346
0.273269, -0.356554, 0.893416
for the 2 eigenvectors respectively (one eigenvector per row)
The code fails to find 3 eigenvectors (it can only find 1-2 in this case, an assert() makes sure of that, but well, that's not a problem).
In this article, Simon Funk shows a simple, effective way to estimate a singular value decomposition (SVD) of a very large matrix. In his case, the matrix is sparse, with dimensions: 17,000 x 500,000.
Now, looking here, describes how eigenvalue decomposition closely related to SVD. Thus, you might benefit from considering a modified version of Simon Funk's approach, especially if your matrix is sparse. Furthermore, your matrix is not only square but also symmetric (if that is what you mean by covariance-like), which likely leads to additional simplification.
... Just an idea :)
It seems that Spectra will do the job with good performances.
Here is an example from their documentation to compute the 3 first eigen values of a dense symmetric matrix M (likewise your covariance matrix):
#include <Eigen/Core>
#include <Spectra/SymEigsSolver.h>
// <Spectra/MatOp/DenseSymMatProd.h> is implicitly included
#include <iostream>
using namespace Spectra;
int main()
{
// We are going to calculate the eigenvalues of M
Eigen::MatrixXd A = Eigen::MatrixXd::Random(10, 10);
Eigen::MatrixXd M = A + A.transpose();
// Construct matrix operation object using the wrapper class DenseSymMatProd
DenseSymMatProd<double> op(M);
// Construct eigen solver object, requesting the largest three eigenvalues
SymEigsSolver< double, LARGEST_ALGE, DenseSymMatProd<double> > eigs(&op, 3, 6);
// Initialize and compute
eigs.init();
int nconv = eigs.compute();
// Retrieve results
Eigen::VectorXd evalues;
if(eigs.info() == SUCCESSFUL)
evalues = eigs.eigenvalues();
std::cout << "Eigenvalues found:\n" << evalues << std::endl;
return 0;
}