FFTW: Only interested in real result - c++

I am using FFTW to compute the inverse DFT of 2-dimensional complex data. The output of the default-setup (complex-to-complex) is complex, imaginary parts are not zero. However, I am only interested in the real-part of the result, not in the complex part. The interleaved-real-complex output of FFTW is not ideal for me since I want to postprocess the (real) output via SSE. Is there a way to get an only-real array from FFTW? The Complex-To-Real plans don't seem to work since the output isn't real.

Real data in [time|freq] domain implies conjugate symmetry about zero in the other domain.
By enforcing conjugate symmetry (adding conjugate flipped version of itself), you can efficiently discard the imaginary part in the other domain. This should allow you to use the real ifft in FFTW, getting roughly 2x speedup. Note you only use nfft/2+1 bins for the FFTW real ifft.
Here's a 1D example to illustrate the point:
X = randn(8,1)+j*randn(8,1);
Xsym = .5*(X + conj(X([1 8:-1:2]'))); % force the symmetric condition
err = real(ifft(X)) - ifft(Xsym);
For a 2D IFFT, it may be best to perform the 2d ifft with 2 passes of 1d ifft as described in another answer

Related

Eigen sparse solvers give drastically different solutions for the same linear system

I am trying to solve a sparse linear system as quickly as possible using eigen.
The docs give you 4 sparse solvers toc hoose from (but really it;s more like these three):
SimplicialLLT
#include<Eigen/SparseCholesky> Direct LLt factorization SPD Fill-in reducing LGPL
SimplicialLDLT is often preferable
SimplicialLDLT
#include<Eigen/SparseCholesky> Direct LDLt factorization SPD Fill-in reducing LGPL
Recommended for very sparse and not too large problems (e.g., 2D Poisson eq.)
SparseLU
#include<Eigen/SparseLU> LU factorization Square Fill-in reducing, Leverage fast dense algebra MPL2
optimized for small and large problems with irregular patterns
When I use the last solver, i.e. I do:
Eigen::SparseLU<Eigen::SparseMatrix<Scalar>> solver(bijection);
Assert(solver.info() == Eigen::Success, "Matrix is degenerate.");
solver.compute(bijection);
Assert(solver.info() == Eigen::Success, "Matrix is degenerate.");
Eigen::VectorXf vertices_u = solver.solve(u);
Assert(solver.info() == Eigen::Success, "Matrix is degenerate.");
Eigen::VectorXf vertices_v = solver.solve(v);
Assert(solver.info() == Eigen::Success, "Matrix is degenerate.");
I get the correct result, which graphically looks like this:
If I use simplicialLDLT, i.e. if I change the solver line and nothing else to:
Eigen::SimplicialLDLT<Eigen::SparseMatrix<Scalar>> solver(bijection);
I get this degenerate monstrosity:
Basically the two solvers are returining wildely different results for the exact same sparse system. How is this possible?
None of the error checks return false, so in both versions the matrices are considered to be fine.
https://eigen.tuxfamily.org/dox/group__TopicSparseSystems.html => SimplicialLDLT only for SPD matrices. You might try a least squares approach https://snaildove.github.io/2017/08/01/positive_definite_and_least_square/

How to extract matrixL() and matrixU() when using Eigen::CholmodSupernodalLLT?

I'm trying to use Eigen::CholmodSupernodalLLT for Cholesky decomposition, however, it seems that I could not get matrixL() and matrixU(). How can I extract matrixL() and matrixU() from Eigen::CholmodSupernodalLLT for future use?
A partial answer to integrate what others have said.
Consider Y ~ MultivariateNormal(0, A). One may want to (1) evaluate the (log-)likelihood (a multivariate normal density), (2) sample from such density.
For (1), it is necessary to solve Ax = b where A is symmetric positive-definite, and compute its log-determinant. (2) requires L such that A = L * L.transpose() since Y ~ MultivariateNormal(0, A) can be found as Y = L u where u ~ MultivariateNormal(0, I).
A Cholesky LLT or LDLT decomposition is useful because chol(A) can be used for both purposes. Solving Ax=b is easy given the decomposition, andthe (log)determinant can be easily derived from the (sum)product of the (log-)components of D or the diagonal of L. By definition L can then be used for sampling.
So, in Eigen one can use:
Eigen::SimplicialLDLT solver(A) (or Eigen::SimplicialLLT), when solver.solve(b) and calculate the determinant using solver.vectorD().diag(). Useful because if A is a covariance matrix, then solver can be used for likelihood evaluations, and matrixL() for sampling.
Eigen::CholmodDecomposition does not give access to matrixL() or vectorD() but exposes .logDeterminant() to achieve the (1) goal but not (2).
Eigen::PardisoLDLT does not give access to matrixL() or vectorD() and does not expose a way to get the determinant.
In some applications, step (2) - sampling - can be done at a later stage so Eigen::CholmodDecomposition is enough. At least in my configuration, Eigen::CholmodDecomposition works 2 to 5 times faster than Eigen::SimplicialLDLT (I guess because of the permutations done under the hood to facilitate parallelization)
Example: in Bayesian spatial Gaussian process regression, the spatial random effects can be integrated out and do not need to be sampled. So MCMC can proceed swiftly with Eigen::CholmodDecomposition to achieve convergence for the uknown parameters. The spatial random effects can then be recovered in parallel using Eigen::SimplicialLDLT. Typically this is only a small part of the computations but having matrixL() directly from CholmodDecomposition would simplify them a bit.
You cannot do this using the given class. The class you are referencing is equotation solver (which indeed uses cholesky decomposition). To decompose your matrix you should rather use Eigen::LLT. Code example from their website:
MatrixXd A(3,3);
A << 4,-1,2, -1,6,0, 2,0,5;
LLT<MatrixXd> lltOfA(A);
MatrixXd L = lltOfA.matrixL();
MatrixXd U = lltOfA.matrixU();
As reported somewhere else, e.g., it cannot be done easily.
I am copying a possible recommendation (answered by Gael Guennebaud himself), even if somewhat old:
If you really need access to the factor to do your own cooking, then
better use the built-in SimplicialL{D}LT<> class. Extracting the
factors from the supernodal internal represations of Cholmod/Pardiso
is indeed not straightforward and very rarely needed. We have to
check, but if Cholmod/Pardiso provide routines to manipulate the
factors, like applying it to a vector, then we could let
matrix{L,U}() return a pseudo expression wrapping these routines.
Developing code for extracting this is likely beyond SO, and probably a topic for a feature request.
Of course, the solution with LLT is at hand (but not the topic of the OP).

Eigen equivalent to Octave/MATLAB mldivide for rectangular matrices

I'm using Eigen v3.2.7.
I have a medium-sized rectangular matrix X (170x17) and row vector Y (170x1) and I'm trying to solve them using Eigen. Octave solves this problem fine using X\Y, but Eigen is returning incorrect values for these matrices (but not smaller ones) - however I suspect that it's how I'm using Eigen, rather than Eigen itself.
auto X = Eigen::Matrix<T, Eigen::Dynamic, Eigen::Dynamic>{170, 17};
auto Y = Eigen::Matrix<T, Eigen::Dynamic, 1>{170};
// Assign their values...
const auto theta = X.colPivHouseholderQr().solve(Y).eval(); // Wrong!
According to the Eigen documentation, the ColPivHouseholderQR solver is for general matrices and pretty robust, but to make sure I've also tried the FullPivHouseholderQR. The results were identical.
Is there some special magic that Octave's mldivide does that I need to implement manually for Eigen?
Update
This spreadsheet has the two input matrices, plus Octave's and my result matrices.
Replacing auto doesn't make a difference, nor would I expect it to because construction cannot be a lazy operation, and I have to call .eval() on the solve result because the next thing I do with the result matrix is get at the raw data (using .data()) on tail and head operations. The expression template versions of the result of those block operations do not have a .data() member, so I have to force evaluation beforehand - in other words theta is the concrete type already, not an expression template.
The result for (X*theta-Y).norm()/Y.norm() is:
2.5365e-007
And the result for (X.transpose()*X*theta-X.transpose()*Y).norm() / (X.transpose()*Y).norm() is:
2.80096e-007
As I'm currently using single precision float for my basic numerical type, that's pretty much zero for both.
According to your verifications, the solution you get is perfectly fine. If you want more accuracy, then use double floating point numbers. Note that MatLab/Octave use double precision by default.
Moreover, it might also likely be that your problem is not full rank, in which case your problem admit an infinite number of solution. ColPivHouseholderQR picks one, somehow arbitrarily. On the other hand, mldivide will pick the minimal norm one that you can also obtain with Eigen::BDCSVD (Eigen 3.3), or the slower Eigen::JacobiSVD.

How can I get eigenvalues and eigenvectors fast and accurate?

I need to compute the eigenvalues and eigenvectors of a big matrix (about 1000*1000 or even more). Matlab works very fast but it does not guaranty accuracy. I need this to be pretty accurate (about 1e-06 error is ok) and within a reasonable time (an hour or two is ok).
My matrix is symmetric and pretty sparse. The exact values are: ones on the diagonal, and on the diagonal below the main diagonal, and on the diagonal above it. Example:
How can I do this? C++ is the most convenient to me.
MATLAB does not guarrantee accuracy
I find this claim unreasonable. On what grounds do you say that you can find a (significantly) more accurate implementation than MATLAB's highly refined computational algorithms?
AND... using MATLAB's eig, the following is computed in less than half a second:
%// Generate the input matrix
X = ones(1000);
A = triu(X, -1) + tril(X, 1) - X;
%// Compute eigenvalues
v = eig(A);
It's fast alright!
I need this to be pretty accurate (about 1e-06 error is OK)
Remember that solving eigenvalues accurately is related to finding the roots of the characteristic polynomial. This specific 1000x1000 matrix is very ill-conditioned:
>> cond(A)
ans =
1.6551e+003
A general rule of thumb is that for a condition number of 10k, you may lose up to k digits of accuracy (on top of what would be lost to the numerical method due to loss of precision from arithmetic method).
So in your case, I'd expect the results to be accurate up to an approximate error of 10-3.
If you're not opposed to using a third party library, I've had great success using the Armadillo linear algebra libraries.
For the example below, arma is the namespace they like to use, vec is a vector, mat is a matrix.
arma::vec getEigenValues(arma::mat M) {
return arma::eig_sym(M);
}
You can also serialize the data directly into MATLAB and vice versa.
Your system is tridiagonal and a (symmetric) Toeplitz matrix. I'd guess that eigen and Matlab's eig have special cases to handle such matrices. There is a closed-form solution for the eigenvalues in this case (reference (PDF)). In Matlab for your matrix this is simply:
n = size(A,1);
k = (1:n).';
v = 1-2*cos(pi*k./(n+1));
This can be further optimized by noting that the eigenvalues are centered about 1 and thus only half of them need to be computed:
n = size(A,1);
if mod(n,2) == 0
k = (1:n/2).';
u = 2*cos(pi*k./(n+1));
v = 1+[u;-u];
else
k = (1:(n-1)/2).';
u = 2*cos(pi*k./(n+1));
v = 1+[u;0;-u];
end
I'm not sure how you're going to get more fast and accurate than that (other than performing a refinement step using the eigenvectors and optimization) with simple code. The above should be able to translated to C++ very easily (or use Matlab's codgen to generate C/C++ code that uses this or eig). However, your matrix is still ill-conditioned. Just remember that estimates of accuracy are worst case.

CUBLAS - matrix addition.. how?

I am trying to use CUBLAS to sum two big matrices of unknown size. I need a fully optimized code (if possible) so I chose not to rewrite the matrix addition code (simple) but using CUBLAS, in particular the cublasSgemm function which allows to sum A and C (if B is a unit matrix): *C = alpha*op(A)*op(B)+beta*c*
The problem is: C and C++ store the matrices in row-major format, cublasSgemm is intended (for fortran compatibility) to work in column-major format. You can specify whether A and B are to be transposed first, but you can NOT indicate to transpose C. So I'm unable to complete my matrix addition..
I can't transpose the C matrix by myself because the matrix is something like 20000x20000 maximum size.
Any idea on how to solve please?
cublasgeam has been added to CUBLAS5.0.
It computes the weighted sum of 2 optionally transposed matrices
If you're just adding the matrices, it doesn't actually matter. You give it alpha, Aij, beta, and Cij. It thinks you're giving it alpha, Aji, beta, and Cji, and gives you what it thinks is Cji = beta Cji + alpha Aji. But that's the correct Cij as far as you're concerned. My worry is when you start going to things which do matter -- like matrix products. There, there's likely no working around it.
But more to the point, you don't want to be using GEMM to do matrix addition -- you're doing a completely pointless matrix multiplication (which takes takes ~20,0003 operations and many passes through memory) for an operatinon which should only require ~20,0002 operations and a single pass! Treat the matricies as 20,000^2-long vectors and use saxpy.
Matrix multiplication is memory-bandwidth intensive, so there is a huge (factors of 10x or 100x) difference in performance between coding it yourself and a tuned version. Ideally, you'd change structures in your code to match the library. If you can't, in this case you can manage just by using linear algebra identities. The C-vs-Fortran ordering means that when you pass in A, CUBLAS "sees" AT (A transpose). Which is fine, we can work around it. If what you want is C=A.B, pass in the matricies in the opposite order, B.A . Then the library sees (BT . AT), and calculates CT = (A.B)T; and then when it passes back CT, you get (in your ordering) C. Test it and see.