Symmetry of autocovariance matrix by multiplying feature matrix with its transpose - pca

There is a mathematical theorem stating that a matrix A multiplied with its transpose yields a symmetric, positive definite matrix (thus leading to positive eigenvalues).
Why does the symmetry test fails here for medium-size-random matrices?
It always works for small matrices (20,20 etc.)
import numpy as np
features = np.random.random((50,70))
autocovar = np.dot(np.transpose(features),features)
print((np.transpose(autocovar) == autocovar).all())
I always get 'FALSE' running this code. What do I do wrong?
I need the autocovariance matrix to perform a PCA but so far I get complex eigenvalues...
Thanks!

This could be due to errors in floating point arithmetic. Your matrix may be very close to a symmetric matrix numerically, but due to errors in finite precision arithmetic it is technically nonsymmetric. As a result a numerical solver may return complex eigenvalues.
One solution (sort of a hack) is to symmetrize the matrix, i.e., replace it with its symmetric part. This matrix is guaranteed to be symmetric, even in floating point arithmetic, and it will be very close to the matrix you define (near machine precision). This can be achieved via
autocovar_sym = .5*(autocovar+autocovar.T)
Hope this helps.

Related

How to find an inverse of a nearly singular matrix?

I am realizing an algorithm using C++ and CUDA. But I got into trouble when I tried to find an inverse of a special matrix.
This matrix has following features:
it is a square matrix (suppose: (m+3)x(m+3),m>0);
its transpose matrix is its self;
its main diagonal must be zeros;
it must have a 3x3 zero matrix on the bottom right corner;
you can consider this matrix in this form:H = [A ,B ;B' ,0];
I have tried some methods but all failed:
pseudo-inverse matrix:
I used matlab at first and got error or warning when I tried to use inv(H'*H): Warning: Matrix is singular to working precision or matrix is close to singular or badly scaled
some approximation methods:
the reference material is here:approximation I found two methods:Gauss-Jordan elimination and Cholesky decomposition.when I tried chol in matlab, i get following error:Matrix must be positive definite
can anybody give me some suggestions?
It would be good to know some more information on your specific problem and, in particular, if you need the inverse per se or if you need to just invert a linear system of equations. I will try to give you directions for both the cases.
Let me start from the consideration that that your matrix is nearly singular and so your system is ill-conditioned.
DETERMINING THE INVERSE OF A NEARLY SINGULAR MATRIX
As it has been clarified in the comments and answers above, seeking the inverse of a nearly singular matrix is meaningless. What makes sense is to construct a regularized inverse of your matrix. You can do that by resorting to the spectral decomposition (Singular Value Decomposition, or SVD) of your matrix. More in detail, you can construct the singular system, remove the least significant singular values which are the source for the nearly singular behavior of the matrix, and then use the singular values and vectors to form an approximate inverse. Of course, in this case A*A_inv will only give an approximation of the identity matrix.
How can this be done on GPU? First, let me say that implementing an SVD algorithm in C++ or CUDA is by no means an easy task. There are several techniques among which you should choose depending on the accuracy you need, for example, to determine the singular values. Anyway, Matlab has a set of linear algebra functions working on GPU. Also, CULA and Magma are two libraries offering SVD calculation routines. Also, you can consider using Arrayfire which also offers linear algebra routines, including the SVD.
INVERTING A NEARLY SINGULAR SYSTEM
In this case, you should consider using some sort of Tikhonov regularization, which consists to formulating the inversion of the linear system as an optimization problem and adding a regularization term, which may depend on the features you already know about your uknowns.
For both the cases above, I recommend reading some theory. The book
M. Bertero, P. Boccacci, Introduction to Inverse Problems in Imaging
would be useful either if you have to find an approximate inverse or if you have the explicitly invert the linear system.
The pseudo-inverse matrix is inv(H'*H)*H', since the condition number of H is very high (try cond(H)), you may need a regularization factor to obtain the pseudo-inverse matrix: inv(H'*H + lambda*eye(size(H)))*H'. The smaller the lambda, the lower bias such estimation will achieve. But too small value of lambda will lead to high variance (ill-conditioned). You may try a best-suit value.
You can of course use pinv(H) directly. The reason why pinv(H)*H ~= eye(size(H)) is because pinv(H) is just an approximation of the inverse of a matrix H with the rank lower than size(H,1). In other words, the columns in H is not completely independent.
I would like to show you a very simple example:
>>a =
0 1 0
0 0 1
0 1 0
pinv(a) * a
>>
ans =
0 0 0
0 1.0000 0
0 0 1.0000
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>a =
1 0
0 1
1 0
pinv(a) * a
>>
ans =
1.0000 0
0 1.0000
Note a * pinv(a) is not identity matrix, because columns of a are linearly independent, not the row of a. Check this page for more details.

Fourier transform floating point issues

I am implementing a conventional (that means not fast), separated Fourier transform for images. I know that in floating point a sum over one period of sin or cos in equally spaced samples is not perfectly zero, and that this is more a problem with the conventional transform than with the fast.
The algorithm works with 2D double arrays and is correct. The inverse is done inside (over a double sign flag and conditional check when using the asymmetric formula), not outside with conjugations. Results are nearly 100% like expected, so its a question about details:
When I perform a forward transform, save logarithmed magnitude and angle to images, reload them, and do an inverse transform, I experience different types of rounding errors with different types of implemented formulas:
F(u,v) = Sum(x=0->M-1) Sum(y=0->N-1) f(x,y) * e^(-i*2*pi*u*x/M) * e^(-i*2*pi*v*y/N)
f(x,y) = 1/M*N * (like above)
F(u,v) = 1/sqrt(M*N) * (like above)
f(x,y) = 1/sqrt(M*N) * (like above)
So the first one is the asymmetric transform pair, the second one the symmetric. With the asymmetric pair, the rounding errors are more in the bright spots of the image (some pixel are rounded slightly outside value range (e.g. 256)). With the symmetric pair, the errors are more in the constant mid-range area of the image (no exceeding of value range!). In total, it seems that the symmetric pair produces a bit more rounding errors.
Then, it also depends of the input: when image stored in [0,255] the rounding errors are other than when in [0,1].
So my question: how should an optimal, most accurate algorithm be implemented (theoretically, no code): asymmetric/symmetric pair? value range of input in [0,255] or [0,1]? How linearly upscaling result before saving logarithmed one to file?
Edit:
my algorithm simply computes the separated asymmetric or symmetric DFT formula. Factors are decomposed into real and imaginary part using Eulers identity, then expanded and sumed up separately as real and imaginary part:
sum_re += f_re * cos(-mode*pi*((2.0*v*y)/N)) - // mode = 1 for forward, -1
f_im * sin(-mode*pi*((2.0*v*y)/N)); // for inverse transform
// sum_im permutated in the known way and + instead of -
This value grouping indside cos and sin should give in my eyes the lowest rounding error (compared to e.g. cos(-mode*2*pi*v*y/N)), because not multiplicating/dividing significantly false rounded transcedental pi several times, but only one time. Isn't it?
The scale factor 1/M*N or 1/sqrt(M*N) is applied separately after each separation outside of the innermost sum. Better inside? Or combined completely at the end of both separations?
For some deeper analysis, I have quitted the input->transform->save-to-file->read-from-file->transform^-1->output workflow and chosen to compare directly in double-precision: input->transform->transform^-1->output.
Here the results for an real life 704x528 8-bit image (delta = max absolute difference between real part of input and output):
with input inside [0,1] and asymmetric formula: delta = 2.6609e-13 (corresponds to 6.785295e-11 for [0,255] range).
with input insde [0,1] and symmetric formula: delta = 2.65232e-13 (corresponds to 6.763416e-11 for [0,255] range).
with input inside [0,255] and asymmetric formula: delta = 6.74731e-11.
with input inside [0,255] and symmetric formula: delta = 6.7871e-11.
These are no real significant differences, however, the full ranged input with the asymmetric transform performs best. I think the values may get worse with 16-bit input.
But in general I see, that my experienced issues are more because of scaling-before-saving-to-file (or inverse) rounding errors, than real transformation rounding errors.
However, I am curious: what is the most used implementation of the Fourier transform: the symmetric or asymmetric? Which value range is in general used for the input: [0,1] or [0,255]? And usual shown spectra in log scale: e.g. [0,M*N] after asymmetric transform of [0,1] input is directly log-scaled to [0,255] or before linearly scaled to [0,255*M*N]?
The errors you report are tiny, normal, and generally can be ignored. Simply scale your results and clamp any results outside the target interval to the endpoints.
In library implementations of FFTs (that is, FFT routines written to be used generally by diverse applications, not custom designed for a single application), little regard is given to scaling; the routine often simply returns data that has been naturally scaled by the arithmetic, with no additional multiplication operations used to adjust the scale. This is because the scale is often either irrelevant for the application (e.g., finding the frequencies with the largest energies works no matter what the scale is) or that the scale may be distributed through multiply operations and performed just once (e.g., instead of scaling in a forward transform and in an inverse transform, the application can get the same effect by explicitly scaling just once). So, since scaling is often not needed, there is no point in including it in a library routine.
The target interval that data are scaled to depends on the application.
Regarding the question on what transform to use (logarithmic or linear) for showing spectra, I cannot advise; I do not work with visualizing spectra.
Scaling causes roundoff errors. Hence, solution 1 (which scales once) is better than solution 2 (which does it twice). Similarly, scaling once after summation is better than scaling everything before summation.
Do you run y from 0 to 2*N or from -N to +N ? Mathematically it's the same, but you have an extra bit of precision in the latter case.
BTW, what's mode doing in cos(-mode * stuff) ?

Rounding error in dgesv?

I'm using the dgesv and dgemm fortran subroutines in C++ to do some simple matrix multiplication and left division.
For random matrices A and B, I do:
A\(A\(A*B));
where * is defined using dgemm and \ using dgesv. Obviously, this expression should simplify to the identity matrix. I'm testing my answers against MATLAB and I'm getting more or less 1's on the diagonal but the other entries are very slightly off (the numbers are on the order of magnitude e-15, so they're close to 0 already).
I'm just wondering if this result is to be expected or not? Because if I do something like this:
C = A+B;
D = A*B;
D\(C\(C*C));
the result should come out to D\C. Basically, C(C*C) is very accurate (matches MATLAB perfectly), but the second I do D\C I get something that's off by e-1 or even e+00. I'm guessing that's not supposed to happen?
Your problem seems to be related to finite accuracy of floating point variables in C/C++. You can read more about it here. There are some techniques of minimizing that effect (some of them described in the wiki article) but there will always be some loss of accuracy after a few operations. You might want to use some third-party mathematical library that supports numbers of arbitrary precision (e.g. GMP). But still - as long as you stick to numerical approach accuracy of your calculations will always be tainted.

Kiss FFT seems to multiply data by the number of points that it transforms

My limited understanding of the Fourier transform is that you should be able to toggle between the time and frequency domain without changing the original data. So, here is a summary of what I (think I) am doing:
Using kiss_fft_next_fast_size(994) to determine that I should use 1000.
Using kiss_fft_alloc(...) to create a kiss_fft_cfg with nfft = 1000.
Extending my input data from size 994 to 1000 by padding extra points as zero.
Passing kiss_fft_cfg to kiss_fft(...) along with my input and output arrays.
Using kiss_fft_alloc(...) to create an inverse kiss_fft_cfg with nfft = 1000.
Passing the inverse kiss_fft_cfg to kiss_fft(...) inputting the previous output array.
Expecting the original data back, but getting each datum exactly 1000 times bigger!
I have put a full example here, and my 50-odd lines of code can be found right at the end. Although I can work around this by dividing each result by the value of OPTIMAL_SIZE (i.e. 1000) that fix makes me very uneasy without understanding why.
Please can you advise what simply stupid thing(s) I am doing wrong?
This is to be expected: the inverse discreet Fourier transform (which can be implemented using the Fast Fourier Transform), requires a division by 1/N:
The normalization factor multiplying the DFT and IDFT (here 1 and 1/N)
and the signs of the exponents are merely conventions, and differ in
some treatments. The only requirements of these conventions are that
the DFT and IDFT have opposite-sign exponents and that the product of
their normalization factors be 1/N. A normalization of \sqrt{1/N} for both the DFT and IDFT makes the transforms unitary,
which has some theoretical advantages. But it is often more practical
in numerical computation to perform the scaling all at once as above
(and a unit scaling can be convenient in other ways).
http://en.wikipedia.org/wiki/Dft

Is there around a straightforward way to invert a triangular (upper or lower) matrix?

I'm trying to implement some basic linear algebra operations and one of these operations is the inversion of a triangular (upper and/or lower) matrix. Is there an easy and stable algorithm to do that?
Thank you.
Yes, use back substitution. A standard algorithm to invert a matrix is to find its LU decomposition (decomposition into a lower-triangular and an upper-triangular matrix), use back subsitution on the triangular pieces, and then combine the results to obtain the inverse of the original matrix.
Don't invert it if you can. It's one of the basic commandments of numerical linear algebra.
It is much faster and numerically stabler to keep the matrix L itself in memory and compute inv(L)b with back-substitution whenever you need to do something else with inv(L).
Note that the customary algorithm for inverting it requires solving the systems inv(L)[1 0 0 ...],
inv(L)[0 1 0 ....],
inv(L)[0 0 1 ....] and so on, so you see it is much easier not to invert it at all.
Given a lower triangular matrix L, backsubstitution allows you to solve the system
L x = b
quickly for any right-hand side b.
To invert L, you can solve this system for right-hand sides e1=(1,0,...,0), e2=(0,1,...,0), ..., en=(0,0,...,1) and combine the resulting solution vectors into a single (necessarily lower-triangular) matrix.
If you are interested in a closed-form solution, the diagonal elements of the inverse are the inverses of the original diagonal elements, and the formula for the rest of the elements of the inverse gets more and more complicated as you move aways from the diagonal.
If you are talking about single precision reals, have a look at the source code for the LAPACK routines STRTRI and STRTI2.
Being B inverse of A, a triangular matrix, you can use the following MATLAB code:
n = size(A,1);
B = zeros(n);
for i=1:n
B(i,i) = 1/A(i,i);
for j=1:i-1
s = 0;
for k=j:i-1
s = s + A(i,k)*B(k,j);
end
B(i,j) = -s*B(i,i);
end
end
Wow, that's practically half the contents of a numerical analysis course. The standard algorithms will do it, and there is a bunch of canned code here. The ultimate source for this and most other usual numerical analysis problems is Numerical Recipes.