I wanted to test a simple Cholesky code I wrote in C++. So I am generating a random lower-triangular L and multiplying by its transpose to generate A.
A = L * Lt;
But my code fails to factor A. So I tried this in Matlab:
N=200; L=tril(rand(N, N)); A=L*L'; [lc,p]=chol(A,'lower'); p
This outputs non-zero p which means Matlab also fails to factor A. I am guessing the randomness generates rank-deficient matrices. Am I right?
Update:
I forgot to mention that the following Matlab code seems to work as pointed out by Malife below:
N=200; L=rand(N, N); A=L*L'; [lc,p]=chol(A,'lower'); p
The difference is L is lower-triangular in the first code and not the second one. Why should that matter?
I also tried the following with scipy after reading A simple algorithm for generating positive-semidefinite matrices:
from scipy import random, linalg
A = random.rand(100, 100)
B = A*A.transpose()
linalg.cholesky(B)
But it errors out with:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/scipy/linalg/decomp_cholesky.py", line 66, in cholesky
c, lower = _cholesky(a, lower=lower, overwrite_a=overwrite_a, clean=True)
File "/usr/lib/python2.7/dist-packages/scipy/linalg/decomp_cholesky.py", line 24, in _cholesky
raise LinAlgError("%d-th leading minor not positive definite" % info)
numpy.linalg.linalg.LinAlgError: 2-th leading minor not positive definite
I don't understand why that's happening with scipy. Any ideas?
Thanks,
Nilesh.
The problem is not with the cholesky factorization. The problem is with the random matrix L.
rand(N,N) is much better conditioned than tril(rand(N,N)). To see this, compare cond(rand(N,N)) to cond(tril(rand(N,N))). I got something like 1e3 for the first and 1e19 for the second, so the conditioning number of the second matrix is much higher and computations will be less stable numerically.
This will result in getting some small negative eigenvalues in the ill-conditioned case - to see this look at the eigenvalues using eig(), some small ones will be negative.
So I would suggest to use rand(N,N) to generate a numerically stable random matrix.
BTW if you are interested in the theory of why this happens, you can look at this paper:
http://epubs.siam.org/doi/abs/10.1137/S0895479896312869
As has been said before, eigen values of a triangular matrix lie on the diagonal. Hence, by doing
L=tril(rand(n))
you made sure that eig(L) only yield positive values. You can improve the condition number of L*L' by adding a large enough positive number to the diagonal, e.g.
L=L+n*eye(n)
and L*L' is positive definite and well conditioned:
> cond(L*L')
ans =
1.8400
To generate a random positive definite matrix in MATLAB your code should read:
N=200;
L=rand(N, N);
A=L*transpose(L);
[lc,p]=chol(A,'lower');
eig(A)
p
And you should indeed have the eigenvalues be greater than zero and p be zero.
You ask about the lower triangular case. Lets see what happens, and why there are problems. This is often a good thing to do, to look at a test case.
For a simple 5x5 matrix,
L = tril(rand(5))
L =
0.72194 0 0 0 0
0.027804 0.78422 0 0 0
0.26607 0.097189 0.77554 0 0
0.96157 0.71437 0.98738 0.66828 0
0.024571 0.046486 0.94515 0.38009 0.087634
eig(L)
ans =
0.087634
0.66828
0.77554
0.78422
0.72194
Of course, the eigenvalues of a triangular matrix are just the diagonal elements. Since the elements generated by rand are always between 0 and 1, on average they will be roughly 1/2. Perhaps looking at the distribution of the determinant of L will help. Better is to consider the distribution of log(det(L)). Since the determinant will be simply the product of the diagonal elements, the log is the sum of the logs of the diagonal elements. (Yes, I know the determinant is a poor measure of singularity, but the distribution of log(det(L)) is easily computed and I'm feeling too lazy to think about the distribution of the condition number.)
Ah, but the negative log of a uniform random variable is an exponential variate, in this case an exponential with lambda = 1. The sum of the logs of a set of n uniform random numbers from the interval (0,1) will by the central limit theorem be Gaussian. The mean of that sum will be -n. Therefore the determinant of a lower triangular nxn matrix generated by such a scheme will be exp(-n). When n is 200, MATLAB tells me that
exp(-200)
ans =
1.3839e-87
Thus for a matrix of any appreciable size, we can see that it will be poorly conditioned. Worse, when you form the product L*L', it will generally be numerically singular. The same arguments apply to the condition number. Thus, for even a 20x20 matrix, see that the condition number of such a lower triangular matrix is fairly large. Then when we form the matrix L*L', the condition will be squared as expected.
L = tril(rand(20));
cond(L)
ans =
1.9066e+07
cond(L*L')
ans =
3.6325e+14
See how much better things are for a full matrix.
A = rand(20);
cond(A)
ans =
253.74
cond(A*A')
ans =
64384
Related
I am using Gurobi (via C++) as part of my MSc thesis to solve Quadratic Knapsack Problem instances. So far, I was able to generate a model with binary decision variables, quadratic objective function and the capacity constraint and Gurobi solved it just fine. Then I wanted to solve the continuous relaxation of the QKP. I built the model as before but with continuous variables instead of binary ones and Gurobi threw me an exception when I tried to optimize it:
10020 - Objective Q not PSD (negative diagonal entry)
Which confounded me for a bit since all values form the problem instance are ≥0. In preparing to post this question, I wrote both models out to file and discovered the reason:
NAME (null)
* Max problem is converted into Min one
Which of course means that all previously positive values are now negative. Now I know why Q is not PSD but how do I fix this? Can I prevent the conversion from a Max problem into a Min one? Do I need to configure the model for the continuous relaxation differently?
From my (unexperienced) point of view it just looks like Gurobi shot itself in the foot.
When you are maximizing a quadratic objective with Gurobi or any other convex optimizer, your 'Q' matrix has to be negative semi-definite, and when you are minimizing, your 'Q' matrix needs to be positive definite. Changing the sign and the sense of the objective changes nothing.
Gurobi doesn't verify that your problem is convex, but it will report any non-convexity it finds. The fact that your original problem seemed to solve as a MIP is an accident and you shouldn't rely on it.
You should model a quadratic objective with binary variables as a linear problem with some simple transformations. If x and y are binary, the expression x*y can be changed to z if you add the constraints
z <= x
z <= y
z >= x + y - 1
I am having a strange issue with using Eigen (Tuxfamily) in my software (in c++).
I am analyzing a 3D volume image by calculating for each pixel an Hessian matrix.
The volume (approx 800x800x600) is divided in subvolumes and for each subvolume i sum up all the obtained matrices and then divide them by the amount to obtain the average (and then i do the same summing up all the averages and dividing by the number of subvolumes to obtain the average for the full volume).
The matrices are of type Matrix3d.
The problem is, that for most of the sums (and obviously for the averages as well) i obtain something like :
Elements analyzed : 28215
Elements summed : 28215
Subvolume sum :
5143.76 | nan | -2778.05
5402.07 | 16011.9 | -inf
-2778.05 | -8716.86 | 7059.32
I sum them this way :
for(int i = 0;i<(int)OuterVector.size();i++){
AverageProduct+=OuterVector[i];
}
Due to the nature of the matrices i know that they should be symmetrical on the diagonal, so the correct value is calculated for some of them. Any idea on why the others might be failing? (and consider that it's always the same two position of the matrix giving me nan and -inf)
Ok, using a mix of the suggestions you guys gave me in the comments, i tried a couple of random fixes and i solved the problem.
When i was creating the Eigen::Matrix3d object, i was not initializing the values, so somehow as soon as i was adding the first OuterVector[i] those two values were going wild (the (0,1) was going to nan and the (1,2) was going to inf). Strange that it was only happening only for those two specific values and in the same identical way every time.
So doing (at initialization time)
Matrix3d AverageProduct << 0,0,0,0,0,0,0,0,0;
was enough to fix it.
I am using PCA on binary attributes to reduce the dimensions (attributes) of my problem. The initial dimensions were 592 and after PCA the dimensions are 497. I used PCA before, on numeric attributes in an other problem and it managed to reduce the dimensions in a greater extent (the half of the initial dimensions). I believe that binary attributes decrease the power of PCA, but i do not know why. Could you please explain me why PCA does not work as good as in numeric data.
Thank you.
The principal components of 0/1 data can fall off slowly or rapidly,
and the PCs of continuous data too —
it depends on the data. Can you describe your data ?
The following picture is intended to compare the PCs of continuous image data
vs. the PCs of the same data quantized to 0/1: in this case, inconclusive.
Look at PCA as a way of getting an approximation to a big matrix,
first with one term: approximate A ~ c U VT, c [Ui Vj].
Consider this a bit, with A say 10k x 500: U 10k long, V 500 long.
The top row is c U1 V, the second row is c U2 V ...
all the rows are proportional to V.
Similarly the leftmost column is c U V1 ...
all the columns are proportional to U.
But if all rows are similar (proportional to each other),
they can't get near an A matix with rows or columns 0100010101 ...
With more terms, A ~ c1 U1 V1T + c2 U2 V2T + ...,
we can get nearer to A: the smaller the higher ci, the faster..
(Of course, all 500 terms recreate A exactly, to within roundoff error.)
The top row is "lena", a well-known 512 x 512 matrix,
with 1-term and 10-term SVD approximations.
The bottom row is lena discretized to 0/1, again with 1 term and 10 terms.
I thought that the 0/1 lena would be much worse -- comments, anyone ?
(U VT is also written U ⊗ V, called a "dyad" or "outer product".)
(The wikipedia articles
Singular value decomposition
and Low-rank approximation
are a bit math-heavy.
An AMS column by
David Austin,
We Recommend a Singular Value Decomposition
gives some intuition on SVD / PCA -- highly recommended.)
I am trying to do a 2D Real To Complex FFT using CUFFT.
I realize that I will do this and get W/2+1 complex values back (W being the "width" of my H*W matrix).
The question is - what if I want to build out a full H*W version of this matrix after the transform - how do I go about copying some values from the H*(w/2+1) result matrix back to a full size matrix to get both parts and the DC value in the right place
Thanks
I'm not familiar with CUDA, so take that into consideration when reading my response. I am familiar with FFTs and signal processing in general, though.
It sounds like you start out with an H (rows) x W (cols) matrix, and that you are doing a 2D FFT that essentially does an FFT on each row, and you end up with an H x W/2+1 matrix. A W-wide FFT returns W values, but the CUDA function only returns W/2+1 because real data is even in the frequency domain, so the negative frequency data is redundant.
So, if you want to reproduce the missing W/2-1 points, simply mirror the positive frequency. For instance, if one of the rows is as follows:
Index Data
0 12 + i
1 5 + 2i
2 6
3 2 - 3i
...
The 0 index is your DC power, the 1 index is the lowest positive frequency bin, and so forth. You would thus make your closest-to-DC negative frequency bin 5+2i, the next closest 6, and so on. Where you put those values in the array is up to you. I would do it the way Matlab does it, with the negative frequency data after the positive frequency data.
I hope that makes sense.
There are two ways this can be acheived. You will have to write your own kernel to acheive either of this.
1) You will need to perform conjugate on the (half) data you get to find the other half.
2) Since you want full results anyway, it would be best if you convert the input data from real to complex (by padding with 0 imaginary) and performing the complex to complex transform.
From practice I have noticed that there is not much of a difference in speed either way.
I actually searched the nVidia forums and found a kernel that someone had written that did just what I was asking. That is what I used. if you search the cuda forum for "redundant results fft" or similar you will find it.
I have a matrix with dimensions 500 x 10000. Each row represents a sample. I want to find for each sample a set of cells that only identify that sample. Thus I am looking for a reduced matrix 500 x n. Where n <10000. Is there any algorithm already developed that would help me?
This example in R doesn't necessarily yield an optimal solution (which in wise foresight you didn't request), but is quite quick:
# generate sample matrix
m = matrix(sample(0:4, 500*10000, T), 500, 10000)
# find smallest i for which the first i columns are unique over all sample rows
for (i in 4:10000) if (nrow(unique(m[, 1:i])) == 500) break