I am having a strange issue with using Eigen (Tuxfamily) in my software (in c++).
I am analyzing a 3D volume image by calculating for each pixel an Hessian matrix.
The volume (approx 800x800x600) is divided in subvolumes and for each subvolume i sum up all the obtained matrices and then divide them by the amount to obtain the average (and then i do the same summing up all the averages and dividing by the number of subvolumes to obtain the average for the full volume).
The matrices are of type Matrix3d.
The problem is, that for most of the sums (and obviously for the averages as well) i obtain something like :
Elements analyzed : 28215
Elements summed : 28215
Subvolume sum :
5143.76 | nan | -2778.05
5402.07 | 16011.9 | -inf
-2778.05 | -8716.86 | 7059.32
I sum them this way :
for(int i = 0;i<(int)OuterVector.size();i++){
AverageProduct+=OuterVector[i];
}
Due to the nature of the matrices i know that they should be symmetrical on the diagonal, so the correct value is calculated for some of them. Any idea on why the others might be failing? (and consider that it's always the same two position of the matrix giving me nan and -inf)
Ok, using a mix of the suggestions you guys gave me in the comments, i tried a couple of random fixes and i solved the problem.
When i was creating the Eigen::Matrix3d object, i was not initializing the values, so somehow as soon as i was adding the first OuterVector[i] those two values were going wild (the (0,1) was going to nan and the (1,2) was going to inf). Strange that it was only happening only for those two specific values and in the same identical way every time.
So doing (at initialization time)
Matrix3d AverageProduct << 0,0,0,0,0,0,0,0,0;
was enough to fix it.
Related
I'm implementing the Mahalanobis distance to measure the distance between two vectors of the same pool and just recognized that it seems to be correct the most but sometimes not, maybe due to negative values..
I realized that sometimes for
negative values occur. That's why the distance gets negative respectively the code throws up an error because root of a negative value..
I wonder about the problem. The data is (a row represents an object):
A: 376.498943729227 2.75082585760394 376.688899264061 2.75084113940164
B: 373.287817831307 2.75074375125675 373.392663499518 2.75092754974534
C: 377.091938091279 2.75082292557743 377.466035993347 2.75077191984784
D: 374.799551607287 2.75094834987157 374.209110037364 2.75091796001419
The covariance matrix S is then
7.13457e-09 3.13933e-05 5.45925e-10 3.80508e-06
3.13933e-05 2.96355 -0.000115865 3.28797
5.45925e-10 -0.000115865 5.31665e-09 -0.000137211
3.80508e-06 3.28797 -0.000137211 3.79042
and the inverse of it is
3.24779e+22 -8.58499e+18 1.40166e+22 7.92177e+18
-8.58499e+18 2.2693e+15 -3.70505e+18 -2.09399e+15
1.40166e+22 -3.70505e+18 6.04917e+21 3.41882e+18
7.92177e+18 -2.09399e+15 3.41882e+18 1.93222e+15
Now I wonder why I get negative results out of the highlighted product (in case of B and D)?
I'm not sure if its a programming problem (that's why I didn't include code lines, yet) or rather a theoretical one but I appreciate any help a lot!
I use the Eigen class.
edit:
I calculated the eigenvalues of the covariance matrix S via R and get:
7.593311e+02 1.243531e-01 1.156646e-02 -3.920936e-04
Why do I have different ones?
I used
M<- matrix(c(376.498943729227, 2.75082585760394, 376.688899264061, 2.75084113940164,
373.287817831307, 2.75074375125675, 373.392663499518, 2.75092754974534,
377.091938091279, 2.75082292557743, 377.466035993347, 2.75077191984784,
374.799551607287, 2.75094834987157, 374.209110037364, 2.75091796001419
), 4, 4)
> M
[,1] [,2] [,3] [,4]
[1,] 376.498944 373.287818 377.091938 374.799552
[2,] 2.750826 2.750744 2.750823 2.750948
[3,] 376.688899 373.392663 377.466036 374.209110
[4,] 2.750841 2.750928 2.750772 2.750918
ev<- eigen(M)
values<- ev$values
values
[1] 7.593311e+02 1.243531e-01 1.156646e-02 -3.920936e-04
Your covariance matrix has two eigenvalues that are almost zero (10^-10 and 10^-18). Therefore, the matrix cannot be easily inverted, it might even be considered as non-invertible.
The reason for the two small eigenvalues is that your data points do not fill the entire 4D space but only a 2D subspace (a plane embedded in 4D).
To calculate a reasonable distance, you need to project your points onto a 2D space (or whatever dimensionality your real data have). You can do this with PCA. After this, you can calculate the distance in 2D.
I copy&pasted your matrix to Matlab and computed eigenvalues, and the smallest of them is -4.0819e-13.
Which doesn't seem that bad, but it shows a problem. A covariance matrix should be positive semidefinite, and therefore no eigenvalue should be smaller than 0. Likely due to rounding issues in your code, the matrix has a (slightly) negative eigenvalue, which can result in a problem like you are having.
Also, since two of the eigenvalues are practically zero, computing the inverse is a very brave move indeed. Meaning: you shouldn't, since you are essentially computing the inverse of a singular matrix.
I am learning about Two Dimensional Neuron Network so I am facing many obstacles but I believe it is worth it and I am really enjoying this learning process.
Here's my plan: To make a 2-D NN work on recognizing images of digits. Images are 5 by 3 grids and I prepared 10 images from zero to nine. For Example this would be number 7:
Number 7 has indexes 0,1,2,5,8,11,14 as 1s (or 3,4,6,7,9,10,12,13 as 0s doesn't matter) and so on. Therefore, my input layer will be a 5 by 3 neuron layer and I will be feeding it zeros OR ones only (not in between and the indexes depends on which image I am feeding the layer).
My output layer however will be one dimensional layer of 10 neurons. Depends on which digit was recognized, a certain neuron will fire a value of one and the rest should be zeros (shouldn't fire).
I am done with implementing everything, I have a problem in computing though and I would really appreciate any help. I am getting an extremely high error rate and an extremely low (negative) output values on all output neurons and values (error and output) do not change even on the 10,000th pass.
I would love to go further and post my Backpropagation methods since I believe the problem is in it. However to break down my work I would love to hear some comments first, I want to know if my design is approachable.
Does my plan make sense?
All the posts are speaking about ranges ( 0->1, -1 ->+1, 0.01 -> 0.5 etc ), will it work for either { 0 | .OR. | 1 } on the output layer and not a range? if yes, how can I control that?
I am using TanHyperbolic as my transfer function. Does it make a difference between this and sigmoid, other functions.. etc?
Any ideas/comments/guidance are appreciated and thanks in advance
Well, by the description given above, I think that the design and approach taken it's correct! With respect to the choice of the activation function, remember that those functions help to get the neurons which have the largest activation number, also, their algebraic properties, such as an easy derivative, help with the definition of Backpropagation. Taking this into account, you should not worry about your choice of activation function.
The ranges that you mention above, correspond to a process of scaling of the input, it is better to have your input images in range 0 to 1. This helps to scale the error surface and help with the speed and convergence of the optimization process. Because your input set is composed of images, and each image is composed of pixels, the minimum value and and the maximum value that a pixel can attain is 0 and 255, respectively. To scale your input in this example, it is essential to divide each value by 255.
Now, with respect to the training problems, Have you tried checking if your gradient calculation routine is correct? i.e., by using the cost function, and evaluating the cost function, J? If not, try generating a toy vector theta that contains all the weight matrices involved in your neural network, and evaluate the gradient at each point, by using the definition of gradient, sorry for the Matlab example, but it should be easy to port to C++:
perturb = zeros(size(theta));
e = 1e-4;
for p = 1:numel(theta)
% Set perturbation vector
perturb(p) = e;
loss1 = J(theta - perturb);
loss2 = J(theta + perturb);
% Compute Numerical Gradient
numgrad(p) = (loss2 - loss1) / (2*e);
perturb(p) = 0;
end
After evaluating the function, compare the numerical gradient, with the gradient calculated by using backpropagation. If the difference between each calculation is less than 3e-9, then your implementation shall be correct.
I recommend to checkout the UFLDL tutorials offered by the Stanford Artificial Intelligence Laboratory, there you can find a lot of information related to neural networks and its paradigms, it's worth to take look at it!
http://ufldl.stanford.edu/wiki/index.php/Main_Page
http://ufldl.stanford.edu/tutorial/
I am using PCA on binary attributes to reduce the dimensions (attributes) of my problem. The initial dimensions were 592 and after PCA the dimensions are 497. I used PCA before, on numeric attributes in an other problem and it managed to reduce the dimensions in a greater extent (the half of the initial dimensions). I believe that binary attributes decrease the power of PCA, but i do not know why. Could you please explain me why PCA does not work as good as in numeric data.
Thank you.
The principal components of 0/1 data can fall off slowly or rapidly,
and the PCs of continuous data too —
it depends on the data. Can you describe your data ?
The following picture is intended to compare the PCs of continuous image data
vs. the PCs of the same data quantized to 0/1: in this case, inconclusive.
Look at PCA as a way of getting an approximation to a big matrix,
first with one term: approximate A ~ c U VT, c [Ui Vj].
Consider this a bit, with A say 10k x 500: U 10k long, V 500 long.
The top row is c U1 V, the second row is c U2 V ...
all the rows are proportional to V.
Similarly the leftmost column is c U V1 ...
all the columns are proportional to U.
But if all rows are similar (proportional to each other),
they can't get near an A matix with rows or columns 0100010101 ...
With more terms, A ~ c1 U1 V1T + c2 U2 V2T + ...,
we can get nearer to A: the smaller the higher ci, the faster..
(Of course, all 500 terms recreate A exactly, to within roundoff error.)
The top row is "lena", a well-known 512 x 512 matrix,
with 1-term and 10-term SVD approximations.
The bottom row is lena discretized to 0/1, again with 1 term and 10 terms.
I thought that the 0/1 lena would be much worse -- comments, anyone ?
(U VT is also written U ⊗ V, called a "dyad" or "outer product".)
(The wikipedia articles
Singular value decomposition
and Low-rank approximation
are a bit math-heavy.
An AMS column by
David Austin,
We Recommend a Singular Value Decomposition
gives some intuition on SVD / PCA -- highly recommended.)
I wanted to test a simple Cholesky code I wrote in C++. So I am generating a random lower-triangular L and multiplying by its transpose to generate A.
A = L * Lt;
But my code fails to factor A. So I tried this in Matlab:
N=200; L=tril(rand(N, N)); A=L*L'; [lc,p]=chol(A,'lower'); p
This outputs non-zero p which means Matlab also fails to factor A. I am guessing the randomness generates rank-deficient matrices. Am I right?
Update:
I forgot to mention that the following Matlab code seems to work as pointed out by Malife below:
N=200; L=rand(N, N); A=L*L'; [lc,p]=chol(A,'lower'); p
The difference is L is lower-triangular in the first code and not the second one. Why should that matter?
I also tried the following with scipy after reading A simple algorithm for generating positive-semidefinite matrices:
from scipy import random, linalg
A = random.rand(100, 100)
B = A*A.transpose()
linalg.cholesky(B)
But it errors out with:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/scipy/linalg/decomp_cholesky.py", line 66, in cholesky
c, lower = _cholesky(a, lower=lower, overwrite_a=overwrite_a, clean=True)
File "/usr/lib/python2.7/dist-packages/scipy/linalg/decomp_cholesky.py", line 24, in _cholesky
raise LinAlgError("%d-th leading minor not positive definite" % info)
numpy.linalg.linalg.LinAlgError: 2-th leading minor not positive definite
I don't understand why that's happening with scipy. Any ideas?
Thanks,
Nilesh.
The problem is not with the cholesky factorization. The problem is with the random matrix L.
rand(N,N) is much better conditioned than tril(rand(N,N)). To see this, compare cond(rand(N,N)) to cond(tril(rand(N,N))). I got something like 1e3 for the first and 1e19 for the second, so the conditioning number of the second matrix is much higher and computations will be less stable numerically.
This will result in getting some small negative eigenvalues in the ill-conditioned case - to see this look at the eigenvalues using eig(), some small ones will be negative.
So I would suggest to use rand(N,N) to generate a numerically stable random matrix.
BTW if you are interested in the theory of why this happens, you can look at this paper:
http://epubs.siam.org/doi/abs/10.1137/S0895479896312869
As has been said before, eigen values of a triangular matrix lie on the diagonal. Hence, by doing
L=tril(rand(n))
you made sure that eig(L) only yield positive values. You can improve the condition number of L*L' by adding a large enough positive number to the diagonal, e.g.
L=L+n*eye(n)
and L*L' is positive definite and well conditioned:
> cond(L*L')
ans =
1.8400
To generate a random positive definite matrix in MATLAB your code should read:
N=200;
L=rand(N, N);
A=L*transpose(L);
[lc,p]=chol(A,'lower');
eig(A)
p
And you should indeed have the eigenvalues be greater than zero and p be zero.
You ask about the lower triangular case. Lets see what happens, and why there are problems. This is often a good thing to do, to look at a test case.
For a simple 5x5 matrix,
L = tril(rand(5))
L =
0.72194 0 0 0 0
0.027804 0.78422 0 0 0
0.26607 0.097189 0.77554 0 0
0.96157 0.71437 0.98738 0.66828 0
0.024571 0.046486 0.94515 0.38009 0.087634
eig(L)
ans =
0.087634
0.66828
0.77554
0.78422
0.72194
Of course, the eigenvalues of a triangular matrix are just the diagonal elements. Since the elements generated by rand are always between 0 and 1, on average they will be roughly 1/2. Perhaps looking at the distribution of the determinant of L will help. Better is to consider the distribution of log(det(L)). Since the determinant will be simply the product of the diagonal elements, the log is the sum of the logs of the diagonal elements. (Yes, I know the determinant is a poor measure of singularity, but the distribution of log(det(L)) is easily computed and I'm feeling too lazy to think about the distribution of the condition number.)
Ah, but the negative log of a uniform random variable is an exponential variate, in this case an exponential with lambda = 1. The sum of the logs of a set of n uniform random numbers from the interval (0,1) will by the central limit theorem be Gaussian. The mean of that sum will be -n. Therefore the determinant of a lower triangular nxn matrix generated by such a scheme will be exp(-n). When n is 200, MATLAB tells me that
exp(-200)
ans =
1.3839e-87
Thus for a matrix of any appreciable size, we can see that it will be poorly conditioned. Worse, when you form the product L*L', it will generally be numerically singular. The same arguments apply to the condition number. Thus, for even a 20x20 matrix, see that the condition number of such a lower triangular matrix is fairly large. Then when we form the matrix L*L', the condition will be squared as expected.
L = tril(rand(20));
cond(L)
ans =
1.9066e+07
cond(L*L')
ans =
3.6325e+14
See how much better things are for a full matrix.
A = rand(20);
cond(A)
ans =
253.74
cond(A*A')
ans =
64384
I am trying to do a 2D Real To Complex FFT using CUFFT.
I realize that I will do this and get W/2+1 complex values back (W being the "width" of my H*W matrix).
The question is - what if I want to build out a full H*W version of this matrix after the transform - how do I go about copying some values from the H*(w/2+1) result matrix back to a full size matrix to get both parts and the DC value in the right place
Thanks
I'm not familiar with CUDA, so take that into consideration when reading my response. I am familiar with FFTs and signal processing in general, though.
It sounds like you start out with an H (rows) x W (cols) matrix, and that you are doing a 2D FFT that essentially does an FFT on each row, and you end up with an H x W/2+1 matrix. A W-wide FFT returns W values, but the CUDA function only returns W/2+1 because real data is even in the frequency domain, so the negative frequency data is redundant.
So, if you want to reproduce the missing W/2-1 points, simply mirror the positive frequency. For instance, if one of the rows is as follows:
Index Data
0 12 + i
1 5 + 2i
2 6
3 2 - 3i
...
The 0 index is your DC power, the 1 index is the lowest positive frequency bin, and so forth. You would thus make your closest-to-DC negative frequency bin 5+2i, the next closest 6, and so on. Where you put those values in the array is up to you. I would do it the way Matlab does it, with the negative frequency data after the positive frequency data.
I hope that makes sense.
There are two ways this can be acheived. You will have to write your own kernel to acheive either of this.
1) You will need to perform conjugate on the (half) data you get to find the other half.
2) Since you want full results anyway, it would be best if you convert the input data from real to complex (by padding with 0 imaginary) and performing the complex to complex transform.
From practice I have noticed that there is not much of a difference in speed either way.
I actually searched the nVidia forums and found a kernel that someone had written that did just what I was asking. That is what I used. if you search the cuda forum for "redundant results fft" or similar you will find it.