How to overcome zero reachdist in density based outlier - data-mining

I encounter a problem with my density based outlier detection program. Sometime, the sum of reachDist is zero. For example, my dataSet consists of { a=4,b=4,c=4,d=4,e=5,f=5,g=6 } and my K is 3. The sum of reachDist of a,b,c and d would be zero. May I know how to overcome this problem. Thanks :)

These points clearly are inliers, and should be givne a LOF of 1.

Related

Backpropagation 2-Dimensional Neuron Network C++

I am learning about Two Dimensional Neuron Network so I am facing many obstacles but I believe it is worth it and I am really enjoying this learning process.
Here's my plan: To make a 2-D NN work on recognizing images of digits. Images are 5 by 3 grids and I prepared 10 images from zero to nine. For Example this would be number 7:
Number 7 has indexes 0,1,2,5,8,11,14 as 1s (or 3,4,6,7,9,10,12,13 as 0s doesn't matter) and so on. Therefore, my input layer will be a 5 by 3 neuron layer and I will be feeding it zeros OR ones only (not in between and the indexes depends on which image I am feeding the layer).
My output layer however will be one dimensional layer of 10 neurons. Depends on which digit was recognized, a certain neuron will fire a value of one and the rest should be zeros (shouldn't fire).
I am done with implementing everything, I have a problem in computing though and I would really appreciate any help. I am getting an extremely high error rate and an extremely low (negative) output values on all output neurons and values (error and output) do not change even on the 10,000th pass.
I would love to go further and post my Backpropagation methods since I believe the problem is in it. However to break down my work I would love to hear some comments first, I want to know if my design is approachable.
Does my plan make sense?
All the posts are speaking about ranges ( 0->1, -1 ->+1, 0.01 -> 0.5 etc ), will it work for either { 0 | .OR. | 1 } on the output layer and not a range? if yes, how can I control that?
I am using TanHyperbolic as my transfer function. Does it make a difference between this and sigmoid, other functions.. etc?
Any ideas/comments/guidance are appreciated and thanks in advance
Well, by the description given above, I think that the design and approach taken it's correct! With respect to the choice of the activation function, remember that those functions help to get the neurons which have the largest activation number, also, their algebraic properties, such as an easy derivative, help with the definition of Backpropagation. Taking this into account, you should not worry about your choice of activation function.
The ranges that you mention above, correspond to a process of scaling of the input, it is better to have your input images in range 0 to 1. This helps to scale the error surface and help with the speed and convergence of the optimization process. Because your input set is composed of images, and each image is composed of pixels, the minimum value and and the maximum value that a pixel can attain is 0 and 255, respectively. To scale your input in this example, it is essential to divide each value by 255.
Now, with respect to the training problems, Have you tried checking if your gradient calculation routine is correct? i.e., by using the cost function, and evaluating the cost function, J? If not, try generating a toy vector theta that contains all the weight matrices involved in your neural network, and evaluate the gradient at each point, by using the definition of gradient, sorry for the Matlab example, but it should be easy to port to C++:
perturb = zeros(size(theta));
e = 1e-4;
for p = 1:numel(theta)
% Set perturbation vector
perturb(p) = e;
loss1 = J(theta - perturb);
loss2 = J(theta + perturb);
% Compute Numerical Gradient
numgrad(p) = (loss2 - loss1) / (2*e);
perturb(p) = 0;
end
After evaluating the function, compare the numerical gradient, with the gradient calculated by using backpropagation. If the difference between each calculation is less than 3e-9, then your implementation shall be correct.
I recommend to checkout the UFLDL tutorials offered by the Stanford Artificial Intelligence Laboratory, there you can find a lot of information related to neural networks and its paradigms, it's worth to take look at it!
http://ufldl.stanford.edu/wiki/index.php/Main_Page
http://ufldl.stanford.edu/tutorial/

How to prevent Gurobi from converting a Max into a Min prob?

I am using Gurobi (via C++) as part of my MSc thesis to solve Quadratic Knapsack Problem instances. So far, I was able to generate a model with binary decision variables, quadratic objective function and the capacity constraint and Gurobi solved it just fine. Then I wanted to solve the continuous relaxation of the QKP. I built the model as before but with continuous variables instead of binary ones and Gurobi threw me an exception when I tried to optimize it:
10020 - Objective Q not PSD (negative diagonal entry)
Which confounded me for a bit since all values form the problem instance are ≥0. In preparing to post this question, I wrote both models out to file and discovered the reason:
NAME (null)
* Max problem is converted into Min one
Which of course means that all previously positive values are now negative. Now I know why Q is not PSD but how do I fix this? Can I prevent the conversion from a Max problem into a Min one? Do I need to configure the model for the continuous relaxation differently?
From my (unexperienced) point of view it just looks like Gurobi shot itself in the foot.
When you are maximizing a quadratic objective with Gurobi or any other convex optimizer, your 'Q' matrix has to be negative semi-definite, and when you are minimizing, your 'Q' matrix needs to be positive definite. Changing the sign and the sense of the objective changes nothing.
Gurobi doesn't verify that your problem is convex, but it will report any non-convexity it finds. The fact that your original problem seemed to solve as a MIP is an accident and you shouldn't rely on it.
You should model a quadratic objective with binary variables as a linear problem with some simple transformations. If x and y are binary, the expression x*y can be changed to z if you add the constraints
z <= x
z <= y
z >= x + y - 1

Eigen sum of matrices resulting in NaN and -inf values

I am having a strange issue with using Eigen (Tuxfamily) in my software (in c++).
I am analyzing a 3D volume image by calculating for each pixel an Hessian matrix.
The volume (approx 800x800x600) is divided in subvolumes and for each subvolume i sum up all the obtained matrices and then divide them by the amount to obtain the average (and then i do the same summing up all the averages and dividing by the number of subvolumes to obtain the average for the full volume).
The matrices are of type Matrix3d.
The problem is, that for most of the sums (and obviously for the averages as well) i obtain something like :
Elements analyzed : 28215
Elements summed : 28215
Subvolume sum :
5143.76 | nan | -2778.05
5402.07 | 16011.9 | -inf
-2778.05 | -8716.86 | 7059.32
I sum them this way :
for(int i = 0;i<(int)OuterVector.size();i++){
AverageProduct+=OuterVector[i];
}
Due to the nature of the matrices i know that they should be symmetrical on the diagonal, so the correct value is calculated for some of them. Any idea on why the others might be failing? (and consider that it's always the same two position of the matrix giving me nan and -inf)
Ok, using a mix of the suggestions you guys gave me in the comments, i tried a couple of random fixes and i solved the problem.
When i was creating the Eigen::Matrix3d object, i was not initializing the values, so somehow as soon as i was adding the first OuterVector[i] those two values were going wild (the (0,1) was going to nan and the (1,2) was going to inf). Strange that it was only happening only for those two specific values and in the same identical way every time.
So doing (at initialization time)
Matrix3d AverageProduct << 0,0,0,0,0,0,0,0,0;
was enough to fix it.

generating a pseduo-random positive definite matrix

I wanted to test a simple Cholesky code I wrote in C++. So I am generating a random lower-triangular L and multiplying by its transpose to generate A.
A = L * Lt;
But my code fails to factor A. So I tried this in Matlab:
N=200; L=tril(rand(N, N)); A=L*L'; [lc,p]=chol(A,'lower'); p
This outputs non-zero p which means Matlab also fails to factor A. I am guessing the randomness generates rank-deficient matrices. Am I right?
Update:
I forgot to mention that the following Matlab code seems to work as pointed out by Malife below:
N=200; L=rand(N, N); A=L*L'; [lc,p]=chol(A,'lower'); p
The difference is L is lower-triangular in the first code and not the second one. Why should that matter?
I also tried the following with scipy after reading A simple algorithm for generating positive-semidefinite matrices:
from scipy import random, linalg
A = random.rand(100, 100)
B = A*A.transpose()
linalg.cholesky(B)
But it errors out with:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/scipy/linalg/decomp_cholesky.py", line 66, in cholesky
c, lower = _cholesky(a, lower=lower, overwrite_a=overwrite_a, clean=True)
File "/usr/lib/python2.7/dist-packages/scipy/linalg/decomp_cholesky.py", line 24, in _cholesky
raise LinAlgError("%d-th leading minor not positive definite" % info)
numpy.linalg.linalg.LinAlgError: 2-th leading minor not positive definite
I don't understand why that's happening with scipy. Any ideas?
Thanks,
Nilesh.
The problem is not with the cholesky factorization. The problem is with the random matrix L.
rand(N,N) is much better conditioned than tril(rand(N,N)). To see this, compare cond(rand(N,N)) to cond(tril(rand(N,N))). I got something like 1e3 for the first and 1e19 for the second, so the conditioning number of the second matrix is much higher and computations will be less stable numerically.
This will result in getting some small negative eigenvalues in the ill-conditioned case - to see this look at the eigenvalues using eig(), some small ones will be negative.
So I would suggest to use rand(N,N) to generate a numerically stable random matrix.
BTW if you are interested in the theory of why this happens, you can look at this paper:
http://epubs.siam.org/doi/abs/10.1137/S0895479896312869
As has been said before, eigen values of a triangular matrix lie on the diagonal. Hence, by doing
L=tril(rand(n))
you made sure that eig(L) only yield positive values. You can improve the condition number of L*L' by adding a large enough positive number to the diagonal, e.g.
L=L+n*eye(n)
and L*L' is positive definite and well conditioned:
> cond(L*L')
ans =
1.8400
To generate a random positive definite matrix in MATLAB your code should read:
N=200;
L=rand(N, N);
A=L*transpose(L);
[lc,p]=chol(A,'lower');
eig(A)
p
And you should indeed have the eigenvalues be greater than zero and p be zero.
You ask about the lower triangular case. Lets see what happens, and why there are problems. This is often a good thing to do, to look at a test case.
For a simple 5x5 matrix,
L = tril(rand(5))
L =
0.72194 0 0 0 0
0.027804 0.78422 0 0 0
0.26607 0.097189 0.77554 0 0
0.96157 0.71437 0.98738 0.66828 0
0.024571 0.046486 0.94515 0.38009 0.087634
eig(L)
ans =
0.087634
0.66828
0.77554
0.78422
0.72194
Of course, the eigenvalues of a triangular matrix are just the diagonal elements. Since the elements generated by rand are always between 0 and 1, on average they will be roughly 1/2. Perhaps looking at the distribution of the determinant of L will help. Better is to consider the distribution of log(det(L)). Since the determinant will be simply the product of the diagonal elements, the log is the sum of the logs of the diagonal elements. (Yes, I know the determinant is a poor measure of singularity, but the distribution of log(det(L)) is easily computed and I'm feeling too lazy to think about the distribution of the condition number.)
Ah, but the negative log of a uniform random variable is an exponential variate, in this case an exponential with lambda = 1. The sum of the logs of a set of n uniform random numbers from the interval (0,1) will by the central limit theorem be Gaussian. The mean of that sum will be -n. Therefore the determinant of a lower triangular nxn matrix generated by such a scheme will be exp(-n). When n is 200, MATLAB tells me that
exp(-200)
ans =
1.3839e-87
Thus for a matrix of any appreciable size, we can see that it will be poorly conditioned. Worse, when you form the product L*L', it will generally be numerically singular. The same arguments apply to the condition number. Thus, for even a 20x20 matrix, see that the condition number of such a lower triangular matrix is fairly large. Then when we form the matrix L*L', the condition will be squared as expected.
L = tril(rand(20));
cond(L)
ans =
1.9066e+07
cond(L*L')
ans =
3.6325e+14
See how much better things are for a full matrix.
A = rand(20);
cond(A)
ans =
253.74
cond(A*A')
ans =
64384

Best way to program piecewise-linear function on DSP TMS320C5509

There is a Table of pairs , which defines pieces bounds.
And we are using straightforward algorithm:
y = f(x)
Calculate index n in Table using x
Get Yn and Yn+1, compute linear interpolation Y
Y is the answer.
So i think, there must be more efficient method, could you please point me?
Depending on the number and distribution of pairs, you might be able to instead store a table T containing only the Y values at regular intervals. Pick the interval to be a power of 2: i=2^c. Then for a given X:
n=X>>c;
Y= T[n]
Y+= ((T[n+1]-T[n])* (X&(i-1))>>c;
This should work as long as you have space for a table with small enough intervals to catch sudden changes in the slope of Y, and enough headroom in Y for the multiply.
Use binary search for step 1.
EDIT: due to the comment you added afterwards, this is not necessary, since your intervals are equally spaced.