Precomputed distances for spectral clustering with scikit-learn - python-2.7

I'm struggling to make sense of the spectral clustering documentation here.
Specifically.
If you have an affinity matrix, such as a distance matrix, for which 0 means identical elements, and high values means very dissimilar elements, it can be transformed in a similarity matrix that is well suited for the algorithm by applying the Gaussian (RBF, heat) kernel:
np.exp(- X ** 2 / (2. * delta ** 2))
For my data, I have a complete distance matrix of size (n_samples, n_samples) where large entries represent dissimilar pairs, small values represent similar pairs and zero represents identical entries. (I.e. the only zeros are along the diagonal).
So all I need to do is build the SpectralClustering object with affinity = "precomputed" and then pass the transformed distance matrix to fit_predict.
I'm stuck on the suggested transformation equation. np.exp(- X ** 2 / (2. * delta ** 2)).
What is X here? The (n_samples, n_samples) distance matrix?
If so, what is delta. Is it just X.max()-X.min()?
Calling np.exp(- X ** 2 / (2. * (X.max()-X.min()) ** 2)) seems to do the right thing. I.e. big entries become relatively small, and small entries relatively big, with all the entries between 0 and 1. The diagonal is all 1's, which makes sense, since each point is most affine with itself.
But I'm worried. I think if the author had wanted me to use np.exp(- X ** 2 / (2. * (X.max()-X.min()) ** 2)) he would have told me to use just that, instead of throwing delta in there.
So I guess my question is just this. What's delta?

Yes, X in this case is the matrix of distances. delta is a scale parameter that you can tune as you wish. It controls the "tightness", so to speak, of the distance/similarity relation, in the sense that a small delta increases the relative dissimmilarity of faraway points.
Notice that delta is proportional to the inverse of the gamma parameter of the RBF kernel, mentioned earlier in the doc link you give: both are free parameters which can be used to tune the clustering results.

Related

Closest point from List for every point of other List

I have a population of so called "Dots" that search for food. Every Dot has a sight_ value, which indicates the range in which it can see food.
The position of each Dot is saved as a pair<uint16_t,uint16_t>. The positions of all foodsources are in a vector<pair<uint16_t,uint16_t>>.
Now I want to calculate the closest foodsource for every Dot, which this Dot can see. And I don't want to calculate the distance of every combination.
My idea was to create a copy of the food-vector, sort one copy by x and the other by y. Then find the interval [x-sight, x+sight] respectively [y-sight, y+sight] in the vectors and then create the intersection of both.
I've read over set_intersection, but it requires both ranges to be sorted with the same rule.
Any Ideas how I could do this? Could also be that my Idea is just the wrong approach.
Thanks
IceFreez3r
Edit:
I did some runtime approximations:
Sort Food: n log n
Find Interval for one Coordinate and one Dot: 2 log n (lower and upper bound)
If we assume equal distribution of food sources, we can calculate the bound that is estimated to be closer to the middle first and then calculate the second bound in the rest interval. This would reduce the runtime to: log n + log(n/2) (Just realized this s probably not *that* powerful:log(n/2) =~ log(n) - 1)
Build intersection: #x * #y =~ (n * sight/testgroundsize)^2
Compute exact Distance for every Food in Intersection: n * (sight/testgroundsize)^2
Sum: 2 n log n + 2 * #Dots * (log n + log(n/2) + (n * sight/testgroundsize)^2 + n * (sight/testgroundsize)^2)
Sum with just limiting one coordinate: n log n + #Dots * (log n + log(n/2) + n * sight/testgroundsize)
I did some tests and just calculated the above formulas on the run:
int dots = dots_.size();
int sum = 2 * n * log(n) + 2 * dots * (log(n) + log(n/2) + pow(n * (sum_sight / dots) / testground_size_,2) + n * pow((sum_sight / dots) / testground_size_, 2));
int sum2 = n * log(n) + dots * (log(n) + log(n/2) + n * (sum_sight / dots) / testground_size_);
cout << n*dots << endl << sum << endl << sum2 << endl;
It turned out the Intersection idea is just bad. While the idea of just limiting one coordinate is at least better than brute-force.
I didn't think about the grid-idea yet #Daniel Jour
You're stepping into a whole field of interesting approaches to this problem. Terms to Google are binary space partitioning, quadtrees, ... and of course nearest neighbour search.
A relatively simple but effective approach when the dots are far more spread than what their "visible range" is:
Select a value "grid size".
Create a map from grid coordinates to a list/set of entities
For each food source: put them in the map at their grid coordinates
For each dot: put them in the map at their grid coordinates and also in the neighbour grid "cells". The size of the neighbourhood depends on the grid size and the dot's sight value
For each entry in the map which contains at least one dot: Either do this algorithm recursively with a smaller grid size or use the brute force approach: check each dot in that grid cell against each food source in that grid cell.
This is a linear algorithm, compared with the quadratic brute force approach.
Calculation of grid coordinates: grid_x = int(x / grid_size) ... same for other coordinate.
Neighbourhood: steps = ceil(sight_value / grid_size) .. the neighbourhood is a square with side length 2×steps + 1 centred at the dot's grid coordinates
I believe your approach is incorrect. This can be mathematically verified. What you can do instead is calculate the magnitude of the vector joining the dot with the food source by means of Pythagoras theorem, and ensure that this magnitude is less than the observation limit. This deals exclusively with determining relative distance, as defined by the Cartesian co-ordinate system, and the standard unit of measurement. In relation to efficiency concerns, the first order of business is to determine if the approach to be taken is in computational terms in actuality less efficient, as measured by time, even though the logical component responsible for certain calculations are, in virtue of this alternative implementation, less time consuming. Of coarse, the ideal is one in which the time taken is decreased, and not merely numerically contained by means of refactoring.
Now, if it is the case that the position of a dot can be specified as any two numbers one may choose, this of course implies a frame of reference called the basis, and also one local to the dot in question. With respect to both, one can quantify position, and other such characteristics and properties. As a consequence of this observation, it would seem that you need n*2 data structures, where n is the amount of dots in the environment, that
contain the sorted values relative to each dot, and quite frankly it is unclear whether or not this approach would even work or is optimal. You state the design and programmatic constraint that the solution shall not compute the distances from each dot to each food source. But to achieve this, one must implement other such procedures, in order that we derive the correct results. These comments are made in relation to my discussion on efficiency. Therefore, you may be better of simply calculating the distance in each case. This is somewhat elegant.

How to calibrate camera focal length, translation and rotation given four points?

I'm trying to find the focal length, position and orientation of a camera in world space.
Because I need this to be resolution-independent, I normalized my image coordinates to be in the range [-1, 1] for x, and a somewhat smaller range for y (depending on aspect ratio). So (0, 0) is the center of the image. I've already corrected for lens distortion (using k1 and k2 coefficients), so this does not enter the picture, except sometimes throwing x or y slightly out of the [-1, 1] range.
As a given, I have a planar, fixed rectangle in world space of known dimensions (in millimeters). The four corners of the rectangle are guaranteed to be visible, and are manually marked in the image. For example:
std::vector<cv::Point3f> worldPoints = {
cv::Point3f(0, 0, 0),
cv::Point3f(2000, 0, 0),
cv::Point3f(0, 3000, 0),
cv::Point3f(2000, 3000, 0),
};
std::vector<cv::Point2f> imagePoints = {
cv::Point2f(-0.958707, -0.219624),
cv::Point2f(-1.22234, 0.577061),
cv::Point2f(0.0837469, -0.1783),
cv::Point2f(0.205473, 0.428184),
};
Effectively, the equation I think I'm trying to solve is (see the equivalent in the OpenCV documentation):
/ xi \ / fx 0 \ / tx \ / Xi \
s | yi | = | fy 0 | | Rxyz ty | | Yi |
\ 1 / \ 1 / \ tz / | Zi |
\ 1 /
where:
i is 1, 2, 3, 4
xi, yi is the location of point i in the image (between -1 and 1)
fx, fy are the focal lengths of the camera in x and y direction
Rxyz is the 3x3 rotation matrix of the camera (has only 3 degrees of freedom)
tx, ty, tz is the translation of the camera
Xi, Yi, Zi is the location of point i in world space (millimeters)
So I have 8 equations (4 points of 2 coordinates each), and I have 8 unknowns (fx, fy, Rxyz, tx, ty, tz). Therefore, I conclude (barring pathological cases) that a unique solution must exist.
However, I can't seem to figure out how to compute this solution using OpenCV.
I have looked at the imgproc module:
getPerspectiveTransform works, but gives me a 3x3 matrix only (from 2D points to 2D points). If I could somehow extract the needed parameters from this matrix, that would be great.
I have also looked at the calib3d module, which contains a few promising functions that do almost, but not quite, what I need:
initCameraMatrix2D sounds almost perfect, but when I pass it my four points like this:
cv::Mat cameraMatrix = cv::initCameraMatrix2D(
std::vector<std::vector<cv::Point3f>>({worldPoints}),
std::vector<std::vector<cv::Point2f>>({imagePoints}),
cv::Size2f(2, 2), -1);
it returns me a camera matrix that has fx, fy set to -inf, inf.
calibrateCamera seems to use a complicated solver to deal with overdetermined systems and outliers. I tried it anyway, but all I can get from it are assertion failures like this:
OpenCV(3.4.1) Error: Assertion failed (0 <= i && i < (int)vv.size()) in getMat_, file /build/opencv/src/opencv-3.4.1/modules/core/src/matrix_wrap.cpp, line 79
Is there a way to entice OpenCV to do what I need? And if not, how could I do it by hand?
3x3 rotation matrices have 9 elements but, as you said, only 3 degrees of freedom. One subtlety is that exploiting this property makes the equation non-linear in the angles you want to estimate, and non-linear equations are harder to solve than linear ones.
This kind of equations are usually solved by:
considering that the P=K.[R | t] matrix has 12 degrees of freedom and solving the resulting linear equation using the SVD decomposition (see Section 7.1 of 'Multiple View Geometry' by Hartley & Zisserman for more details)
decomposing this intermediate result into an initial approximate solution to your non-linear equation (see for example cv::decomposeProjectionMatrix)
refining the approximate solution using an iterative solver which is able to deal with non-linear equations and with the reduced degrees of freedom of the rotation matrix (e.g. Levenberg-Marquard algorithm). I am not sure if there is a generic implementation of this in OpenCV, however it is not too complicated to implement one yourself using the Ceres Solver library.
However, your case is a bit particular because you do not have enough point matches to solve the linear formulation (i.e. step 1) reliably. This means that, as you stated it, you have no way to initialize an iterative refining algorithm to get an accurate solution to your problem.
Here are a few work-arounds that you can try:
somehow get 2 additional point matches, leading to a total of 6 matches hence 12 constraints on your linear equation, allowing you to solve the problem using the steps 1, 2, 3 above.
somehow guess manually an initial estimate for your 8 parameters (2 focal lengths, 3 angles & 3 translations), and directly refine them using an iterative solver. Be aware that the iterative process might converge to a wrong solution if your initial estimate is too far off.
reduce the number of unknowns in your model. For instance, if you manage to fix two of the three angles (e.g. roll & pitch) the equations might simplify a lot. Also, the two focal lengths are probably related via the aspect ratio, so if you know it and if your pixels are square, then you actually have a single unknown there.
if all else fails, there might be a way to extract approximated values from the rectifying homography estimated by cv::getPerspectiveTransform.
Regarding the last bullet point, the opposite of what you want is clearly possible. Indeed, the rectifying homography can be expressed analytically knowing the parameters you want to estimate. See for instance this post and this post. There is also a full chapter on this in the Hartley & Zisserman book (chapter 13).
In your case, you want to go the other way around, i.e. to extract the intrinsic & extrinsic parameters from the homography. There is a somewhat related function in OpenCV (cv::decomposeHomographyMat), but it assumes the K matrix is known and it outputs 4 candidate solutions.
In the general case, this would be tricky. But maybe in your case you can guess a reasonable estimate for the focal length, hence for K, and use the point correspondences to select the good solution to your problem. You might also implement a custom optimization algorithm, testing many focal length values and keeping the solution leading to the lowest reprojection error.

Using Principal Components Analysis (PCA) on binary data

I am using PCA on binary attributes to reduce the dimensions (attributes) of my problem. The initial dimensions were 592 and after PCA the dimensions are 497. I used PCA before, on numeric attributes in an other problem and it managed to reduce the dimensions in a greater extent (the half of the initial dimensions). I believe that binary attributes decrease the power of PCA, but i do not know why. Could you please explain me why PCA does not work as good as in numeric data.
Thank you.
The principal components of 0/1 data can fall off slowly or rapidly,
and the PCs of continuous data too —
it depends on the data. Can you describe your data ?
The following picture is intended to compare the PCs of continuous image data
vs. the PCs of the same data quantized to 0/1: in this case, inconclusive.
Look at PCA as a way of getting an approximation to a big matrix,
first with one term: approximate A ~ c U VT, c [Ui Vj].
Consider this a bit, with A say 10k x 500: U 10k long, V 500 long.
The top row is c U1 V, the second row is c U2 V ...
all the rows are proportional to V.
Similarly the leftmost column is c U V1 ...
all the columns are proportional to U.
But if all rows are similar (proportional to each other),
they can't get near an A matix with rows or columns 0100010101 ...
With more terms, A ~ c1 U1 V1T + c2 U2 V2T + ...,
we can get nearer to A: the smaller the higher ci, the faster..
(Of course, all 500 terms recreate A exactly, to within roundoff error.)
The top row is "lena", a well-known 512 x 512 matrix,
with 1-term and 10-term SVD approximations.
The bottom row is lena discretized to 0/1, again with 1 term and 10 terms.
I thought that the 0/1 lena would be much worse -- comments, anyone ?
(U VT is also written U ⊗ V, called a "dyad" or "outer product".)
(The wikipedia articles
Singular value decomposition
and Low-rank approximation
are a bit math-heavy.
An AMS column by
David Austin,
We Recommend a Singular Value Decomposition
gives some intuition on SVD / PCA -- highly recommended.)

My neural net learns sin x but not cos x

I have build my own neural net and I have a weird problem with it.
The net is quite a simple feed-forward 1-N-1 net with back propagation learning. Sigmoid is used as activation function.
My training set is generated with random values between [-PI, PI] and their [0,1]-scaled sine values (This is because the "Sigmoid-net" produces only values between [0,1] and unscaled sine -function produces values between [-1,1]).
With that training-set, and the net set to 1-10-1 with learning rate of 0.5, everything works great and the net learns sin-function as it should. BUT.. if I do everything exately the same way for COSINE -function, the net won't learn it. Not with any setup of hidden layer size or learning rate.
Any ideas? Am I missing something?
EDIT: My problem seems to be similar than can be seen with this applet. It won't seem to learn sine-function unless something "easier" is taught for the weights first (like 1400 cycles of quadratic function). All the other settings in the applet can be left as they initially are. So in the case of sine or cosine it seems that the weights need some boosting to atleast partially right direction before a solution can be found. Why is this?
I'm struggling to see how this could work.
You have, as far as I can see, 1 input, N nodes in 1 layer, then 1 output. So there is no difference between any of the nodes in the hidden layer of the net. Suppose you have an input x, and a set of weights wi. Then the output node y will have the value:
y = Σi w_i x
= x . Σi w_i
So this is always linear.
In order for the nodes to be able to learn differently, they must be wired differently and/or have access to different inputs. So you could supply inputs of the value, the square root of the value (giving some effect of scale), etc and wire different hidden layer nodes to different inputs, and I suspect you'll need at least one more hidden layer anyway.
The neural net is not magic. It produces a set of specific weights for a weighted sum. Since you can derive a set weights to approximate a sine or cosine function, that must inform your idea of what inputs the neural net will need in order to have some chance of succeeding.
An explicit example: the Taylor series of the exponential function is:
exp(x) = 1 + x/1! + x^2/2! + x^3/3! + x^4/4! ...
So if you supplied 6 input notes with 1, x1, x2 etc, then a neural net that just received each input to one corresponding node, and multiplied it by its weight then fed all those outputs to the output node would be capable of the 6-term taylor expansion of the exponential:
in hid out
1 ---- h0 -\
x -- h1 --\
x^2 -- h2 ---\
x^3 -- h3 ----- y
x^4 -- h4 ---/
x^5 -- h5 --/
Not much of a neural net, but you get the point.
Further down the wikipedia page on Taylor series, there are expansions for sin and cos, which are given in terms of odd powers of x and even powers of x respectively (think about it, sin is odd, cos is even, and yes it is that straightforward), so if you supply all the powers of x I would guess that the sin and cos versions will look pretty similar with alternating zero weights. (sin: 0, 1, 0, -1/6..., cos: 1, 0, -1/2...)
I think you can always compute sine and then compute cosine externally. I think your concern here is why the neural net is not learning the cosine function when it can learn the sine function. Assuming that this artifact if not because of your code; I would suggest the following:
It definitely looks like an error in the learning algorithm. Could be because of your starting point. Try starting with weights that gives the correct result for the first input and then march forward.
Check if there is heavy bias in your learning - more +ve than -ve
Since cosine can be computed by sine 90 minus angle, you could find the weights and then recompute the weights in 1 step for cosine.

How to get ALL data from 2D Real to Complex FFT in Cuda

I am trying to do a 2D Real To Complex FFT using CUFFT.
I realize that I will do this and get W/2+1 complex values back (W being the "width" of my H*W matrix).
The question is - what if I want to build out a full H*W version of this matrix after the transform - how do I go about copying some values from the H*(w/2+1) result matrix back to a full size matrix to get both parts and the DC value in the right place
Thanks
I'm not familiar with CUDA, so take that into consideration when reading my response. I am familiar with FFTs and signal processing in general, though.
It sounds like you start out with an H (rows) x W (cols) matrix, and that you are doing a 2D FFT that essentially does an FFT on each row, and you end up with an H x W/2+1 matrix. A W-wide FFT returns W values, but the CUDA function only returns W/2+1 because real data is even in the frequency domain, so the negative frequency data is redundant.
So, if you want to reproduce the missing W/2-1 points, simply mirror the positive frequency. For instance, if one of the rows is as follows:
Index Data
0 12 + i
1 5 + 2i
2 6
3 2 - 3i
...
The 0 index is your DC power, the 1 index is the lowest positive frequency bin, and so forth. You would thus make your closest-to-DC negative frequency bin 5+2i, the next closest 6, and so on. Where you put those values in the array is up to you. I would do it the way Matlab does it, with the negative frequency data after the positive frequency data.
I hope that makes sense.
There are two ways this can be acheived. You will have to write your own kernel to acheive either of this.
1) You will need to perform conjugate on the (half) data you get to find the other half.
2) Since you want full results anyway, it would be best if you convert the input data from real to complex (by padding with 0 imaginary) and performing the complex to complex transform.
From practice I have noticed that there is not much of a difference in speed either way.
I actually searched the nVidia forums and found a kernel that someone had written that did just what I was asking. That is what I used. if you search the cuda forum for "redundant results fft" or similar you will find it.