How to cluster a group of near points to single point?

How to cluster a group of near points to single point? - c++

I'm using ORB in openCV3 C++ to detect some features in image and get back the real coordinates. But I'm having some points that are very very near to each other which I don't need I just need one of them.
X=[0.493953,0.490301,0.540664,0.575473,0.423641,0.49213,0.366055,0.395635,0.488464,0.486621,0.49213,0.358992,0.397844,0.575473,0.397844,0.425734,0.576992,0.580014,0.425734,-0.810798];
Y=[0.141909,0.154724,-0.03982,0.260174,-0.0699365,0.140797,0.121944,0.31197,0.13856,0.153795,0.137043,0.0239328,0.310085,0.256748,0.312835,-0.0683147,0.255281,0.253498,-0.0629622,-0.932006];
I need to group the near points from the x and their corresponding in Y in a new array so that it will be:
X_new=[-0.810798, 0.358992, 0.395635, 0.423641, 0.486621, 0.540664, 0.576992]
y_new=[-0.932006,0.0239328, 0.31197, -0.0699365, 0.153795, -0.03982, 0.255281]
I tried first to sort the data from x and run nested loops and if condition based on the distance between the x coordinates, but I didn't get the output as needed.

As I said in the comments, a nicer idea will be to use k means clustering. Although, you may not get the exact same points as you mentioned in your question, yet those will be a good approximation of what you desire to achieve. Hope it helps.

Related

Set points outside plot to lower and upper limit

Maybe this question exists already, but I could not find it.
I am making plots in Python. I don't want to set my axes range such that all points are included - there are some really high or really low values, and all I care about in those points is that they exist - that is, they need to be in the plot, but not on their actual value - rather, somewhere on the top of the canvas.
So i found something that helps a bit in achieving what i want to do in this question Link
So basically this thing works:
xmax=0.18
plt.(np.minimum(x,xmax),y)
But when I tried something like this then it didn't work.
xmin=0.8
xmax=0.18
plt.(np.minimum(x, xmin,xmax),y)
How can i solve this?

To force the points above a threshold to a maximum level you may use np.minimum(x,xmax).
To force the points below a threshold to a minimum level you may use np.maximum(x,xmin).
To do both you may combine the two commands
xlimited = np.minimum(np.maximum(x,xmin),xmax)
Note that to have the points restricted in the vertical direction you would do this to the y values of course.

Integrate RANSAC to compute essential matrix

I have calculated the essential matrix using the 5 point algorithm. I'm not sure how to integrate it with ransac so it gives me a better outcome.
Here is the source code. https://github.com/lunzhang/openar/blob/master/src/utils/5point/computeEssential.js
Currently, I was thinking about computing the essential matrix for 5 random points then convert the essential matrix to fundamental and see the error threshold using this equation x'Fx = 0. But then I'm not sure, what to do after.
How do I know which points to set as outliners? If the errors too big, do I set them as outliners right away? Could it be possible that one point could produce different essential matrices depending on what the other 4 points are?

Well, here is a short explanation, in pseudo-code, of how you can integrate this with ransac. Basically, all Ransac does is compute your model (here the Essential) using a subset of the data, and then sees if the rest of data "is happy" with that result. It keeps the result for which a highest portion of the dataset "is happy".
highest_number_of_happy_points=-1;
best_estimated_essential_matrix=Identity;
for iter=1 to max_iter_number:
n_pts=get_n_random_pts(P);//get a subset of n points from the set of points P. You can use 5, but you can also use more.
E=compute_essential(n_pts);
number_of_happy_points=0;
for pt in P:
//we want to know if pt is happy with the computed E
err=cost_function(pt,E);//for example x^TFx as you propose, or X^TEX with the essential.
if(err<some_threshold):
number_of_happy_points+=1;
if(number_of_happy_points>highest_number_of_happy_points):
highest_number_of_happy_points=number_of_happy_points;
best_estimated_essential_matrix=E;
This should do the trick. Usually, you set some_threshold experimentally to a low value. There are of course more sophisticated Ransacs, you can easily find them by googling.
Your idea of using x^TFx is fine in my opinion.
Once this Ransac completes, you will have best_estimated_essential_matrix. The outliers are those that have a x^TFx value that is greater than your optional threshold.
To answer your final question, yes, a point could produce a different matrix given 4 different points, because their spatial configuration is different (you can have degenerate situations). In an ideal settings this wouldn't be the case, but we always have noise, matching errors and so on, so what happens in the end is that the equations you obtain with 5 points wont produce the exact same results as for 5 other points.
Hope this helps.

Cubic Spline : Start/End Segment interpolation

I'm doing Spline interpolation in C++. I've used code from here: http://tehc0dez.blogspot.ch/2010/04/nice-curves-catmullrom-spline-in-c.html (the code is also linked on that page, it's up on github). The app is working just fine for closed contours, since it's copying the first three points to the end.
But in my case I need to be able to make an "open" shape - or rather line-, where the first and last points are not connected.
It is my understanding that since the Catmull-Rom spline is cubic, I won't be able to calculate the interpolated points for the first and last segment without adding any additional points.
I read that a common method to interpolate the points in those two segments is to use Quadratic Interpolation.
Unfortunately I can't wrap my head around how to do this. I've found out how to do quadratic Bezier approximation, but this is not what I want to do since I don't want to introduce any additional support points.
I found this site: http://dafeda.wordpress.com/2010/09/01/newtons-divided-difference-polynomial-quadratic-interpolation/ Which explains quite nicely how to do Quadratic interpolation. But I don't know how to adapt this for my case, where I want to calculate a new Point rather than just y.
Any help would be appreciated. Thanks !

The usual way to do this is to add a second copy of your two end points... So if you have a spline passing through A-B-C-D then you will calculate the spline A-A-B-C-D-D.

Managed to implement a decent solution thanks to the formula found here: http://www.doc.ic.ac.uk/~dfg/AndysSplineTutorial/Parametrics.html
They also provide a nice Java applet to check out the different parameters.
For my problem I set the t1 value to 0.5 and check if t is above/below this threshold, since I only want to draw one segment of the curve! Works out nicely.

How to check collisions on x-y axis

I'm writing a mobile robotics application in C/C++ in Ubuntu and, at the moment, I'm using a laser sensor to scan the environment and detect collisions with objects when the robot moves.
This laser has a scan area of 270° and a maximum radius of 4000mm.
It is able to detect an object within this range and to report their distance from the sensor.
Each distance is in planar coordinates, so to get more readeable data, I convert them from planar to cartesian coordinates and then I print them in a text file and then I plot them in MatLab to see what the laser had detected.
This picture shows a typical detection on cartesian coordinates.
Values are in meters, so 0.75 are 75 centimeters and 2 are two meters. Contiguous blue points are all the detected objects, while the points near (0,0) refer to the laser position and must be discarded. Blue points under y < 0 are produced since laser scan area is 270°; I added the red line square (1.5 x 2 meters) to determine the region within I want to implement the collisions check.
So, I would like to detect in realtime if there are points (objects) inside that area and, if yes, call some functions. This is a little bit tricky, because, this check should be able to detect also if there are contiguous points to determine if the object is real or not (i.e. if it detects a point, then it should search the nearest point to determine if they compose an object or if it's only a point which may be a detection error).
This is the function I use to perform a single scan:
struct point pt[limit*URG_POINTS];
//..
for(i = 0; i < limit; i++){
for(j = 0; j < URG_POINTS; j++){
ang2 = kDeg2Rad*((j*240/(double)URG_POINTS)-120);
offset = 0.03; //it depends on sensor module [m]
dis = (double) dist[cnt] / 1000.0;
//THRESHOLD of RANGE
// if(dis > MAX_RANGE) dis = 0; //MAX RANGE = 4[m]
// if(dis < MIN_RANGE) dis = 0;
pt[cnt].x = dis * cos(ang2) * cos(ang1) + (offset*sin(ang1)); // <-- X POINTS
pt[cnt].y = dis * sin(ang2); // <-- Y POINTS
// pt[cnt].z = dis * cos(ang2) * sin(ang1) - (offset*cos(ang1)); <- I disabled 3D mapping at the moment
cnt++;
}
ang1 += diff;
}
After each single scan, pt contains all the detected points in x-y coordinates.
I'd like to do something like this:
perform a single scan, then at the end,
apply collisions check on each pt.x and pt.y
if you find a point in the inner region, then check for other near points, if yes, stop the robot;
if not or if no other near points are found, start another scan
I'd like to know how to easy check objects (composed by more than one single point) inner the previous defined region.
Can you help me, please?
It seems very difficult for me :(

I don't think I can give a complete answer, but a few thoughts on where it might be possible to go.
What do you mean with realtime? How long may it take for any given algorithm to run? And what processor does your program run at?
Filtering the points that are within your area of detection should be quite easy just by checking if abs(x) < 0.75 and y< 2 && y > 0. Furthermore, you should only consider points that are far enough away from 0, so x^2 + y^2 > d.
But that should be the trivial part.
More interesting it will get to detect groups of points. DBSCAN has proven to be a fairly good clustering algorithm for detecting 2-dimensional groups of points. The critical question here is if DBSCAN is fast enough for real-time applications.
If not, you might have to think about optimizing the algorithm (You can press it's complexity to n*log(n) using some clever indexing structures).
Furthermore, it might be worth thinking about how you can incorporate the knowledge you have from your last iteration (assuming a high frequency, the data points should not change to much).
It might be worth looking at other robotics projects - I could imagine the problem of interpreting sensor data to construct information of the surroundings is a rather common one.
UPDATE
It is fairly difficult to give you good advice without knowing where you stumble on applying DBSCAN on your problem. But let me try to give you a step-by-step-guide how an algorithm may work:
For each datapoint you receive you check whether it is in the region you want to have observed. (The conditions I have given above should work).
If the datapoint is within the region you save it to some sort of list
After reading all data points you check if the list is empty. If so, everything is good. Otherwise we have to check if there are bigger groups of data points that you have to navigate around.
Now comes the more difficult part. You throw DBSCAN on that points and try to find groups of points. Which parameters will work for the algorithm I do not know - that has to be tried. After that you should have some clusters of points. I'm not totally sure what you will do with the groups - an idea would be to detect the points of each group that have the minimum and maximum degree in polar coordinates. That way you could decide how far you have to turn your vehicle. Special care would have to be taken if two groups are so close that it will not be possible to navigate through the gap between.
For the implementation of DBSCAN you could here or just ask google for help. It is a fairly common algorithm that has been coded thousands of times. For further optimizations concerning speed it might be helpful to create an own implementation. However, if one of the implementations you find seems to be usable, I would try that first before going all the way and implementing it on my own.
If you stumble on specific problems while implementing the algorithm I would suggest creating a new question, as it is far away from this one and you might get more people that are willing to help you.
I hope things are a bit clearer now. If not please give the exact point that you have doubts about.

Finding the spread of each cluster from Kmeans

I'm trying to detect how well an input vector fits a given cluster centre. I can find the best match quite easily (the centre with the minimum euclidean distance to the input vector is the best), however, I now need to work how good a match that is.
To do this I need to find the spread (standard deviation?) of the vectors which build up the centroid, then see if the distance from my input vector to the centre is less than the spread. If it's more than the spread than I should be able to say that I have no clusters to fit it (given that the best doesn't fit the input vector well).
I'm not sure how to find the spread per cluster. I have all the centre vectors, and all the training vectors are labelled with their closest cluster, I just can't quite fathom exactly what I need to do to get the spread.
I hope that's clear? If not I'll try to reword it!
TIA
Ian

Use the distance function and calculate the distance from your center point to each labeled point, then figure out the mean of those distances. That should give you the standard deviation.

If you switch to using a different algorithm, such as Mixture of Gaussians, you get the spread (e.g., std. deviation) as part of the model (clustering result).
http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/mixture.html
http://en.wikipedia.org/wiki/Mixture_model

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js