Select all points in a matrix within 30m of another point

Select all points in a matrix within 30m of another point - c++

So if you look at my other posts, it's no surprise I'm building a robot that can collect data in a forest, and stick it on a map. We have algorithms that can detect tree centers and trunk diameters and can stick them on a cartesian XY plane.
We're planning to use certain 'key' trees as natural landmarks for localizing the robot, using triangulation and trilateration among other methods, but programming this and keeping data straight and efficient is getting difficult using just Matlab.
Is there a technique for sub-setting an array or matrix of points? Say I have 1000 trees stored over 1km (1000m), is there a way to say, select only points within 30m radius of my current location and work only with those?
I would just use a GIS, but I'm doing this in Matlab and I'm unaware of any GIS plugins for Matlab.
I forgot to mention, this code is going online, meaning it's going on a robot for real-time execution. I don't know if, as the map grows to several miles, using a different data structure will help or if calculating every distance to a random point is what a spatial database is going to do anyway.
I'm thinking of mirroring the array of trees, into two arrays, one sorted by X and the other by Y. Then bubble sorting to determine the 30m range in that. I do the same for both arrays, X and Y, and then have a third cross link table that will select the individual values. But I don't know, what that's called, how to program that and I'm sure someone already has so I don't want to reinvent the wheel.
Cartesian Plane
GIS

You are looking for a spatial database like a quadtree or a kd-tree. I found two kd-tree implementations here and here, but didn't find any quadtree implementations for Matlab.

The simple solution of calculating all the distances and scanning through seems to run almost instantaneously:
lim = 1;
num_trees = 1000;
trees = randn(num_trees,2); %# list of trees as Nx2 matrix
cur = randn(1,2); %# current point as 1x2 vector
dists = hypot(trees(:,1) - cur(1), trees(:,2) - cur(2)); %# distance from all trees to current point
nearby = tree_ary((dists <= lim),:); %# find the nearby trees, pull them from the original matrix
On a 1.2 GHz machine, I can process 1 million trees (1 MTree?) in < 0.4 seconds.
Are you running the Matlab code directly on the robot? Are you using the Real-Time Workshop or something? If you need to translate this to C, you can replace hypot with sqr(trees[i].x - pos.x) + sqr(trees[i].y - pos.y), and replace the limit check with < lim^2. If you really only need to deal with 1 KTree, I don't know that it's worth your while to implement a more complicated data structure.

You can transform you cartesian coordinates into polar coordinates with CART2POL. Then selecting points inside certain radius will be strait-forward.
[THETA,RHO] = cart2pol(X-X0,Y-Y0);
selected = RHO < 30;
where X0, Y0 are coordinates of the current location.

My guess is that trees are distributed roughly evenly through the forest. If that is the case, simply use 30x30 (or 15x15) grid blocks as hash keys into an closed hash table. Look up the keys for all blocks intersecting the search circle, and check all hash entries starting at that key until one is flagged as the last in its "bucket."
0---------10---------20---------30--------40---------50----- address # line
(0,0) (0,30) (0,60) (30,0) (30,30) (30,60) hash key values
(1,3) (10,15) (3,46) (24,9.) (23,65.) (15,55.) tree coordinates + "." flag
For example, to get the trees in (0,0)…(30,30), map (0,0) to the address 0 and read entries (1,3), (10,15), reject (3,46) because it's out of bounds, read (24,9), and stop because it's flagged as the last tree in that sector.
To get trees in (0,60)…(30,90), map (0,60) to address 20. Skip (24, 9), read (23, 65), and stop as it's last.
This will be quite memory efficient as it avoids storing pointers, which would otherwise be of considerable size relative to the actual data. Nevertheless, closed hashing requires leaving some empty space.
The illustration isn't "to scale" as in reality there would be space for several entries between the hash key markers. So you shouldn't have to skip any entries unless there are more trees than average in a local preceding sector.
This does use hash collisions to your advantage, so it's not as random as a hash function typically is. (Not every entry corresponds to a distinct hash value.) However, as dense sections of forest will often be adjacent, you should randomize the mapping of sectors to "buckets," so a given dense sector will hopefully overflow into a less dense one, or the next, or the next.
Additionally, there is the issue of empty sectors and terminating iteration. You could insert a dummy tree into each sector to mark it as empty, or some other simple hack.
Sorry for the long explanation. This kind of thing is simpler to implement than to document. But the performance and the footprint can be excellent.

Use some sort of spatially partitioned data structure. A simple solution would be to simply create a 2d array of lists containing all objects within a 30m x 30m region. Worst case is then that you only need to compare against the objects in four of those lists.
Plenty of more complex (and potentially beneficial) solutions could also be used - something like bi-trees are a bit more complex to implement (not by much though), but could get more optimum performance (especially in cases where the density of objects varies considerably).

You could look at the voronoi diagram support in matlab:
http://www.mathworks.com/access/helpdesk/help/techdoc/ref/voronoi.html
If you base the voronoi polygons on your key trees, and cluster the neighbouring trees into those polygons, that would partition your search space by proximity (finding the enclosing polygon for a given non-key point is fast), but ultimately you're going to get down to computing key to non-key distances by pythagoras or trig and comparing them.
For a few thousand points (trees) brute force might be fast enough if you have a reasonable processor on board. Compute the distance of every other tree from tree n, then select those within 30'. This is the same as having all trees in the same voronoi polygon.
Its been a few years since I worked in GIS but I found the following useful: 'Computational Geometry In C' Joseph O Rourke, ISBN 0-521-44592-2 Paperback.

Related

Searching for geometric shape on cartesian plane by coordinates

I have an algorithmic problem on a Cartesian plane.. I need to efficiently search for geometric shapes that intersect with a given point. There are several shapes(rectangle, circle, triangle and polygon) but those are not important, because the determining the actual point inclusion is not a problem here, I will implement those on my own. The problem lies in determining which shapes need to be verified for the inclusion with the given point. Iterating through all of my shapes on plane and running the point inclusion method on each one of them is inefficient as the number of instances of shapes will be quite large. My first idea was to divide the plane for segments(the plane is finite, but too large for any kind of 3D array) and when adding a shape to the database, i would determine which segments it would intersect with and save them within object of the shape. Then when the point for inclusion verification is given, I would only need to determine the segment in which the point is located and then verify the inclusion only with objects which intersect with that segment.
Is that the way to go? I don't know if the method I described is optimal or if i am not missing something. Any help would be appreciated..
Thanks in advance
P.S.: I will be writing this in C++. That is not really relevant as it is more of an algorithmic problem but I wanted to put that out if someone was curious...

The gridding approach can be used here.
See the plane as a raster image where you draw all your shapes using a scan conversion algorithm, making sure that all pixels even partially covered are filled. For every image pixel, keep a list of the shapes that filled it.
A query is then straightforward: find the pixel where the query point falls in time O(1) and check every shape in the list, in time O(K), where K is the list length, approximately equal to the number of intersecting shapes.
If your image is made of N² pixels and you have M objects having an average area A pixels, you will need to store N²+M.A list elements (a shape identifier + a link to the next). You will choose the pixel size to achieve a good compromise between accuracy and storage cost. In any case, you must limit yourself to N²<Q.M, where Q is the total number of queries, otherwise the cost of just initializing the image could exceed the total query time.
In case your scene is very sparse (more voids than shapes), you can use a compressed representation of the image, using a quadtree.

C++: Find neighbouring grid points from calibration picture from unsorted list

I do have 4 lists of the x and y coordinates of calibration points. Those are in no particular order and not alligned on any axis (they come from a real calibration picture with slight rotation and distortion) but the lists have the same indexing and cannot be sorted in such a way that each list is ascending/descending. They also hold no integer values but floating point. I am now trying to find the four neighbouring points for a given point.
E.g. searching for the neighbours of the point [150,150] would return [140,140], [140,160], [160,140], [160,160] (except for them actually beeing more like [139.581239,138.28812]).
At the moment I have to look through all calibration points for each point to check. There are about 500 calibration points.
Later during the process, I need to know the 4 neighbours for a random point within the 1600x1400 grid for multiple million times. So it is crucial to find those points as fast as possible to avoid calculation time of days or even weeks.
My first approach was checking each of the ~500 calibration points for each point to check and look at their relative position to the checking point (x_calib > x and y_calib > y would be somewhere in the top, right region of the point) and calculate their distance to it. The closest point in each region (top left, top right, lower left, lower right) would then be the respective neighbour point. That seems not the be efficient at all and takes a lot of time.
The second approach was creating a rainbow table for each of the 1600x1400 points and save the respective neighbours (to be exact, to save the index in the list of coordinates). Later on, the process would check this rainbow table at position [x,y,0], [x,y,1], [x,y,2] and [x,y,3] to get the 4 indices of the 4 neighbour points. Though calculating the rainbow table takes some time (~20 minutes for those ~2 million points), this approach speeds up the later processing. Unfortunatelly, this approach makes it difficult to debug the later steps of the process because it takes this much time before the rest even starts..
I still think there should be room for optimization and I would appreciate any suggestion or help to speed up the whole thing. I allready read about the kd-tree thing but did not quite see the possibility to use it here. I'm hoping that there's an approach for this kind of unsorted (and unsortable) list of points which is more efficient than the rainbow table - or which is at least faster at creating the table.
Thanks in advance!

'Stable' multi-dimensional scaling algorithm

I have a wireless mesh network of nodes, each of which is capable of reporting its 'distance' to its neighbors, measured in (simplified) signal strength to them. The nodes are geographically in 3d space but because of radio interference, the distance between nodes need not be trigonometrically (trigonomically?) consistent. I.e., given nodes A, B and C, the distance between A and B might be 10, between A and C also 10, yet between B and C 100.
What I want to do is visualize the logical network layout in terms of connectness of nodes, i.e. include the logical distance between nodes in the visual.
So far my research has shown the multidimensional scaling (MDS) is designed for exactly this sort of thing. Given that my data can be directly expressed as a 2d distance matrix, it's even a simpler form of the more general MDS.
Now, there seem to be many MDS algorithms, see e.g. http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html and http://tapkee.lisitsyn.me/ . I need to do this in C++ and I'm hoping I can use a ready-made component, i.e. not have to re-implement an algo from a paper. So, I thought this: https://sites.google.com/site/simpmatrix/ would be the ticket. And it works, but:
The layout is not stable, i.e. every time the algorithm is re-run, the position of the nodes changes (see differences between image 1 and 2 below - this is from having been run twice, without any further changes). This is due to the initialization matrix (which contains the initial location of each node, which the algorithm then iteratively corrects) that is passed to this algorithm - I pass an empty one and then the implementation derives a random one. In general, the layout does approach the layout I expected from the given input data. Furthermore, between different runs, the direction of nodes (clockwise or counterclockwise) can change. See image 3 below.
The 'solution' I thought was obvious, was to pass a stable default initialization matrix. But when I put all nodes initially in the same place, they're not moved at all; when I put them on one axis (node 0 at 0,0 ; node 1 at 1,0 ; node 2 at 2,0 etc.), they are moved along that axis only. (see image 4 below). The relative distances between them are OK, though.
So it seems like this algorithm only changes distance between nodes, but doesn't change their location.
Thanks for reading this far - my questions are (I'd be happy to get just one or a few of them answered as each of them might give me a clue as to what direction to continue in):
Where can I find more information on the properties of each of the many MDS algorithms?
Is there an algorithm that derives the complete location of each node in a network, without having to pass an initial position for each node?
Is there a solid way to estimate the location of each point so that the algorithm can then correctly scale the distance between them? I have no geographic location of each of these nodes, that is the whole point of this exercise.
Are there any algorithms to keep the 'angle' at which the network is derived constant between runs?
If all else fails, my next option is going to be to use the algorithm I mentioned above, increase the number of iterations to keep the variability between runs at around a few pixels (I'd have to experiment with how many iterations that would take), then 'rotate' each node around node 0 to, for example, align nodes 0 and 1 on a horizontal line from left to right; that way, I would 'correct' the location of the points after their relative distances have been determined by the MDS algorithm. I would have to correct for the order of connected nodes (clockwise or counterclockwise) around each node as well. This might become hairy quite quickly.
Obviously I'd prefer a stable algorithmic solution - increasing iterations to smooth out the randomness is not very reliable.
Thanks.
EDIT: I was referred to cs.stackexchange.com and some comments have been made there; for algorithmic suggestions, please see https://cs.stackexchange.com/questions/18439/stable-multi-dimensional-scaling-algorithm .
Image 1 - with random initialization matrix:
Image 2 - after running with same input data, rotated when compared to 1:
Image 3 - same as previous 2, but nodes 1-3 are in another direction:
Image 4 - with the initial layout of the nodes on one line, their position on the y axis isn't changed:

Most scaling algorithms effectively set "springs" between nodes, where the resting length of the spring is the desired length of the edge. They then attempt to minimize the energy of the system of springs. When you initialize all the nodes on top of each other though, the amount of energy released when any one node is moved is the same in every direction. So the gradient of energy with respect to each node's position is zero, so the algorithm leaves the node where it is. Similarly if you start them all in a straight line, the gradient is always along that line, so the nodes are only ever moved along it.
(That's a flawed explanation in many respects, but it works for an intuition)
Try initializing the nodes to lie on the unit circle, on a grid or in any other fashion such that they aren't all co-linear. Assuming the library algorithm's update scheme is deterministic, that should give you reproducible visualizations and avoid degeneracy conditions.
If the library is non-deterministic, either find another library which is deterministic, or open up the source code and replace the randomness generator with a PRNG initialized with a fixed seed. I'd recommend the former option though, as other, more advanced libraries should allow you to set edges you want to "ignore" too.

I have read the codes of the "SimpleMatrix" MDS library and found that it use a random permutation matrix to decide the order of points. After fix the permutation order (just use srand(12345) instead of srand(time(0))), the result of the same data is unchanged.

Obviously there's no exact solution in general to this problem; with just 4 nodes ABCD and distances AB=BC=AC=AD=BD=1 CD=10 you cannot clearly draw a suitable 2D diagram (and not even a 3D one).
What those algorithms do is just placing springs between the nodes and then simulate a repulsion/attraction (depending on if the spring is shorter or longer than prescribed distance) probably also adding spatial friction to avoid resonance and explosion.
To keep a "stable" diagram just build a solution and then only update the distances, re-using the current position from previous solution as starting point. Picking two fixed nodes and aligning them seems a good idea to prevent a slow drift but I'd say that spring forces never end up creating a rotational momentum and thus I'd expect that just scaling and centering the solution should be enough anyway.

Ray-mesh intersection or AABB tree implementation in C++ with little overhead?

Can you recommend me...
either a proven lightweight C / C++ implementation of an AABB tree?
or, alternatively, another efficient data-structure, plus a lightweight C / C++ implementation, to solve the problem of intersecting a large number of rays with a large number of triangles?
"Large number" means several 100k for both rays and triangles.
I am aware that AABB trees are part of the CGAL library and probably of game physics libraries like Bullet. However, I don't want the overhead of an enormous additional library in my project. Ideally, I'd like to use a small float-type templated header-only implementation. I would also go for something with a bunch of CPP files, as long as it integrated easily in my project. Dependency on boost is ok.
Yes, I have googled, but without success.
I should mention that my application context is mesh processing, and not rendering. In a nutshell, I'm transferring the topology of a reference mesh to the geometry of a mesh from a 3D scan. I'm shooting rays from vertices and along the normals of the reference mesh towards the 3D scan, and I need to recover the intersection of these rays with the scan.
Edit
Several answers / comments pointed to nearest-neighbor data structures. I have created a small illustration regarding the problems that arise when ray-mesh intersections are approached with nearest neighbor methods. Nearest neighbors methods can be used as heuristics that work in many cases, but I'm not convinced that they actually solve the problem systematically, like AABB trees do.

While this code is a bit old and using the 3DS Max SDK, it gives a fairly good tree system for object-object collision deformations in C++. Can't tell at a glance if it is Quad-tree, AABB-tree, or even OBB-tree (comments are a bit skimpy too).
http://www.max3dstuff.com/max4/objectDeform/help.html
It will require translation from Max to your own system but it may be worth the effort.

Try the ANN library:
http://www.cs.umd.edu/~mount/ANN/
It's "Approximate Nearest Neighbors". I know, you're looking for something slightly different, but here's how you can use this to speed up your data processing:
Feed points into ANN.
Query a user-selectable (think of this as a "per-mesh knob") radius around each vertex that you want to ray-cast from and find out the mesh vertices that are within range.
Select only the triangles that are within that range, and ray trace along the normal to find the one you want.
By judiciously choosing the search radius, you will definitely get a sizable speed-up without compromising on accuracy.

If there's no real time requirements, I'd first try brute force.
1M * 1M ray->triangle tests shouldn't take much more than a few minutes to run (in CPU).
If that's a problem, the second best thing to do would be to restrict the search area by calculating a adjacency graph/relation between the triangles/polygons in the target mesh. After an initial guess fails, one can try the adjacent triangles. This of course relies on lack of self occlusion / multiple hit points. (which I think is one interpretation of "visibility doesn't apply to this problem").
Also depending on how pathological the topologies are, one could try environment mapping the target mesh on a unit cube (each pixel would consists of a list of triangles projected on it) and test the initial candidate by a single ray->aabb test + lookup.
Given the feedback, there's one more simple option to consider -- space partitioning to simple 3D grid, where each dimension can be subdivided by the histogram of the x/y/z locations or even regularly.
100x100x100 grid is of very manageable size of 1e6 entries
the maximum number of cubes to visit is proportional to the diameter (max 300)
There are ~60000 extreme cells, which suggests an order of 10 triangles per cell
caveats: triangles must be placed on every cell they occupy
-- a conservative algorithm places them to cells they don't belong to; large triangles will probably require clipping and reassembly.

Minimizing Sum of Distances: Optimization Problem

The actual question goes like this:
McDonald's is planning to open a number of joints (say n) along a straight highway. These joints require warehouses to store their food. A warehouse can store food for any number of joints, but has to be located at one of the joints only. McD has a limited number of warehouses (say k) available, and wants to place them in such a way that the average distance of joints from their nearest warehouse is minimized.
Given an array (n elements) of coordinates of the joints and an integer 'k', return an array of 'k' elements giving the coordinates of the optimal positioning of warehouses.
Sorry, I don't have any examples available since I'm writing this down from memory. Anyway, one sample could be:
array={1,3,4,5,7,7,8,10,11} (n=9)
k=1
Ans: {7}
This is what I've been thinking: For k=1, we can simply find out the median of the set, which would give the optimal location of the warehouse. However, for k>1, the given set should be divided into 'k' subsets (disjoint, and of contiguous elements of the superset), and median for each subset would give the warehouse locations. However, I don't understand on what basis the 'k' subsets should be formed. Thanks in advance.
EDIT: There's a variation to this problem also: Instead of sum/avg, minimize the maximum distance between a joint and its closest warehouse. I don't get this either..

The straight highway makes this an exercise in dynamic programming, working from left to right along the highway. A partial solution can be described by the location of the rightmost warehouse and the number of warehouses placed. The cost of the partial solution will be the total distance to the nearest warehouse (for fixed k minimising this is the same as minimising the averge) or the maximum distance so far to the closest warehouse.
At each stage you have worked out the answers for the leftmost N joints and have them indexed by number of warehouses used and position of the rightmost warehouse - you need to save only the best cost. Now consider the next joint and work out the best solution for N+1 joints and all possible values of k and rightmost warehouse, using the answers you have stored for N joints to speed this up. Once you have worked out the best cost solution covering all the joints you know where its rightmost warehouse is, which gives you the location of one warehouse. Go back to the solution that has that warehouse as the rightmost joint and find out what solution that was based on. That gives you one more rightmost warehouse - and so you can work your way back to the location of all the warehouses for the best solution.
I tend to get the cost of working this out wrong, but with N joints and k warehouses to place you have N steps to take, each of the based on considering no more than Nk previous solutions, so I reckon cost is kN^2.

This is NOT a clustering problem, it's a special case of a facility location problem. You can solve it using a general integer / linear programming package, but because the problem is on a line, there may be more efficient (and less expensive software-wise) algorithms that would work. You might consider dynamic programming since there are probably combination of facilities that could be eliminated rather quickly. Look into the P-Median problem for more info.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js