I have a grid of a fixed size. I'm trying to tile it into
multiple rectangles of different sizes.
The size difference is needed because I have to balance
the content of each rectangles in order to parallelize
the process and have balanced threads.
The grid contains individuals, so some cases may be free,
others may contain different types of individuals.
I've been told to look into "Plane tiling algorithms" but
I can't seem to find anything or can't find the right term
for what I'm looking for.
It looks to me like you want to do space subdivision in order to have some kind of search operations up and running in a balanced way.
For that, there are Quadtrees available, balanced Binary Space Partitioning Trees, and K-d trees.
If you need optimized parallel space partitioning, there is a great article about parallelized Octree and a related library coming from Uni Bonn, Germany which is quite recent, and the library is open source.
Related
Setup
Function will need to provide the distance from a point to the closest edge of a polygon
Point is known to be inside of the polygon
Polygon can be convex or concave
Many points (millions) will need to be tested
Many separate polygons (dozens) will need to be ran through the function per point
Precalculated and persistently stored data structures are an option.
The final search function will be in C++
For the function implementation, I know a simple method would be to test the distance to all segments of the polygon using standard distance to line segment formulas. This option would be fairly slow at scale and I am confident there should be a better option.
My gut instinct is that there should be some very fast known algorithms for this type of function that would have been implemented in a game engine, but I'm not sure where to look.
I've found a reference for storing line segments in a quadtree, which would provide for very rapid searching and I think it could be used for my purpose to quickly narrow down which segment to look at as the closest segment and then would only need to calculate the distance to one line segment.
https://people.cs.vt.edu/~shaffer/Papers/SametCVPR85.pdf
I've not been able to locate any code examples for how this would work. I don't mind implementing algorithms from scratch, but don't see the point in doing so if a working, tested code base exists.
I've been looking at a couple quadtree implementations and I think the way it would work is to create a quadtree per polygon and insert each polygon's line segments with a bounding box into the quadtree for that polygon.
The "query" portion of the function I would be making would then consist of creating a point as a very small bounding box, which would then be used to search against the quadtree structure, which would then find only the very closest portions of the polygon.
http://www.codeproject.com/Articles/30535/A-Simple-QuadTree-Implementation-in-C
and
https://github.com/Esri/geometry-api-java/blob/master/src/main/java/com/esri/core/geometry/QuadTree.java
My real question would be, does this seem like a sound approach for a fast search time function?
Is there something that would work faster?
EDIT:
I've been looking around and found some issues with using a quadtree. The way quadtrees work is good for collision detection, but isn't setup to allow for efficient nearest neighbor searching.
https://gamedev.stackexchange.com/questions/14373/in-2d-how-do-i-efficiently-find-the-nearest-object-to-a-point
R-Trees look to be a better option.
https://en.wikipedia.org/wiki/R-tree
and
efficient way to handle 2d line segments
Based on those posts, R-trees look like the winner. Also handy to see that C++ Boost already has them implemented. This looks close enough to what I was planning on doing that I'll go ahead and implement it and verify the results.
EDIT:
Since i have implemented an PMR quadtree, I see now, that the nearest neighbour search is a bit more complex than I described.
If the quad search result for the search point would be empty then it gets more complex.
I remeber a description somewhere in Hannan Sammets:Multidimensional search structure.
Giving the answer below I had in mind searching for all objects withing a specified distance. This is easy for the PMR quadtree, but just finding the closest is more complex.
Edit End
I would not use a R-Tree.
The weak point (and the strong point!) on R-trees is the separation of the space into rectangles.
There are three algorithms known to do that separation but none is well suited for all situations.
R-trees are really complex to implement.
Why then do it? Just because R-Trees can be twice fast than a quad tree when perfectly implemented. The speed difference between a quadtree and a R-Tree is not relevant. The monetary difference is. (If you have working code for both I would use the PMR quadtree, if you have only code for the R-Tree then use that, If you have none use the PMR Quadtree)
Quad trees (PMR) always work, and they are simple to implement.
Using the PMR quad tree, you just find all segments related to the search point. The result will be a few segments, then you just check them and ready you are.
People that tell quad trees are not suited or neighbour search, do not know that there are hundreds of different quad trees. The non suitability is just true for a point quad tree, not for the PMR one, which stores bounding boxes.
I once remebered the compelx description of finding the neighbour points in a POINT-Quadtree. For the PMR-quadtree I had nothing to do (for a search within a specified rectangular interval), no code change, Just iterate the result and find the closest.
I think that there are even better solutions than Quad tree or R-Tree for your spefic questions, but the point is that the PMR always work. Just implement it one time and use if for all spatial searches.
Since there are so many more points to test than polygons, you could consider doing some fairly extensive pre-processing of the polygons in order to accelerate the average number of tests to find the nearest line segment per point.
Consider an approach like this (assumes polygons have no holes):
Walk the edges of the polygon and define line segments along each equidistant line
Test which side of the line segment a point is to restrict the potential set of closest line segments
Build an arithmetic coding tree with each test weighted by the amount of space that is culled by the half-space of the line segment. this should give good average performance in determining the closest segment for a point and open up the possibility of parallel testing over multiple points at once.
This diagram should illustrate the concept. The blue lines define the polygon and the red lines are the equidistant lines.
Notice that needing to support concave polygons greatly increase the complexity, as illustrated by the 6-7-8 region. Concave regions mean that the line segments that extend to infinity may be defined by vertices that are arbitrarily far apart.
You could decompose this problem by fitting a convex hull to the polygon and then doing a fast, convex test for most points and only doing additional work on points that are within the "region of influence" of the concave region, but I am not sure if there is a fast way to calculate that test.
I am not sure how great the quadtree algorithm you posed would be, so I will let someone else comment on that, but I had a thought on something that might be fast and robust.
My thought is you could represent a polygon by a KD-Tree (assuming the vertices are static in time) and then find the nearest two vertices, doing a nearest 2 neighbor search, to whatever the point is that lies in this polygon. These two vertices should be the ones that create the nearest line segment, regardless of convexity, if my thinking is correct.
I have a large graph (the number of vertices can be in the range of 50,000-100,000, the adjacency matrix need not be sparse). Edges in the graph can be removed/added, and I want to update the resulting connected components structure after such changes. I have implemented this in a straightforward fashion with a BFS search myself in C++ (keeping track of unordered_maps of vertices to connected component ids and updating them), but I am wondering if there is a more efficient way to do this using Boost's graph library.
I was able to find some questions similar to this here in Stackoverflow, and came to know of filtered_graph (and the connected_components function) but I am worried about the overhead involved in creating such filtered instances, every time we add or remove an edge. (Or should this be a concern at all?!)
I believe your solution is essentially the best possible. If you are only allowed to add edges, then I believe the algorithm can be improved by keeping track of connected components in terms of vertices included, and then when an edge is included you check to see if the two vertices belong to different connected components, in which case you merge the two connected components. This will reduce the complexity from quadratic to best-case per edge added. However, if you are allowed to insert and delete edges, I don't see any asymptotically faster way to solve the problem other that what you described.
There are algorithms for maintaining connectivity under edge insertions and deletions that are faster than recalculating. This is called "dynamic graph connectivity". Here is a paper on experimental evaluations (some newer theoretical results have been found since, but it is unclear whether they have practical relevance).
Given a huge collection of points (float64) in 2d space...
Is there a way to determine the nearest neighbour using a feature of OpenGL or DirectX?
I've implemented a kd-tree, which is still not fast enough.
A kd-tree should work just fine. But here's some hints.
I implemented a kd-tree once for a million point data set once. Here's what I learned out of it:
Did you try profiling your code? You might find that there are easy optimizations to make such as common helper functions needing to be forced inline.
Did you actually test your code to validate that it was culling out tree branches for partitions that are easily identified as "too far away". If you aren't careful, you can easily have a bug that does needless distance computations on points too far away.
Easiest thing: Where comparing linear distance between points, you don't need to take the SQRT of (x2-x1)*(y2-y1).
Most of the time spent in my code was just building the tree from the original data set, including multiple full sorts on each iteration deciding which axis was the best to partition on. An easier algorithm would be to just alternate between partitioning on the x and y axis for each tree branch and to cache the sorting order for each axis. It may not build the most optimal search tree, but the overall savings can be enormous.
I'm terrible with math, but I have a situation where I need to find all points in a 3D space that are arbitrarily close to a vector being projected through that same space. The points can be stored in any fashion the algorithm calls for, not that I can think of any particularly beneficial ordering for them.
Are there any existing C++ algorithms for this feat? And if so (or not), what kind of mathematical concept does or would it entail, since I'd love to attempt to understand it and tie my brain into a pretzel.
( This algorithm would be operating on a space with perhaps 100,000 points in it, it would need to test around 1,000,000 vectors, and need to complete those vectors within 1/30th of a second. I of course doubt if any algorithm can perform this feat at all, but it'll be fun to see if that's true or not. )
You would probably want to store your points in some spatial data structure. The ones that come to mind are:
oct-trees
BSP trees
kd-trees
They have slightly different properties. An oct-tree divides the entire world up into 8 equally sized cubes, organized to themselves form a larger cube. Each of these cubes are then in turn split into 8, evenly sized, cubes. You keep splitting the cubes until you have less than some number of points in a cube. With this tree structure, you can quite easily traverse the tree, extracting all points that may intersect a given cube. Once you have that list of points, you can test them one at a time. Since your test geometry is a sphere (distance from a point) you would circumscribe a cube around the sphere and get the points that may intersect it. As an optimization, you may also inscribe a cube in your circle, and anything that for sure intersects that, you can simply include in your hit-set right away.
The BSP tree is a Binary space partitioning tree. It's a tree of planes in 3-space, forming a binary tree. The main problem of using this for your problem is that you might have to do a lot of square roots while traversing it, to find the distance to the planes. The principle is the same though, once you have fewer than some number of points you form a leaf with those points in it. All leaves in a BSP tree are convex polygons (except for the leaves that are along the perimeter, which will be infinitely large polygons). When building the BSP, you want to split the points in half for each step, to truly get O(log n) searches.
The kd-tree is a special case of BSP, where all planes are axis aligned. This typically speeds up tests against them quite significantly, but doesn't allow you to optimize the planes based on your set of points quite as well.
I don't know of any c++ libraries that implement these, but I'm sure there are plenty of them. These are fairly common techniques used in video games, so you might want to look at game engines.
It might help your understanding of octrees when you can think of it as a curve that fills the space traversing every coordinate only once and w/o crossing itself. The curve maps the 3d complexity to a 1d complexity. There are some of this monster curve, like the z curve, the hilbert curve, and the moore curve. The latter is a copy of 4 hilbert curves and has very good space fills quality. But isn't a search for the closest points not solved with dijkstra algorithm?
Anybody out there using BGL for large production servers?
How many node does your network consist of?
How do you handle community detection
Does BGL have any cool ways to detect communities?
Sometimes two communities might be linked together by one or two edges, but these edges are not reliable and can fade away. Sometimes there are no edges at all.
Could someone speak briefly on how to solve this problem.
Please open my mind and inspire me.
So far I have managed to work out if two nodes are on an island (in a community)
in a lest expensive manner, but now I need to work out which two nodes on separate islands are closest to each other. We can only make minimal use of unreliable geographical data.
If we figuratively compare it to a mainland and an island and take it out of social distance context. I want to work out which two bits of land are the closest together across a body of water.
I've used the BGL for graphs with millions of nodes, but the size of the graph you can use depends on what algorithm you are trying to run. You can quickly compute distances between nodes. There are 4 shortest path algorithms which are most applicable depending on your data: (single pairs of points, for all pairs of points, sparse and dense graphs,...).
As for community detection, there aren't any algorithms built-into the BGL specifically for that (but maybe you can contribute one when you are finished with your project). There are a few algorithms that might be helpful in building a community detection algorithm. The max-flow/min-cut algorithms are typically used in community detection (if there is a lot of flow possible between two nodes, then they are likely to be in the same community, if there isn't much flow, then the min-cut is likely to represent roads between communities). There are also heuristics to order the nodes of the graph to reduce bandwidth. Nodes making up "communities" are likely to be close to each other in such an ordering.
As far as I know BGL doesn't have any algorithms specifically for community detection.
By "island" do you mean a disconnected subgraph?
Also, graphs do not have any notion of 'distance'.
This 'social distance' is something that you are going to have to define. Once you've done that a large part of the work is done.
There are numerous methods listed on the page you linked to, most of those only require you to define something like a 'distance' metric, and then plug your definitions into the algorithm.
# David Nehme
Graphs without edge-weights are only about connectedness, they have no notion of distance. If you want to talk about a network then you can talk about distance. But a graph with no edge-weights does not have any distance, unless you want to assume an implied edge-weight of 1 for all edges. But this is really just turning the graph into a network.
Also, he is talking about the distance between two disconnected graphs. To model this, you have to introduce an external concept for distance between nodes, separate from the edge-distance.