Choosing an appropriate data structure for point-intersection detecting - c++

I have some kind of shapes (rectangles, triangles, circles, etc.) that are given by points in space (coordinates). For example, for a rectangle - [0,0] and [5,5] - opposite corners.
There will be a lot of such shapes. My task is to find all shapes that contain a given point in themselves or on their edges.
Coordinates are always integers.
The total size of the coordinate plane can reach more than 2,000,000 x 2,000,000 points.
I'm trying to choose an appropriate data structure for this task. I was thinking about the Point Quadtree, and adding each point of the given shape to the tree. However, after thinking a little, I realized that if the maximum size of the shape is given, then I will end up with more than 2,000,000 x 2,000,000 nodes, which will be equivalent to a two-dimensional array with a size of 2,000,000 x 2,000,000. And that's exactly what I'm trying to avoid.
Which data structure would suit me best?

Related

The search for a set of points with a minimum sum of lengths to rectangles. What is the algorithm?

Good day.
I have the task of finding the set of points in 2D space for which the sum of the distances to the rectangles is minimal. For example, for two rectangles, the result will be the next area (picture). Any point in this area has the minimum sum of lengths to A and B rectangles.
Which algorithm is suitable for finding a region, all points of which have the minimum sum of lengths? The number of rectangles can be different, they are randomly located. They can even overlap each other. The sides of the rectangles are parallel to the coordinate axes and cannot be rotated. The region must be either a rectangle or a line or a point.
Hint:
The distance map of a rectangle (function that maps any point (x,y) to the closest distance to the rectangle) is made of four slanted planes (slope 45°), four quarter of cones and the rectangle itself, which is at ground level, forming a continuous surface.
To obtain the global distance map, it "suffices" to sum the distance maps of the individual rectangles. A pretty complex surface will result. Depending on the geometries, the minimum might be achieved on a single vertex, a whole edge or a whole face.
The construction of the global map seems more difficult than that of a line arrangement, due to the conic patches. A very difficult problem in the general case, though the axis-aligned constraint might ease it.
Add on Yves's answer.
As Yves described, each rectangle 'divide' plane into 9 parts and adds different distance method in to the sum. Middle part (rectangle) add distance 0, side parts add coordinate distance to that side, corner parts add point distance to that corner. With that approach plan has to be divided into 9^n parts, and distance sum is calculated by adding appropriate rectangle distance functions. That is feasible if number of rectangles is not too large.
Probably it is not needed to calculate all parts since it is easy to calculate some bound on part min value and check is it needed to calculate part at all.
I am not sure, but it seems to me that global distance map is convex function. If that is the case than it can be solved iteratively by similar idea as in linear programming.

Clustering Points Algorithm

I've applied three different methods of getting sets of points as follows.
Every method produces a vector of Points. Each method is in a different color, red, blue, and green.
Here is the combined image, overlaying all 3 of the sets of points
As you can see in the combined image there are spots in which all three sets "agree" on (i.e are generally in the exact same spot). I would like to find these particular spots and combine them into a single coordinate. I'm not sure where to start with approaching this problem. I've looked into K-means clustering, but to me it seems the problem is that K-means will cluster all the points and take the average with surrounding points, shifting the cluster center from the original position. I could loop through all the points in all the vectors that store the points, but as these images get larger with more points, it becomes very costly and inefficient.
Does anybody have any tips on how to approach this problem? I've been using OpenCV with C++.
Notionally, what you want to do is consider the complete tripartite graph on the three sets of points with edges weighted by distance. Then select edges in order of weight until a triangle appears; call those points a corresponding set, choose (say) their centroid to represent them, and remove them from the graph. Stop when the edge length exceeds some tolerance.
The mathematical justification for this approach is that it is independent of point ordering (except in the unlikely case of problematic ties in distances between points).
The practical implementation of this algorithm (for a significant number of points) involves a search data structure that can quickly find nearby points (not just the nearest): bins of the threshold size, a quad trie, or a k-d tree would work. Probably you would create one for each point set and use the other sets’ points as query points.

Scalable loop over neighbours

It is well-known, that if one wants to layout square grid(aka matrix) of real numbers he can use array with row-major order. Let's draw neighbourhood of some element i:
...................................
...|i-width-1|i-width|i-width+1|...
...| i-1 | i | i+1 |...
...|i+width-1|i+width|i+width+1|...
...................................
Let us for simplicity assume that i somewhere in the middle of square grid, so no bordering issues.(We can add %(width*height) and get connected on borders grid). So if one wants to do something with each element in neighbourhood of i'th element one should do:
//function which does something with element at idx
void DoSomethinWithElement(size_t idx);
//left neighbour
DoSomethinWithElement(i-1);
//right neighbour
DoSomethinWithElement(i+1);
//top neighbour
DoSomethinWithElement(i-width);
//bottom neighbour
DoSomethinWithElement(i+width);
I want to generalize this algorithm for any type of regular polygon grid (i.e. triangle, square, pentagonal, hexagonal etc.) Regular means that it constructed only from one type of polygon (i.e only from triangles).
How to generalize for any type of polygon grid:
1. Layout of other grid in array?
2. These N(for square mesh four) statements in a loop?
The problem "How to find all neighbors of a tile in a polygonal grid?" is solved quickly using graphs. But I want to use arrays so I could copy them to graphics card with CUDA.
Examples of grids:
I want to generalize this algorithm for any type of regular polygon grid (i.e. triangle, square, pentagonal, hexagonal etc.)
Your definition of regular polygon grid is unusual. Typically you may not rotate or mirror the faces of the grid (tiling) for it to be considered regular. There are only 3 regular tilings (triangle, square, hexagon). All pentagonal tilings require mirroring or rotation or both.
Let's simplify your problem to this: "How to find all neighbors of a face in a polygonal grid?" Once you've got that figured out, it's trivial to call a function on each neighbor.
Grids are graphs with certain limitations. It is possible to generalize the search for neighbors by representing the grid with a general graph. The vertices of the graph represent the faces and they have edges to their neighbors. Graph of a grid is a planar graph. When the grid is represented by a graph, the problem becomes: "Given a vertex in a graph, how do I find all adjacent vertices?"
Note that the vertices and edges of the graph are not the same thing as the vertices and edges of the grid. For example, the grid-vertices of a hexagonal grid have three connected edges, while the faces have six neighboring faces and therefore the graph-vertices have six edges each.
One way to represent graphs is the adjacency list. In this representation, you simply need to look up the adjacency list of the vertex to find all it's neighbors.
But I want to use arrays
Well since the size of each adjacency list is constant, they can be implemented with arrays like in decltype_auto's answer. Or you can represent the graph with an adjacency matrix.
But if you want a generic algorithm for any tabular representation of a grid, then I think that you've painted yourself into a corner with that requirement. Each representation is different and you'll need a different algorithm for each.
You can represent the adjacencies of K-polygonal grid of polygon count N by a 1D vector v of N*K length, or a 2D NxK matrix M. Use std::size_t -1, a nullptr or whatever suites the reference type you've stored in v or M to indicate missing neighbors at the border.
Given such a NxK matrix M:
For the neighbors of polygon n you iterate from M[n][0] to M[n][K-1], fetch the neighbors by whatever reference you've stored in M, and apply whatever function you want on them.

Searching for geometric shape on cartesian plane by coordinates

I have an algorithmic problem on a Cartesian plane.. I need to efficiently search for geometric shapes that intersect with a given point. There are several shapes(rectangle, circle, triangle and polygon) but those are not important, because the determining the actual point inclusion is not a problem here, I will implement those on my own. The problem lies in determining which shapes need to be verified for the inclusion with the given point. Iterating through all of my shapes on plane and running the point inclusion method on each one of them is inefficient as the number of instances of shapes will be quite large. My first idea was to divide the plane for segments(the plane is finite, but too large for any kind of 3D array) and when adding a shape to the database, i would determine which segments it would intersect with and save them within object of the shape. Then when the point for inclusion verification is given, I would only need to determine the segment in which the point is located and then verify the inclusion only with objects which intersect with that segment.
Is that the way to go? I don't know if the method I described is optimal or if i am not missing something. Any help would be appreciated..
Thanks in advance
P.S.: I will be writing this in C++. That is not really relevant as it is more of an algorithmic problem but I wanted to put that out if someone was curious...
The gridding approach can be used here.
See the plane as a raster image where you draw all your shapes using a scan conversion algorithm, making sure that all pixels even partially covered are filled. For every image pixel, keep a list of the shapes that filled it.
A query is then straightforward: find the pixel where the query point falls in time O(1) and check every shape in the list, in time O(K), where K is the list length, approximately equal to the number of intersecting shapes.
If your image is made of N² pixels and you have M objects having an average area A pixels, you will need to store N²+M.A list elements (a shape identifier + a link to the next). You will choose the pixel size to achieve a good compromise between accuracy and storage cost. In any case, you must limit yourself to N²<Q.M, where Q is the total number of queries, otherwise the cost of just initializing the image could exceed the total query time.
In case your scene is very sparse (more voids than shapes), you can use a compressed representation of the image, using a quadtree.

find overlapping rectangles algorithm

let's say I have a huge set of non-overlapping rectangle with integer coordinates, who are fixed once and for all
I have another rectangle A with integer coordinates whose coordinates are moving (but you can assume that its size is constant)
What is the most efficient way to find which rectangles are intersecting (or inside) A?
I cannot simply loop through my set as it is too big. Thanks
edit : the rectangles are all parallel to the axis
I'll bet you could use some kind of derivation of a quadtree to do this. Take a look at this example.
Personally, I would solve this with a KD-Tree or a BIH-Tree. They are both adaptive spatial data structures that have a log(n) search time. I have an implementation of both for my Ray Tracer, and they scream.
-- UPDATE --
Store all of your fixed rectangles in the KD-Tree. When you are testing intersections, iterate through the KD-Tree as follows:
function FindRects(KDNode node, Rect searchRect, List<Rect> intersectionRects)
// searchRect is the rectangle you want to test intersections with
// node is the current node. This is a recursive function, so the first call
// is the root node
// intersectionRects contains the list of rectangles intersected
int axis = node.Axis;
// Only child nodes actually have rects in them
if (node is child)
{
// Test for intersections with each rectangle the node owns
for each (Rect nRect in node.Rects)
{
if (nRect.Intersects(searchRect))
intersectionRects.Add(nRect);
}
}
else
{
// If the searchRect's boundary extends into the left bi-section of the node
// we need to search the left sub-tree for intersections
if (searchRect[axis].Min // Min would be the Rect.Left if axis == 0,
// Rect.Top if axis == 1
< node.Plane) // The absolute coordinate of the split plane
{
FindRects(node.LeftChild, searchRect, intersectionRects);
}
// If the searchRect's boundary extends into the right bi-section of the node
// we need to search the right sub-tree for intersections
if (searchRect[axis].Max // Max would be the Rect.Right if axis == 0
// Rect.Bottom if axis == 1
> node.Plane) // The absolute coordinate of the split plane
{
FindRects(node.RightChild, searchRect, intersectionRects);
}
}
This function should work once converted from pseudo-code, but the algorithm is correct. This is a log(n) search algorithm, and possibly the slowest implementation of it (convert from recursive to stack based).
-- UPDATE -- Added a simple KD-Tree building algorithm
The simplest form of a KD tree that contains area/volume shapes is the following:
Rect bounds = ...; // Calculate the bounding area of all shapes you want to
// store in the tree
int plane = 0; // Start by splitting on the x axis
BuildTree(_root, plane, bounds, insertRects);
function BuildTree(KDNode node, int plane, Rect nodeBds, List<Rect> insertRects)
if (insertRects.size() < THRESHOLD /* Stop splitting when there are less than some
number of rects. Experiment with this, but 3
is usually a decent number */)
{
AddRectsToNode(node, insertRects);
node.IsLeaf = true;
return;
}
float splitPos = nodeBds[plane].Min + (nodeBds[plane].Max - nodeBds[plane].Min) / 2;
// Once you have a split plane calculated, you want to split the insertRects list
// into a list of rectangles that have area left of the split plane, and a list of
// rects that have area to the right of the split plane.
// If a rect overlaps the split plane, add it to both lists
List<Rect> leftRects, rightRects;
FillLists(insertRects, splitPos, plane, leftRects, rightRects);
Rect leftBds, rightBds; // Split the nodeBds rect into 2 rects along the split plane
KDNode leftChild, rightChild; // Initialize these
// Build out the left sub-tree
BuildTree(leftChild, (plane + 1) % NUM_DIMS, // 2 for a 2d tree
leftBds, leftRects);
// Build out the right sub-tree
BuildTree(rightChild, (plane + 1) % NUM_DIMS,
rightBds, rightRects);
node.LeftChild = leftChild;
node.RightChild = rightChild;
There a bunch of obvious optimizations here, but build time is usually not as important as search time. That being said, a well build tree is what makes searching fast. Look up SAH-KD-Tree if you want to learn how to build a fast kd-tree.
You can create two vectors of rectangle indexes (because two diagonal points uniquely define your rectangle), and sort them by one of coordinates. Then you search for overlaps using those two index arrays, which is going to be logarithmic instead of linear complexity.
You can do a random "walking" algorithm ... basically create a list of neighbors for all your fixed position rectangles. Then randomly pick one of the fixed-position rectangles, and check to see where the target rectangle is in comparison to the current fixed-position rectangle. If it's not inside the rectangle you randomly picked as the starting point, then it will be in one of the eight directions which correspond to a given neighbor of your current fixed position rectangle (i.e., for any given rectangle there will be a rectangle in the N, NE, E, SE, S, SW, W, NW directions). Pick the neighboring rectangle in the closest given direction to your target rectangle, and re-test. This is essentially a randomized incremental construction algorithm, and it's performance tends to be very good for geometric problems (typically logarithmic for an individual iteration, and O(n log n) for repeated iterations).
Create a matrix containing "quadrant" elements, where each quadrant represents an N*M space within your system, with N and M being the width and height of the widest and tallest rectangles, respectively. Each rectangle will be placed in a quadrant element based on its upper left corner (thus, every rectangle will be in exactly one quadrant). Given a rectangle A, check for collisions between rectangles in the A's own quadrant and the 8 adjacent quadrants.
This is an algorithm I recall seeing recommended as a simple optimization to brute force hit-tests in collision detection for game design. It works best when you're mostly dealing with small objects, though if you have a couple large objects you can avoid wrecking its efficiency by performing collision detection on them separately and not placing them in a quadrant, thus reducing quadrant size.
As they are not overlapping I would suggest an approach similar (but not equal) to Jason Moore (B).
Sort your array by x of upper left corner.
And sort a copy by y of upper left corner. (of course you would just sort pointers to them to save memory).
Now you once create two sets Sliding_Window_X and Sliding_Window_Y.
You search with binary search once your x-coordinate (upper left) for your A window in the x-sorted array and your y-coordinate. You put your results into the corrospondng Sliding_Window_Set. Now you add all following rectangles in the ordered array that have a lower x(y) (this time lower right) coordinate than your lower right of A.
The result is that you have in your Sliding_Window-sets the windows that overlap with your A in one coordinate. The overlapping of A is the intersection of Sliding_Window_X and _Y.
The Sliding_Window sets can be easily represented by just 2 numbers (begin and end index of the corrosponding sorted array).
As you say you move A, it is now really easy to recalculate the overlap. Depending on the direction you can now add/remove Elements to the Sliding_Window set. I.e. you take just the next element from the sorted array at the front/end of the set and maybe remove on at the end.
Topcoder provides a way to determine if a point lies within a rectangle. It says that say we have a point x1,y1 and a rectangle. We should choose a random point very far away from current locality of reference in the rectangular co-ordinate system say x2,y2.
Now we should make a line segment with the points x1,y1 and x2,y2. If this line segment intersects odd number of sides of the given rectangle (it'll be 1 in our case, this method can be extended to general polygons as well) then the point x1,y1 lies inside the rectangle and if it intersects even number of sides it lies outside the rectangle.
Given two rectangles, we need to repeat this process for every vertex of 1 triangle to possibly lie in the second triangle. This way we'd be able to determine if two rectangles overlap even if they are not aligned to the x or y axis.
Interval Trees: Are BSTs designed with taking 'lo' value as key in an interval. So, for example if we want to insert (23, 46) in the tree, we'd insert it using '23' in the BST.
Also, with interval trees at each node, we keep the maximum endpoint (hi value) of the sub-tree rooted at that node.
This order of insertion allows us to search all 'R' intersections in R(logN) time. [We search for first intersection in logN time and all R in RlogN time] Please refer to interval trees documentation for how insert, search is done and details of complexity.
Now for this problem, we use an algorithm known as sweep-line algorithm. Imagine we have a vertical line (parallel to y-axis) which is sweeping the 2D space and in this process intersects with the rectangles.
1) Arrange rectangles in increasing order of x-cordinates (left-edge wise) either via priority queue or via sorting . Complexity NlogN if N rectangles.
2) As this line sweeps from left to right, following are the intersection cases:
If line intersects the left side of a rectangle never seen, add the y co-ords of the rectangle's side to the interval tree. [say (x1,y1) and (x1,y2) are left edge co-ordinates of the rectangle add interval (y1, y2) to the interval tree] ---> (NlogN)
Do a range search on the interval tree. [say (x1,y1) and (x1,y2) are left edge co-ordinates of the rectangle, take the interval (y1,y2) and do an interval intersection query on the tree to find all intersections] ---> RlogN (in practice)
If line intersects the right side of a rectangle, remove it's y-coords from the interval tree as the rectangle is now processed completely. ----> NlogN
Total complexity : NlogN + RlogN
Let your set of rectangle be (Xi1,Yi1,Xi2,Yi2) where i varies from 0 to N.
Rectangle A and B can NOT be intersecting if Ax1 > Bx2 || Ay1 < By2 || Bx1 > Ax2 || By1 < Ay2.
Create tree which is optimized for range/interval (For exa: segment tree or interval tree)
See http://w3.jouy.inra.fr/unites/miaj/public/vigneron/cs4235/l5cs4235.pdf
Use this tree to find set of triangle while your triangle is changing coordinates.
By calculating the area of each rectangle and and checking the length L, height H and area of rectangles whether exceeds or not the length and height and area of a rectangle A
Method (A)
You could use an interval tree or segment tree. If the trees were created so that they would be balanced this would give you a run time of O(log n). I assume this type of preprocessing is practical because it would only happen once (it seems like you are more concerned with the runtime once the rectangle starts moving than you are with the amount of initial preprocessing for the first time). The amount of space would be O(n) or O(n log n) depending on your choice above.
Method (B)
Given that your large set of rectangles are of fixed size and never change their coordinates and that they are non-overlapping, you could try a somewhat different style of algorithm/heuristic than proposed by others here (assuming you can live with a one-time, upfront preprocessing fee).
Preprocessing Algorithm [O(n log n) or O(n^2) runtime {only run once though}, O(n) space]
Sort the rectangles by their horizontal coordinates using your favorite sorting algorithm (I am assuming O(n log n) run time).
Sort the rectangles by their vertical coordinates using your favorite sorting algorithm (I am assuming O(n log n) run time)
Compute a probability distribution function and a cumulative distribution function of the horizontal coordinates. (Runtime of O(1) to O(n^2) depending on method used and what kind of distribution your data has)
a) If your rectangles' horizontal coordinates follow some naturally occurring process then you can probably estimate their distribution function by using a known distribution (ex: normal, exponential, uniform, etc.).
b) If your rectangles' horizontal coordinates do not follow a known distribution, then you can calculate a custom/estimated distribution by creating a histogram.
Compute a probability distribution function and a cumulative distribution function of the vertical coordinates.
a) If your rectangles' vertical coordinates follow some naturally occurring process then you can probably estimate their distribution function by using a known distribution (ex: normal, exponential, uniform, etc.).
b) If your rectangles' vertical coordinates do not follow a known distribution, then you can calculate a custom/estimated distribution by creating a histogram.
Real-time Intersection Finding Algorithm [Anywhere from O(1) to O(log n) to O(n) {note: if O(n), then the constant in front of n would be very small} run time depending on how well the distribution functions fit the dataset]
Taking the horizontal coordinate of your moving rectangle and plug it into the cumulative density function for the horizontal coordinates of the many rectangles. This will output a probability (value between 0 and 1). Multiply this value times n (where n is the number of many rectangles you have). This value will be the array index to check in the sorted rectangle list. If the rectangle of this array index happens to be intersecting then you are done and can proceed to the next step. Otherwise, you have to scan the surrounding neighbors to determine if the neighbors intersect with the moving rectangle. You can attack this portion of the problem multiple ways:
a) do a linear scan until finding the intersecting rectangle or finding a rectangle on the other side of the moving rectangle
b) calculate a confidence interval using the probability density functions you calculated to give you a best guess at potential boundaries (i.e. an interval where an intersection must lie). Then do a binary search on this small interval. If the binary search fails then revert back to a linear search in part (a).
Do the same thing as step 1, but do it for the vertical portions rather than the horizontal parts.
If step 1 yielded an intersection and step 2 yielded an intersection and the intersecting rectangle in step 1 was the same rectangle as in step 2, then the rectangle must intersect with the moving rectangle. Otherwise there is no intersection.
Use an R+ tree, which is most likely precisely the specific tree structure you are looking for. R+ trees explicitly do not allow overlapping in the internal (non-leaf) structure in exchange for speed. As long as no object exists in multiple leaves at once, there is no overlap. In your implementation, rather than support overlap, whenever an object needs to be added to multiple leaves, just return true instead.
Here is a detailed description of the data structure, including how to manage rectangles:
The R+-tree: A dynamic index for multi-dimensional objects