I have a homework assignment and have no idea where to start or what to do. Basically, I need to organize roughly 1 million points using "Spatial Hashing" and use the hash table to find two points that are closest to each other, and return the distance.
Assignment Specifics are:
To find the closest pair of points quickly, you will divide the unit square containing the points into a b×b grid of square “cells”, each representing a 2D square of size 1/b×1/b. Each point should be“hashed” to the cell containing it. For example, if you are storing the x coordinate of a point in a“double” variable named x, then(int)(x * b)will scale the coordinate up by b and round down to the nearest integer; the result (in the range 0. . . b−1) can be used as an one of the indices into your 2D array of cells. The other index is calculated the same way, only using they coordinate. After hashing, each point needs only to be compared to the other points within its cell, and the 8 cells immediately surrounding its cell – this should result in many fewer comparisons than if we simply compared every point to every other point. You will need to select an appropriate value of1
bas part of this lab exercise. You may want to consider what are the dangers in setting b too low or too high. Internally, the grid of cells should be stored as a 2-dimensional array of pointers to linked lists, with each linked list containing the set of points belonging to a single cell. The array of cells must be dynamically allocated (with the new command) and subsequently de-allocated at the end of your program (with the delete) command. This means that, since each cell of the table is itself a pointer to a node of a linked list, the top-level variable representing the 2D array of pointers will be of type “Node ***” (with three *’s!), assuming “Node” is the name of the structure representing a node in your linked lists.
Your program should consist of the following major steps:
Allocate and initialize 2D array of pointers to linked lists.
Read input file, inserting each point into the appropriate linked list based on the cell to which it maps.
For each point, compare it to all the points within its cell and the 8 adjacent cells; remember the smallest distance obtained during this process.
De-allocate 2D array and linked lists.
Print out minimum distance.
Part of this lab also involves figuring out a good choice for the value of b. Please include in a comment in your code a brief description of why you think your choice of b is a good one.
Related
TL:DR
I want to keep track of the spatial structure of some objects via a 2d vector of pointers pointing onto the objects. The objects themselves are in a 1d vector. I was told this is not the way to do things and I am searching for a better approach.
I want to write a simulation on spatial cell growth in C++. The cells should be placed onto some kind of 2d grid because I need a spatial structure between them to implement the growing algorithm. The simulation starts from a line of cells in a "corridor" of set width, but not yet known length. For implementing the growing algorithm which repeatedly places new cells adjacent to some old cell onto the grid, I need a spatial structure between the cells (hence the usage of a 2d grid). This growth process will go on for an extremely long time until some condition is met. Sometimes it will end quickly but sometimes the needed "length" of my corridor will exceed the memory capacity of my computer. Luckily I only need to remember certain cells in the back of my corridor. Some cells whose ancestry line looses connection to the growing front of the cell colony can be "forgotten". Also, I only need this 2d spatial structure for some small grid "wandering along" the front of my cells colony, since further along in the back everything will be filled already and therefore the spatial structure there is not important anymore.
I therefore want to have some data structure which keeps track of my "important" cells, which doesn't have to be spatially structured, and then some additional small grid structure which wanders along with the front of my colony and can point to the cells currently at the front of my growing interface.
My initial idea was to have a 2d grid of pointers which point to a large vector. The vector would keep track of all the important cells, i.e. in every new cell object would be added to the vector and once it becomes unimportant I would delete it from the vector. At the same time the grid of pointers which points at the elements of the vector would keep track of my spatial structure at the front and the pointers would be updated as soon as my cells grow out of my pointer grid, i.e. then I would move my grid along with the front.
I was told in a different question (Strange output from dereferencing pointers into a vector) I had, that it is not good to have pointers to a vector. I don't see how else I could do it though. Maybe someone has an idea on how to handle this. Thank you!
As of what I understand from your question, and I am sure I do not understand it fully, I think what you want is to simulate some kind of cells growing in a 2-D grid.
Assuming your cell is some kind of an object, Store your cells directly into a 2-D array and have a list that stores the indices of the important cells.
As an implementation, let: -
class Cell{
... //data members
bool enabled; //Just for representing empty/available spaces in 2-D grid. If it is set to true, then there is a cell in the specific unit of the grid.
Cell(){
enabled = false;
}
... //other cell data
}
be your class that is used to make individual cell objects; then: -
std::vector<std::vector<Cell>> grid(m, vector<Cell>(n));
would represent your 2-D grid of size m*n.
Now, due to the default constructor, all the cells inside grid would have enabled = false. This is good because it can work as empty spaces for your cells to grow.
Next, have a list that stores your 'important' cell's indices: -
struct Index{
int x, y;
Index(int x_, int y_){
x = x_;
y = y_;
}
}
std::list<Index> importantCells;
Note: I am using list here because I assume you don't need random access.
Initially, put some 'important' cells into your grid, for example at (0, 0) of your grid: -
grid[0][0] = Cell(...) //Assuming you have some parameterized constructor for that...
and now, store the index 0,0 in your importantCells list: -
importantCells.emplace_back(0, 0); //or use push_back() if you wish
Now, traverse importantCells list, read the positions of 'important' cells and update them in your 2-D grid. This will lead to generation of new 'important' cells and possibly removing the current 'important' cell(because it might not have more room to grow). For the newly generated cells, add them to the list using emplace_front, and if the current cell becomes unimportant, remove it from the list...
You can also use vector if you want random access, but lists seems better for your scenario.
I'm currently working on a problem where I want to maintain the convex hull of a set of linear functions. It might look something like this:
I'm using a set<Line> to maintain the lines so that I can dynamically insert lines, which works fine. The lines are ordered by increasing slope, which is defined by the operator< of the lines. By throwing out "superseded" lines, the data structure guarantees that every line will have some segment that is a part of the convex hull.
Now the problem is that I want to search in this data structure for the crossing point whose X coordinate precedes a given x. Since those crossing points are only implicitely defined by adjacency in the set (in the image above, those are the points N, Q etc.), it seems to be entirely impossible to solve with the set alone, since I don't have
The option to find an element by anything but the primary compare function
The option to "binary search" in the underlying search tree myself, that is, compute the pre-order predecessor or successor of an iterator
The option to access elements by index efficiently
I am thus inclined to use a second set<pair<set<Line>::iterator, set<Line>::iterator> > >, but this seems incredibly hacky. Seeing as we mainly need this for programming contests, I want to minimize code size, so I want to avoid a second set or a custom BBST data structure.
Is there a good way to model this scenario which still let's me maintain the lines dynamically and binary search by the value of a function on adjacent elements, with a reasonable amount of code?
My program contains polygons which have the form of a vector containing points (2 dimensional double coordinates, stored in a self-made structure). I'm looking for a quick way of finding the smallest square containing my polygon (ie. knowing the maximal and minimal coordinates of all the points).
Is there a quicker way than just parsing all the points and storing the minimum and maximum values?
The algorithm ou are describing is straightforward: Iterate over all your points and find the minimum and maximum for each coordinate. This is an O(n) algorithm, n being the number of points you have.
You can't do better, since you will need to check at least all your points once, otherwise the last one could be outside the square you found.
Now, the complexity is at best O(n) so you just have to minimize the constant factors, but in that case it's already pretty small : Only one loop over your vector, looking for two maximums and two minimums.
You can either iterate through all points and find max and min values, or do some preprocessing, for example, store your points in treap (http://en.wikipedia.org/wiki/Treap).
There is no way w/o some preprocessing to do it better than just iterating over all points.
I'm not sure if there can be any faster way to find the min & max values in an array of values than linear time. The only 'optimization' I can think of is to find these values on one of the other occasions you're iterating the array (filling it/performing a function on all points), then perform checks on any data update.
Suppose I have a sequence of characters (ABCDEF....), in an array or a string or any suitable data structure, and these characters are distributed over the sites if a 3D lattice, such that position 1 corresponds to coordinates (1,1,1) and so on. When I perform any operation on this lattice, i.e., periodic translation in x-direction which means all elements are shifted cyclically in the direction of x, this should alter the sequence of characters in my data structure accordingly. My question: which data structures/functions/libraries can do these permutations efficiently in c++? Speed is important because this has to be done many times.
In 1D, you could think about it as a circular doubly linked list. The advantage would be that you could use an STL list container and make your life easier.
The exercise of extending this to 3D is left to the reader.
I have with me a lot of x,y points and each x,y point has some extra data associated with it. This extra data I'll be storing in a struct.
My application requires that given any one point, I'll have to find how many other points lie within a rectangular area surrounding this point (this point is at the centre of the rectangle).
One logic I've thought of is to store all x points as the keys in a map A and all y points as the keys in another map B.
Map A will have x as the key and y values as the value.
Map B will have y as the key and the associated struct as the value.
This way, if the given point is (10.5,20.6), I can use upper_bound(10.5+RECTANGLE_WIDTH) and lower_bound(10.5-RECTANGLE_WIDTH) to find the range of x values lying within the rectangle and for the corresponding y values, find whether the y values lie within the +- range of 20.6.
My whole point of using map was because I have a massive store of x,y points and the searching has to be done every two seconds. So I had to use the log(n) search of map.
I feel that this can be done in a more efficient way. Suggestions?
This is a typical application for a quadtree. The quadtree facilitates lookup of the m points lying in your rectangle in O(log(n) + m), where n is the total number of points.
Edit: Your approach using the map is not nearly as efficient. For randomly distributed points, it would have an O(sqrt(n)) average complexity, and O(n) worst-case.
How about you store the points as a simple 2 dimensional array of pointers to those structs, and when you need to find a point x,y it's a simple index operation. The same goes for any other points (x+a,y+b).
If you use a std::map of points the lookup will always be O(log N) where N is the number of points you have.
Your other option would be to divide your search space into buckets and put your point into buckets. You then calculate in your rectangle:
any buckets for which all the points are inside your rectangle
any for which there is some overlap.
For those that there is some overlap you can then look up in your collection which is O(M) if you use the right collection type per bucket, but M should be smaller than N. It may even be that M rarely exceeds a handful in which case you can probably check them linearly.
Working out which buckets overlap is a constant time operation but you have to run through these linearly (even to check if they are empty) so having too many of them may also be an issue.
The first observation would be that std::map wouldn't be the most efficient structure in any case. Your input is pretty much fixed, apparently (from the comments). In that case, std::binary_search on a sorted std::vector is more efficient. The main benefit of std::map over a sorted std::vector is that insertion is O(log N) instead of O(N), and you don't need that.
The next observation would be that you can probabaly afford to be a bit inaccurate in the first phase. Your output set will probably be a lot smaller than the total number of points (else a linear search would be in order). But assuming that this is the case, you might benefit from rounding up your rectangle. This will result in more candidate points, which you then check against the precise boundary.
For instance, if your points lay randomly distributed in the XY plane between (0,0) and (200,300), it would be possible to create a 20x30 matrix with each holding an subarea of size (10,10). If you now need points in the rectangle from (64,23) to (78, 45), you need to check subareas [6,2], [6,3], [6,4], [7,2], [7,3] and [7,4] - only 6 of the 600. In the second step, you'd throw out results such as (61, 25).