Prim's algorithm for dynamic locations - c++

Suppose you have an input file:
<total vertices>
<x-coordinate 1st location><y-coordinate 1st location>
<x-coordinate 2nd location><y-coordinate 2nd location>
<x-coordinate 3rd location><y-coordinate 3rd location>
...
How can Prim's algorithm be used to find the MST for these locations? I understand this problem is typically solved using an adjacency matrix. Any references would be great if applicable.

If you already know prim, it is easy. Create adjacency matrix adj[i][j] = distance between location i and location j

I'm just going to describe some implementations of Prim's and hopefully that gets you somewhere.
First off, your question doesn't specify how edges are input to the program. You have a total number of vertices and the locations of those vertices. How do you know which ones are connected?
Assuming you have the edges (and the weights of those edges. Like #doomster said above, it may be the planar distance between the points since they are coordinates), we can start thinking about our implementation. Wikipedia describes three different data structures that result in three different run times: http://en.wikipedia.org/wiki/Prim's_algorithm#Time_complexity
The simplest is the adjacency matrix. As you might guess from the name, the matrix describes nodes that are "adjacent". To be precise, there are |v| rows and columns (where |v| is the number of vertices). The value at adjacencyMatrix[i][j] varies depending on the usage. In our case it's the weight of the edge (i.e. the distance) between node i and j (this means that you need to index the vertices in some way. For instance, you might add the vertices to a list and use their position in the list).
Now using this adjacency matrix our algorithm is as follows:
Create a dictionary which contains all of the vertices and is keyed by "distance". Initially the distance of all of the nodes is infinity.
Create another dictionary to keep track of "parents". We use this to generate the MST. It's more natural to keep track of edges, but it's actually easier to implement by keeping track of "parents". Note that if you root a tree (i.e. designate some node as the root), then every node (other than the root) has precisely one parent. So by producing this dictionary of parents we'll have our MST!
Create a new list with a randomly chosen node v from the original list.
Remove v from the distance dictionary and add it to the parent dictionary with a null as its parent (i.e. it's the "root").
Go through the row in the adjacency matrix for that node. For any node w that is connected (for non-connected nodes you have to set their adjacency matrix value to some special value. 0, -1, int max, etc.) update its "distance" in the dictionary to adjacencyMatrix[v][w]. The idea is that it's not "infinitely far away" anymore... we know we can get there from v.
While the dictionary is not empty (i.e. while there are nodes we still need to connect to)
Look over the dictionary and find the vertex with the smallest distance x
Add it to our new list of vertices
For each of its neighbors, update their distance to min(adjacencyMatrix[x][neighbor], distance[neighbor]) and also update their parent to x. Basically, if there is a faster way to get to neighbor then the distance dictionary should be updated to reflect that; and if we then add neighbor to the new list we know which edge we actually added (because the parent dictionary says that its parent was x).
We're done. Output the MST however you want (everything you need is contained in the parents dictionary)
I admit there is a bit of a leap from the wikipedia page to the actual implementation as outlined above. I think the best way to approach this gap is to just brute force the code. By that I mean, if the pseudocode says "find the min [blah] such that [foo] is true" then write whatever code you need to perform that, and stick it in a separate method. It'll definitely be inefficient, but it'll be a valid implementation. The issue with graph algorithms is that there are 30 ways to implement them and they are all very different in performance; the wikipedia page can only describe the algorithm conceptually. The good thing is that once you implement it some way, you can find optimizations quickly ("oh, if I keep track of this state in this separate data structure, I can make this lookup way faster!"). By the way, the runtime of this is O(|V|^2). I'm too lazy to detail that analysis, but loosely it's because:
All initialization is O(|V|) at worse
We do the loop O(|V|) times and take O(|V|) time to look over the dictionary to find the minimum node. So basically the total time to find the minimum node multiple times is O(|V|^2).
The time it takes to update the distance dictionary is O(|E|) because we only process each edge once. Since |E| is O(|V|^2) this is also O(|V|^2)
Keeping track of the parents is O(|V|)
Outputting the tree is O(|V| + |E|) = O(|E|) at worst
Adding all of these (none of them should be multiplied except within (2)) we get O(|V|^2)
The implementation with a heap is O(|E|log(|V|) and it's very very similar to the above. The only difference is that updating the distance is O(log|V|) instead of O(1) (because it's a heap), BUT finding/removing the min element is O(log|V|) instead of O(|V|) (because it's a heap). The time complexity is quite similar in analysis and you end up with something like O(|V|log|V| + |E|log|V|) = O(|E|log|V|) as desired.
Actually... I'm a bit confused why the adjacency matrix implementation cares about it being an adjacency matrix. It could just as well be implemented using an adjacency list. I think the key part is how you store the distances. I could be way off in my implementation outlined above, but I am pretty sure it implements Prim's algorithm is satisfies the time complexity constraints outlined by wikipedia.

Related

Finding the best algorithm for nearest neighbor search in a 2D plane with moving points

I am looking for an efficient way to perform nearest neighbor searches within a specified radius in a two-dimensional plane. According to Wikipedia, space-partitioning data structures, such as :
k-d trees,
r-trees,
octrees,
quadtrees,
cover trees,
metric trees,
BBD trees
locality-sensitive hashing,
and bins,
are often used for organizing points in a multi-dimensional space and can provide O(log n) performance for search and insert operations. However, in my case, the points in the two-dimensional plane are moving at each iteration, so I need to update the tree accordingly. Rebuilding the tree from scratch at each iteration seems easier, but I would like to avoid it if possible because the points only move slightly between iterations.
I have read that k-d trees are not naturally balanced, which could be an issue in my case. R-trees, on the other hand, are better suited for storing rectangles. Bin algorithms, on the other hand, are easy to implement and provide near-linear search performance within local bins.
I am working on an autonomous agent simulation where 1,000,000 agents are rendered in the GPU, and the CPU is responsible for computing the next movement of each agent. Each agent is influenced by other agents within its line of sight, or in other words, other agents within a circular sector of angle θ and radius r. So here specific requirements for my use case:
Search space is a 2-d plane,
Each object is a point identified with the x,y coordinate.
All points are frequently updated by a small factor.
Cannot afford any O(n^2) algorithms.
Search within a radius (circular sector)
Search for all candidates within the search surface.
Given these considerations, what would be the best algorithms for my use case?
I think you could potentially solve this by doing a sort of scheduling approach. If you know that no object will move more than d distance in each iteration, and you want to know which objects are within X distance of each other on each iteration, then given the distances between all objects you know that on the next iteration the only potential pairs of objects that would change their neighbor status would be those with a distance between X-d and X+d. The iteration after that it would be X-2d and X+2d and so on.
So I'm thinking that you could do an initial distance calculation between all pairs of objects, and then based on each difference you can create an NxN matrix where the value in each cell is which iteration you will need to re-check their distance. Then when you re-check those during that iteration, you would update their values in this matrix for the next iteration that they need to be checked.
The only problem is whether calculating an initial NxN distance matrix is feasible.

Path finding on a large list of nodes? Around 100,000 nodes

I have a list of nodes as 2D coordinate (array of float) and the goal is to find how many nodes are linked to the source node(given).
Two nodes are defined as linked, if the distance between the nodes is less than or equal to 10. Also, if distance between A and B is <= 10, distance between B and C is <= 10 and distance between A and C > 10, even then, A and C are linked as then path would be is A->B->C. So, it is a typical graph search problem in theory.
Here is the problem. I have around 100,000 nodes in a list. Each node is a 2D coordinate point. Since, the list is enormous, conventional traversal and path finding algorithms like DFS or BFS would take up a O(n^2) to construct the adjacency list, which is not ideal and not what I am looking for.
I researched on the internet and found out that Quad Tree or kd Tree probably might be the best to implement in this case. I have made my own Quad Tree class also, I just don't understand how to implement a search algorithm like DFS on it. Or if there is something else that I am missing out on?
A quadtree groups points by splitting 2D space into quarters, either until each point has a quadrant to itself, or until you reach a minimum size, after which you lump all points within the quadrant into a list.
Since you're trying to find all points within a maximum distance of each point in your source list, you don't need to go all the way down to one-point-per-cell. To pick a cutoff, I would do performance tests on some different values, but as a starting point for minimum quadrant size the maximum connection distance for points is probably a good guess.
So now you have all of your points grouped into a tree and you need to know how to actually find nearby ones.
Since the quadtree encodes spatial information, to find points within a certain distance of any given point, you would descend the quadtree and use that spatial information to exclude entire quadrants from your search. To do this, you would check whether the nearest bound of each quadrant is beyond the maximum distance from the point you are searching from. If the closest edge of that quadrant is beyond the maximum distance, then none of the points in that quadrant can possibly be within the maximum distance, so there is no need to explore that part of the tree. (This is similar to how a binary search doesn't need to search parts of a sorted array or tree, because it knows that those parts cannot possibly contain the value being searched for).
Once you get down to the level of the quadtree where you have a single point or list of points, you would do a regular euclidean distance check with those points to see if they were actually within the maximum distance. (Don't forget to check for equality, otherwise you'll find the same point you're searching around).
So, for example, if you were searching for points near one of the points in the bottom-right corner of this image, there would be no need to search the other three top-level quadrants because all three of them would be beyond the maximum distance. This would save you from exploring all of the sub-quadrants in those parts of the tree and avoid doing distance comparisons against all of those points.
If, however, you are searching for a point near the edge of a quadrant, you do need to check neighboring quadrants, because the nearest bound will be close enough that you cannot exclude the possibility of a valid point being in that quadrant.
In your particular case, you would make use of this by building the quadtree once, and then looping over the original list of points and doing the search I described above to find all points near that point. You would then use the found-points to build a connectivity graph, which could be efficiently traversed by Depth/Breadth-First-Search or could be given edge-weights to be used with a more complex, weighted search like Dijkstra's Algorithm or A*.

Efficiently search among pairs of adjacent elements in a `set`

I'm currently working on a problem where I want to maintain the convex hull of a set of linear functions. It might look something like this:
I'm using a set<Line> to maintain the lines so that I can dynamically insert lines, which works fine. The lines are ordered by increasing slope, which is defined by the operator< of the lines. By throwing out "superseded" lines, the data structure guarantees that every line will have some segment that is a part of the convex hull.
Now the problem is that I want to search in this data structure for the crossing point whose X coordinate precedes a given x. Since those crossing points are only implicitely defined by adjacency in the set (in the image above, those are the points N, Q etc.), it seems to be entirely impossible to solve with the set alone, since I don't have
The option to find an element by anything but the primary compare function
The option to "binary search" in the underlying search tree myself, that is, compute the pre-order predecessor or successor of an iterator
The option to access elements by index efficiently
I am thus inclined to use a second set<pair<set<Line>::iterator, set<Line>::iterator> > >, but this seems incredibly hacky. Seeing as we mainly need this for programming contests, I want to minimize code size, so I want to avoid a second set or a custom BBST data structure.
Is there a good way to model this scenario which still let's me maintain the lines dynamically and binary search by the value of a function on adjacent elements, with a reasonable amount of code?

Given an even number of vertices, how to find an optimum set of pairs based on proximity?

The problem:
We have a set of n vertices in 3D euclidean space, and there is an even number of these vertices.
We want to pair them up based on their proximity. In other words, we'd like to be able to find a set of vertex pairs, where the vertices in each pair are as close as possible together.
We want to minimise sacrificing the proximity between the vertices of any other pairs as much as possible in doing this.
I am not looking for the most optimal solution (if it even strictly exists/can be done), just a reasonable one that can be computed relatively quickly.
A relatively awful brute force approach involves choosing a vertex and looping through the rest to find its nearest neighbor and then repeating until there are none left. Of course as we near the end of the list the closest vertex could be very far away, but it is the only choice, therefore this can fail badly on the third point above.
A common approach for this kind of problems (especially if n is large) is to precompute a spatial index structure, such as a kd tree or an octtree and perform the search for nearest neighbors with the help of it. Through the nodes of the octtree, the available point are put into bins, so you can be sure they are mutually close. Also you minimize the number of comparisons.
A sketch of the implementation with an octtree: you need a Node class that stores its bounding box. A derived LeafNode class stores small number of points up to a maximum (e.g. k = 20), that are added with an insert function. A derived NonLeafNode class stores references to 8 subnodes (which may be both Leaf and NonLeafNodes).
The tree is represented by a root node, all insertions and queries start here. The tree is built up by starting with the first k points being inserted into a LeafNode. If the k+1st point is inserted, the bounding box is split into 8 sub boxes and the contained points are sorted into them. The current LeafNode is replaced by one NonLeafNode with 8 subnodes.
This is iterated until all points are in the tree.
For nearest neighbor searches, the tree is traversed starting from the root node by comparing with the bounding box. If the query point is within a node's bounding box, the traversal goes into that node. Note that if you found the nearest candidate, you also need to check with neighboring nodes in the octtree.
For a kdtree implementation check the wikipedia page, looks quite straigthforward.
Since you are not looking for an optimal solution, here's a heuristic you may consider.
For each point p compute two points: the nearest neighbour and the farthest neighbour that are closest and farthest to p respectively. Now let q be the point with the largest farthest neighbour (q is an extreme point in the input). Match q with its nearest neighbour, delete both of them and recursively compute the matching for the remaining points.
This is certainly NOT optimal, but it does seem to do reasonably well on small input sets. If you need an optimal solution you should read about the euclidean matching problem.

Finding edge in weighted graph

I have a graph with four nodes, each node represents a position and they are laid out like a two dimensional grid. Every node has a connection (an edge) to all (according to the position) adjacent nodes. Every edge also has a weight.
Here are the nodes represented by A,B,C,D and the weight of the edges is indicated by the numbers:
A 100 B
120 220
C 150 D
I want to structure a container and an algorithm that switches the nodes sharing the edge with the highest weight. Then reset the weight of that edge. No node (position) can be switched more than once each time the algorithm is executed.
For example, processing the above, the highest weight is on edge BD, so we switch those. Since no node can be switched more than once, all edges involved in either B or D is reset.
A D
120
C B
Then, the next highest weight is on the only edge left, switching those would give us the final layout: C,D,A,B.
I'm currently running a quite awful implementation of this. I store a long list of edges, holding four values for the nodes they are (potentially) connected to, a value for its weight and the position for the node itself. Every time anything is requested, I loop through the entire list.
I'm writing this in C++, could some parts of the STL help speed this up? Also, how to avoid the duplication of data? A node position is currently in five objects. The node itself that is there and the four nodes indicating a connection to it.
In short, I want help with:
Can this be structured in a way so that there is no data duplication?
Recognise the problem? If any of this has a name, tell me so I can google for more info on the subject.
Fast algorithms are always nice.
As for names, this is a vertex cover problem. Optimal vertex cover is NP-hard with decent approximation solutions, but your problem is simpler. You're looking at a pseudo-maximum under a tighter edge selection criterion. Specifically, once an edge is selected every connected edge is removed (representing the removal of vertices to be swapped).
For example, here's a standard greedy approach:
0) sort the edges; retain adjacency information
while edges remain:
1) select the highest edge
2) remove all adjacent edges from the list
endwhile
The list of edges selected gives you the vertices to swap.
Time complexity is O(Sorting vertices + linear pass over vertices), which in general will boil down to O(sorting vertices), which will likely by O(V*log(V)).
The method of retaining adjacency information depends on the graph properties; see your friendly local algorithms text. Feel free to start with an adjacency matrix for simplicity.
As with the adjacency information, most other speed improvements will apply best to graphs of a certain shape but come with a tradeoff of time versus space complexity.
For example, your problem statement seems to imply that the vertices are laid out in a square pattern, from which we could derive many interesting properties. For example, that system is very easily parallelized. Also, the adjacency information would be highly regular but sparse at large graph sizes (most vertices wouldn't be connected to each other). This makes the adjacency matrix give a high overhead; you could instead store adjacency in an array of 4-tuples as it would retain fast access but almost entirely eliminate overhead.
If you have bigger graphs look into the boost graph library. It gives you good data structures for graphs and basic iterators for different types of graph traversing