Finding edge in weighted graph

Finding edge in weighted graph - c++

I have a graph with four nodes, each node represents a position and they are laid out like a two dimensional grid. Every node has a connection (an edge) to all (according to the position) adjacent nodes. Every edge also has a weight.
Here are the nodes represented by A,B,C,D and the weight of the edges is indicated by the numbers:
A 100 B
120 220
C 150 D
I want to structure a container and an algorithm that switches the nodes sharing the edge with the highest weight. Then reset the weight of that edge. No node (position) can be switched more than once each time the algorithm is executed.
For example, processing the above, the highest weight is on edge BD, so we switch those. Since no node can be switched more than once, all edges involved in either B or D is reset.
A D
120
C B
Then, the next highest weight is on the only edge left, switching those would give us the final layout: C,D,A,B.
I'm currently running a quite awful implementation of this. I store a long list of edges, holding four values for the nodes they are (potentially) connected to, a value for its weight and the position for the node itself. Every time anything is requested, I loop through the entire list.
I'm writing this in C++, could some parts of the STL help speed this up? Also, how to avoid the duplication of data? A node position is currently in five objects. The node itself that is there and the four nodes indicating a connection to it.
In short, I want help with:
Can this be structured in a way so that there is no data duplication?
Recognise the problem? If any of this has a name, tell me so I can google for more info on the subject.
Fast algorithms are always nice.

As for names, this is a vertex cover problem. Optimal vertex cover is NP-hard with decent approximation solutions, but your problem is simpler. You're looking at a pseudo-maximum under a tighter edge selection criterion. Specifically, once an edge is selected every connected edge is removed (representing the removal of vertices to be swapped).
For example, here's a standard greedy approach:
0) sort the edges; retain adjacency information
while edges remain:
1) select the highest edge
2) remove all adjacent edges from the list
endwhile
The list of edges selected gives you the vertices to swap.
Time complexity is O(Sorting vertices + linear pass over vertices), which in general will boil down to O(sorting vertices), which will likely by O(V*log(V)).
The method of retaining adjacency information depends on the graph properties; see your friendly local algorithms text. Feel free to start with an adjacency matrix for simplicity.
As with the adjacency information, most other speed improvements will apply best to graphs of a certain shape but come with a tradeoff of time versus space complexity.
For example, your problem statement seems to imply that the vertices are laid out in a square pattern, from which we could derive many interesting properties. For example, that system is very easily parallelized. Also, the adjacency information would be highly regular but sparse at large graph sizes (most vertices wouldn't be connected to each other). This makes the adjacency matrix give a high overhead; you could instead store adjacency in an array of 4-tuples as it would retain fast access but almost entirely eliminate overhead.

If you have bigger graphs look into the boost graph library. It gives you good data structures for graphs and basic iterators for different types of graph traversing

Related

Finding the best algorithm for nearest neighbor search in a 2D plane with moving points

I am looking for an efficient way to perform nearest neighbor searches within a specified radius in a two-dimensional plane. According to Wikipedia, space-partitioning data structures, such as :
k-d trees,
r-trees,
octrees,
quadtrees,
cover trees,
metric trees,
BBD trees
locality-sensitive hashing,
and bins,
are often used for organizing points in a multi-dimensional space and can provide O(log n) performance for search and insert operations. However, in my case, the points in the two-dimensional plane are moving at each iteration, so I need to update the tree accordingly. Rebuilding the tree from scratch at each iteration seems easier, but I would like to avoid it if possible because the points only move slightly between iterations.
I have read that k-d trees are not naturally balanced, which could be an issue in my case. R-trees, on the other hand, are better suited for storing rectangles. Bin algorithms, on the other hand, are easy to implement and provide near-linear search performance within local bins.
I am working on an autonomous agent simulation where 1,000,000 agents are rendered in the GPU, and the CPU is responsible for computing the next movement of each agent. Each agent is influenced by other agents within its line of sight, or in other words, other agents within a circular sector of angle θ and radius r. So here specific requirements for my use case:
Search space is a 2-d plane,
Each object is a point identified with the x,y coordinate.
All points are frequently updated by a small factor.
Cannot afford any O(n^2) algorithms.
Search within a radius (circular sector)
Search for all candidates within the search surface.
Given these considerations, what would be the best algorithms for my use case?

I think you could potentially solve this by doing a sort of scheduling approach. If you know that no object will move more than d distance in each iteration, and you want to know which objects are within X distance of each other on each iteration, then given the distances between all objects you know that on the next iteration the only potential pairs of objects that would change their neighbor status would be those with a distance between X-d and X+d. The iteration after that it would be X-2d and X+2d and so on.
So I'm thinking that you could do an initial distance calculation between all pairs of objects, and then based on each difference you can create an NxN matrix where the value in each cell is which iteration you will need to re-check their distance. Then when you re-check those during that iteration, you would update their values in this matrix for the next iteration that they need to be checked.
The only problem is whether calculating an initial NxN distance matrix is feasible.

Boost Graph find neighbours of a group of vertices

I have an undirected_unweighted_graph graph; which is defined as follows:
typedef typename boost::adjacency_list<boost::vecS,boost::vecS,boost::undirectedS,boost::no_property,boost::no_property> undirected_unweighted_graph;
It has several vertices which are interconnected by undirected edges.
During my algorithm, I'm searching for a connected subgraph of graph which only contains some of the vertices, which has certain properties.
I'm using a linear optimization software package which provides me with possible optimal solutions for my problem. A solution consists of a set of vertices with a fixed size n and might be infeasible (i.e. the vertices are not connected in the corresponding subgraph of graph). I'm currently generating a new graph with the vertices of the solution and adding the edges which are also present in graph. I'm using boost::connected_components() to calculate the connected components for it.
Now I come to my question:
The next step for me is to improve the performance of generating a solution by imposing a constraint. Specifically, I will "grow" a solution, starting from a single node and ending with a subgraph of n nodes. At each stage, a partial solution will grow by adding one of its neighbors. (The idea is that if a partial solution can grow to a full solution, then at least one of its neighbors will be in the full solution.) How can I identify these neighbors?
My approach is the following:
I'm iterating over each component and then iterate over boost::out_edges(v, g). I then have to check if the neighbor is part of my component or not. If it is not part of the component I add it to the component neighbor group. I wonder if there is any way in boost to iterate over boost::out_edges(V, g) for a list of vertices V.
EDIT
To be more concrete: Given a graph, I am able to iterate over the neighbors of a given vertex like this:
for (auto edge: boost::make_iterator_range(boost::out_edges(v, graph))) {
//do stuff
}
What if I have a connected component, say a vector of vertices std::vector<size_t> component. What I want are the outgoing edges of the component meaning all outgoing edges of the vertices excluding those which are between two vertices of component. Is there an elegant way to get those edges efficiently?

I would not iterate over multiple vertices. Instead, I would maintain two sets of vertices – one containing the vertices in the current partial solution, and one containing the vertices adjacent to (and not in) the partial solution. When the linear optimization package adds a vertex to the partial solution, that vertex should also be moved from the set of adjacencies to the set of vertices in the solution. Next the edges coming from the new vertex need to be iterated over, but only those from the new vertex. For each vertex adjacent to the new one, if it is not in the partial solution then add it to the set of adjacent vertices.
I would also try something similar using just one set containing both the vertices in the partial solution and those adjacent to the partial solution. Less overhead. Depending on what the surrounding code expects, this set might work as well as the one with just the adjacent vertices.
The advantage of this approach is that you eliminate repetitive work. If you already looked at all the neighbors of vertex A, why should you need to look at them again just because vertex B was added to your set?
A disadvantage of this approach is that you might require significant memory overhead if you need to backtrack at times (think depth-first search and maintaining a stack of these sets). How bad this is depends on how big n is and, on average, how many edges connect to each vertex. Even in a bad case, some clever elimination of redundancy might salvage this approach, but I'll leave that for later.

Boost Graph : Test if two vertices are adjacent

I'm new in using C++ boost library in particularly the boost graph library which a needed to try coding some algorithms where i commonly check the adjacency of two vertices and dealing with other graph concepts like computing graph invariants.
What i know is that we can iterate through adjacent vertices with the function : adjacent_vertices(u, g) but i'm searching for an efficient way to test if two vertices u, v are adjacent or not without doing linear search

The AdjacencyMatrix concept gives a complexity guarantee that the edge() function must return in constant time.
To check if two vertices v and w are adjacent in G, you write edge(v, w, G).second, since the function returns a pair where the second value indicates if the edge exists.
The edge() function is implemented for other graph representations as well. Here is a graph that shows how different representations compare with regard to performance of checking vertex adjacency:
Here is the code used to generate the data for this plot. Each data point is 100 random graphs of medium density, with 100 random edge checks per each graph. Note the logarithmic y axis.
What is the best choice will eventually depend on your particular application, because for other operations the ordering of structures by speed is different. In other words, avoid premature optimization.

BGL is a highly generic library. You can adapt most any datastructure for use with its algorithms.
You can vary the edge container. You don't mention it, but I'm assuming you've been looking at the interface/complexity guarantees for boost::adjacency_list.
Indeed the edge membership test will be O(n) even if you use setS for the edge container selector. This is mostly because adjacency lists store outgoing edges are per vertex. So in worst case, each vertex contains at most one outgoing edge and the search is practically O(n) [1]
In this case you simply want to select another graph implementation.
The documentation page on Graph Concepts is a good starting point to find out about which concepts are expected. As well as, which models supply those concepts.
In the worst case you can adapt your data structure for use with Boost Graph algorithms. E.g. you could store all edges in a simple std::[unordered_]set<std::pair<VID, VID> > and adapt it to model the EdgeListGraph concept.
That way you will have performant lookups.
[1] of course this also means, in best case the search is whatever your set implementation affords: O(log n) because all edges could originate from the same vertex...

Prim's algorithm for dynamic locations

Suppose you have an input file:
<total vertices>
<x-coordinate 1st location><y-coordinate 1st location>
<x-coordinate 2nd location><y-coordinate 2nd location>
<x-coordinate 3rd location><y-coordinate 3rd location>
...
How can Prim's algorithm be used to find the MST for these locations? I understand this problem is typically solved using an adjacency matrix. Any references would be great if applicable.

If you already know prim, it is easy. Create adjacency matrix adj[i][j] = distance between location i and location j

I'm just going to describe some implementations of Prim's and hopefully that gets you somewhere.
First off, your question doesn't specify how edges are input to the program. You have a total number of vertices and the locations of those vertices. How do you know which ones are connected?
Assuming you have the edges (and the weights of those edges. Like #doomster said above, it may be the planar distance between the points since they are coordinates), we can start thinking about our implementation. Wikipedia describes three different data structures that result in three different run times: http://en.wikipedia.org/wiki/Prim's_algorithm#Time_complexity
The simplest is the adjacency matrix. As you might guess from the name, the matrix describes nodes that are "adjacent". To be precise, there are |v| rows and columns (where |v| is the number of vertices). The value at adjacencyMatrix[i][j] varies depending on the usage. In our case it's the weight of the edge (i.e. the distance) between node i and j (this means that you need to index the vertices in some way. For instance, you might add the vertices to a list and use their position in the list).
Now using this adjacency matrix our algorithm is as follows:
Create a dictionary which contains all of the vertices and is keyed by "distance". Initially the distance of all of the nodes is infinity.
Create another dictionary to keep track of "parents". We use this to generate the MST. It's more natural to keep track of edges, but it's actually easier to implement by keeping track of "parents". Note that if you root a tree (i.e. designate some node as the root), then every node (other than the root) has precisely one parent. So by producing this dictionary of parents we'll have our MST!
Create a new list with a randomly chosen node v from the original list.
Remove v from the distance dictionary and add it to the parent dictionary with a null as its parent (i.e. it's the "root").
Go through the row in the adjacency matrix for that node. For any node w that is connected (for non-connected nodes you have to set their adjacency matrix value to some special value. 0, -1, int max, etc.) update its "distance" in the dictionary to adjacencyMatrix[v][w]. The idea is that it's not "infinitely far away" anymore... we know we can get there from v.
While the dictionary is not empty (i.e. while there are nodes we still need to connect to)
Look over the dictionary and find the vertex with the smallest distance x
Add it to our new list of vertices
For each of its neighbors, update their distance to min(adjacencyMatrix[x][neighbor], distance[neighbor]) and also update their parent to x. Basically, if there is a faster way to get to neighbor then the distance dictionary should be updated to reflect that; and if we then add neighbor to the new list we know which edge we actually added (because the parent dictionary says that its parent was x).
We're done. Output the MST however you want (everything you need is contained in the parents dictionary)
I admit there is a bit of a leap from the wikipedia page to the actual implementation as outlined above. I think the best way to approach this gap is to just brute force the code. By that I mean, if the pseudocode says "find the min [blah] such that [foo] is true" then write whatever code you need to perform that, and stick it in a separate method. It'll definitely be inefficient, but it'll be a valid implementation. The issue with graph algorithms is that there are 30 ways to implement them and they are all very different in performance; the wikipedia page can only describe the algorithm conceptually. The good thing is that once you implement it some way, you can find optimizations quickly ("oh, if I keep track of this state in this separate data structure, I can make this lookup way faster!"). By the way, the runtime of this is O(|V|^2). I'm too lazy to detail that analysis, but loosely it's because:
All initialization is O(|V|) at worse
We do the loop O(|V|) times and take O(|V|) time to look over the dictionary to find the minimum node. So basically the total time to find the minimum node multiple times is O(|V|^2).
The time it takes to update the distance dictionary is O(|E|) because we only process each edge once. Since |E| is O(|V|^2) this is also O(|V|^2)
Keeping track of the parents is O(|V|)
Outputting the tree is O(|V| + |E|) = O(|E|) at worst
Adding all of these (none of them should be multiplied except within (2)) we get O(|V|^2)
The implementation with a heap is O(|E|log(|V|) and it's very very similar to the above. The only difference is that updating the distance is O(log|V|) instead of O(1) (because it's a heap), BUT finding/removing the min element is O(log|V|) instead of O(|V|) (because it's a heap). The time complexity is quite similar in analysis and you end up with something like O(|V|log|V| + |E|log|V|) = O(|E|log|V|) as desired.
Actually... I'm a bit confused why the adjacency matrix implementation cares about it being an adjacency matrix. It could just as well be implemented using an adjacency list. I think the key part is how you store the distances. I could be way off in my implementation outlined above, but I am pretty sure it implements Prim's algorithm is satisfies the time complexity constraints outlined by wikipedia.

Given an even number of vertices, how to find an optimum set of pairs based on proximity?

The problem:
We have a set of n vertices in 3D euclidean space, and there is an even number of these vertices.
We want to pair them up based on their proximity. In other words, we'd like to be able to find a set of vertex pairs, where the vertices in each pair are as close as possible together.
We want to minimise sacrificing the proximity between the vertices of any other pairs as much as possible in doing this.
I am not looking for the most optimal solution (if it even strictly exists/can be done), just a reasonable one that can be computed relatively quickly.
A relatively awful brute force approach involves choosing a vertex and looping through the rest to find its nearest neighbor and then repeating until there are none left. Of course as we near the end of the list the closest vertex could be very far away, but it is the only choice, therefore this can fail badly on the third point above.

A common approach for this kind of problems (especially if n is large) is to precompute a spatial index structure, such as a kd tree or an octtree and perform the search for nearest neighbors with the help of it. Through the nodes of the octtree, the available point are put into bins, so you can be sure they are mutually close. Also you minimize the number of comparisons.
A sketch of the implementation with an octtree: you need a Node class that stores its bounding box. A derived LeafNode class stores small number of points up to a maximum (e.g. k = 20), that are added with an insert function. A derived NonLeafNode class stores references to 8 subnodes (which may be both Leaf and NonLeafNodes).
The tree is represented by a root node, all insertions and queries start here. The tree is built up by starting with the first k points being inserted into a LeafNode. If the k+1st point is inserted, the bounding box is split into 8 sub boxes and the contained points are sorted into them. The current LeafNode is replaced by one NonLeafNode with 8 subnodes.
This is iterated until all points are in the tree.
For nearest neighbor searches, the tree is traversed starting from the root node by comparing with the bounding box. If the query point is within a node's bounding box, the traversal goes into that node. Note that if you found the nearest candidate, you also need to check with neighboring nodes in the octtree.
For a kdtree implementation check the wikipedia page, looks quite straigthforward.

Since you are not looking for an optimal solution, here's a heuristic you may consider.
For each point p compute two points: the nearest neighbour and the farthest neighbour that are closest and farthest to p respectively. Now let q be the point with the largest farthest neighbour (q is an extreme point in the input). Match q with its nearest neighbour, delete both of them and recursively compute the matching for the remaining points.
This is certainly NOT optimal, but it does seem to do reasonably well on small input sets. If you need an optimal solution you should read about the euclidean matching problem.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js