I have an undirected_unweighted_graph graph; which is defined as follows:
typedef typename boost::adjacency_list<boost::vecS,boost::vecS,boost::undirectedS,boost::no_property,boost::no_property> undirected_unweighted_graph;
It has several vertices which are interconnected by undirected edges.
During my algorithm, I'm searching for a connected subgraph of graph which only contains some of the vertices, which has certain properties.
I'm using a linear optimization software package which provides me with possible optimal solutions for my problem. A solution consists of a set of vertices with a fixed size n and might be infeasible (i.e. the vertices are not connected in the corresponding subgraph of graph). I'm currently generating a new graph with the vertices of the solution and adding the edges which are also present in graph. I'm using boost::connected_components() to calculate the connected components for it.
Now I come to my question:
The next step for me is to improve the performance of generating a solution by imposing a constraint. Specifically, I will "grow" a solution, starting from a single node and ending with a subgraph of n nodes. At each stage, a partial solution will grow by adding one of its neighbors. (The idea is that if a partial solution can grow to a full solution, then at least one of its neighbors will be in the full solution.) How can I identify these neighbors?
My approach is the following:
I'm iterating over each component and then iterate over boost::out_edges(v, g). I then have to check if the neighbor is part of my component or not. If it is not part of the component I add it to the component neighbor group. I wonder if there is any way in boost to iterate over boost::out_edges(V, g) for a list of vertices V.
EDIT
To be more concrete: Given a graph, I am able to iterate over the neighbors of a given vertex like this:
for (auto edge: boost::make_iterator_range(boost::out_edges(v, graph))) {
//do stuff
}
What if I have a connected component, say a vector of vertices std::vector<size_t> component. What I want are the outgoing edges of the component meaning all outgoing edges of the vertices excluding those which are between two vertices of component. Is there an elegant way to get those edges efficiently?
I would not iterate over multiple vertices. Instead, I would maintain two sets of vertices – one containing the vertices in the current partial solution, and one containing the vertices adjacent to (and not in) the partial solution. When the linear optimization package adds a vertex to the partial solution, that vertex should also be moved from the set of adjacencies to the set of vertices in the solution. Next the edges coming from the new vertex need to be iterated over, but only those from the new vertex. For each vertex adjacent to the new one, if it is not in the partial solution then add it to the set of adjacent vertices.
I would also try something similar using just one set containing both the vertices in the partial solution and those adjacent to the partial solution. Less overhead. Depending on what the surrounding code expects, this set might work as well as the one with just the adjacent vertices.
The advantage of this approach is that you eliminate repetitive work. If you already looked at all the neighbors of vertex A, why should you need to look at them again just because vertex B was added to your set?
A disadvantage of this approach is that you might require significant memory overhead if you need to backtrack at times (think depth-first search and maintaining a stack of these sets). How bad this is depends on how big n is and, on average, how many edges connect to each vertex. Even in a bad case, some clever elimination of redundancy might salvage this approach, but I'll leave that for later.
Related
I am trying to solve a SSSP problem in a connected directed weighted cyclic graph with non-negative weights. The catch here is, this problem asks for the SSSP that uses at most k vertices.
I tried using modified dijkstra's algorithm to solve this problem, keeping a 3-tuple in my priority queue. i.e. (vertex weight, number of vertices in path to this vertex (inclusive), vertex index). My algorithm prevents nodes that are more than k vertices away from being pushed into the priority queue and thus being considered in the shortest path.
Somehow, my algorithm is getting the wrong answer. One reason is, if an initially smaller weighted edge leads to a non-valid path and a initially larger weighted edge leads to a valid path, then my algorithm (being greedy) will report that it cannot find a valid path to the destination.
Edit: Solution code redacted as it is not helpful.
I've found it hard to read your code - so maybe you're already doing this: give each vertex a collection of best paths (edit: actually each vertex stores only the previous step of each of the possible paths), storing the least expensive for that number of visited vertices, once a path goes over the maximum vertex count you can discard it, but you can't discard a more expensive (in terms of total edge lengths) path until you know that the cheaper paths will eventually reach the target in an acceptable number of vertices.
At the end you may have more than one complete path, and you just choose the least edge-wise expensive regardless of its number of vertices (you'd have already discarded it if there were too many)
(Your code would be easier to read if you create a class/struct for some of the things you're storing as pairs of pairs etc, then you can give the members clearer names than second.first etc. Even if you are OK with the current naming, the extra clarity may help you get some other answers if the above hasn't helped.)
Edit to answer: "How do I keep the more expensive path until I know that the cheaper path will lead to a solution? "
Your priority queue is nearly doing this already - its not that each vertex (n) has a complete path stored as I originally implied, currently you just store the best previous vertex (n-1) to use to get to vertex n - and its cost and its vertex count. I'm saying that instead of storing that one best choice for vertex n-1 you store several options, e.g. the best route to A using 3 previous vertices is from vertex B and costs 12, and the best route using 4 is from vertex C and costs 10.
(In all the above best means best found so far in the search)
You only need to store the cheapest route for a given number of vertices. You keep a route if (but only if) its better on either the cost or the vertex count.
In my above example you need to keep both because the cheaper route to this vertex uses more previous vertices so might result in too many vertices before reaching the target - its not clear at that stage which path will be best in the end.
So you need to change your collection type, and your rule for discarding options.
You could for example use a std::map where previous vertices count is the key and total edge cost and previous vertex ID are stored in the value, or an array of total costs where index is the count.
I think you want to store two incrementors with each node: one for the node count and one for the weighted distance. You use the node-count as an early terminator to discard those paths from the set of potential options. You use the weighted distance to choose the next node to iterate, and discard based on node count. In this way, if you fully terminate all the nodes on the periphery as discardable, then you know there's no eligible path to the destination that's at most the required number of hops. If you get to the destination within your list of periphery nodes, then you automatically know it's not more than the restricted number of nodes, and by induction you know it's already the shortest way of getting there, because every other path that could be found from then on must already have a longer path.
I have a triangle mesh which contains millions of triangles. Currently in my data structure only the triangles and the vertices are stored. I want to reconstruct all the edges and stored them in a data container. The idea may be like this: Traverse all the triangles, get each two of its vertices, and create an edge between them. The question is the shared edge maybe created twice. So to overcome this problem, I need a data container EdgeContainer to store the edges and it should have a function to check whether this edge has been already created. So it is like a map with multiple keys, but according to my question, this map should also have the following functions:
EdgeContainer(v1, v2) should return the same result as EdgeContainer(v2, v1), where v1 and v2 are the pointers to two vertices.
EdgeContainer should have a function like EdgeContainer::Remove(v1), which will remove all edges incident to vertex v1.
The implementation should be as efficient as possible.
Is there any existing library which can handle this?
First i suggest you have a look at the concept of
half-edge http://www.flipcode.com/archives/The_Half-Edge_Data_Structure.shtml meshes it is used in CGAL and also in OpenMesh and you should be aware of the concept of you are going to use any of them.
I my slef recommend OpenMesh http://openmesh.org/Documentation/OpenMesh-2.0-Documentation/tutorial_01.html it is free and open source, you can easily create mesh from set of vertices and indices, and after creating mesh you can easily iterate over all edges.
Your easiest bet would be to use the Cgal library, which is basically designed for doing this.
http://doc.cgal.org/latest/Triangulation_2/index.html
It provides natural iterators for iterating over faces, edges and vertices.
Notice that in Cgal, they do not actually store the edges explicitly, they are generated
each time the structure is iterated. This can be done efficiently using some clever rules
that stop you from counting things twice: looking at the code, it appears that each face
is iterated once, and an edge is added for each neighbouring face to the current face,
that comes earlier in the list of faces than the current face.
Note that visiting the edges in this fashion only requires constant time per edge (depending on how you store your faces) so that you are unlikely to benefit from storing them separately. Also note that the edge is defined by two adjacent faces, rather than two adjacent vertices. You can transform them in constant time.
Simple solution is to use sorted pair of indices:
struct _edge_desc : public std::pair<int,int> {
_edge_desc(int a, int b): std::pair<int,int>(a<b?a:b, a<b?b:a) {}
};
std::set<_edge_desc> Edges;
If additional info about edges is needed than it can be store in separate vector, and instead of using set for storing edges, use map that maps to index in vector.
std::vector<some_struct> EdgesInfo;
std::map<_edge_desc, int> EdgesMap;
I am trying to create a minimum spanning tree using prim's algorithm and I have a major question about the actual heap. I structured my graphs adjacency list to be a vector of vertexes, and each vertex has a vector of edges. The edges contain a weight, a connecting vertex, and a key. I am not sure whether my heap should be a heap of vertexes or edges. If I make it a heap of vertexes then there is no way to determine whether the weights are going from the same parent and destination vertexes, which makes me think that I should be making a heap for each vertexes list of edges. So my final question is should I be creating a heap of edges, or a heap of vertexes? If its a list of edges, should I be using the weight on the edges as the key, or should I have a separate data member called key that I can actually use for the priority queue? Thanks!
You should make a minHeap of edges since you are going to sort edges by their weight but the edges should contain two vertexes: representing one vertex on each end. Otherwise, as you suggested: there is no way to determine whether the weights are going from the same parent and destination vertexes. Therefore you should re-structure your edge class and make a minHeap of them.
Consider the algorithm from Wiki as well.
Initialize a tree with a single vertex, chosen
arbitrarily from the graph.
Grow the tree by one edge: Of the edges
that connect the tree to vertices not yet in the tree, find the
minimum-weight edge, and transfer it to the tree.
Repeat step 2 (until all vertices are in the tree).
I don't fully understand the key field in the edge class. I assume it's like an Id to the edge. So you should make a heap of them but since you are providing user-defined data structure to the heap, you should also provide a comparison function for the edge class, i.e. define the bool operator<(const Edge&) method.
Your heap could be of pairs <vertex, weight>, and will contain vertices, which are a single edge away from any vertex already in the partial minimum spanning tree. (edit: in some cases it may contain a vertex which is already in the partial MST, you should ignore such elements when they pop out).
It could be a heap of edges like <src, dst, weight>, which is practically the same, you just ignore src while dst is the same as vertex in the first variant.
PS. Regarding that key thing, I see no need for any keys, you need to compare weights.
The heap must maintain the vertices with key as the smallest weighted edge to it. As the vertex is still not visited hence any edge to it will be unvisited hence the minimum of all unvisited edge to unvisited vertex will be the next edge to be added to spanning hence you remove the vertex corresponding to it. The only problem here is to maintain the updated weights to minimum edges to a vertex in heap as the spanning tree changes in every iteration and new edges are added to it. The way to do it is to keep the position of each unvisited vertex in the heap, when a new vertex is added to spanning tree the unvisited edges from it are updated using the direct position of vertex they are pointing to using stored positions. Then you update the vertex minimum cost if the current cost is less that new edge weight added. Then bubble it up the heap using standard procedure of heap to maintain the min heap.
DataStructure: -
<Vertex,Weight> : Vertex id & weight of minimum edge to it as record of heap
position[Vertex] : The position of vertex record in heap.
Note: inbuilt function wont help you here hence you need to build your own heap to make this work efficiently.Initialize the key values of each vertex to some infinite value at the start
Another Approach: Store the all edge which point to unvisited vertex with weight in min heap. But that would require higher space complexity then other approach but has similar amortized time complexity. When you extract a edge check if the vertex it is pointing to is visited or not, if visited extract again and discard the edge.
Suppose you have an input file:
<total vertices>
<x-coordinate 1st location><y-coordinate 1st location>
<x-coordinate 2nd location><y-coordinate 2nd location>
<x-coordinate 3rd location><y-coordinate 3rd location>
...
How can Prim's algorithm be used to find the MST for these locations? I understand this problem is typically solved using an adjacency matrix. Any references would be great if applicable.
If you already know prim, it is easy. Create adjacency matrix adj[i][j] = distance between location i and location j
I'm just going to describe some implementations of Prim's and hopefully that gets you somewhere.
First off, your question doesn't specify how edges are input to the program. You have a total number of vertices and the locations of those vertices. How do you know which ones are connected?
Assuming you have the edges (and the weights of those edges. Like #doomster said above, it may be the planar distance between the points since they are coordinates), we can start thinking about our implementation. Wikipedia describes three different data structures that result in three different run times: http://en.wikipedia.org/wiki/Prim's_algorithm#Time_complexity
The simplest is the adjacency matrix. As you might guess from the name, the matrix describes nodes that are "adjacent". To be precise, there are |v| rows and columns (where |v| is the number of vertices). The value at adjacencyMatrix[i][j] varies depending on the usage. In our case it's the weight of the edge (i.e. the distance) between node i and j (this means that you need to index the vertices in some way. For instance, you might add the vertices to a list and use their position in the list).
Now using this adjacency matrix our algorithm is as follows:
Create a dictionary which contains all of the vertices and is keyed by "distance". Initially the distance of all of the nodes is infinity.
Create another dictionary to keep track of "parents". We use this to generate the MST. It's more natural to keep track of edges, but it's actually easier to implement by keeping track of "parents". Note that if you root a tree (i.e. designate some node as the root), then every node (other than the root) has precisely one parent. So by producing this dictionary of parents we'll have our MST!
Create a new list with a randomly chosen node v from the original list.
Remove v from the distance dictionary and add it to the parent dictionary with a null as its parent (i.e. it's the "root").
Go through the row in the adjacency matrix for that node. For any node w that is connected (for non-connected nodes you have to set their adjacency matrix value to some special value. 0, -1, int max, etc.) update its "distance" in the dictionary to adjacencyMatrix[v][w]. The idea is that it's not "infinitely far away" anymore... we know we can get there from v.
While the dictionary is not empty (i.e. while there are nodes we still need to connect to)
Look over the dictionary and find the vertex with the smallest distance x
Add it to our new list of vertices
For each of its neighbors, update their distance to min(adjacencyMatrix[x][neighbor], distance[neighbor]) and also update their parent to x. Basically, if there is a faster way to get to neighbor then the distance dictionary should be updated to reflect that; and if we then add neighbor to the new list we know which edge we actually added (because the parent dictionary says that its parent was x).
We're done. Output the MST however you want (everything you need is contained in the parents dictionary)
I admit there is a bit of a leap from the wikipedia page to the actual implementation as outlined above. I think the best way to approach this gap is to just brute force the code. By that I mean, if the pseudocode says "find the min [blah] such that [foo] is true" then write whatever code you need to perform that, and stick it in a separate method. It'll definitely be inefficient, but it'll be a valid implementation. The issue with graph algorithms is that there are 30 ways to implement them and they are all very different in performance; the wikipedia page can only describe the algorithm conceptually. The good thing is that once you implement it some way, you can find optimizations quickly ("oh, if I keep track of this state in this separate data structure, I can make this lookup way faster!"). By the way, the runtime of this is O(|V|^2). I'm too lazy to detail that analysis, but loosely it's because:
All initialization is O(|V|) at worse
We do the loop O(|V|) times and take O(|V|) time to look over the dictionary to find the minimum node. So basically the total time to find the minimum node multiple times is O(|V|^2).
The time it takes to update the distance dictionary is O(|E|) because we only process each edge once. Since |E| is O(|V|^2) this is also O(|V|^2)
Keeping track of the parents is O(|V|)
Outputting the tree is O(|V| + |E|) = O(|E|) at worst
Adding all of these (none of them should be multiplied except within (2)) we get O(|V|^2)
The implementation with a heap is O(|E|log(|V|) and it's very very similar to the above. The only difference is that updating the distance is O(log|V|) instead of O(1) (because it's a heap), BUT finding/removing the min element is O(log|V|) instead of O(|V|) (because it's a heap). The time complexity is quite similar in analysis and you end up with something like O(|V|log|V| + |E|log|V|) = O(|E|log|V|) as desired.
Actually... I'm a bit confused why the adjacency matrix implementation cares about it being an adjacency matrix. It could just as well be implemented using an adjacency list. I think the key part is how you store the distances. I could be way off in my implementation outlined above, but I am pretty sure it implements Prim's algorithm is satisfies the time complexity constraints outlined by wikipedia.
I have a graph with four nodes, each node represents a position and they are laid out like a two dimensional grid. Every node has a connection (an edge) to all (according to the position) adjacent nodes. Every edge also has a weight.
Here are the nodes represented by A,B,C,D and the weight of the edges is indicated by the numbers:
A 100 B
120 220
C 150 D
I want to structure a container and an algorithm that switches the nodes sharing the edge with the highest weight. Then reset the weight of that edge. No node (position) can be switched more than once each time the algorithm is executed.
For example, processing the above, the highest weight is on edge BD, so we switch those. Since no node can be switched more than once, all edges involved in either B or D is reset.
A D
120
C B
Then, the next highest weight is on the only edge left, switching those would give us the final layout: C,D,A,B.
I'm currently running a quite awful implementation of this. I store a long list of edges, holding four values for the nodes they are (potentially) connected to, a value for its weight and the position for the node itself. Every time anything is requested, I loop through the entire list.
I'm writing this in C++, could some parts of the STL help speed this up? Also, how to avoid the duplication of data? A node position is currently in five objects. The node itself that is there and the four nodes indicating a connection to it.
In short, I want help with:
Can this be structured in a way so that there is no data duplication?
Recognise the problem? If any of this has a name, tell me so I can google for more info on the subject.
Fast algorithms are always nice.
As for names, this is a vertex cover problem. Optimal vertex cover is NP-hard with decent approximation solutions, but your problem is simpler. You're looking at a pseudo-maximum under a tighter edge selection criterion. Specifically, once an edge is selected every connected edge is removed (representing the removal of vertices to be swapped).
For example, here's a standard greedy approach:
0) sort the edges; retain adjacency information
while edges remain:
1) select the highest edge
2) remove all adjacent edges from the list
endwhile
The list of edges selected gives you the vertices to swap.
Time complexity is O(Sorting vertices + linear pass over vertices), which in general will boil down to O(sorting vertices), which will likely by O(V*log(V)).
The method of retaining adjacency information depends on the graph properties; see your friendly local algorithms text. Feel free to start with an adjacency matrix for simplicity.
As with the adjacency information, most other speed improvements will apply best to graphs of a certain shape but come with a tradeoff of time versus space complexity.
For example, your problem statement seems to imply that the vertices are laid out in a square pattern, from which we could derive many interesting properties. For example, that system is very easily parallelized. Also, the adjacency information would be highly regular but sparse at large graph sizes (most vertices wouldn't be connected to each other). This makes the adjacency matrix give a high overhead; you could instead store adjacency in an array of 4-tuples as it would retain fast access but almost entirely eliminate overhead.
If you have bigger graphs look into the boost graph library. It gives you good data structures for graphs and basic iterators for different types of graph traversing