topological sort on boost graph for vertices that are connected through a given vertex only - boost-graph

I have a large graph which represents a set of dependencies.
A -->B
A -->C
B -->F
C -->G
D -->E
E -->L
F -->H
F -->I
H -->K
J -->K
Given B as a starting node, result needs to be : B F I H K
I do not need any other nodes since they cannot be reached from B.
A user can specify a node and I need to collect all the paths and the nodes in those paths and print the topological sort of those nodes.
Currently I am implementing this by
Create a new graph
Extract all the outgoing edges from the given node.
Add those edges in a graph. From those edges, get the nodes and also add them to the new graph.
For each node in step 3, repeat (2) and (3)
On that new graph, perform a topological sort and get the order of the nodes
Is there any other better way to do this or known algorithm for finding the topological sort of a subset of nodes?

I would look into the depth_first_visit() function, and the Visitor Pattern provided by boost. The Visitor Pattern allows you to inspect vertices and edges as the tree is searched. The depth_first_visit() function allows you to start a DFS search starting at a specific vertex.

Related

When adding a vertex to a weighted undirected graph, which weight stays?

I'm doing an adjacency list implementation of a graph class in C++ (but that's kinda irrelevant). I have a weighted directionless graph and I am writing the addVertex method. I have basically everything down, but I'm not sure what I should do when I add a vertex in between two others that were already there and had their own corresponding weights.
Should I just throw away the old weight that the vertex stored? Should I use the new one that was passed in? A mix of both? Or does it not matter at all what I pick?
I just wanted to make sure that I was not doing something I shouldn't.
Thanks!
I guess it depends on what you want to achieve. Usually, an adjacency list is a nested list whereby each row i indicates the i-th node's neighbourhood. To be precise, each entry in the i-th node's neighbourhood represents an outgoing connection from node i to j. The adjacency list does not comprise edge or arc weights.
Hence, adding a vertex n should do not affect the existing adjacency list's entries but adds a new empty row n to the adjacency list. However, adding or removing edges alter the adjacency list's entries. Thus, adding a vertex n "between two other [nodes i and j] that were already there and had their own corresponding weights" implies that you remove the existing connection between i and j and eventually add two new connections (i,n) and (n,j). If there are no capacity restrictions on the edges and the sum of distances (i,n) and (n,j) dominates the distance (I,j) this could be fine. However, if the weights represent capacities (e.g. max-flow problem) you should keep both connections.
So your question seems to be incomplete or at least unprecise. I assume that your goal is to calculate the shortest distances between each pair of nodes within an undirected graph. I suggest keeping all the different connections in your graph. Shortest path algorithms can calculate the shortest connections between each node pair after you have finished your graph's creation.

BGL - determine all mincuts

Given a graph, I want to find all edges (if any), that if removed spilt the graph into two components.
An initial idea would have been to assign a weight of 1 to all edges, then calculate the mincut of the graph. mincut > 1 implies there is no single edge that when removed causes a split.
For mincut == 1, it would have been nice if the algorithm would provide for each mincut the edges it consists of.
Unfortunately, BGL does not seem to support that kind of thing:
The stoer_wagner_min_cut function determines exactly one of the min-cuts as well as its weight.
(http://www.boost.org/doc/libs/1_59_0/libs/graph/doc/stoer_wagner_min_cut.html)
Is there a way to make this work (i.e. to determine more than one mincut) with the BGL or will I have to come up with something different?
This may come a little bit late...
From what I see you only need to find all edges that don't belong to any cycle in the graph(assuming the graph is already connected).
This can be done by iteratively removing leaf nodes(and the edges connected to a them), much like what you do in topological sorting, until there's no leaf node left i.e. every edge in the remaining graph belongs to at least one cycle. All edges removed during the process will be the ones you want.
In pseudocode, for a connected undirected graph G=(V,E), you can do this:
S = Ø
while(there exists a node n∈V s.t. degree(n)==1)
e = edge connected to n
S = S∪{e}
E = E-{e}
V = V-{n}
return S
which can be done in O(|V|+|E|) time

Minimum path to connect k nodes in a graph with n nodes

Given a graph with n nodes and n-1 edges, find the lowest amount of edges needed to connect k nodes. You are given m numbers k <= n for which you have to solve the problem. The edges are unweighted and the graph does not necessarily describe a binary tree.
For a graph with n=5 and k=4, a possible path could be 1-2-5-2-3.
My approach was a greedy method in which I start from the node with the highest rank and try to add edges from there, but I did not have much success in trying to formulate an algorithm.
Try to find longest path in your tree, you will take every edge on that route once, than choose random node that has an edge, that connect that node to already choosen elements. So find longest path in your tree if that path has k or more nodes it is the end, you choose element on that path, every edge will be taken once Else choose a random, untaken already node, that is connected by any edge directly to one of already taken nodes, as long as you do not have k nodes in your collection.

Subset of vertices

I have a homework problem and i don't know how to solve it. If you could give me an idea i would be very grateful.
This is the problem:
"You are given a connected undirected graph which has N vertices and N edges. Each vertex has a cost. You have to find a subset of vertices so that the total cost of the vertices in the subset is minimum, and each edge is incident with at least one vertex from the subset."
Thank you in advance!
P.S: I have tought about a solution for a long time, and the only ideas i came up with are backtracking or an minimum cost matching in bipartite graph but both ideas are too slow for N=100000.
This may be solved in linear time using dynamic programming.
A connected graph with N vertices and N edges contains exactly one cycle. Start with detecting this cycle (with the help of depth-first search).
Then remove any edge on this cycle. Two vertices incident to this edge are u and v. After this edge removal, we have a tree. Interpret it as a rooted tree with the root u.
Dynamic programming recurrence for this tree may be defined this way:
w0[k] = 0 (for leaf nodes)
w1[k] = vertex_cost (for leaf nodes)
w0[k] = w1[k+1] (for nodes with one descendant)
w1[k] = vertex_cost + min(w0[k+1], w1[k+1]) (for nodes with one descendant)
w0[k] = sum(w1[k+1], x1[k+1], ...) (for branch nodes)
w1[k] = vertex_cost + sum(min(w0[k+1], w1[k+1]), min(x0[k+1], x1[k+1]), ...)
Here k is the node depth (distance from root), w0 is cost of the sub-tree starting from node w when w is not in the "subset", w1 is cost of the sub-tree starting from node w when w is in the "subset".
For each node only two values should be calculated: w0 and w1. But for nodes that were on the cycle we need 4 values: wi,j, where i=0 if node v is not in the "subset", i=1 if node v is in the "subset", j=0 if current node is not in the "subset", j=1 if current node is in the "subset".
Optimal cost of the "subset" is determined as min(u0,1, u1,0, u1,1). To get the "subset" itself, store back-pointers along with each sub-tree cost, and use them to reconstruct the subset.
Due to the number of edges are strict to the same number of vertices, so it's not the common Vertex cover problem which is NP-Complete. I think there's a polynomial solution here:
An N vertices and (N-1) edges graph is a tree. Your graph has N vertices and N edges. Firstly find the awful edge causing a loop and make the graph to a tree. You could use DFS to find the loop (O(N)). Removing any one of the edges in the loop would make a possible tree. In extreme condition you would get N possible trees (the raw graph is a circle).
Apply a simple dynamic planning algorithm (O(N)) to each possible tree (O(N^2)), then find the one with the least cost.

Why do we need a priority queue in Prim's Algorithm

As my question speaks I want to know why do we use Priority queue in Prim's Algorithm?
How does it saves us from using the naive way (yes I've heard of it but don't know why).
I'd be very happy if anyone could explain step by step for adjacency list . I am using Cormen's book.
The pseudocode :
Prim(G,w,r) //what is w (weight?) and r?
For each u in V[G]
do key[u] ← ∞ // what is key?
π[u] ← NIL
key[r] ← 0
Q ← V[G]
While Q ≠ Ø
do u ← EXTRACT-MIN(Q)
for each v in Adj[u]
if v is in Q and w(u,v) < key[v]
then π[v] ← u
key[v] ← w(u,v)
I am thinking to use std::vector then std::make_heap(); as priority queue for storing edges.
In prim's algorithm, there is a step where you have to get the 'nearest' vertex. This step would cost O(N) if using normal array, but it'd take only O(logN) if you use priority queue (heap for example)
Hence, the reason for using priority queue is to reduce the algorithm's time complexity (which mean it make your program run faster)
**
Update:
**
Here is Prim's algorithm's description from Wikipedia. The bold part is the part for finding nearest vertex I talked about:
Input: A non-empty connected weighted graph with vertices V and edges E (the weights can be negative).
Initialize: Vnew = {x}, where x is an arbitrary node (starting point) from V, Enew = {}
Repeat until Vnew = V:
Choose an edge (u, v) with minimal weight such that u is in Vnew and v is not (if there are multiple edges with the same weight, any of them may be picked)
Add v to Vnew, and (u, v) to Enew
Output: Vnew and Enew describe a minimal spanning tree
You don't "need" it. In fact, a naive implementation of Prim's algorithm would simply do a linear search of the array of distances to find the next nearest vertex. Dijkstra's algorithm works the exact same way.
The reason why people use it is because it significantly speeds up the runtime of the algorithm. It turns from O(V^2 + E) to O(E*log(V)).
The key to this is the EXTRACT-MIN(Q) function. If you do it naively, this operation would take O(V) time. With a heap, it only takes O(logV) time.
Doing this roughly from memory, so it may be slightly inconsistent, but it gets the point across:
class Graph
Set<node> nodes; // The set of nodes in the graph
MultiMap<Node, Edge> edges; // Map from Node, to a list of weighted edges connected to the node. If it weren't weighted, any spanning tree by definition would be a minimum spanning tree.
Graph Prim(Graph input):
Graph MST = new Graph();
PriorityQueue<Edge> candidateEdges;
Node anyNode = input.pickAnyNodeAtRandom()
candidateEdges.putAll(input.edges.get(anyNode));
while MST.nodes.size() < input.nodes.size():
edge = candidateEdges.takeLowest() // THIS IS THE IMPORTANT PART
if edge.v1 in MST.nodes and edge.v2 not in MST.nodes:
MST.nodes.add(edge.v2)
MST.edges.add(edge)
candidateEdges.add(edge.v2.edges)
Basically, at each step in the algorithm, you're looking for the minimum edge with one vertex in the partial minimum spanning tree, and one vertex not in the tree, and you're going to add said edge to the tree. How do you do that efficiently? If you have a way to efficiently order all of the edges connected to a vertex in your partial spanning tree, you can simply iterate through them until you find an edge with an acceptable vertex.
Without such an ordered data structure, you'd have to iterate through all candidate edges each time to find the minimum, rather than being able to efficiently grab the minimum directly.
Prim's algorithm uses two Sets - lets say U and V/U.
You are starting from the root, (root is the only element in U).
You place all the vertexes adjacent to it in the queue, with weight[v] = dist[root,v] where v is adjacent to root.
So when you are popping from the queue, you are taking the vertex (lets say u) that has one end in U and end in V/U and is the smallest with that property.
You set its weight, its parent to be root and etc... and put all its ajdacent nodes in the queue. So now the queue has all the nodes ajdacent to root and all the nodes the ajdacent to root and all the nodes ajdacent to u with their respective weights. So when you pop from it, you will once more get a node from V/U which is 'closest' to U.
In the implementation, they are initially adding every vertex to the queue with INFINITY priority, but they are gradually updating the weights, as you can see. This reflects in the priority queue as well, guaranteeng the text above.
Hope it helps.