Subset of vertices - c++

I have a homework problem and i don't know how to solve it. If you could give me an idea i would be very grateful.
This is the problem:
"You are given a connected undirected graph which has N vertices and N edges. Each vertex has a cost. You have to find a subset of vertices so that the total cost of the vertices in the subset is minimum, and each edge is incident with at least one vertex from the subset."
Thank you in advance!
P.S: I have tought about a solution for a long time, and the only ideas i came up with are backtracking or an minimum cost matching in bipartite graph but both ideas are too slow for N=100000.

This may be solved in linear time using dynamic programming.
A connected graph with N vertices and N edges contains exactly one cycle. Start with detecting this cycle (with the help of depth-first search).
Then remove any edge on this cycle. Two vertices incident to this edge are u and v. After this edge removal, we have a tree. Interpret it as a rooted tree with the root u.
Dynamic programming recurrence for this tree may be defined this way:
w0[k] = 0 (for leaf nodes)
w1[k] = vertex_cost (for leaf nodes)
w0[k] = w1[k+1] (for nodes with one descendant)
w1[k] = vertex_cost + min(w0[k+1], w1[k+1]) (for nodes with one descendant)
w0[k] = sum(w1[k+1], x1[k+1], ...) (for branch nodes)
w1[k] = vertex_cost + sum(min(w0[k+1], w1[k+1]), min(x0[k+1], x1[k+1]), ...)
Here k is the node depth (distance from root), w0 is cost of the sub-tree starting from node w when w is not in the "subset", w1 is cost of the sub-tree starting from node w when w is in the "subset".
For each node only two values should be calculated: w0 and w1. But for nodes that were on the cycle we need 4 values: wi,j, where i=0 if node v is not in the "subset", i=1 if node v is in the "subset", j=0 if current node is not in the "subset", j=1 if current node is in the "subset".
Optimal cost of the "subset" is determined as min(u0,1, u1,0, u1,1). To get the "subset" itself, store back-pointers along with each sub-tree cost, and use them to reconstruct the subset.

Due to the number of edges are strict to the same number of vertices, so it's not the common Vertex cover problem which is NP-Complete. I think there's a polynomial solution here:
An N vertices and (N-1) edges graph is a tree. Your graph has N vertices and N edges. Firstly find the awful edge causing a loop and make the graph to a tree. You could use DFS to find the loop (O(N)). Removing any one of the edges in the loop would make a possible tree. In extreme condition you would get N possible trees (the raw graph is a circle).
Apply a simple dynamic planning algorithm (O(N)) to each possible tree (O(N^2)), then find the one with the least cost.

Related

how to find S or less nodes in tree with minimum distance? [duplicate]

Given an unoriented tree with weightless edges with N vertices and N-1 edges and a number K find K nodes so that every node from a tree is within S distance of at least one of the K nodes. Also, S has to be the smallest possible S, so that if there were S' < S at least one node would be unreachable in S' steps.
I tried solving this problem, however, I feel that my supposed solution is not very fast.
My solution:
set x=1
find nodes which are x distance from every node
let the node which has the most nodes in its distance be one of the K nodes.
recompute for every node whilst not counting already covered nodes.
do this till I find K number of K nodes. Then if every node is covered we are done else increase x.
This problem is called p-center, and you can find several papers online about it such as this. It is indeed NP for general graphs, but polynomial on trees, both weighted and unweighted.
For me it looks like a clustering problem. Try it with the k-Means (wikipedia) algorithm where k equals to your K. Since you have a tree and all vertices are connected, you can use as distance measurement the distance/number of edges between your vertices.
When the algorithm converts you get the K nodes which should be found. Then you can determine S by iterating through all k clusters. There you calculate the maximum distance for every node in the cluster to the center node. And the overall max should be S.
Update: But actually I see that the k-means algorithm does not produce a global optimum, so this algorithm wouldn't also produce the best result ...
You say N nodes and N-1 vertices so your graph is a tree. You are actually looking for a connected K-subset of nodes minimizing the longest edge.
A polynomial algorithm may be:
Sort all your edges increasing distance.
Then loop on edges:
if none of the 2 nodes are in a group, create a new group.
else if one node is in 1 existing goup, add the other to the group
else both nodes are in 2 different groups, then fuse the groups
When a group reach K, break the loop and you have your connected K-subset.
Nevertheless, you have to note that your group can contain more than K nodes. You can imagine the problem of having 4 nodes, closed two by two. There would be no exact 3-subset solution of your problem.

BGL - determine all mincuts

Given a graph, I want to find all edges (if any), that if removed spilt the graph into two components.
An initial idea would have been to assign a weight of 1 to all edges, then calculate the mincut of the graph. mincut > 1 implies there is no single edge that when removed causes a split.
For mincut == 1, it would have been nice if the algorithm would provide for each mincut the edges it consists of.
Unfortunately, BGL does not seem to support that kind of thing:
The stoer_wagner_min_cut function determines exactly one of the min-cuts as well as its weight.
(http://www.boost.org/doc/libs/1_59_0/libs/graph/doc/stoer_wagner_min_cut.html)
Is there a way to make this work (i.e. to determine more than one mincut) with the BGL or will I have to come up with something different?
This may come a little bit late...
From what I see you only need to find all edges that don't belong to any cycle in the graph(assuming the graph is already connected).
This can be done by iteratively removing leaf nodes(and the edges connected to a them), much like what you do in topological sorting, until there's no leaf node left i.e. every edge in the remaining graph belongs to at least one cycle. All edges removed during the process will be the ones you want.
In pseudocode, for a connected undirected graph G=(V,E), you can do this:
S = Ø
while(there exists a node n∈V s.t. degree(n)==1)
e = edge connected to n
S = S∪{e}
E = E-{e}
V = V-{n}
return S
which can be done in O(|V|+|E|) time

Domino path algorithm

I've got some (I hope) easy question about algorithm that can resolve the "domino path" problem. I'm looking for resolution that will resolved this problem in less then O(n^2) complexity.
I've got a group of n-points (n in [1,100 000], every point is diffrent) with x and y coords:
(0,1)
(1,3)
(1,2)
(2,4)
(3,5)
(4,2)
(5,0)
I'm looking for the "path" from start point (0,y) to the end point (x,0) (the other point need to be sticked like domino block). In this example the path will be looking like this: (0,1) > (1,3) > (3,5) > (5,0). If the points will create more than one path - choose any of them. Could it be done is less than O(n^2)?
EDIT: Thanks for graph algorithm, but can it be done without it? I'm looking for some tricky recurrence algorith or something like this.
Yes. You should read up on Dijkstra's algorithm which runs in O(E+V log V) where E is the number of edges in your graph and V is the number of vertices. A breadth-first search would also work, since the graph is unweighted. That would run in O(E+V) time.
Though these are common ways of solving this problem, they are by no means the only ones.
We can solve the problem in O(n) using a queue, similar to a Breadth First Search. You do not need Dijkstra's algorithm, but you do need to store your input as a graph.
Pseudocode:
Q = a FIFO queue
Enque all pairs (0, y)
while Q not empty:
remove an element p from Q
Enque all unvisited nodes reachable from p
if you enqued a (x, 0) node, you have a solution
You can get the path back by keeping track of your distances for example. If d[i] = cost to reach node i, then find a node k connected to your solution node for which d[k] + 1 = d[solution], then repeat the process for k.

Graph theory: Breadth First Search

There are n vertices connected by m edges. Some of the vertices are special and others are ordinary. There is atmost one path to move from one vertex to another.
First Query:
I need to find out how many pairs of special vertices exists which are connected directly or indirectly.
My approach:
I"ll apply BFS (via queue )to see how many nodes are connected to each other somehow. Let number of special vertices I discover in this be n, then answer to my query would be nC2. I'll repeat this till all vertices are visited.
Second Query:
How many vertices lie on path between any two special vertices.
My approach:
In my approach for query 1, I'll apply BFS to find out path between any two special vertices and then backtrack and mark the vertices lying on the path.
Problem:
Number of vertices can be as high as 50,000. So, applying BFS and then I guess, backtracking would be slower for my time constraint (2 seconds).
I have list of all vertices and their adjacency list. Now while pushing vertices in my queue while BFS, can I somehow calculate answer to query 2 also? Is there a better approach one can use to solve the problem? Input format will be such that I'll be told whether a vertex is special or not one by one and then I'll be given info about i th pathway which connects two vertices.There is atmost one path to move from one vertex to another.
The first query is solved by splitting your forest in trees.
Starting with the full set of vertices, pick the one, then visit every node you can from there, until you cannot visit any more vertex. This is one tree. Repeat for each tree.
You now have K bags of vertices, each containing 0-j special ones. That answers the first question.
For the second question, I suppose a trivial solution is indeed to BFS the path between a vertex to another for each pair in their sub-graph.
You could also take advantage of the tree nature of your sub-graph. This question: How to find the shortest simple path in a Tree in a linear time? mentions it. (I have not really dug into this yet, though)
For the first query, one round of BFS and some simple calculation as you have described is optimal.
For the second query, assuming the worst case where all vertices are special and the graph is a tree, doing a BFS per query is going to give O(Q|V|) complexity, where Q is the number of queries. You are going to run into trouble if Q is larger than 104 and |V| is also larger than 104.
In the worst case, we are basically solving the all pairs shortest path problem, but on a tree/forest. When |V| is small, we can do BFS on all nodes, which results in O(|V|2) algorithm. However, there is a faster algorithm:
Read all second type queries and store all the pairs in a set S.
For each tree in the forest:
Choose a root node for the current tree. Calculate the distance from the root node to all the other nodes in the current tree (regardless it is special or not).
Calculating lowest common ancestor (LCA) for all pairs of nodes being queried (which are stored in set S). This can be done with Tarjan's offline LCA algorithm.
Calculate the distance between a pair of node by: dist(root, a) + dist(root, b) - dist(root, lca(a,b))
Let the arr be bool array where arr[i] is 1 if it is special and 0 otherwise.
find-set(i) returns the root node of the tree. So any nodes lying in the same tree returns the same number.
for(int i=1; i<n; i++){
for(int j=i+1; j<=n; j++){
if(arr[i]==1 && arr[j]==1){ //If both are special
if(find-set(i)==find-set(j)){ //and both i and j belong to the same tree
//k++ where k is answer to the first query.
//bfs(i,j) and find the intermediate vertices and do ver[i]=1 for the corresponding intermediate vertex/node.
}
}
}
}
finally count no of 1's in ver matrix which is the answer to the second query.

Why do we need a priority queue in Prim's Algorithm

As my question speaks I want to know why do we use Priority queue in Prim's Algorithm?
How does it saves us from using the naive way (yes I've heard of it but don't know why).
I'd be very happy if anyone could explain step by step for adjacency list . I am using Cormen's book.
The pseudocode :
Prim(G,w,r) //what is w (weight?) and r?
For each u in V[G]
do key[u] ← ∞ // what is key?
π[u] ← NIL
key[r] ← 0
Q ← V[G]
While Q ≠ Ø
do u ← EXTRACT-MIN(Q)
for each v in Adj[u]
if v is in Q and w(u,v) < key[v]
then π[v] ← u
key[v] ← w(u,v)
I am thinking to use std::vector then std::make_heap(); as priority queue for storing edges.
In prim's algorithm, there is a step where you have to get the 'nearest' vertex. This step would cost O(N) if using normal array, but it'd take only O(logN) if you use priority queue (heap for example)
Hence, the reason for using priority queue is to reduce the algorithm's time complexity (which mean it make your program run faster)
**
Update:
**
Here is Prim's algorithm's description from Wikipedia. The bold part is the part for finding nearest vertex I talked about:
Input: A non-empty connected weighted graph with vertices V and edges E (the weights can be negative).
Initialize: Vnew = {x}, where x is an arbitrary node (starting point) from V, Enew = {}
Repeat until Vnew = V:
Choose an edge (u, v) with minimal weight such that u is in Vnew and v is not (if there are multiple edges with the same weight, any of them may be picked)
Add v to Vnew, and (u, v) to Enew
Output: Vnew and Enew describe a minimal spanning tree
You don't "need" it. In fact, a naive implementation of Prim's algorithm would simply do a linear search of the array of distances to find the next nearest vertex. Dijkstra's algorithm works the exact same way.
The reason why people use it is because it significantly speeds up the runtime of the algorithm. It turns from O(V^2 + E) to O(E*log(V)).
The key to this is the EXTRACT-MIN(Q) function. If you do it naively, this operation would take O(V) time. With a heap, it only takes O(logV) time.
Doing this roughly from memory, so it may be slightly inconsistent, but it gets the point across:
class Graph
Set<node> nodes; // The set of nodes in the graph
MultiMap<Node, Edge> edges; // Map from Node, to a list of weighted edges connected to the node. If it weren't weighted, any spanning tree by definition would be a minimum spanning tree.
Graph Prim(Graph input):
Graph MST = new Graph();
PriorityQueue<Edge> candidateEdges;
Node anyNode = input.pickAnyNodeAtRandom()
candidateEdges.putAll(input.edges.get(anyNode));
while MST.nodes.size() < input.nodes.size():
edge = candidateEdges.takeLowest() // THIS IS THE IMPORTANT PART
if edge.v1 in MST.nodes and edge.v2 not in MST.nodes:
MST.nodes.add(edge.v2)
MST.edges.add(edge)
candidateEdges.add(edge.v2.edges)
Basically, at each step in the algorithm, you're looking for the minimum edge with one vertex in the partial minimum spanning tree, and one vertex not in the tree, and you're going to add said edge to the tree. How do you do that efficiently? If you have a way to efficiently order all of the edges connected to a vertex in your partial spanning tree, you can simply iterate through them until you find an edge with an acceptable vertex.
Without such an ordered data structure, you'd have to iterate through all candidate edges each time to find the minimum, rather than being able to efficiently grab the minimum directly.
Prim's algorithm uses two Sets - lets say U and V/U.
You are starting from the root, (root is the only element in U).
You place all the vertexes adjacent to it in the queue, with weight[v] = dist[root,v] where v is adjacent to root.
So when you are popping from the queue, you are taking the vertex (lets say u) that has one end in U and end in V/U and is the smallest with that property.
You set its weight, its parent to be root and etc... and put all its ajdacent nodes in the queue. So now the queue has all the nodes ajdacent to root and all the nodes the ajdacent to root and all the nodes ajdacent to u with their respective weights. So when you pop from it, you will once more get a node from V/U which is 'closest' to U.
In the implementation, they are initially adding every vertex to the queue with INFINITY priority, but they are gradually updating the weights, as you can see. This reflects in the priority queue as well, guaranteeng the text above.
Hope it helps.