How to compare nodes in minHeapify for minimum spanning tree? - c++

I am trying to create a minimum spanning tree using prim's algorithm and I have a major question about the actual heap. I structured my graphs adjacency list to be a vector of vertexes, and each vertex has a vector of edges. The edges contain a weight, a connecting vertex, and a key. I am not sure whether my heap should be a heap of vertexes or edges. If I make it a heap of vertexes then there is no way to determine whether the weights are going from the same parent and destination vertexes, which makes me think that I should be making a heap for each vertexes list of edges. So my final question is should I be creating a heap of edges, or a heap of vertexes? If its a list of edges, should I be using the weight on the edges as the key, or should I have a separate data member called key that I can actually use for the priority queue? Thanks!

You should make a minHeap of edges since you are going to sort edges by their weight but the edges should contain two vertexes: representing one vertex on each end. Otherwise, as you suggested: there is no way to determine whether the weights are going from the same parent and destination vertexes. Therefore you should re-structure your edge class and make a minHeap of them.
Consider the algorithm from Wiki as well.
Initialize a tree with a single vertex, chosen
arbitrarily from the graph.
Grow the tree by one edge: Of the edges
that connect the tree to vertices not yet in the tree, find the
minimum-weight edge, and transfer it to the tree.
Repeat step 2 (until all vertices are in the tree).
I don't fully understand the key field in the edge class. I assume it's like an Id to the edge. So you should make a heap of them but since you are providing user-defined data structure to the heap, you should also provide a comparison function for the edge class, i.e. define the bool operator<(const Edge&) method.

Your heap could be of pairs <vertex, weight>, and will contain vertices, which are a single edge away from any vertex already in the partial minimum spanning tree. (edit: in some cases it may contain a vertex which is already in the partial MST, you should ignore such elements when they pop out).
It could be a heap of edges like <src, dst, weight>, which is practically the same, you just ignore src while dst is the same as vertex in the first variant.
PS. Regarding that key thing, I see no need for any keys, you need to compare weights.

The heap must maintain the vertices with key as the smallest weighted edge to it. As the vertex is still not visited hence any edge to it will be unvisited hence the minimum of all unvisited edge to unvisited vertex will be the next edge to be added to spanning hence you remove the vertex corresponding to it. The only problem here is to maintain the updated weights to minimum edges to a vertex in heap as the spanning tree changes in every iteration and new edges are added to it. The way to do it is to keep the position of each unvisited vertex in the heap, when a new vertex is added to spanning tree the unvisited edges from it are updated using the direct position of vertex they are pointing to using stored positions. Then you update the vertex minimum cost if the current cost is less that new edge weight added. Then bubble it up the heap using standard procedure of heap to maintain the min heap.
DataStructure: -
<Vertex,Weight> : Vertex id & weight of minimum edge to it as record of heap
position[Vertex] : The position of vertex record in heap.
Note: inbuilt function wont help you here hence you need to build your own heap to make this work efficiently.Initialize the key values of each vertex to some infinite value at the start
Another Approach: Store the all edge which point to unvisited vertex with weight in min heap. But that would require higher space complexity then other approach but has similar amortized time complexity. When you extract a edge check if the vertex it is pointing to is visited or not, if visited extract again and discard the edge.

Related

Boost Graph find neighbours of a group of vertices

I have an undirected_unweighted_graph graph; which is defined as follows:
typedef typename boost::adjacency_list<boost::vecS,boost::vecS,boost::undirectedS,boost::no_property,boost::no_property> undirected_unweighted_graph;
It has several vertices which are interconnected by undirected edges.
During my algorithm, I'm searching for a connected subgraph of graph which only contains some of the vertices, which has certain properties.
I'm using a linear optimization software package which provides me with possible optimal solutions for my problem. A solution consists of a set of vertices with a fixed size n and might be infeasible (i.e. the vertices are not connected in the corresponding subgraph of graph). I'm currently generating a new graph with the vertices of the solution and adding the edges which are also present in graph. I'm using boost::connected_components() to calculate the connected components for it.
Now I come to my question:
The next step for me is to improve the performance of generating a solution by imposing a constraint. Specifically, I will "grow" a solution, starting from a single node and ending with a subgraph of n nodes. At each stage, a partial solution will grow by adding one of its neighbors. (The idea is that if a partial solution can grow to a full solution, then at least one of its neighbors will be in the full solution.) How can I identify these neighbors?
My approach is the following:
I'm iterating over each component and then iterate over boost::out_edges(v, g). I then have to check if the neighbor is part of my component or not. If it is not part of the component I add it to the component neighbor group. I wonder if there is any way in boost to iterate over boost::out_edges(V, g) for a list of vertices V.
EDIT
To be more concrete: Given a graph, I am able to iterate over the neighbors of a given vertex like this:
for (auto edge: boost::make_iterator_range(boost::out_edges(v, graph))) {
//do stuff
}
What if I have a connected component, say a vector of vertices std::vector<size_t> component. What I want are the outgoing edges of the component meaning all outgoing edges of the vertices excluding those which are between two vertices of component. Is there an elegant way to get those edges efficiently?
I would not iterate over multiple vertices. Instead, I would maintain two sets of vertices – one containing the vertices in the current partial solution, and one containing the vertices adjacent to (and not in) the partial solution. When the linear optimization package adds a vertex to the partial solution, that vertex should also be moved from the set of adjacencies to the set of vertices in the solution. Next the edges coming from the new vertex need to be iterated over, but only those from the new vertex. For each vertex adjacent to the new one, if it is not in the partial solution then add it to the set of adjacent vertices.
I would also try something similar using just one set containing both the vertices in the partial solution and those adjacent to the partial solution. Less overhead. Depending on what the surrounding code expects, this set might work as well as the one with just the adjacent vertices.
The advantage of this approach is that you eliminate repetitive work. If you already looked at all the neighbors of vertex A, why should you need to look at them again just because vertex B was added to your set?
A disadvantage of this approach is that you might require significant memory overhead if you need to backtrack at times (think depth-first search and maintaining a stack of these sets). How bad this is depends on how big n is and, on average, how many edges connect to each vertex. Even in a bad case, some clever elimination of redundancy might salvage this approach, but I'll leave that for later.

Does time complexity of dijkstra's algorithm for shortest path depends on data structure used?

One way to store the graph is to implement nodes as structures, like
struct node {
int vertex; node* next;
};
where vertex stores the vertex number and next contains link to the other node.
Another way I can think of is to implement it as vectors, like
vector<vector< pair<int,int> > G;
Now, while applying Dijkstra's algorithm for shortest path, we need to build priority queue and other required data structures and so as in case 2 (vector implementation).
Will there be any difference in complexity in above two different methods of applying graph? Which one is preferable?
EDIT:
In first case, every node is associated with a linked list of nodes which are directly accessible from the given node. In second case,
G.size() is the number of vertices in our graph
G[i].size() is the number of vertices directly reachable from vertex with index i
G[i][j].first is the index of j-th vertex reachable from vertex i
G[i][j].second is the length of the edge heading from vertex i to vertex G[i][j].first
Both are adjacency list representations. If implemented correctly, that would be expected to result in the same time complexity. You'd get a different time complexity if you use an adjacency matrix representation.
In more detail - this comes down to the difference between an array (vector) and a linked-list. When all you're doing is iterating through the entire collection (i.e. the neighbours of a vertex), as you do in Dijkstra's algorithm, this takes linear time (O(n)) regardless of whether you're using an array or linked-list.
The resulting complexity for running Dijkstra's algorithm, as noted on Wikipedia, would be
O(|E| log |V|) with a binary heap in either case.

Datastructure for undirected graph edges with constant time complexity

I have a undirected graph where the nodes are stored in a flat array. Now I am looking for a data structure for the edges. It should have constant time complexity for getting all edges of a given node. An edge contains two node indices and additional information such as a weight.
The only way I see is duplicating the data, one sorted by the left node and another sorted by the right node.
vector<vector<int>> left, right;
But I would like to prevent duplicating the edges.
It sounds like you just want an adjacency list representation.
In this representation, each node would store a list of all its connected edges.
For an undirected graph, you can have each endpoint both store the edge.
There isn't really a way to get the connected edges for a node in constant time without some duplication. But you can just store a pointer, reference or unique ID (which can be an index in an edge array, for example) to the actual edge, preventing the need to actually have 2 copies of it floating around.
Make a vector of vectors.
Each node will have a vector of all the nodes it has.
You should build this during the graph creation.

Graph Representation in C++

I am going through a book The Design and Analysis of Computer Algorithms Reading through the Graph chapter, I am trying to implement DFS. By Reading definition of this algorithm it says, Graph G=(V,E) partiions the edges in E into two sets T and B. An Edge (v,w) is place in set T if vertes w has not been previously visited when we are at vertex v considering edged (v,w) , otherwise edge `(v,w) is place in set B.
Basically his algorithm of DFS will give me new Graph which will be G=(V,T). I want to know how one would implement this in C++.
I tried using adjacency list, but I am confuse is there a need of storing edges of just a map of list should be fine.
In VTK, edges are stored in a vector, and it always stores a pair (v,w). Near this vector there are 2 other vector of vectors to store in and out edges of graph nodes. When a new edge is added, it added to edge vector, its nodes (v,w) are added to in and out edges vector of vectors, too.
I am not quite clear about what your exact question is. I assume that you are asking about how to maintain two sets T and B to distinguish edges that have been visited from edges that have been not during DFS. I think the easiest way to do so is to add a bool field "visited" to the node struct in your adjacency list. Initial value of this field for all nodes are "false". Suppose in the above case, when DFS come to v, and the edge (v,w) is not visited, then the node on the list of v that corresponds to w would have a value "false" for "visited" at that time. Otherwise it will have a value of "true".
I think the author just try to give you the idea that edges will be categorized into two kinds: visited and not visited at the end of DFS. But I don't think keep two explicit sets maintaining those two kinds of edges are necessary. You can always print the visited edges after DFS according to their updated "visited" value.

Finding edge in weighted graph

I have a graph with four nodes, each node represents a position and they are laid out like a two dimensional grid. Every node has a connection (an edge) to all (according to the position) adjacent nodes. Every edge also has a weight.
Here are the nodes represented by A,B,C,D and the weight of the edges is indicated by the numbers:
A 100 B
120 220
C 150 D
I want to structure a container and an algorithm that switches the nodes sharing the edge with the highest weight. Then reset the weight of that edge. No node (position) can be switched more than once each time the algorithm is executed.
For example, processing the above, the highest weight is on edge BD, so we switch those. Since no node can be switched more than once, all edges involved in either B or D is reset.
A D
120
C B
Then, the next highest weight is on the only edge left, switching those would give us the final layout: C,D,A,B.
I'm currently running a quite awful implementation of this. I store a long list of edges, holding four values for the nodes they are (potentially) connected to, a value for its weight and the position for the node itself. Every time anything is requested, I loop through the entire list.
I'm writing this in C++, could some parts of the STL help speed this up? Also, how to avoid the duplication of data? A node position is currently in five objects. The node itself that is there and the four nodes indicating a connection to it.
In short, I want help with:
Can this be structured in a way so that there is no data duplication?
Recognise the problem? If any of this has a name, tell me so I can google for more info on the subject.
Fast algorithms are always nice.
As for names, this is a vertex cover problem. Optimal vertex cover is NP-hard with decent approximation solutions, but your problem is simpler. You're looking at a pseudo-maximum under a tighter edge selection criterion. Specifically, once an edge is selected every connected edge is removed (representing the removal of vertices to be swapped).
For example, here's a standard greedy approach:
0) sort the edges; retain adjacency information
while edges remain:
1) select the highest edge
2) remove all adjacent edges from the list
endwhile
The list of edges selected gives you the vertices to swap.
Time complexity is O(Sorting vertices + linear pass over vertices), which in general will boil down to O(sorting vertices), which will likely by O(V*log(V)).
The method of retaining adjacency information depends on the graph properties; see your friendly local algorithms text. Feel free to start with an adjacency matrix for simplicity.
As with the adjacency information, most other speed improvements will apply best to graphs of a certain shape but come with a tradeoff of time versus space complexity.
For example, your problem statement seems to imply that the vertices are laid out in a square pattern, from which we could derive many interesting properties. For example, that system is very easily parallelized. Also, the adjacency information would be highly regular but sparse at large graph sizes (most vertices wouldn't be connected to each other). This makes the adjacency matrix give a high overhead; you could instead store adjacency in an array of 4-tuples as it would retain fast access but almost entirely eliminate overhead.
If you have bigger graphs look into the boost graph library. It gives you good data structures for graphs and basic iterators for different types of graph traversing