Parallelizing Boruvka with openMP - c++

I have implemented Boruvka's algorithm sequentially in C++ and I know one of the advantages of the algorithm is that it can easily be paralleled. I am trying to do this using openMP, but I can't figure out how to get it to work. I read in an adjacency list from graph.txt and print my output of the minimum spanning tree into mst.txt. Here is my sequential code for boruvka:
#include <iostream>
#include <fstream>
#include <sstream>
using namespace std;
// initialize data structure for edges (given in adjacency list)
struct Edge {
int v1, v2, weight; // 2 connecting verticies and a weight
};
// initialize structure for the graph
struct Graph {
int vertex, edge;
Edge* e; // undirected graph so edge from v1 to v2 is same as v2 to v1
};
// Creates a graph for #verticies and #edges using arrays
struct Graph* formGraph(int vertex, int edge)
{
Graph* graph = new Graph;
graph->vertex = vertex;
graph->edge = edge;
graph->e = new Edge[edge]; // again, v1-v2 = v2-v1
return graph;
}
// initialize structure for subsets within the graph
struct Subset {
int parent, rank; // rank will act as counter
};
// will help to find lightest edge of sets recursively
int find(struct Subset subset[], int i)
{
if (subset[i].parent != i) {
subset[i].parent = find(subset, subset[i].parent);
}
// once it is =1
return subset[i].parent;
}
// A function that does union of two sets
void Union(struct Subset subs[], int set1, int set2)
{
int root1 = find(subs, set1);
int root2 = find(subs, set2);
//union by ranking
if (subs[root1].rank < subs[root2].rank) { // if rank2 is higher thats parent
subs[root1].parent = root2;
}
else if (subs[root1].rank > subs[root2].rank) { // if rank1 is higher thats parent
subs[root2].parent = root1;
}
else // ranks are the equal so increment rank by 1
{
subs[root2].parent = root1;
subs[root1].rank++;
}
}
// the boruvka algorithm implementation
void boruvka(struct Graph* graph) {
// set data of initial graph
int vertex = graph->vertex;
int edge = graph->edge;
Edge* e = graph->e;
//initially there will always be as many subsets as there are vertices
struct Subset *subs = new Subset[vertex];
int *lightest = new int[vertex]; // array storing least weight edge
// subset for each vertex
for (int v = 0; v < vertex; v++)
{
subs[v].parent = v; // initial parent (none)
subs[v].rank = 0; // initial rank (no parent so always 0)
lightest[v] = -1; // start from -1
}
int components = vertex; // iniitial trees = number of verticies
int minWeight = 0;
// must keep going until there is only one tree
while (components > 1)
{
// lightest weight for all edges
for (int i=0; i<edge; i++)
{
// gets subsets for edges that could connect
int set1 = find(subs, e[i].v1);
int set2 = find(subs, e[i].v2);
// waste of time if they're already in same set so don't check
if (set1 == set2)
continue;
// if different then check which one is lightest
else
{
if (lightest[set1] == -1 || e[lightest[set1]].weight > e[i].weight) {
lightest[set1] = i;
}
if (lightest[set2] == -1 || e[lightest[set2]].weight > e[i].weight) {
lightest[set2] = i;
}
}
}
// making sure the wieghts are added
for (int i=0; i<vertex; i++)
{
// make sure all lightest edges are included
if (lightest[i] != -1)
{
int s1 = find(subs, e[lightest[i]].v1);
int s2 = find(subs, e[lightest[i]].v2);
if (s1 == s2)
continue;
minWeight += e[lightest[i]].weight;
// Need to sort output lexicographically!?!?!?!?!!
printf("Edge %d-%d included in MST with weight %d\n", // prints verices and weight of edge
e[lightest[i]].v1, e[lightest[i]].v2,
e[lightest[i]].weight);
// union subsets together, decrease component number
Union(subs, s1, s2);
components--;
}
lightest[i] = -1; // in case after first iteration lightest edges fall in same subset
}
}
printf("Weight of MST is %d\n", minWeight);
return;
}
// main function for calling boruvka
int main() {
ifstream infile;
char inputFileName[] = "graph.txt"; // input filename here
infile.open(inputFileName, ios::in);
string line;
getline(infile, line);
int V = atoi(line.c_str()); // set num of vertices to first line of txt
getline(infile, line);
int E = atoi(line.c_str()); // set num of edges to second line of txt
// create graph for boruvka
struct Graph* graph = formGraph(V, E);
if (infile.is_open()) {
string data[3]; // initialize data array
int count = 0; // initialize counter
while (infile.good()) { // same as while not end of file
getline(infile, line);
stringstream ssin(line);
int i = 0;
while (ssin.good() && i < 3) {
ssin >> data[i];
i++;
}
graph->e[count].v1 = atoi(data[0].c_str());
graph->e[count].v2 = atoi(data[1].c_str());
graph->e[count].weight = atoi(data[2].c_str());
count++;
}
}
freopen("mst.txt","w",stdout); // writes output into mst.txt
// call boruvka function
boruvka(graph);
infile.close(); // close the input file
return 0;
}
An example of my graph.txt is this:
9
14
0 1 4
7 8 7
1 2 8
1 7 11
2 3 7
2 5 4
2 8 2
3 4 9
3 5 14
4 5 10
5 6 2
6 7 1
6 8 6
0 7 8
The output for this example which is correct that is placed in my mst.txt is this:
Edge 0-1 included in MST with weight 4
Edge 2-8 included in MST with weight 2
Edge 2-3 included in MST with weight 7
Edge 3-4 included in MST with weight 9
Edge 5-6 included in MST with weight 2
Edge 6-7 included in MST with weight 1
Edge 1-2 included in MST with weight 8
Edge 2-5 included in MST with weight 4
Weight of MST is 37

According to the algorithm, in each iteration, each tree in the forest will have one and only one edge added to the forest independently (edges from different trees could be the same), until the added edges connect the whole forest into a single tree.
Here you can see finding the only edge for each tree can be done in parallel. As long as you have more than one tree, you could use multiple threads to speed up the searching.

if you're interested, I've written an implementation of the parallel Boruvka's algorithm using OpenMP.
We store the graph as an edge list (edges) where each edge (u, v) appears twice: as an edge from u and from v. At each step of the algorithm, edges is sorted in O(E log E) = O(E log V) time.
Then edges are split between P processors. Each one of them calculates the array of shortest edges from its local nodes. Because allocating raw memory for all nodes is done in constant time, we can simply store this as an array and avoid using hashmaps. Then we merge the results between processors into a global shortest edge array using compare and swap. Note that because we sorted the edge list previously, all edges from u make up a continuous segment in edges. Because of this, the total number of extra iterations in the cas loop does not exceed O(P) which gives us O(E / P + P) = O(E / P) time for this step.
After that, we can merge components along the added edges in O(V * alpha(V) / P) time using a parallel DSU algorithm.
The next step is updating the list of vertices and edges, this can be done using parallel cumulative sum in O(V / P) and O(E / P) respectively.
Since the total number of iterations is O(log V), the overall time complexity is O(E log^2 V / P).

Related

How to implement Dijkstra Algorithm for finding shortest path from 1 node to all other in an undirected graph in C and print the distances as well

There is an undirected graph. You need to store all edge weights in a two-dimensional array cost[][], and calculate the shortest distance from the source node 0 to all other nodes. Suppose there are at most 100 nodes. If there is no edge between two nodes, we set their weight to a very large number, MAX_DIS=999999, to denote these two nodes are not connected directly.
In this exercise, you are required to fulfill the following two tasks.
Initialize Cost Array
Initially, we have the array cost[100][100] to store the edge cost. We input the total nodes number n and edges number m, and the input all edges with <x,y,w> format, where w is the weight of edge (x,y). If there is no edge between two nodes, we set the cost MAX_DIS.
Calculate Shortest Distance.
With the cost array, we need to compute the shortest distance between node 0 and all other nodes. Also, we need to initialize the distance array distance[100] at first. Then in each loop, we first find the min distance distance[w] and update other distance distance[v] if node v is adjacent to w.
//Below is my code for this challenge, but it is not working properly for all the test cases. It works fine for some but I can't figure out where is the problem. I hope this is a good challenge to be solved and that is why I am posting it here. Can you guys help me debug this code...
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#define MAX_NODES 100
#define MAX_DIS 999999
int cost[MAX_NODES][MAX_NODES];
int distance[MAX_NODES];
void initial(int m, int n);
void Dijkstra(int n);
void initial(int m, int n)
{
/*
let user input all edges and their weights and initialize cost[][].
note that if no edge between (x,y), set cost[x][y]=MAX_DIS
and cost[a][b]=cost[b][a] for the undirected graph.
Fill in your code here...
*/
int i,j;
for(i=0;i<n;i++){
for(j=0;j<n;j++){
cost[i][j] = MAX_DIS;
}
}
cost[0][0] = 0;
int weight,x,y;
for(i=0; i < m; i++){
scanf("%d %d %d", &x,&y,&weight);
cost[x][y] = weight;
cost[y][x] = weight;
}
}
void Dijsktra(int n)
{
/*
Fill in your code here...
calculate the distance from node 0 to all other nodes.
*/
int i;
int S[n];
S[0] = 1;
int all_visited = 0;
for(i=1;i<n;i++){
S[i] = -1;
}
for(i=0;i<n;i++){
distance[i] = cost[0][i];
}
while(all_visited != 1){
int temp = MAX_DIS;
int pos = -1;
for(i=1;i<n;i++){
if(S[i] == -1 && cost[0][i] <= temp){
temp = cost[0][i];
pos = i;
}
}
S[pos] = 1;
for(i=0;i<n;i++){
if(S[i] == -1)
break;
}
if(i==n)
all_visited = 1;
for(i=1; i<n; i++){
distance[i] = (int)fmin(distance[i], distance[pos] + cost[pos][i]);
}
}
}
int main()
{
int m,n;
printf("Input the number of nodes:\n");
scanf("%d",&n);
printf("Input the number of edges:\n");
scanf("%d",&m);
printf("Input these edges:\n");
initial(m,n);
Dijsktra(n);
for(int i=0;i<n;i++)
printf("%d ",distance[i]);
return 0;
}
This is test case for which my code is failing -
Input the number of nodes:
8
Input the number of edges:
10
Input these edges:
0 1 2,
1 2 9,
2 3 4,
3 5 7,
2 4 8,
5 6 10,
6 7 8,
7 5 1,
7 3 4,
0 4 10
Expected output - 0 2 11 15 10 20 27 19
My output - 0 2 11 15 10 999999 999999 999999
use a break statement in this loop.
for(i=1;i<n;i++){
if(S[i] == -1 && cost[0][i] <= temp){
temp = cost[0][i];
pos = i;
break; //here
}
}

How can I return all the shortest paths with tied weight using bellman-ford algorithm?

As title says, what I'm looking for is printing "all the shortest paths" that are tied by weight.
Example:
We have a graph with edges going from 0 -> 1 -> 3 that has 6 as weight, but we also have the path 0 -> 3 that has 6 as weight as well, the algorithm below only returns the first path, I would like to know if it is possible to return both or all the paths alike. Also is there a more efficient/elegant way of printing the shortest path. I took this code as example only, mine is very similar but prints only from source to last vertex.
There is a similar question answered here, but I could not understand the code since I'm familiar with c++.
#include <iostream>
#include <vector>
#include <iomanip>
#include <climits>
using namespace std;
// Data structure to store graph edges
struct Edge
{
int source, dest, weight;
};
// Recurive Function to print path of given vertex v from source vertex
void printPath(vector<int> const &parent, int v)
{
if (v < 0)
return;
printPath(parent, parent[v]);
cout << v << " ";
}
// Function to run Bellman Ford Algorithm from given source
void BellmanFord(vector<Edge> const &edges, int source, int N)
{
// count number of edges present in the graph
int E = edges.size();
// distance[] and parent[] stores shortest-path (least cost/path)
// information. Initially all vertices except source vertex have
// a weight of infinity and a no parent
vector<int> distance (N, INT_MAX);
distance[source] = 0;
vector<int> parent (N, -1);
int u, v, w, k = N;
// Relaxation step (run V-1 times)
while (--k)
{
for (int j = 0; j < E; j++)
{
// edge from u to v having weight w
u = edges[j].source, v = edges[j].dest;
w = edges[j].weight;
// if the distance to the destination v can be
// shortened by taking the edge u-> v
if (distance[u] != INT_MAX && distance[u] + w < distance[v])
{
// update distance to the new lower value
distance[v] = distance[u] + w;
// set v's parent as u
parent[v] = u;
}
}
}
// Run Relaxation step once more for Nth time to
// check for negative-weight cycles
for (int i = 0; i < E; i++)
{
// edge from u to v having weight w
u = edges[i].source, v = edges[i].dest;
w = edges[i].weight;
// if the distance to the destination u can be
// shortened by taking the edge u-> v
if (distance[u] != INT_MAX && distance[u] + w < distance[v])
{
cout << "Negative Weight Cycle Found!!";
return;
}
}
for (int i = 0; i < N; i++)
{
cout << "Distance of vertex " << i << " from the source is "
<< setw(2) << distance[i] << ". It's path is [ ";
printPath(parent, i); cout << "]" << '\n';
}
}
// main function
int main()
{
// vector of graph edges as per above diagram
vector<Edge> edges =
{
// (x, y, w) -> edge from x to y having weight w
{ 0, 1, 2 }, { 1, 3, 4 }, { 0, 3, 6 }
};
// Set maximum number of nodes in the graph
int N = 5;
// let source be vertex 0
int source = 0;
// run Bellman Ford Algorithm from given source
BellmanFord(edges, source, N);
return 0;
}
Looking at this more abstractly, you have something that can find a smallest thing, and you want to change it to also return all the other things equally small. (The same principles apply when looking for largest things. Sticking to "smallest", though, lends itself to an easier explanation.)
Many algorithms for finding an extreme thing have some part where they ask "is A more extreme than B". For example, you might see code like the following:
if ( A < B )
smallest = A;
Notice how this ignores ties (A == B) the same way it ignores worse results (A > B). Hence, you get only the first of the best results returned. So this is something to change. However, you cannot simply change A < B to A <= B, since that would replace B with A in the case of a tie, the same way it is replaced when A is a better result. (You'd get only the last of the best results returned.) The three cases (less than, equal to, and greater than) need to be dealt with separately.
Another aspect to look at is how the smallest thing is tracked. The above code snippet suggests that smallest has the same type as A; this is inadequate for tracking multiple solutions. You probably will want a container to track solutions. A vector is likely a reasonable choice.
Putting this together, the above code might become something more like the following (after changing the declaration of smallest):
if ( A < B ) {
smallest.clear();
smallest.push_back(A);
}
else if ( A == B ) {
smallest.push_back(A);
}
How can this be applied to Bellman-Ford?
Fortunately the key part of the code is relatively easy since there are comments documenting it. The harder part is changing the code to track multiple results, as there are two pieces of data updated when a shorter path is found. It looks like parent is the data that needs to be expanded. Here is a new declaration for it:
vector< vector<int> > parent (N);
This uses an empty vector instead of -1 to indicate "no parent". The check for shortest path can now become
if (distance[u] != INT_MAX) {
// if the distance to the destination v can be
// shortened by taking the edge u-> v
if (distance[u] + w < distance[v])
{
// update distance to the new lower value
distance[v] = distance[u] + w;
// forget the previous parent list.
parent[v].clear();
}
// if u-> v is a way to get the shortest
// distance to the destination v.
if (distance[u] + w == distance[v])
{
// add u as a possible parent for v
parent[v].push_back(u);
}
}
This differs a little from what the general approach in that there is no "else". It is the same logic, just arranged a bit differently. Note that when the first if clause is entered, the distance vector is updated so the second if clause is entered as well.
I think that handling the found paths is a separate (and not trivial) question, so I'll leave it to you to figure out how to update printPath(). I will, though, give a version that preserves the old output (just the first of the shortest paths) while receiving the new results. This is not a recommendation so much as an illustration relating the new data structure to the old.
// Recursive function to print the path of (just) the first given vertex from source vertex.
void printPath(vector< vector<int> > const &parent, vector<int> const &vlist)
{
if (vlist.empty())
return;
int v = vlist.front();
printPath(parent, parent[v]);
cout << v << " ";
}

Undirected Graph Issue

I generated this code to test a random undirected graph 100 times and randomly generating the nodes and weights of the graph. My problem is that when I try to store the shortest path by calling minimum distance something is going wrong and when I return the size of my list it is always 1. What is wrong?
// Random Graph Generator
for (int n = 1; n <= 101; ++n)
{
int r = 0;
nodeCount = 10; //rand() % 8128 + 64;
while (r <= nodeCount)
{
++r;
int nodeNumb = (rand() % 6); // Generates a possible node from 0 to 6 (seven possiblities)
int nodeDest = (rand() % 6); // Generates a possible node destination the same as above
int node_weight = rand() % 100 + 1; // Generate random weight of node from 1 to 101
// Create adjacency list
adjacency_list[nodeNumb].push_back(neighbourer(nodeDest, node_weight));
// For undirected graph create opposite connection back
adjacency_list[nodeDest].push_back(neighbourer(nodeNumb, node_weight));
}
vector<weight_w> min_distance; // declare vector for minimum distance
vector<vertex_v> previous; // declare vector to hold previos
int origin = 3; // origin to be inputted
int destination = 5; // destination to be inputted
list<double> pathCount;
DijkstraComputePaths(origin, adjacency_list, min_distance, previous);
pathCount.push_back(min_distance[destination]);
for (int deleteIterator = 0; deleteIterator <= 6; ++deleteIterator)
{
adjacency_list[deleteIterator].clear();
}
cout << "The List Size is: " << pathCount.size() << endl;
}
The reason you always only have only 1 element in your list is because you have list<double> pathCount; inside the body of your outer for loop.
This means on every iteration you are destroying your old list and creating a fresh one and only appending 1 value to it.
Instead, move the definition of pathCount outside the for loop. This way it will have larger scope than the for loop.
of course I cannot guarantee correctness of your program after that fix because definitions for neighbourer() vertex_v, weight_w and DisjkstraComputePaths are missing.

Reading in to Linked-List not Working correctly C++

I have been stuck on this problem for hours will someone please help.
The input is in the following format
5
1 2 9.0
1 3 12.0
2 4 18.0
2 3 6.0
2 5 20.0
3 5 15.0
0
1 5
The first number is the number of vertexes in the graph. Then next lines up to 0 are the edges of the graph. With the first and second numbers being the vertexes and the third being how far the edge is between them. Trying to read in the data and store the edges into there locations in the List adjacency for that vertex. This example would make a graph with five vertexes with edges from 1 to 2&3. 2 to 4&3&1 etc also stores opposites EX. 2 1 9.0.
It is not storing the data correctly. When reading in each new data for each vertex overwrites the previous data. It is storing the data in multiple listCells because when printed out i get
1 3 12.000
1 3 12.000
2 5 20.000
2 5 20.000
2 5 20.000
2 5 20.000
3 5 15.000
3 5 15.000
3 5 15.000
So it is the write number of cells just wrong information.
#include <cstdio>
using namespace std;
int tracing= 1;
struct ListCell
{
ListCell* next;
int vertex;
double weight;
ListCell(int v, double w, ListCell* nxt)
{
vertex = v;
weight = w;
next = nxt;
}
};
typedef ListCell* List;
struct Vertex
{
bool signaled;
long distance;
List adjacency;
};
struct Graph
{
int numVertices;
Vertex* vertexInfo;
Graph(int n)
{
numVertices = n;
vertexInfo = new Vertex[n+1];
for(int i = 1; i <= n; i++)
{
vertexInfo[i].signaled = false;
}
}
};
//==============================================================
// tail
//==============================================================
//
//==============================================================
List tail(List L)
{
return L->next;
}
//==============================================================
// isEmpty
//==============================================================
//
//==============================================================
bool isEmpty(List L)
{
return L == NULL;
}
//==============================================================
// readIn
//==============================================================
//
//==============================================================
Graph readIn()
{
int g;
int p1;
int p2;
float edge;
scanf("%i ", &g);
Graph myGraph(g);
scanf("%i", &p1);
while(p1 != 0)
{
scanf("%i", &p2);
scanf("%f", &edge);
if(tracing >0)
{
printf("Edge from %i to %i is %5.3f\n", p1, p2, edge);
}
myGraph.vertexInfo[p1].adjacency = new ListCell
(p2,edge,myGraph.vertexInfo[p1].adjacency);
myGraph.vertexInfo[p2].adjacency = new ListCell
(p1, edge, myGraph.vertexInfo[p2].adjacency);
scanf("%i", &p1);
}
return myGraph;
}
//==============================================================
// printOut
//==============================================================
//
//==============================================================
void printOut(Graph myGraph)
{
int n;
int length = myGraph.numVertices;
float d;
List p;
printf("There are %i vertices.\n", length);
printf("The edges are as follows. \n\n");
for(int i=1; i<=length; i++)
{
p= myGraph.vertexInfo[i].adjacency;
for(p=p; !isEmpty(p); p=tail(p))
{
n = myGraph.vertexInfo[i].adjacency -> vertex;
d = myGraph.vertexInfo[i].adjacency -> weight;
if(i<n)
{
printf("%i %i %7.3f \n",i,n,d);
}
}
}
}
//==============================================================
// main
//==============================================================
int main(int argc, char** argv)
{
Graph myGraph = readIn();
printOut(myGraph);
return 0;
}
There is a lot wrong with your code. The first thing and probably the most glaring is this:
myGraph.vertexInfo[p1].adjacency = new ListCell
(p2,edge,myGraph.vertexInfo[p1].adjacency);
you call this in the input loop. Look at your first vertex you input (1). You input this twice, one time for "1 2 9.0", and again for "1 3 12.0". The variable p1 doesn't change (it remains at 1), but you're overwriting the first input with the second input.
So you not only have an error in the input, you have a memory leak since you're creating a ListCell dynamically, and overwriting the previous dynamically allocated ListCell.
Maybe an array is not what you should use as a vertexInfo. Maybe it should be a map of vertex to vertices/distances:
std::map<int, std::vector<std::pair<int,double>>>
where the key is the vertex, and the inner vector is a vector of all the adjacent vertices and distances (each one represented as a pair).

Storing Data into a Linked List C++

The input is in the following format.
5
1 2 9.0
1 3 12.0
2 4 18.0
2 3 6.0
2 5 20.0
3 5 15.0
0
1 5
The first number is the number of vertexes in the graph. Then next lines up to 0 are the edges of the graph. With the first and second numbers being the vertexes and the third being how far the edge is between them. I can not figure out how to store the data into the List adjacency for each of the vertexes when reading it in. EX. Vertex 1 would have two List cells containing 2 9.0 and 3 12.0. I would also need to put the 1 9.0 and 1 12.0 into vertexes 2 and 3. But I can not figure out how to store the data into the ListCells
Code so Far:
#include <cstdio>
using namespace std;
typedef ListCell* List;
struct ListCell
{
ListCell* next;
int vertex;
double weight;
ListCell(int v, double w, ListCell* nxt)
{
vertex = v;
weight = w;
next = nxt;
}
};
struct Vertex
{
bool signaled;
long distance;
Vertex next;
List adjacency;
};
struct Graph
{
int numVertices;
Vertex* vertexInfo;
Graph(int n)
{
numVertices = n;
vertexInfo = new Vertex[n+1];
for(int i = 1; i <= n; i++)
{
vertexInfo[i].signaled = false;
}
}
};
//==============================================================
// readIn
//==============================================================
//
//==============================================================
void readIn()
{
int n, p1, p2;
double edge;
scanf("%i ", &n);
Graph(n);
while(scanf("%i " &p1) != 0)
{
}
}
I use to define data structures in a way suitable for the business logic.
I would suggest you have a look on The Art of Computer programming to have an idea of some best practices.
Put an eye on "Linear Lists" chapter
hint:
traverse the list and append the new node (please take care of corner cases):
Vertex* v = vertexInfo[i];
while (v->next!=null) {
v = v->next;
}
v->next = new Vertex(....);