Using Dijkstra's algorithm with an unordered_map graph - c++

So this my current code, I will post the header declarations below...
// Using Dijkstra's
int Graph::closeness(string v1, string v2){
int edgesTaken = 0;
unordered_map<string, bool> visited;
unordered_map<string, int> distances;
string source = v1; // Starting node
while(source != v2 && !visited[source]){
// The node has been visited
visited[source] = 1;
// Set all initial distances to infinity
for(auto i = vertices.begin(); i != vertices.end(); i++){
distances[i->first] = INT_MAX;
}
// Consider all neighbors and calculate distances from the current node
// & store them in the distances map
for(int i = 0; i < vertices[source].edges.size(); i++){
string neighbor = vertices[source].edges[i].name;
distances[neighbor] = vertices[source].edges[i].weight;
}
// Find the neighbor with the least distance
int minDistance = INT_MAX;
string nodeWithMin;
for(auto i = distances.begin(); i != distances.end(); i++){
int currDistance = i->second;
if(currDistance < minDistance){
minDistance = currDistance;
nodeWithMin = i->first;
}
}
// There are no neighbors and the node hasn't been found yet
// then terminate the function and return -1. The nodes aren't connected
if(minDistance == INT_MAX)
return -1;
// Set source to the neighbor that has the shortest distance
source = nodeWithMin;
// Increment edgesTaken
edgesTaken++;
// clear the distances map to prepare for the next iteration
distances.clear();
}
return edgesTaken;
}
Declarations (This is an undirected graph) :
class Graph{
private:
// This holds the connected name and the corresponding we
struct EdgeInfo{
std::string name;
int weight;
EdgeInfo() { }
EdgeInfo(std::string n, int w) : name(n), weight(
};
// This will hold the data members of the vertices, inclu
struct VertexInfo{
float value;
std::vector<EdgeInfo> edges;
VertexInfo() { }
VertexInfo(float v) : value(v) { }
};
// A map is used so that the name is used as the index
std::unordered_map<std::string, VertexInfo> vertices;
NOTE: Please do not suggest that I change the header declarations, I am contributing to a project that has already had 8 other functions written and it's definitely too late to go back and change anything since every other function would then have to be rewritten
I'm currently getting incorrect output. The function is handling a 0 distance situation correctly however (If two vertices aren't connected then the function should return a -1). If the two nodes are the same vertex ex closeness("Boston", "Boston") then the function should return a 0.
Example graph
the closeness of the following two vertices on the left will be on the right:
Correct:
Trenton -> Philadelphia: 2
Binghamton -> San Francisco: -1
Boston -> Boston: 0
Palo Alto -> Boston: -1
Output of my function:
Trenton -> Philadelphia: 3
Binghamton -> San Francisco: -1
Boston -> Boston: 0
Palo Alto -> Boston: 3
I've tried to copy dijkstra's exactly how it is described, but I'm getting incorrect readings, I've been trying to figure this out for a while now -> Can anyone point me in the right direction?

This is most certainly not a real answer to the question (since I'm not pointing you in a direction regarding your implementation), but did you think about just using the Boost Graph library?
It boils down to writing a short Traits class for your graph structure (and thus it is not necessary to alter your graph definition/header) and is - at least for these fundamental algorithms - proven to be working stable and correctly.
I'd always suggest not to reinvent the wheel especially when it comes to graphs and numerics...

Your implementation is wrong, and it is only by chance you get "correct" results.
Lets do one example by hand. From Trenton to Philadelphia. I use the first letter of the cities as labels.
First iteration
visited = [(T, 1), (N, 0), (W, 0), (P, 0), (B, 0)]
minDistance = 3;
nodeWithMin = N;
edgesTaken = 1
second iteration
visited = [(T, 1), (N, 1), (W, 0), (P, 0), (B, 0)]
minDistance = 2;
nodeWithMin = W;
edgesTaken = 2
third iteration
visited = [(T, 1), (N, 1), (W, 1), (P, 0), (B, 0)]
minDistance = 2;
nodeWithMin = N;
edgesTaken = 3;
fourth iteration
N is already 1 so we stop. Can you see the errors?
Traditionally Dijkstras shortest path algorithm is implemented with a priority queue
dijkstra(graph, source)
weights is a map indexed by nodes with all weights = infinity
predecessor is a map indexed by nodes with all predecessors set to itself
unvisited is a priority queue containing all nodes
weights[source] = 0
unvisited.increase(source)
while unvisited is not empty
current = unvisited.pop();
for each neighbour to current
if weights[current] + edge_weight(current, neighbour) < weights[neighbour]
weights[neighbour] = weights[current] + + edge_weight(current, neighbour)
unvisited.increase(neighbour)
predecessors[neighbour] = current
return (weights, predecessors)
And you can get the path length by following the predecessors.

The problem with Palo Alto -> Boston seems to be that the algorithm takes the route Palo Alto -> San Fransisco -> Los Angeles -> San Fransisco (edgesTaken = 3) and then fails the while condition because San Fransisco's been visited already.

Related

Finding the number of strongly connected components

I have wrote this code for finding the number of SCC (strongly connected components):
#include <iostream>
#include <vector>
using namespace std;
int n , m;
vector<vector<int>>G(101) , GT(101);
void read()
{
//n = number of nodes , m = number of edges.
cin>>n>>m;
for (int i = 0 ; i < m ; i++)
{
int a , b;
cin>>a>>b;
G[a].push_back(b);
GT[b].push_back(a);
}
}
void DFS_G(int node , vector<int>&V)
{
V[node] = 1;
for (int x : G[node])
{
if (!V[x])
DFS_G(x , V);
}
}
void DFS_GT(int node , vector<int>&V)
{
V[node] = 1;
for (int x : GT[node])
{
if (!V[x])
DFS_GT(x , V);
}
}
int main()
{
//G-graph
//GT-reversed graph
int SCC = 0;
read();
vector<int>component(101 , 0);
vector<int>reachedG(101) , reachedGT(101);//will keep nodes reached from x in G and in GT
for (int i = 1 ; i <= n ; i++)
{
if (!component[i])
{
component[i] = 1;
for(int j = 1 ; j <= n ; j++)
{
reachedG[j] = reachedGT[j] = 0;
}
DFS_G(i , reachedG);
DFS_GT(i , reachedGT);
for (int j = 1 ; j <= n ; j++)
{
if (reachedG[j] == 1 && reachedGT[j] == 1)
{
component[j] = 1;
}
}
SCC++;
}
}
cout<<SCC;
return 0;
}
Let's say you are at node X.First we DFS from X , and find the nodes that we can reach from it.We mark them as reached in our reachedG vector.As you may know , by reversing a graph and then DFS from x , the nodes you will encounter are actually the nodes that can reach X.I keep them in reachedGT.So the intersection between these two vectors will actually be the SCC our node X is in.However , as I read on the internet, this isn't the best implementation of Kosaraju's algorithm.The more efficient one is this one from https://www.geeksforgeeks.org/strongly-connected-components/.
The steps are the following:
Create an empty stack ‘S’ and do DFS traversal of a graph. In DFS traversal, after calling recursive DFS for adjacent vertices of a vertex, push the vertex to stack. In the above graph, if we start DFS from vertex 0, we get vertices in stack as 1, 2, 4, 3, 0.
Reverse directions of all arcs to obtain the transpose graph.
One by one pop a vertex from S while S is not empty. Let the popped vertex be ‘v’. Take v as source and do DFS (call DFSUtil(v)). The DFS starting from v prints strongly connected component of v. In the above example, we process vertices in order 0, 3, 4, 2, 1 (One by one popped from stack).
I've spent quite some hours reading about it and I still don't understand the logic behind the stack with finishing times of our nodes.However , I think this approach is really similar to mine and that I'm missing something.I'd be happy if you helped me!
Let v be the last node to be finished. Every node that can reach v in the original graph (hence that v can reach in the transpose) is in v's strong component. Why? Suppose to the contrary that x is a node that can reach v, but v can't reach x. When we start x, the node v cannot be on the stack at the time because that would imply a path to x. We can't finish x until we've at least started every node that x can reach, so if v starts before x, it's already finished (because not on the stack), and if v starts after x, it finishes before x (because it's higher on the stack than x). Contradiction. This argument extends to a correctness proof.

A-Star Search Algorithm won't find a valid path

I'm trying to implement an A* algorithm for pathfinding in my 3D grid. I've been following a tutorial but I'm not getting a valid path. I've stepped through my code to find out what's going on, but I don't know how to solve the problem. For the most basic test I'm just using a 2-D grid (it's 3-D, but there's only one Z option, so basically 2-D).
Here's what it's doing:
So we start at 0,0 (orange) and want to get to 1,2 (green). First it calculates the two options for the orange square, north and east, and gets distances of 2 and 1.414 for F values of 3 and 2.414. It moves to the east square (0,1). Great. But now it calculates the two open squares from 0,1 which are 1,1 and 0,2, both of which have a g value of 2 and an h value (distance) of 1, making their F values both be 3.
Since their F values are 3 and we already have an option with an F value of 3 (1,0 from the starting point), these two options are ignored even though they are clearly the best options.
It then continues onward and switches to moving to 1,0 where it then calculates 1,1 as 3 again and 2,0 as 4.236. 1,1's f value is not bigger than our current f value though, so it's ignored and we move upward to 2,0.
2,0 can only move right so it does.
2,1 can only move down since 2,2 is an invalid square, but the f value of moving to 1,1 is saved as 3, so it's again ignored, leaving us with no valid path between 0,0 and 1,2. What am I missing?
Here's a snippet of my path loop. There's a bunch of custom structs in here, and I'm using TMap from Unreal Engine to store my closed list, but I don't think that matters to the question. Here's a quick and dirty about what these structs are:
PCell: Holds cell coordinates
PPair: Holds cell coordinates as a PCell and an F value
FVectorInt: 3-D integer vector
FPathCell: Holds parent coordinates, and f, g, and h values.
cellDetails is a 3D dynamic array of FPathCell
closedMap is a TMap with <key, value> as <IntVector, bool>
Also locationIsWalkable(FVectorInt, StepDirection) is just code that checks to see if the player can walk to a cell from a certain direction. You can ignore that part.
std::set<PPair> openList;
PPair originPair = PPair();
originPair.cell = PCell(i, j, k);
originPair.f = 0.0;
openList.insert(originPair);
bool foundDestination = false;
FPathCell destPair;
FVectorInt destCell;
while (!openList.empty() && !foundDestination)
{
iterations++;
PPair p = *openList.begin();
//Remove vertex
openList.erase(openList.begin());
//Add vertex to closed list
i = p.cell.i;
j = p.cell.j;
k = p.cell.k;
closedMap.Remove(FIntVector(i, j, k));
closedMap.Add(FIntVector(i, j, k), true);
double gNew, hNew, fNew;
//Generate movement options
//Option 1: NORTH (+X)
//Process if valid movement
if (locationIsWalkable(FVectorInt(i + 1, j, k), StepDirection::North))
{
FVectorInt check = FVectorInt(i + 1, j, k);
//If this cell is the destination
if (check == destination)
{
foundDestination = true;
//Set the parent of the destination cell
cellDetails[check.x][check.y][check.z].parent_i = i;
cellDetails[check.x][check.y][check.z].parent_j = j;
cellDetails[check.x][check.y][check.z].parent_k = k;
destPair = cellDetails[check.x][check.y][check.z];
destCell = check;
break;
}
//Else if this cell is not in the closed list
else if (!closedMap.FindRef(FIntVector(check.x, check.y, check.z)))
{
gNew = cellDetails[i][j][k].g + 1;
hNew = calculateHValue(check, destination);
fNew = gNew + hNew;
if (cellDetails[check.x][check.y][check.z].f == FLT_MAX ||
cellDetails[check.x][check.y][check.z].f > fNew) {
PPair cellPair = PPair();
cellPair.cell = PCell(check.x, check.y, check.z);
cellPair.f = fNew;
openList.insert(cellPair);
cellDetails[check.x][check.y][check.z].f = fNew;
cellDetails[check.x][check.y][check.z].g = gNew;
cellDetails[check.x][check.y][check.z].h = hNew;
cellDetails[check.x][check.y][check.z].parent_i = i;
cellDetails[check.x][check.y][check.z].parent_j = j;
cellDetails[check.x][check.y][check.z].parent_k = k;
}
}
}
//11 other movement options
}
inline bool operator<(const PPair& lhs, const PPair& rhs)
{
return lhs.f < rhs.f;
}
There's 12 movement options (north, south, east, west, up+north, down+north, etc.), but they basically all use the same code and just swap out the check vector for the appropriate movements.
Here's the tutorial I followed
Since their F values are 3 and we already have an option with an F value of 3 (1,0 from the starting point), these two options are ignored even though they are clearly the best options.
This must be your mistake. These options shall not be 'ignored', but rather 'delayed till they are the next-best options'. The way it's done is that on every iteration of A* you ought to select the open cell with the lowest F-score.
In your example, once you expand 0,1 (to get 0,2 and 1,1), your open set should look like:
(1,0):3 (1,1):3 (0,2):3
(It can also be any other permutation of these, because they have the same scores.)
Now imagine that it chooses to visit 1,0. It adds 2,0 to the queue, but 1,1 and 0,2 should still be there:
(1,1):3 (0,2):3 (2,0):4.236
Since 2,0 has a higher F-score than 1,1 or 0,2, it will not be chosen yet. Instead your algorithm shall pick 1,1 or 0,2 at this iteration, thus arriving at the destination 1,2.
As for your code, you're using an std::set for the openList, which prevents having multiple instances with the same score within the queue. You can use multiset or priority_queue to combat that. However, A* can decrease the weight of nodes in the open set, and neither data-structure allows that operation in sub-linear time. By inserting the same node multiple times (every time its score decreases), and ignoring any pops after it was closed, you'll still get a correct, though sub-optimal, algorithm.
Proper A* implementations usually use binomial or Fibonnacci heaps.
Unfortunately C++ doesn't have them. You can find libraries implementing these on the web.

Depth First Search on Adjacency Matrix

For this program, I am given a set of inputs that I need to store in an adjacency matrix. I've done this, so I have an adjacency matrix Matrix[11][11]. Now, using this matrix, I need to perform a depth first search and return the pi values.
I have the pseudocode for this, so I believe that I need two methods: DFS(graph) and DFS-VISIT(node). However, I'm having trouble actually implementing this. Can I do this using the adjacency matrix directly or do I somehow need to create a graph using the matrix? Any help with actually coding this would be appreciated.
DFS(G)
for each u ∈ V[G] do
color[u] = WHITE
∏[u] = NIL
time = 0
for each u ∈ V[G] do
if color[u] = WHITE then
DFS-VISIT(u)
DFS-VISIT(u)
color[u] = GRAY
time++
d[u] = time
for each v ∈ Adj[u] do
if color[v] = WHITE then
∏[v] = u
DFS-VISIT(v)
color[u] = BLACK
time++
f[u] = time
The pseudo-code you have there seems to assume an adjacency list.
Specifically this code: (indentation corresponding to code blocks assumed)
for each v ∈ Adj[u] do
if color[v] = white then
∏[v] = u
DFS-VISIT(v)
The difference is: with an adjacency matrix, all the vertices are there, and one typically uses 0/1 flags to indicate whether there's an edge between the current and target vertices.
So, you should loop through all vertices for that row in the adjacency matrix, and only do something when the flag is set to 1.
That part of the pseudo-code will look something like:
for v = 1 to n do // assuming 1-indexed
if color[v] = white && AdjMatrix[u][v] == 1 then
∏[v] = u
DFS-VISIT(v)
As far as I can tell, the rest of the psuedo-code should look identical.
Generally it is preferred to code DFS assuming graph to be represented as an adjacency list because the time complexity that results is O(|V| + |E|). But with adjacency matrix the time complexity becomes O(|V|*|V|). Below is an implementation of dfs assuming adjacency matrix representation:
#define WHITE 0
#define GRAY 1
#define BLACK 2
int time_;
vector<int> color(n, WHITE), par(n, 0), strt(n, 0), fin(n, 0);
vector<vector<int> > G(n, vector<int>(n, 0));
void dfs_visit(int);
void DFS(){
for(int i = 0; i < n; i++)
color[i] = 0, par[i] = -1;
time = 0;
for(int j = 0; j < n; i++)
if(color[j] == WHITE)
dfs_visit(j);
}
}
void dfs_visit(int u){
time_++;
strt[u] = time_;
color[u] = GRAY;
for(int v = 0; v < n && v++)
if(G[u][v] && color[v] == WHITE){
par[v] = u;
dfs_visit(v);
}
color[u] = BLACK;
time_++;
fin[u] = time_;
}
The par[] matrix calculates parent of each vertex and strt[] and fin[] matrices time stamp the vertices. Vertices are 0-based numbered.

C++ Trouble with Understanding BFS to find Shortest Path

I have a simple graph and my assignment is to find the shortest path between two nodes. I've done my best to read through the BFS pseudocode and examples but it's just not clicking.
My nodes are stored in an adjacency list, (at the moment I'm not concerned with edge weights)
Here's a visual of the data: The first column is the vector element, the row to it's immediate left is another vector. The vector element number represents the number of the corresponding node.
=====================================
0 | (1, 100), (3, 50), (7, 100)
1 | (0, 100), (4, 50), (5, 10)
2 | (3, 1000), (4, 200), (6, 1000)
3 | (0, 50), (2, 1000)
4 | (1, 50), (2, 200), (6, 100)
5 | (1, 10), (6, 50), (7, 2000)
6 | (2, 1000), (4, 100), (5, 50)
7 | (0, 100), (5, 2000)
I'm trying to implement a BFS via the pseudocode I found on wikipedia but I'm not getting it. My adjancency list is stored in a vector: vector<vector <vertex> > adjList; vertex is a struct of two int's 1)node and 2)weight (again I'm not really concerned with weights right now but I'm leaving this struct setup this way to modify later)
My implementation is pretty basic:
vector<vector <vertex> > adjList; // the vector adjacency list
// the start and end vertices are passed into the function
int startV; //num of start node
int endV; //num of destination node
bool visited = false, done = false; //control
vector<int> marked, path; // holds visited and final path
Queue<int> Q; // Q for BFS
Q.push(startV); // enqueue BFS
while (!Q.empty() && !done) {
int t = Q.front(); Q.pop(); // mark and pop(back)
marked.push_back(t); // push back start node onto marked
if (t == endV) // if t is our destination
done = true; // done, get out of here
size_t adjE = adjList[t].size(); // degree of this node
for (int i = 0; i < adjE; i++) {
int u = adjList[t][i].node; // visit each adjacent node
for (int j = 0; j < marked.size(); j++) {
if (marked[j] == u) // test if node has been marked
visited = true;
}
// check if this node has
if (!visited) { // already been visited
marked.push_back(u); // if not enqueue
Q.push(u);
}
}
}
}
I know there has to be something wrong with my implementation. I'm just not seeing what is.
Update:
I solved this by using a multimap approach. A detailed explanation is here: Traverse MultiMap to Find Path from a Given Value to a Given Key
I think your logic about finding visited nodes is incorrect. Try
...
int u = adjList[t][i].node;
visited = false;
for (int j = 0; j < marked.size(); j++) {
// std::find() does essentially all this for you. Check it out.
if (marked[j] == u) {
visited = true;
}
}
if (!visited) {
marked.push_back(u);
Q.push(u);
}

Generating a random DAG

I am solving a problem on directed acyclic graph.
But I am having trouble testing my code on some directed acyclic graphs. The test graphs should be large, and (obviously) acyclic.
I tried a lot to write code for generating acyclic directed graphs. But I failed every time.
Is there some existing method to generate acyclic directed graphs I could use?
I cooked up a C program that does this. The key is to 'rank' the nodes, and only draw edges from lower ranked nodes to higher ranked ones.
The program I wrote prints in the DOT language.
Here is the code itself, with comments explaining what it means:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define MIN_PER_RANK 1 /* Nodes/Rank: How 'fat' the DAG should be. */
#define MAX_PER_RANK 5
#define MIN_RANKS 3 /* Ranks: How 'tall' the DAG should be. */
#define MAX_RANKS 5
#define PERCENT 30 /* Chance of having an Edge. */
int main (void)
{
int i, j, k,nodes = 0;
srand (time (NULL));
int ranks = MIN_RANKS
+ (rand () % (MAX_RANKS - MIN_RANKS + 1));
printf ("digraph {\n");
for (i = 0; i < ranks; i++)
{
/* New nodes of 'higher' rank than all nodes generated till now. */
int new_nodes = MIN_PER_RANK
+ (rand () % (MAX_PER_RANK - MIN_PER_RANK + 1));
/* Edges from old nodes ('nodes') to new ones ('new_nodes'). */
for (j = 0; j < nodes; j++)
for (k = 0; k < new_nodes; k++)
if ( (rand () % 100) < PERCENT)
printf (" %d -> %d;\n", j, k + nodes); /* An Edge. */
nodes += new_nodes; /* Accumulate into old node set. */
}
printf ("}\n");
return 0;
}
And here is the graph generated from a test run:
The answer to https://mathematica.stackexchange.com/questions/608/how-to-generate-random-directed-acyclic-graphs applies: if you have a adjacency matrix representation of the edges of your graph, then if the matrix is lower triangular, it's a DAG by necessity.
A similar approach would be to take an arbitrary ordering of your nodes, and then consider edges from node x to y only when x < y. That constraint should also get your DAGness by construction. Memory comparison would be one arbitrary way to order your nodes if you're using structs to represent nodes.
Basically, the pseudocode would be something like:
for(i = 0; i < N; i++) {
for (j = i+1; j < N; j++) {
maybePutAnEdgeBetween(i, j);
}
}
where N is the number of nodes in your graph.
The pseudocode suggests that the number of potential DAGs, given N nodes, is
2^(n*(n-1)/2),
since there are
n*(n-1)/2
ordered pairs ("N choose 2"), and we can choose either to have the edge between them or not.
So, to try to put all these reasonable answers together:
(In the following, I used V for the number of vertices in the generated graph, and E for the number of edges, and we assume that E ≤ V(V-1)/2.)
Personally, I think the most useful answer is in a comment, by Flavius, who points at the code at http://condor.depaul.edu/rjohnson/source/graph_ge.c. That code is really simple, and it's conveniently described by a comment, which I reproduce:
To generate a directed acyclic graph, we first
generate a random permutation dag[0],...,dag[v-1].
(v = number of vertices.)
This random permutation serves as a topological
sort of the graph. We then generate random edges of the
form (dag[i],dag[j]) with i < j.
In fact, what the code does is generate the request number of edges by repeatedly doing the following:
generate two numbers in the range [0, V);
reject them if they're equal;
swap them if the first is larger;
reject them if it has generated them before.
The problem with this solution is that as E gets closes to the maximum number of edges V(V-1)/2, then the algorithm becomes slower and slower, because it has to reject more and more edges. A better solution would be to make a vector of all V(V-1)/2 possible edges; randomly shuffle it; and select the first (requested edges) edges in the shuffled list.
The reservoir sampling algorithm lets us do this in space O(E), since we can deduce the endpoints of the kth edge from the value of k. Consequently, we don't actually have to create the source vector. However, it still requires O(V2) time.
Alternatively, one can do a Fisher-Yates shuffle (or Knuth shuffle, if you prefer), stopping after E iterations. In the version of the FY shuffle presented in Wikipedia, this will produce the trailing entries, but the algorithm works just as well backwards:
// At the end of this snippet, a consists of a random sample of the
// integers in the half-open range [0, V(V-1)/2). (They still need to be
// converted to pairs of endpoints).
vector<int> a;
int N = V * (V - 1) / 2;
for (int i = 0; i < N; ++i) a.push_back(i);
for (int i = 0; i < E; ++i) {
int j = i + rand(N - i);
swap(a[i], a[j]);
a.resize(E);
This requires only O(E) time but it requires O(N2) space. In fact, this can be improved to O(E) space with some trickery, but an SO code snippet is too small to contain the result, so I'll provide a simpler one in O(E) space and O(E log E) time. I assume that there is a class DAG with at least:
class DAG {
// Construct an empty DAG with v vertices
explicit DAG(int v);
// Add the directed edge i->j, where 0 <= i, j < v
void add(int i, int j);
};
Now here goes:
// Return a randomly-constructed DAG with V vertices and and E edges.
// It's required that 0 < E < V(V-1)/2.
template<typename PRNG>
DAG RandomDAG(int V, int E, PRNG& prng) {
using dist = std::uniform_int_distribution<int>;
// Make a random sample of size E
std::vector<int> sample;
sample.reserve(E);
int N = V * (V - 1) / 2;
dist d(0, N - E); // uniform_int_distribution is closed range
// Random vector of integers in [0, N-E]
for (int i = 0; i < E; ++i) sample.push_back(dist(prng));
// Sort them, and make them unique
std::sort(sample.begin(), sample.end());
for (int i = 1; i < E; ++i) sample[i] += i;
// Now it's a unique sorted list of integers in [0, N-E+E-1]
// Randomly shuffle the endpoints, so the topological sort
// is different, too.
std::vector<int> endpoints;
endpoints.reserve(V);
for (i = 0; i < V; ++i) endpoints.push_back(i);
std::shuffle(endpoints.begin(), endpoints.end(), prng);
// Finally, create the dag
DAG rv;
for (auto& v : sample) {
int tail = int(0.5 + sqrt((v + 1) * 2));
int head = v - tail * (tail - 1) / 2;
rv.add(head, tail);
}
return rv;
}
You could generate a random directed graph, and then do a depth-first search for cycles. When you find a cycle, break it by deleting an edge.
I think this is worst case O(VE). Each DFS takes O(V), and each one removes at least one edge (so max E)
If you generate the directed graph by uniformly random selecting all V^2 possible edges, and you DFS in random order and delete a random edge - this would give you a uniform distribution (or at least close to it) over all possible dags.
A very simple approach is:
Randomly assign edges by iterating over the indices of a lower diagonal matrix (as suggested by a link above: https://mathematica.stackexchange.com/questions/608/how-to-generate-random-directed-acyclic-graphs)
This will give you a DAG with possibly more than one component. You can use a Disjoint-set data structure to give you the components that can then be merged by creating edges between the components.
Disjoint-sets are described here: https://en.wikipedia.org/wiki/Disjoint-set_data_structure
Edit: I initially found this post while I was working with a scheduling problem named flexible job shop scheduling problem with sequencing flexibility where jobs (the order in which operations are processed) are defined by directed acyclic graphs. The idea was to use an algorithm to generate multiple random directed graphs (jobs) and create instances of the scheduling problem to test my algorithms. The code at the end of this post is a basic version of the one I used to generate the instances. The instance generator can be found here.
I translated to Python and integrated some functionalities to create a transitive set of the random DAG. In this way, the graph generated has the minimum number of edges with the same reachability.
The transitive graph can be visualized at http://dagitty.net/dags.html by pasting the output in Model code (in the right).
Python version of the algorithm
import random
class Graph:
nodes = []
edges = []
removed_edges = []
def remove_edge(self, x, y):
e = (x,y)
try:
self.edges.remove(e)
# print("Removed edge %s" % str(e))
self.removed_edges.append(e)
except:
return
def Nodes(self):
return self.nodes
# Sample data
def __init__(self):
self.nodes = []
self.edges = []
def get_random_dag():
MIN_PER_RANK = 1 # Nodes/Rank: How 'fat' the DAG should be
MAX_PER_RANK = 2
MIN_RANKS = 6 # Ranks: How 'tall' the DAG should be
MAX_RANKS = 10
PERCENT = 0.3 # Chance of having an Edge
nodes = 0
ranks = random.randint(MIN_RANKS, MAX_RANKS)
adjacency = []
for i in range(ranks):
# New nodes of 'higher' rank than all nodes generated till now
new_nodes = random.randint(MIN_PER_RANK, MAX_PER_RANK)
# Edges from old nodes ('nodes') to new ones ('new_nodes')
for j in range(nodes):
for k in range(new_nodes):
if random.random() < PERCENT:
adjacency.append((j, k+nodes))
nodes += new_nodes
# Compute transitive graph
G = Graph()
# Append nodes
for i in range(nodes):
G.nodes.append(i)
# Append adjacencies
for i in range(len(adjacency)):
G.edges.append(adjacency[i])
N = G.Nodes()
for x in N:
for y in N:
for z in N:
if (x, y) != (y, z) and (x, y) != (x, z):
if (x, y) in G.edges and (y, z) in G.edges:
G.remove_edge(x, z)
# Print graph
for i in range(nodes):
print(i)
print()
for value in G.edges:
print(str(value[0]) + ' ' + str(value[1]))
get_random_dag()
Bellow, you may see in the figure the random DAG with many redundant edges generated by the Python code above.
I adapted the code to generate the same graph (same reachability) but with the least possible number of edges. This is also called transitive reduction.
def get_random_dag():
MIN_PER_RANK = 1 # Nodes/Rank: How 'fat' the DAG should be
MAX_PER_RANK = 3
MIN_RANKS = 15 # Ranks: How 'tall' the DAG should be
MAX_RANKS = 20
PERCENT = 0.3 # Chance of having an Edge
nodes = 0
node_counter = 0
ranks = random.randint(MIN_RANKS, MAX_RANKS)
adjacency = []
rank_list = []
for i in range(ranks):
# New nodes of 'higher' rank than all nodes generated till now
new_nodes = random.randint(MIN_PER_RANK, MAX_PER_RANK)
list = []
for j in range(new_nodes):
list.append(node_counter)
node_counter += 1
rank_list.append(list)
print(rank_list)
# Edges from old nodes ('nodes') to new ones ('new_nodes')
if i > 0:
for j in rank_list[i - 1]:
for k in range(new_nodes):
if random.random() < PERCENT:
adjacency.append((j, k+nodes))
nodes += new_nodes
for i in range(nodes):
print(i)
print()
for edge in adjacency:
print(str(edge[0]) + ' ' + str(edge[1]))
print()
print()
Result:
Create a graph with n nodes and an edge between each pair of node n1 and n2 if n1 != n2 and n2 % n1 == 0.
I recently tried re-implementing the accepted answer and found that it is indeterministic. If you don't enforce the min_per_rank parameter, you could end up with a graph with 0 nodes.
To prevent this, I wrapped the for loops in a function and then checked to make sure that, after each rank, that min_per_rank was satisfied. Here's the JavaScript implementation:
https://github.com/karissa/random-dag
And some pseudo-C code that would replace the accepted answer's main loop.
int pushed = 0
int addRank (void)
{
for (j = 0; j < nodes; j++)
for (k = 0; k < new_nodes; k++)
if ( (rand () % 100) < PERCENT)
printf (" %d -> %d;\n", j, k + nodes); /* An Edge. */
if (pushed < min_per_rank) return addRank()
else pushed = 0
return 0
}
Generating a random DAG which might not be connected
Here's an simple algorithm for generating a random DAG that might not be connected.
const randomDAG = (x, n) => {
const length = n * (n - 1) / 2;
const dag = new Array(length);
for (let i = 0; i < length; i++) {
dag[i] = Math.random() < x ? 1 : 0;
}
return dag;
};
const dagIndex = (n, i, j) => n * i + j - (i + 1) * (i + 2) / 2;
const dagToDot = (n, dag) => {
let dot = "digraph {\n";
for (let i = 0; i < n; i++) {
dot += ` ${i};\n`;
for (let j = i + 1; j < n; j++) {
const k = dagIndex(n, i, j);
if (dag[k]) dot += ` ${i} -> ${j};\n`;
}
}
return dot + "}";
};
const randomDot = (x, n) => dagToDot(n, randomDAG(x, n));
new Viz().renderSVGElement(randomDot(0.3, 10)).then(svg => {
document.body.appendChild(svg);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/viz.js/2.1.2/viz.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/viz.js/2.1.2/full.render.js"></script>
If you run this code snippet a couple of times, you might see a DAG which is not connected.
So, how does this code work?
A directed acyclic graph (DAG) is just a topologically sorted undirected graph. An undirected graph of n vertices can have a maximum of n * (n - 1) / 2 edges, not counting repeated edges or edges from a vertex to itself. Now, you can only have an edge from a lower vertex to a higher vertex. Hence, the direction of all the edges are predetermined.
This means that you can represent the entire DAG using a one dimensional array of n * (n - 1) / 2 edge weights. An edge weight of 0 means that the edge is absent. Hence, we just create a random array of zeros or ones, and that's our random DAG.
An edge from vertex i to vertex j in a DAG of n vertices, where i < j, has an edge weight at index k where k = n * i + j - (i + 1) * (i + 2) / 2.
Generating a connected DAG
Once you generate a random DAG, you can check if it's connected using the following function.
const isConnected = (n, dag) => {
const reached = new Array(n).fill(false);
reached[0] = true;
const queue = [0];
while (queue.length > 0) {
const x = queue.shift();
for (let i = 0; i < n; i++) {
if (i === n || reached[i]) continue;
const j = i < x ? dagIndex(n, i, x) : dagIndex(n, x, i);
if (dag[j] === 0) continue;
reached[i] = true;
queue.push(i);
}
}
return reached.every(x => x); // return true if every vertex was reached
};
If it's not connected then its complement will always be connected.
const complement = dag => dag.map(x => x ? 0 : 1);
const randomConnectedDAG = (x, n) => {
const dag = randomDAG(x, n);
return isConnected(n, dag) ? dag : complement(dag);
};
Note that if we create a random DAG with 30% edges then its complement will have 70% edges. Hence, the only safe value for x is 50%. However, if you care about connectivity more than the percentage of edges then this shouldn't be a deal breaker.
Finally, putting it all together.
const randomDAG = (x, n) => {
const length = n * (n - 1) / 2;
const dag = new Array(length);
for (let i = 0; i < length; i++) {
dag[i] = Math.random() < x ? 1 : 0;
}
return dag;
};
const dagIndex = (n, i, j) => n * i + j - (i + 1) * (i + 2) / 2;
const isConnected = (n, dag) => {
const reached = new Array(n).fill(false);
reached[0] = true;
const queue = [0];
while (queue.length > 0) {
const x = queue.shift();
for (let i = 0; i < n; i++) {
if (i === n || reached[i]) continue;
const j = i < x ? dagIndex(n, i, x) : dagIndex(n, x, i);
if (dag[j] === 0) continue;
reached[i] = true;
queue.push(i);
}
}
return reached.every(x => x); // return true if every vertex was reached
};
const complement = dag => dag.map(x => x ? 0 : 1);
const randomConnectedDAG = (x, n) => {
const dag = randomDAG(x, n);
return isConnected(n, dag) ? dag : complement(dag);
};
const dagToDot = (n, dag) => {
let dot = "digraph {\n";
for (let i = 0; i < n; i++) {
dot += ` ${i};\n`;
for (let j = i + 1; j < n; j++) {
const k = dagIndex(n, i, j);
if (dag[k]) dot += ` ${i} -> ${j};\n`;
}
}
return dot + "}";
};
const randomConnectedDot = (x, n) => dagToDot(n, randomConnectedDAG(x, n));
new Viz().renderSVGElement(randomConnectedDot(0.3, 10)).then(svg => {
document.body.appendChild(svg);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/viz.js/2.1.2/viz.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/viz.js/2.1.2/full.render.js"></script>
If you run this code snippet a couple of times, you may see a DAG with a lot more edges than others.
Generating a connected DAG with a certain percentage of edges
If you care about both connectivity and having a certain percentage of edges then you can use the following algorithm.
Start with a fully connected graph.
Randomly remove edges.
After removing an edge, check if the graph is still connected.
If it's no longer connected then add that edge back.
It should be noted that this algorithm is not as efficient as the previous method.
const randomDAG = (x, n) => {
const length = n * (n - 1) / 2;
const dag = new Array(length).fill(1);
for (let i = 0; i < length; i++) {
if (Math.random() < x) continue;
dag[i] = 0;
if (!isConnected(n, dag)) dag[i] = 1;
}
return dag;
};
const dagIndex = (n, i, j) => n * i + j - (i + 1) * (i + 2) / 2;
const isConnected = (n, dag) => {
const reached = new Array(n).fill(false);
reached[0] = true;
const queue = [0];
while (queue.length > 0) {
const x = queue.shift();
for (let i = 0; i < n; i++) {
if (i === n || reached[i]) continue;
const j = i < x ? dagIndex(n, i, x) : dagIndex(n, x, i);
if (dag[j] === 0) continue;
reached[i] = true;
queue.push(i);
}
}
return reached.every(x => x); // return true if every vertex was reached
};
const dagToDot = (n, dag) => {
let dot = "digraph {\n";
for (let i = 0; i < n; i++) {
dot += ` ${i};\n`;
for (let j = i + 1; j < n; j++) {
const k = dagIndex(n, i, j);
if (dag[k]) dot += ` ${i} -> ${j};\n`;
}
}
return dot + "}";
};
const randomDot = (x, n) => dagToDot(n, randomDAG(x, n));
new Viz().renderSVGElement(randomDot(0.3, 10)).then(svg => {
document.body.appendChild(svg);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/viz.js/2.1.2/viz.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/viz.js/2.1.2/full.render.js"></script>
Hope that helps.
To test algorithms I generated random graphs based on node layers. This is the Python script (also print the adjacency list). You can change the nodes connection probability percentages or add layers to have a slightly different or "taller" graphs:
# Weighted DAG generator by forward layers
import argparse
import random
parser = argparse.ArgumentParser("dag_gen2")
parser.add_argument(
"--layers",
help="DAG forward layers. Default=5",
type=int,
default=5,
)
args = parser.parse_args()
layers = [[] for _ in range(args.layers)]
edges = {}
node_index = -1
print(f"Creating {len(layers)} layers graph")
# Random horizontal connections -low probability-
def random_horizontal(layer):
for node1 in layer:
# Avoid cycles
for node2 in filter(
lambda n2: node1 != n2 and node1 not in map(lambda el: el[0], edges[n2]),
layer,
):
if random.randint(0, 100) < 10:
w = random.randint(1, 10)
edges[node1].append((node2, w))
# Connect two layers
def connect(layer1, layer2):
random_horizontal(layer1)
for node1 in layer1:
for node2 in layer2:
if random.randint(0, 100) < 30:
w = random.randint(1, 10)
edges[node1].append((node2, w))
# Start nodes 1 to 3
start_nodes = random.randint(1, 3)
start_layer = []
for sn in range(start_nodes + 1):
node_index += 1
start_layer.append(node_index)
# Gen nodes
for layer in layers:
nodes = random.randint(2, 5)
for n in range(nodes):
node_index += 1
layer.append(node_index)
# Connect all
layers.insert(0, start_layer)
for layer in layers:
for node in layer:
edges[node] = []
for i, layer in enumerate(layers[:-1]):
connect(layer, layers[i + 1])
# Print in DOT language
print("digraph {")
for node_key in [node_key for node_key in edges.keys() if len(edges[node_key]) > 0]:
for node_dst, weight in edges[node_key]:
print(f" {node_key} -> {node_dst} [label={weight}];")
print("}")
print("---- Adjacency list ----")
print(edges)