Related
Hi I am trying to parralelize the calculation of this code with openMP. It is calculation of hydrodynamics vorticity with finite difference implicite method. I am using the Alternating direction implicit method to do so.
I would like to speed up its execution. (Here Nx=Ny=100)
The probleme is that using openMp this way slow down the code instead of speeding it up. I have try to specify the shared variable but this not helping much.
Any idea?
All the best
void ADI(double vort[][Ny], double psi[][Ny], double n[][Ny],
double cls[][Ny],double AAx[], double BBx[], double CCx[], double DDx[],
double AAy[], double BBy[], double CCy[], double DDy[],
double cx[][Ny], double cy[][Ny], double epsx[][Ny], double epsy[][Ny],
double vortx[], double vorty[Ny-2], double dx, double Dxs, double coefMass,
double coefMasCls)
{
////////////calcul sur y////////////
//calcul coef ADI
int i=0, j=0;
#pragma omp parallel for private(Dxs,i) shared(psi,vort)
for (i=0; i<Nx; i++) //Boundary condition sur x
{
vort[i][0]=(psi[i][0]-psi[i][1])*2/Dxs;
vort[i][Ny-1] = (psi[i][Ny-1]-psi[i][Ny-2])*2/Dxs;
}
#pragma omp parallel for private(Dxs,j) shared(psi,vort)
for (j=0; j<Ny; j++) //Boundary condition
{
vort[0][j] = (psi[0][j]-psi[1][j])*2/Dxs;
vort[Nx-1][j] = (psi[Nx-1][j]-psi[Nx-2][j])*2/Dxs;
}
for (j=1; j<Ny-1; j++) //interior points
{
#pragma omp parallel for private(coefMasCls,coefMasCls,i) shared(psi,vort,n,cls)
for (i=1; i<Nx-1; i++) //interior points
{
vort[i][j] = vort[i][j] - coefMass * (n[i+1][j]-n[i-1][j])- coefMasCls * (cls[i+1][j]-cls[i-1][j]);;
}
//i=0;
//vort[i][j] = vort[i][j] + coefMass*(n[1][j]-n[1][j]);
//i=Nx-1;
//vort[i][j] = vort[i][j] + coefMass*(n[Nx-2][j]-n[Nx-2][j]);
}
for (i=1; i<Nx-1; i++) //interior points
{
for (j=1; j<Ny-1; j++) //interior points
{
AAy[j] = -.5 * ( .5 * (1 + epsy[i][j]) * cy[i][j-1] + dx);
BBy[j] = 1 + dx + .5 * epsy[i][j] * cy[i][j];
CCy[j] = .5 * ( .5 * ( 1 - epsy[i][j] ) * cy[i][j+1] - dx);
DDy[j] = .5 * (.5 * ( 1 + epsx[i][j] ) * cx[i-1][j] + dx ) * vort[i-1][j]
+ ( 1 - dx - .5 * epsx[i][j] * cx[i][j] ) * vort[i][j]
+ .5 * (- .5 * ( 1 - epsx[i][j] ) * cx[i+1][j] + dx ) * vort[i+1][j];
vorty[j] = vort[i][j];
}
DDy[1]=DDy[1] - AAy[1] * vort[i][0]; //the AA[0] are not taken into account in the tridiag methode. Include it in the second hand
DDy[Ny-2]=DDy[Ny-2] - CCy[Ny-2]* vort[i][Ny-1]; //moving boundary condition
//DDy[Ny-3]= DDy[Ny-3]; //vorticity nul on the free slip boundary condition
tridiag(AAy, BBy, CCy, DDy, vorty, Ny-1); //ne calcul pas le point en 0 et en Ny-1
for (j=1; j<Ny-1; j++)
{
vort[i][j]=vorty[j];
}
}
////////////calcul sur x //////////
//calcul coef ADI
for (j=1; j<Ny-1; j++)
{
for (i=1; i<Nx-1; i++)
{
AAx[i] = -.5* ( .5 * ( 1 + epsx[i][j] ) * cx[i-1][j] + dx );
BBx[i] = 1 + dx + .5 * epsx[i][j] * cx[i][j];
CCx[i] = .5 * ( .5 * ( 1 - epsx[i][j] ) * cx[i+1][j] - dx) ;
DDx[i]= .5 * ( .5 * ( 1 + epsy[i][j] ) * cy[i][j-1] + dx ) * vort[i][j-1]
+ ( 1 - dx - .5 * epsy[i][j] * cy[i][j] ) * vort[i][j]
+ .5 * (-.5 * ( 1 - epsy[i][j] ) * cy[i][j+1] + dx ) * vort[i][j+1];
vortx[i]=vort[i][j];
}
DDx[1] = DDx[1] - AAx[1]* vort[0][j];
DDx[Nx-2] = DDx[Nx-2] - CCx[Nx-2] * vort[Nx-1][j];
tridiag(AAx, BBx, CCx, DDx, vortx, Nx-1); //ne calcul pas le point en 0 et en Nx-1
for (i=1; i<Nx-1; i++)
{
vort[i][j]=vortx[i];
}
}
}
The first thing to do is indeed to isolate which loop parallelizations have the most bad impact, but the last loop there looks very much like you would be experiencing cache thrashing. Simplifying the structure a bit:
double vort[Nx][Ny];
// ...
for (int j=1; j<Ny-1; ++j) {
#pragma omp parallel for
for (int i=1; i<Nx-1; ++i) {
vort[i][j] -= f(i, j);
}
}
Any given thread is going to read and update in turn the values in vort at offsets j+k*Ny, j+(k+1)*Ny, j+(k+2)*Ny etc. depending on how the for loop is chunked across the threads. Each of these accesses is going to pull in a cache-line's worth of data to update 8 bytes. And when the outer loop starts again, chances are none of the data you just accessed is still going to be in cache.
All things being equal, if you can arrange your array accesses so that you're moving in the direction of the smallest stride (for C arrays, that's the last index), your cache behaviour will be much better. For dimension size 100, the arrays are likely not so big that this makes a huge difference. For e.g. Nx, Ny = 1000, it will likely be devastating to access the array the 'wrong way'.
This would give poorer performance in serial code, but I think adding threads to it just makes it that much worse.
That all said, the amount of computation done in each of these inner loops is quite small; there's a good chance you're going to be constrained by memory bandwidth regardless.
Addendum
Just to be explicit, the 'right' loop access would look like:
for (int i=1; i<Nx-1; ++i) {
for (int j=1; j<Ny-1; ++j) {
vort[i][j] -= f(i, j);
}
}
And to parallelize it, you can allow the compiler to better chunk the data across threads by using the collapse directive:
#pragma omp parallel for collapse(2)
for (int i=1; i<Nx-1; ++i) {
for (int j=1; j<Ny-1; ++j) {
vort[i][j] -= f(i, j);
}
}
Lastly, in order to avoid false sharing (threads treading on each other's cache lines), it's good to make sure that two adjacent rows of the array don't share data in the same cache line. One could make sure that each row is aligned on to a multiple of the cache-line size in memory, or more simply just add padding to the end of each row.
double vort[Nx][Ny+8]; // 8 doubles ~ 64 bytes
(Assuming a cache-line of 64 bytes, this should suffice.)
When I try to convert a Matlab program as a group which includes several functions to C++ program by Matlab coder app, I get a error says this:
enter image description here.And variable calc is a struct in Matlab. However, if I try the function conscalc itself, there is no problem at all.
What is the problem?
Here is the code of function conscalc:
function [calc] = conscalc(rho, gv, calc)
tp = 1 + rho;
sqrt2 = sqrt(2);
consadj = 2 ^ (0.5 * (1 - calc.alpha));
% initialization;
cv = zeros(gv.nacatlim, 1); % consumption equivalents of future expected marginal utility;
yac = zeros(gv.nacatlim, 1); % income (y) - end of period asset (a) - optimal consumption given end of period asse (c);
margu = zeros(gv.nvcatlim, gv.nacatlim); % marginal utility next period by next period survival status (1 - 3) and initial asset (1 - gv.nacatlim);
cons = zeros(gv.T - gv.beginage + 1, gv.nvcatlim, gv.nacatlim); % optimal consumption at the current period by age (25 - 99), survival status (1 - 3) and initial asset (1 - gv.nacatlim);
% simplified backward induction;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% calculate optimal consumption and marginal utility for the terminal age %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% the optimal decision is to consume everything at the terminal age;
for vcat = 1: 3; % survival status;
for acat = 1 : gv.nacatlim; % asset;
y = calc.acats(acat, 1) + calc.income(vcat, gv.T - gv.beginage + 1); % total resources in the last period given initial asset and income;
mu = y ^ (calc.alpha - 1); % marginal utility given next period survival status and initial asset;
if vcat == 1; % married couple adjustment;
mu = consadj * mu;
end;
% save to marginal utility next period (when calculating backward to age - 1) for later calculations;
margu(vcat, acat) = mu;
end;
end;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% calculate optimal consumption and marginal utility for ages gv.t to gv.T - 1 %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
for age = gv.T - 1 : -1 : gv.beginage; % age;
for vcat = 1 : 3; % survival status;
y = calc.income(vcat, age - gv.beginage + 1); % income given survival status and age;
% calculate expected marginal utility next period given current period end of period asset;
for acat = 1 : gv.nacatlim; % asset;
mu = 0; % expected marginal utility next period given current period survival status and end of period asset;
for rcat = 1 : gv.nrcatlim; % asset return shock;
mur = 0; % marginal utility next period given current period survival status, end of period asset and asset return shock;
% (end of period asset + saving) * asset return shock is asset next period;
% interpolation;
% find corresponding asset grid point of the next period initial asset given current period end of period asset and asset return shock;
acatf = floor(calc.rtransa(acat, rcat));
if acatf >= gv.nacatlim;
acatf = gv.nacatlim - 1;
end;
fa = calc.rtransa(acat, rcat) - acatf;
for vcatt = 1 : 3; % survival status next period;
if vcatt == 1 || (vcat == 1 && age >= gv.surviveage); % the codes are not right. if vcat == 2/3, the program uses margu(1, acatf); should use margu(2/3, acatf); ???
mu0 = margu(vcatt, acatf);
mu1 = margu(vcatt, acatf + 1);
if mu0 <= 0 || mu1 <= 0;
fprintf('Interpolaton Error: Bad mu in rho section: %2d %2d %14.6e %14.6e %2d %2d %2d', ...
calc.obs, age, mu0, mu1, vcat, acat, rcat);
return;
end;
if fa <= 1;
murv = (1 - fa) * mu0 + fa * mu1;
else
if mu0 > mu1;
dmu = mu0 - mu1;
mufact = dmu / mu0;
murv = mu1 / (1 + (fa - 1) * mufact);
else
murv = mu1;
end;
end;
if vcat == 1 && age >= gv.surviveage; % both spouses alive;
mur = mur + calc.transrate(vcatt, age - gv.beginage + 1) * murv;
else
mur = mur + murv;
end;
end;
end;
mu = mu + calc.rprob(rcat, 1) * mur;
end;
% marginal utility this period should equal to the discounted expected marginal utility next period;
% convert optimal discounted expected marginal utility back to consumption level;
if vcat == 1; % both spouses alive;
cv(acat, 1) = sqrt2 * (mu / tp) ^ (1 / (calc.alpha - 1));
elseif vcat == 2 || vcat == 3; % only one spouse alive;
cv(acat, 1) = (mu * calc.srate(vcat - 1, age - gv.beginage + 1) / tp) ^ (1 / (calc.alpha - 1));
end;
yac(acat, 1) = y - calc.acats(acat, 1) - cv(acat, 1); % income - end of period asset - consumption;
end;
% find optimal consumption at the current period given initial asset;
k = 1; % initialize asset grid point;
for acat = 1 : gv.nacatlim; % asset;
nassets = - calc.acats(acat, 1); % - initial asset level at the current period;
% find how much asset left after consumption;
% - asset(t) = income - end of period asset(t) - optimal consumption(t) given end of period asset(t);
% interpolation;
if yac(k, 1) < nassets;
k = k - 1;
while k >= 1 && yac(k, 1) < nassets;
k = k - 1;
end;
if k < 1; % optimal to leave no assets to next period;
f = 0;
k = 1;
elseif k >= 1;
f = (yac(k, 1) - nassets) / (yac(k, 1) - yac(k + 1, 1));
end;
elseif yac(k, 1) >= nassets;
while k < gv.nacatlim && yac(k + 1, 1) >= nassets;
k = k + 1;
end;
if k > gv.nacatlim - 1; % requires extrapolation;
k = gv.nacatlim - 1;
if cv(k + 1, 1) > cv(k, 1);
f = (yac(k, 1) - nassets) / (yac(k, 1) - yac(k + 1, 1));
else
f = 1 + (yac(k + 1, 1) - nassets) / (calc.acats(k + 1, 1) - calc.acats(k, 1));
end;
elseif k <= gv.nacatlim - 1;
f = (yac(k, 1) - nassets) / (yac(k, 1) - yac(k + 1, 1));
end;
end;
c = y + calc.acats(acat, 1) - ((1 - f) * calc.acats(k, 1) + f * calc.acats(k + 1, 1)); % optimal consumption at the current period;
% calculate marginal utility at the current period given optimal consumption;
if vcat == 1; % married couple adjustment;
mu = consadj * c ^ (calc.alpha - 1);
elseif vcat == 2 || vcat == 3;
mu = c ^ (calc.alpha - 1);
end;
% save optimal consumption to corresponding optimal consumption matrix for later calculations;
cons(age - gv.beginage + 1, vcat, acat) = c; % optimal consumption at the current period;
margu(vcat, acat) = mu; % marginal utility next period (when calculating backward at age - 1), given survival status and initial asset;
end;
end;
end;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% assign the values to structure variable calc for future calculations %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
calc.cons = cons;
end
The error suggests that during the automatic input type definition your function is called with structures that have different sets of fields.
To debug, put a breakpoint in your entry-point function in MATLAB and run your input definition test script. Take note of the structures being passed in to see where the mismatch originates.
I have a c[N][M] matrix where I apply a max-sum operation over a (K+1)² window. I am trying to reduce the complexity of the naive algorithm.
In particular, here's my code snippet in C++:
<!-- language: cpp -->
int N,M,K;
std::cin >> N >> M >> K;
std::pair< unsigned , unsigned > opt[N][M];
unsigned c[N][M];
// Read values for c[i][j]
// Initialize all opt[i][j] at (0,0).
for ( int i = 0; i < N; i ++ ) {
for ( int j = 0; j < M ; j ++ ) {
unsigned max = 0;
int posX = i, posY = j;
for ( int ii = i; (ii >= i - K) && (ii >= 0); ii -- ) {
for ( int jj = j; (jj >= j - K) && (jj >= 0); jj -- ) {
// Ignore the (i,j) position
if (( ii == i ) && ( jj == j )) {
continue;
}
if ( opt[ii][jj].second > max ) {
max = opt[ii][jj].second;
posX = ii;
posY = jj;
}
}
}
opt[i][j].first = opt[posX][posY].second;
opt[i][j].second = c[i][j] + opt[posX][posY].first;
}
}
The goal of the algorithm is to compute opt[N-1][M-1].
Example: for N = 4, M = 4, K = 2 and:
c[N][M] = 4 1 1 2
6 1 1 1
1 2 5 8
1 1 8 0
... the result should be opt[N-1][M-1] = {14, 11}.
The running complexity of this snippet is however O(N M K²). My goal is to reduce the running time complexity. I have already seen posts like this, but it appears that my "filter" is not separable, probably because of the sum operation.
More information (optional): this is essentially an algorithm which develops the optimal strategy in a "game" where:
Two players lead a single team in a N × M dungeon.
Each position of the dungeon has c[i][j] gold coins.
Starting position: (N-1,M-1) where c[N-1][M-1] = 0.
The active player chooses the next position to move the team to, from position (x,y).
The next position can be any of (x-i, y-j), i <= K, j <= K, i+j > 0. In other words, they can move only left and/or up, up to a step K per direction.
The player who just moved the team gets the coins in the new position.
The active player alternates each turn.
The game ends when the team reaches (0,0).
Optimal strategy for both players: maximize their own sum of gold coins, if they know that the opponent is following the same strategy.
Thus, opt[i][j].first represents the coins of the player who will now move from (i,j) to another position. opt[i][j].second represents the coins of the opponent.
Here is a O(N * M) solution.
Let's fix the lower row(r). If the maximum for all rows between r - K and r is known for every column, this problem can be reduced to a well-known sliding window maximum problem. So it is possible to compute the answer for a fixed row in O(M) time.
Let's iterate over all rows in increasing order. For each column the maximum for all rows between r - K and r is the sliding window maximum problem, too. Processing each column takes O(N) time for all rows.
The total time complexity is O(N * M).
However, there is one issue with this solution: it does not exclude the (i, j) element. It is possible to fix it by running the algorithm described above twice(with K * (K + 1) and (K + 1) * K windows) and then merging the results(a (K + 1) * (K + 1) square without a corner is a union of two rectangles with K * (K + 1) and (K + 1) * K size).
I have a problem that could be boiled down to finding a way of mapping a triangular matrix to a vector skipping the diagonal.
Basically I need to translate this C++ code using the Gecode libraries
// implied constraints
for (int k=0, i=0; i<n-1; i++)
for (int j=i+1; j<n; j++, k++)
rel(*this, d[k], IRT_GQ, (j-i)*(j-i+1)/2);
Into this MiniZinc (functional) code
constraint
forall ( i in 1..m-1 , j in i+1..m )
( (differences[?]) >= (floor(int2float(( j-i )*( j-i+1 )) / int2float(2)) ));
And I need to figure out the index in differences[?].
MiniZinc is a functional/mathematical language with no proper for loops.
So I have to map those indexes i and j that are touching all and only the cells of an upper triangular matrix, skipping its diagonal, to a k that numbers those cells from 0 to whatever.
If this was a regular triangular matrix (it's not), a solution like this would do
index = x + (y+1)*y/2
The matrix I'm handling is a square n*n matrix with indexes going from 0 to n-1, but it would be nice to provide a more general solution for an n*m matrix.
Here's the full Minizinc code
% modified version of the file found at https://github.com/MiniZinc/minizinc-benchmarks/blob/master/golomb/golomb.mzn
include "alldifferent.mzn";
int: m;
int: n = m*m;
array[1..m] of var 0..n: mark;
array[int] of var 0..n: differences = [mark[j] - mark[i] | i in 1..m, j in i+1..m];
constraint mark[1] = 0;
constraint forall ( i in 1..m-1 ) ( mark[i] < mark[i+1] );
% this version of the constraint works
constraint forall ( i in 1..m-1 , j in i+1..m )
( (mark[j] - mark[i]) >= (floor(int2float(( j-i )*( j-i+1 )) / int2float(2))) );
%this version does not
%constraint forall ( i in 1..m-1, j in i+1..m )
% ( (differences[(i-1) + ((j-2)*(j-1)) div 2]) >= (floor(int2float(( j-i )*( j-i+1 )) / int2float(2))) );
constraint alldifferent(differences);
constraint differences[1] < differences[(m*(m-1)) div 2];
solve :: int_search(mark, input_order, indomain, complete) minimize mark[m];
output ["golomb ", show(mark), "\n"];
Thanks.
Be careful. The formula you found from that link, index = x + (y+1)*y/2, includes the diagonal entries, and is for a lower triangular matrix, which I gather is not what you want. The exact formula you are looking for is actually index = x + ((y-1)y)/2
(see: https://math.stackexchange.com/questions/646117/how-to-find-a-function-mapping-matrix-indices).
Again, watch out, this formula I gave you assumes your indices: x,y, are zero-based. Your MiniZinc code is using indices i,j that start from 1 (1 <= i <= m), 1 <= j <= m)). For indices that start from 1, the formula is T(i,j) = i + ((j-2)(j-1))/2. So your code should look like:
constraint
forall ( i in 1..m-1 , j in i+1..m )
((distances[(i + ((j-2)*(j-1)) div 2]) >= ...
Note that (j-2)(j-1) will always be a multiple of 2, so we can just use integer division with divisor 2 (no need to worry about converting to/from floats).
The above assumes you are using a square m*m matrix.
To generalise to a M*N rectangular matrix, one formula could be:
where 0 <= i < M, 0<= j < N [If you again, need your indices to start from 1, replace i with i-1 and j with j-1 in the above formula]. This touches all of cells of an upper triangular matrix as well as the 'extra block on the side' of the square that occurs when N > M. That is, it touches all cells (i,j) such that i < j for 0 <= i < M, 0 <= j < N.
Full code:
% original: https://github.com/MiniZinc/minizinc-benchmarks/blob/master/golomb/golomb.mzn
include "alldifferent.mzn";
int: m;
int: n = m*m;
array[1..m] of var 0..n: mark;
array[1..(m*(m-1)) div 2] of var 0..n: differences;
constraint mark[1] = 0;
constraint forall ( i in 1..m-1 ) ( mark[i] < mark[i+1] );
constraint alldifferent(differences);
constraint forall (i,j in 1..m where j > i)
(differences[i + ((j-1)*(j-2)) div 2] = mark[j] - mark[i]);
constraint forall (i,j in 1..m where j > i)
(differences[i + ((j-1)*(j-2)) div 2] >= (floor(int2float(( j-i )*( j-i+1 )) / int2float(2))));
constraint differences[1] < differences[(m*(m-1)) div 2];
solve :: int_search(mark, input_order, indomain, complete)
minimize mark[m];
output ["golomb ", show(mark), "\n"];
Lower triangular version (take previous code and swap i and j where necessary):
% original: https://github.com/MiniZinc/minizinc-benchmarks/blob/master/golomb/golomb.mzn
include "alldifferent.mzn";
int: m;
int: n = m*m;
array[1..m] of var 0..n: mark;
array[1..(m*(m-1)) div 2] of var 0..n: differences;
constraint mark[1] = 0;
constraint forall ( i in 1..m-1 ) ( mark[i] < mark[i+1] );
constraint alldifferent(differences);
constraint forall (i,j in 1..m where i > j)
(differences[j + ((i-1)*(i-2)) div 2] = mark[i] - mark[j]);
constraint forall (i,j in 1..m where i > j)
(differences[j + ((i-1)*(i-2)) div 2] >= (floor(int2float(( i-j )*( i-j+1 )) / int2float(2))));
constraint differences[1] < differences[(m*(m-1)) div 2];
solve :: int_search(mark, input_order, indomain, complete)
minimize mark[m];
output ["golomb ", show(mark), "\n"];
I am solving a problem on directed acyclic graph.
But I am having trouble testing my code on some directed acyclic graphs. The test graphs should be large, and (obviously) acyclic.
I tried a lot to write code for generating acyclic directed graphs. But I failed every time.
Is there some existing method to generate acyclic directed graphs I could use?
I cooked up a C program that does this. The key is to 'rank' the nodes, and only draw edges from lower ranked nodes to higher ranked ones.
The program I wrote prints in the DOT language.
Here is the code itself, with comments explaining what it means:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define MIN_PER_RANK 1 /* Nodes/Rank: How 'fat' the DAG should be. */
#define MAX_PER_RANK 5
#define MIN_RANKS 3 /* Ranks: How 'tall' the DAG should be. */
#define MAX_RANKS 5
#define PERCENT 30 /* Chance of having an Edge. */
int main (void)
{
int i, j, k,nodes = 0;
srand (time (NULL));
int ranks = MIN_RANKS
+ (rand () % (MAX_RANKS - MIN_RANKS + 1));
printf ("digraph {\n");
for (i = 0; i < ranks; i++)
{
/* New nodes of 'higher' rank than all nodes generated till now. */
int new_nodes = MIN_PER_RANK
+ (rand () % (MAX_PER_RANK - MIN_PER_RANK + 1));
/* Edges from old nodes ('nodes') to new ones ('new_nodes'). */
for (j = 0; j < nodes; j++)
for (k = 0; k < new_nodes; k++)
if ( (rand () % 100) < PERCENT)
printf (" %d -> %d;\n", j, k + nodes); /* An Edge. */
nodes += new_nodes; /* Accumulate into old node set. */
}
printf ("}\n");
return 0;
}
And here is the graph generated from a test run:
The answer to https://mathematica.stackexchange.com/questions/608/how-to-generate-random-directed-acyclic-graphs applies: if you have a adjacency matrix representation of the edges of your graph, then if the matrix is lower triangular, it's a DAG by necessity.
A similar approach would be to take an arbitrary ordering of your nodes, and then consider edges from node x to y only when x < y. That constraint should also get your DAGness by construction. Memory comparison would be one arbitrary way to order your nodes if you're using structs to represent nodes.
Basically, the pseudocode would be something like:
for(i = 0; i < N; i++) {
for (j = i+1; j < N; j++) {
maybePutAnEdgeBetween(i, j);
}
}
where N is the number of nodes in your graph.
The pseudocode suggests that the number of potential DAGs, given N nodes, is
2^(n*(n-1)/2),
since there are
n*(n-1)/2
ordered pairs ("N choose 2"), and we can choose either to have the edge between them or not.
So, to try to put all these reasonable answers together:
(In the following, I used V for the number of vertices in the generated graph, and E for the number of edges, and we assume that E ≤ V(V-1)/2.)
Personally, I think the most useful answer is in a comment, by Flavius, who points at the code at http://condor.depaul.edu/rjohnson/source/graph_ge.c. That code is really simple, and it's conveniently described by a comment, which I reproduce:
To generate a directed acyclic graph, we first
generate a random permutation dag[0],...,dag[v-1].
(v = number of vertices.)
This random permutation serves as a topological
sort of the graph. We then generate random edges of the
form (dag[i],dag[j]) with i < j.
In fact, what the code does is generate the request number of edges by repeatedly doing the following:
generate two numbers in the range [0, V);
reject them if they're equal;
swap them if the first is larger;
reject them if it has generated them before.
The problem with this solution is that as E gets closes to the maximum number of edges V(V-1)/2, then the algorithm becomes slower and slower, because it has to reject more and more edges. A better solution would be to make a vector of all V(V-1)/2 possible edges; randomly shuffle it; and select the first (requested edges) edges in the shuffled list.
The reservoir sampling algorithm lets us do this in space O(E), since we can deduce the endpoints of the kth edge from the value of k. Consequently, we don't actually have to create the source vector. However, it still requires O(V2) time.
Alternatively, one can do a Fisher-Yates shuffle (or Knuth shuffle, if you prefer), stopping after E iterations. In the version of the FY shuffle presented in Wikipedia, this will produce the trailing entries, but the algorithm works just as well backwards:
// At the end of this snippet, a consists of a random sample of the
// integers in the half-open range [0, V(V-1)/2). (They still need to be
// converted to pairs of endpoints).
vector<int> a;
int N = V * (V - 1) / 2;
for (int i = 0; i < N; ++i) a.push_back(i);
for (int i = 0; i < E; ++i) {
int j = i + rand(N - i);
swap(a[i], a[j]);
a.resize(E);
This requires only O(E) time but it requires O(N2) space. In fact, this can be improved to O(E) space with some trickery, but an SO code snippet is too small to contain the result, so I'll provide a simpler one in O(E) space and O(E log E) time. I assume that there is a class DAG with at least:
class DAG {
// Construct an empty DAG with v vertices
explicit DAG(int v);
// Add the directed edge i->j, where 0 <= i, j < v
void add(int i, int j);
};
Now here goes:
// Return a randomly-constructed DAG with V vertices and and E edges.
// It's required that 0 < E < V(V-1)/2.
template<typename PRNG>
DAG RandomDAG(int V, int E, PRNG& prng) {
using dist = std::uniform_int_distribution<int>;
// Make a random sample of size E
std::vector<int> sample;
sample.reserve(E);
int N = V * (V - 1) / 2;
dist d(0, N - E); // uniform_int_distribution is closed range
// Random vector of integers in [0, N-E]
for (int i = 0; i < E; ++i) sample.push_back(dist(prng));
// Sort them, and make them unique
std::sort(sample.begin(), sample.end());
for (int i = 1; i < E; ++i) sample[i] += i;
// Now it's a unique sorted list of integers in [0, N-E+E-1]
// Randomly shuffle the endpoints, so the topological sort
// is different, too.
std::vector<int> endpoints;
endpoints.reserve(V);
for (i = 0; i < V; ++i) endpoints.push_back(i);
std::shuffle(endpoints.begin(), endpoints.end(), prng);
// Finally, create the dag
DAG rv;
for (auto& v : sample) {
int tail = int(0.5 + sqrt((v + 1) * 2));
int head = v - tail * (tail - 1) / 2;
rv.add(head, tail);
}
return rv;
}
You could generate a random directed graph, and then do a depth-first search for cycles. When you find a cycle, break it by deleting an edge.
I think this is worst case O(VE). Each DFS takes O(V), and each one removes at least one edge (so max E)
If you generate the directed graph by uniformly random selecting all V^2 possible edges, and you DFS in random order and delete a random edge - this would give you a uniform distribution (or at least close to it) over all possible dags.
A very simple approach is:
Randomly assign edges by iterating over the indices of a lower diagonal matrix (as suggested by a link above: https://mathematica.stackexchange.com/questions/608/how-to-generate-random-directed-acyclic-graphs)
This will give you a DAG with possibly more than one component. You can use a Disjoint-set data structure to give you the components that can then be merged by creating edges between the components.
Disjoint-sets are described here: https://en.wikipedia.org/wiki/Disjoint-set_data_structure
Edit: I initially found this post while I was working with a scheduling problem named flexible job shop scheduling problem with sequencing flexibility where jobs (the order in which operations are processed) are defined by directed acyclic graphs. The idea was to use an algorithm to generate multiple random directed graphs (jobs) and create instances of the scheduling problem to test my algorithms. The code at the end of this post is a basic version of the one I used to generate the instances. The instance generator can be found here.
I translated to Python and integrated some functionalities to create a transitive set of the random DAG. In this way, the graph generated has the minimum number of edges with the same reachability.
The transitive graph can be visualized at http://dagitty.net/dags.html by pasting the output in Model code (in the right).
Python version of the algorithm
import random
class Graph:
nodes = []
edges = []
removed_edges = []
def remove_edge(self, x, y):
e = (x,y)
try:
self.edges.remove(e)
# print("Removed edge %s" % str(e))
self.removed_edges.append(e)
except:
return
def Nodes(self):
return self.nodes
# Sample data
def __init__(self):
self.nodes = []
self.edges = []
def get_random_dag():
MIN_PER_RANK = 1 # Nodes/Rank: How 'fat' the DAG should be
MAX_PER_RANK = 2
MIN_RANKS = 6 # Ranks: How 'tall' the DAG should be
MAX_RANKS = 10
PERCENT = 0.3 # Chance of having an Edge
nodes = 0
ranks = random.randint(MIN_RANKS, MAX_RANKS)
adjacency = []
for i in range(ranks):
# New nodes of 'higher' rank than all nodes generated till now
new_nodes = random.randint(MIN_PER_RANK, MAX_PER_RANK)
# Edges from old nodes ('nodes') to new ones ('new_nodes')
for j in range(nodes):
for k in range(new_nodes):
if random.random() < PERCENT:
adjacency.append((j, k+nodes))
nodes += new_nodes
# Compute transitive graph
G = Graph()
# Append nodes
for i in range(nodes):
G.nodes.append(i)
# Append adjacencies
for i in range(len(adjacency)):
G.edges.append(adjacency[i])
N = G.Nodes()
for x in N:
for y in N:
for z in N:
if (x, y) != (y, z) and (x, y) != (x, z):
if (x, y) in G.edges and (y, z) in G.edges:
G.remove_edge(x, z)
# Print graph
for i in range(nodes):
print(i)
print()
for value in G.edges:
print(str(value[0]) + ' ' + str(value[1]))
get_random_dag()
Bellow, you may see in the figure the random DAG with many redundant edges generated by the Python code above.
I adapted the code to generate the same graph (same reachability) but with the least possible number of edges. This is also called transitive reduction.
def get_random_dag():
MIN_PER_RANK = 1 # Nodes/Rank: How 'fat' the DAG should be
MAX_PER_RANK = 3
MIN_RANKS = 15 # Ranks: How 'tall' the DAG should be
MAX_RANKS = 20
PERCENT = 0.3 # Chance of having an Edge
nodes = 0
node_counter = 0
ranks = random.randint(MIN_RANKS, MAX_RANKS)
adjacency = []
rank_list = []
for i in range(ranks):
# New nodes of 'higher' rank than all nodes generated till now
new_nodes = random.randint(MIN_PER_RANK, MAX_PER_RANK)
list = []
for j in range(new_nodes):
list.append(node_counter)
node_counter += 1
rank_list.append(list)
print(rank_list)
# Edges from old nodes ('nodes') to new ones ('new_nodes')
if i > 0:
for j in rank_list[i - 1]:
for k in range(new_nodes):
if random.random() < PERCENT:
adjacency.append((j, k+nodes))
nodes += new_nodes
for i in range(nodes):
print(i)
print()
for edge in adjacency:
print(str(edge[0]) + ' ' + str(edge[1]))
print()
print()
Result:
Create a graph with n nodes and an edge between each pair of node n1 and n2 if n1 != n2 and n2 % n1 == 0.
I recently tried re-implementing the accepted answer and found that it is indeterministic. If you don't enforce the min_per_rank parameter, you could end up with a graph with 0 nodes.
To prevent this, I wrapped the for loops in a function and then checked to make sure that, after each rank, that min_per_rank was satisfied. Here's the JavaScript implementation:
https://github.com/karissa/random-dag
And some pseudo-C code that would replace the accepted answer's main loop.
int pushed = 0
int addRank (void)
{
for (j = 0; j < nodes; j++)
for (k = 0; k < new_nodes; k++)
if ( (rand () % 100) < PERCENT)
printf (" %d -> %d;\n", j, k + nodes); /* An Edge. */
if (pushed < min_per_rank) return addRank()
else pushed = 0
return 0
}
Generating a random DAG which might not be connected
Here's an simple algorithm for generating a random DAG that might not be connected.
const randomDAG = (x, n) => {
const length = n * (n - 1) / 2;
const dag = new Array(length);
for (let i = 0; i < length; i++) {
dag[i] = Math.random() < x ? 1 : 0;
}
return dag;
};
const dagIndex = (n, i, j) => n * i + j - (i + 1) * (i + 2) / 2;
const dagToDot = (n, dag) => {
let dot = "digraph {\n";
for (let i = 0; i < n; i++) {
dot += ` ${i};\n`;
for (let j = i + 1; j < n; j++) {
const k = dagIndex(n, i, j);
if (dag[k]) dot += ` ${i} -> ${j};\n`;
}
}
return dot + "}";
};
const randomDot = (x, n) => dagToDot(n, randomDAG(x, n));
new Viz().renderSVGElement(randomDot(0.3, 10)).then(svg => {
document.body.appendChild(svg);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/viz.js/2.1.2/viz.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/viz.js/2.1.2/full.render.js"></script>
If you run this code snippet a couple of times, you might see a DAG which is not connected.
So, how does this code work?
A directed acyclic graph (DAG) is just a topologically sorted undirected graph. An undirected graph of n vertices can have a maximum of n * (n - 1) / 2 edges, not counting repeated edges or edges from a vertex to itself. Now, you can only have an edge from a lower vertex to a higher vertex. Hence, the direction of all the edges are predetermined.
This means that you can represent the entire DAG using a one dimensional array of n * (n - 1) / 2 edge weights. An edge weight of 0 means that the edge is absent. Hence, we just create a random array of zeros or ones, and that's our random DAG.
An edge from vertex i to vertex j in a DAG of n vertices, where i < j, has an edge weight at index k where k = n * i + j - (i + 1) * (i + 2) / 2.
Generating a connected DAG
Once you generate a random DAG, you can check if it's connected using the following function.
const isConnected = (n, dag) => {
const reached = new Array(n).fill(false);
reached[0] = true;
const queue = [0];
while (queue.length > 0) {
const x = queue.shift();
for (let i = 0; i < n; i++) {
if (i === n || reached[i]) continue;
const j = i < x ? dagIndex(n, i, x) : dagIndex(n, x, i);
if (dag[j] === 0) continue;
reached[i] = true;
queue.push(i);
}
}
return reached.every(x => x); // return true if every vertex was reached
};
If it's not connected then its complement will always be connected.
const complement = dag => dag.map(x => x ? 0 : 1);
const randomConnectedDAG = (x, n) => {
const dag = randomDAG(x, n);
return isConnected(n, dag) ? dag : complement(dag);
};
Note that if we create a random DAG with 30% edges then its complement will have 70% edges. Hence, the only safe value for x is 50%. However, if you care about connectivity more than the percentage of edges then this shouldn't be a deal breaker.
Finally, putting it all together.
const randomDAG = (x, n) => {
const length = n * (n - 1) / 2;
const dag = new Array(length);
for (let i = 0; i < length; i++) {
dag[i] = Math.random() < x ? 1 : 0;
}
return dag;
};
const dagIndex = (n, i, j) => n * i + j - (i + 1) * (i + 2) / 2;
const isConnected = (n, dag) => {
const reached = new Array(n).fill(false);
reached[0] = true;
const queue = [0];
while (queue.length > 0) {
const x = queue.shift();
for (let i = 0; i < n; i++) {
if (i === n || reached[i]) continue;
const j = i < x ? dagIndex(n, i, x) : dagIndex(n, x, i);
if (dag[j] === 0) continue;
reached[i] = true;
queue.push(i);
}
}
return reached.every(x => x); // return true if every vertex was reached
};
const complement = dag => dag.map(x => x ? 0 : 1);
const randomConnectedDAG = (x, n) => {
const dag = randomDAG(x, n);
return isConnected(n, dag) ? dag : complement(dag);
};
const dagToDot = (n, dag) => {
let dot = "digraph {\n";
for (let i = 0; i < n; i++) {
dot += ` ${i};\n`;
for (let j = i + 1; j < n; j++) {
const k = dagIndex(n, i, j);
if (dag[k]) dot += ` ${i} -> ${j};\n`;
}
}
return dot + "}";
};
const randomConnectedDot = (x, n) => dagToDot(n, randomConnectedDAG(x, n));
new Viz().renderSVGElement(randomConnectedDot(0.3, 10)).then(svg => {
document.body.appendChild(svg);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/viz.js/2.1.2/viz.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/viz.js/2.1.2/full.render.js"></script>
If you run this code snippet a couple of times, you may see a DAG with a lot more edges than others.
Generating a connected DAG with a certain percentage of edges
If you care about both connectivity and having a certain percentage of edges then you can use the following algorithm.
Start with a fully connected graph.
Randomly remove edges.
After removing an edge, check if the graph is still connected.
If it's no longer connected then add that edge back.
It should be noted that this algorithm is not as efficient as the previous method.
const randomDAG = (x, n) => {
const length = n * (n - 1) / 2;
const dag = new Array(length).fill(1);
for (let i = 0; i < length; i++) {
if (Math.random() < x) continue;
dag[i] = 0;
if (!isConnected(n, dag)) dag[i] = 1;
}
return dag;
};
const dagIndex = (n, i, j) => n * i + j - (i + 1) * (i + 2) / 2;
const isConnected = (n, dag) => {
const reached = new Array(n).fill(false);
reached[0] = true;
const queue = [0];
while (queue.length > 0) {
const x = queue.shift();
for (let i = 0; i < n; i++) {
if (i === n || reached[i]) continue;
const j = i < x ? dagIndex(n, i, x) : dagIndex(n, x, i);
if (dag[j] === 0) continue;
reached[i] = true;
queue.push(i);
}
}
return reached.every(x => x); // return true if every vertex was reached
};
const dagToDot = (n, dag) => {
let dot = "digraph {\n";
for (let i = 0; i < n; i++) {
dot += ` ${i};\n`;
for (let j = i + 1; j < n; j++) {
const k = dagIndex(n, i, j);
if (dag[k]) dot += ` ${i} -> ${j};\n`;
}
}
return dot + "}";
};
const randomDot = (x, n) => dagToDot(n, randomDAG(x, n));
new Viz().renderSVGElement(randomDot(0.3, 10)).then(svg => {
document.body.appendChild(svg);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/viz.js/2.1.2/viz.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/viz.js/2.1.2/full.render.js"></script>
Hope that helps.
To test algorithms I generated random graphs based on node layers. This is the Python script (also print the adjacency list). You can change the nodes connection probability percentages or add layers to have a slightly different or "taller" graphs:
# Weighted DAG generator by forward layers
import argparse
import random
parser = argparse.ArgumentParser("dag_gen2")
parser.add_argument(
"--layers",
help="DAG forward layers. Default=5",
type=int,
default=5,
)
args = parser.parse_args()
layers = [[] for _ in range(args.layers)]
edges = {}
node_index = -1
print(f"Creating {len(layers)} layers graph")
# Random horizontal connections -low probability-
def random_horizontal(layer):
for node1 in layer:
# Avoid cycles
for node2 in filter(
lambda n2: node1 != n2 and node1 not in map(lambda el: el[0], edges[n2]),
layer,
):
if random.randint(0, 100) < 10:
w = random.randint(1, 10)
edges[node1].append((node2, w))
# Connect two layers
def connect(layer1, layer2):
random_horizontal(layer1)
for node1 in layer1:
for node2 in layer2:
if random.randint(0, 100) < 30:
w = random.randint(1, 10)
edges[node1].append((node2, w))
# Start nodes 1 to 3
start_nodes = random.randint(1, 3)
start_layer = []
for sn in range(start_nodes + 1):
node_index += 1
start_layer.append(node_index)
# Gen nodes
for layer in layers:
nodes = random.randint(2, 5)
for n in range(nodes):
node_index += 1
layer.append(node_index)
# Connect all
layers.insert(0, start_layer)
for layer in layers:
for node in layer:
edges[node] = []
for i, layer in enumerate(layers[:-1]):
connect(layer, layers[i + 1])
# Print in DOT language
print("digraph {")
for node_key in [node_key for node_key in edges.keys() if len(edges[node_key]) > 0]:
for node_dst, weight in edges[node_key]:
print(f" {node_key} -> {node_dst} [label={weight}];")
print("}")
print("---- Adjacency list ----")
print(edges)