Find Max and Min of Double Linked List in Golang - list

I need to find a function where I get the max and min value (using int values) of a Double Linked List (not an array) using Golang. I've write the code for getting this values for array but i have no idea where to start when i use a double linked list (or circular) :
package main
import "fmt"
func main() {
var a = [5]int{11, -4, 7, 8, -10}
min, max := findMinAndMax(a)
fmt.Println("Min: ", min)
fmt.Println("Max: ", max)
}
This works but i need to replace the array with a double list (for ex. called 'l') where i have 5 int values (2,5,7,9,12) and i need to get the max and min values
l={2,5,7,9,12} // Created a double linked list

Below code will help you to find the answer. I used Go container/list package's (double) list to findMinAndMax operation.
package main
import (
"container/list"
"fmt"
)
func main() {
var a = [5]int{11, -4, 7, 8, -10}
//declare a double list
i := list.New()
// push a's values to list
for _, valueA := range a {
i.PushFront(valueA)
}
min, max := findMinAndMax(i)
fmt.Println("Min: ", min) //Output: Min: -10
fmt.Println("Max: ", max) //Output: Max: 11
}
func findMinAndMax(a *list.List) (min int, max int) {
min = a.Front().Value.(int)
max = a.Front().Value.(int)
for e := a.Front(); e != nil; e = e.Next() {
value := e.Value.(int)
if value < min {
min = value
}
if value > max {
max = value
}
}
return min, max
}
The Go Playground for list
For circular linked list (ring) implementation will be as below. I used Go container/ring package's Ring for the implementation.
package main
import (
"container/ring"
"fmt"
)
func main() {
var a = [5]int{11, -4, 7, 8, -10}
//declare a container ring
r := ring.New(len(a))
// push a's values to ring
for _, valueA := range a {
r.Value = valueA
r = r.Next()
}
min, max := findMinAndMax(r)
fmt.Println("Min: ", min) //Output: Min: -10
fmt.Println("Max: ", max) //Output: Max: 11
}
func findMinAndMax(r *ring.Ring) (min int, max int) {
min = r.Value.(int)
max = r.Value.(int)
tmp := r
for {
value := r.Value.(int)
if value < min {
min = value
}
if value > max {
max = value
}
r = r.Next()
// r = tmp means one ring circle is completed
if r == tmp {
break
}
}
return min, max
}
The Go Playground for ring

This was the code i've done for getting max and min of an arrau but i need both from a double linked list, hope this is understood:
func findMinAndMax(a [5]int) (min int, max int) {
min = a[0]
max = a[0]
for _, value := range a {
if value < min {
min = value
}
if value > max {
max = value
}
}
return min, max
}

Related

Maximum difference between sum of even and odd position elements: How to memoize the brute-force approach?

I have the following code for a problem.
The problem is: Maximize the absolute difference between the sum of elements at the even and odd positions of an array. To do so, you may delete as many elements you want.
I did it by brute-force by using backtracking. My logic is that, for each index I have 2 options:
a) either delete it (in this case, I put it in a set)
b) don't delete it (in this case, I removed the index from the set and backtracked).
I took the local maximum of two cases and updated the global maximum value appropriately.
void maxAns(vector<int> &arr, int index, set<int> &removed, int &res)
{
if (index<0)
return;
int k=0;
int s3=0,s4=0;
for (int i=0;i<arr.size();i++)
{
if (i!=index)
{
set<int>::iterator it=removed.find(i);
if (it==removed.end())
{
if( k%2==0)
s3+=arr[i];
else
s4+=arr[i];
k++;
}
}
else //don't delete the element
{
if (k%2==0)
s3+=arr[i];
else
s4+=arr[i];
k++;
}
}
k=0;
int s1=0, s2=0;
for (int i=0;i<arr.size();i++)
{
if (i!=index)
{
set<int>::iterator it=removed.find(i);
if (it==removed.end())
{
if (k%2==0)
s1+=arr[i];
else
s2+=arr[i];
k++;
}
}
else //delete the element
{
//add index into the removed set
removed.insert(index);
}
}
//delete the index element
int t1=abs(s1-s2);
maxAns(arr,index-1,removed,res);
//don't delete the index element, and then backtrack
set<int>::iterator itr=removed.find(index);
removed.erase(itr);
int t2=abs(s3-s4);
maxAns(arr,index-1,removed,res);
//choose the max value
res=max(res,max(t1,t2));
}
Please suggest how to memoize this solution as I think it's quite inefficient. Feel free to share any interesting approach.
Hint: divide and conquer. Consider that a fixed length list as a left part of a larger list, maximised (or minimised) for the actual, rather than abdolute difference and depending on the parity of its length, would pair better with a right part that does not depend on the parity of its length.
[0,3] ++ [0,3] -> diff -3 -3 = -6
[0,3] ++ [9,13,1] -> diff -3 -3 = -6
We can also easily create base cases for max_actual_diff and min_actual_diff of lists with lengths 1 and 2. Note that the best choice for those might include ommiting one or more of those few elements.
JavaScript code:
function max_diff(A, el, er, memo){
if (memo[['mx', el, er]])
return memo[['mx', el, er]]
if (er == el)
return memo[['mx', el, er]] = [A[el], 1, 0, 0]
var best = [A[el], 1, 0, 0]
if (er == el + 1){
if (A[el] - A[er] > best[2]){
best[2] = A[el] - A[er]
best[3] = 2
}
if (A[er] > best[0]){
best[0] = A[er]
best[1] = 1
}
return memo[['mx', el, er]] = best
}
const mid = el + ((er - el) >> 1)
const left = max_diff(A, el, mid, memo)
const right_min = min_diff(A, mid + 1, er, memo)
const right_max = max_diff(A, mid + 1, er, memo)
// Best odd = odd + even
if (left[0] - right_min[2] > best[0]){
best[0] = left[0] - right_min[2]
best[1] = left[1] + right_min[3]
}
// Best odd = even + odd
if (left[2] + right_max[0] > best[0]){
best[0] = left[2] + right_max[0]
best[1] = left[3] + right_max[1]
}
// Best even = odd + odd
if (left[0] - right_min[0] > best[2]){
best[2] = left[0] - right_min[0]
best[3] = left[1] + right_min[1]
}
// Best even = even + even
if (left[2] + right_max[2] > best[2]){
best[2] = left[2] + right_max[2]
best[3] = left[3] + right_max[3]
}
return memo[['mx', el, er]] = best
}
function min_diff(A, el, er, memo){
if (memo[['mn', el, er]])
return memo[['mn', el, er]]
if (er == el)
return memo[['mn', el, er]] = [A[el], 1, 0, 0]
var best = [A[el], 1, 0, 0]
if (er == el + 1){
if (A[el] - A[er] < best[2]){
best[2] = A[el] - A[er]
best[3] = 2
}
if (A[er] < best[0]){
best[0] = A[er]
best[1] = 1
}
return memo[['mn', el, er]] = best
}
const mid = el + ((er - el) >> 1)
const left = min_diff(A, el, mid, memo)
const right_min = min_diff(A, mid + 1, er, memo)
const right_max = max_diff(A, mid + 1, er, memo)
// Best odd = odd + even
if (left[0] - right_max[2] < best[0]){
best[0] = left[0] - right_max[2]
best[1] = left[1] + right_max[3]
}
// Best odd = even + odd
if (left[2] + right_min[0] < best[0]){
best[0] = left[2] + right_min[0]
best[1] = left[3] + right_min[1]
}
// Best even = odd + odd
if (left[0] - right_max[0] < best[2]){
best[2] = left[0] - right_max[0]
best[3] = left[1] + right_max[1]
}
// Best even = even + even
if (left[2] + right_min[2] < best[2]){
best[2] = left[2] + right_min[2]
best[3] = left[3] + right_min[3]
}
return memo[['mn', el, er]] = best
}
var memo = {}
var A = [1, 2, 3, 4, 5]
console.log(`A: ${ JSON.stringify(A) }`)
console.log(
JSON.stringify(max_diff(A, 0, A.length-1, memo)) + ' // [odd max, len, even max, len]')
console.log(
JSON.stringify(min_diff(A, 0, A.length-1, memo)) + ' // [odd min, len, even min, len]')
console.log('\nmemo:\n' + JSON.stringify(memo))
Maximize the absolute difference between the sum of elements at the odd and even positions of an array. To do so, you may delete as many elements as you want.
Example
A = [9, 5, 2, 9, 4]
Ans = 16 => [9, 2, 9] = 9-2+9
A = [8, 6, 2, 7, 7, 2, 7]
Ans = 18 => [8, 2, 7, 2, 7] = 8-2+7-2+7
Hint:
At the position "i+1", Let the maximum and minimum possible difference for all the subsequences of subarray A[i+1,n] be Max, Min respectively
Hence at position "i", the maximum and minimum possible difference for all the subsequences of subarray A[i, n] can be calculated as
Include the current element arr[i]
Don't Include the current element arr[I]
Max = MAX(Max, arr[i] - Min)
Min = MIN(Min, arr[i] - Max)
Explanation:
A = 9, 5, 2, 9, 4
Max = 16, 12, 9, 9, 4
Min = -7, -7, -7, 0, 0
Final Answer: Max(Max[0], Min[0]) = Max(16, -7) = 16
Time Complexity: O(n)
Space Complexity: O(1) * As Just 2 variables Max, Min were used*
Let's say we always add the values at even positions and we always rest the values at odd positions. Now, we will iterate from 1 to 𝑛nΒ and make choices: keep the element or delete it; the thing is that when we keep an element, we need to know if it is at an even position or an odd position, so we will do Dynamic Programming:
𝑑𝑝[𝑖][0]dp[i][0]: max possible sum using the elements inΒ π‘Ž1,π‘Ž2,…,π‘Žπ‘–a1,a2,…,aiΒ and the resulting array is of even length.
𝑑𝑝[𝑖][1]dp[i][1]: same as above, but now the resulting array is of odd length.
Transitions are: keep it or delete it.
𝑑𝑝[𝑖][π‘Ÿ]=max(𝑑𝑝[π‘–βˆ’1][π‘Ÿ],𝑑𝑝[π‘–βˆ’1][!π‘Ÿ]+π‘Ž[𝑖]βˆ—((π‘Ÿ==0)?1:βˆ’1)dp[i][r]=max(dp[iβˆ’1][r],dp[iβˆ’1][!r]+a[i]βˆ—((r==0)?1:βˆ’1);
Now it's when someone says: Wait minute, you are always adding at even positions and resting at odd positions, what if it's of the other way. Well, for this, perform again the DP but adding at odd positions and resting at even positions. You stay with the maximum of both solutions

Find the largest palindrome which is product of two 5-digit prime numbers, where is the mistake

I try to find the largest palindrome which is product of two 5-digit prime numbers. The program also have to return the multipliers. The problem is that the program works for a very long time and gives the wrong result. Where is the mistake and How I can correct it?
from sympy import sieve
from sympy.utilities.iterables import multiset_combinations
from numpy import prod
def max_palindrome(prime_numbers):
prime_numbers_list_unique_combinations = multiset_combinations(prime_numbers, 2)
list_palindromes = ((prod(i), i) for i in prime_numbers_list_unique_combinations if \
str(prod(i)) == str(prod(i))[::-1])
result = max(list_palindromes)
return "The max palindrome is {0} which is producted of {1} and {2} numbers".format(result[0],\
result[1][0], result[1][1])
from timeit import default_timer
start = default_timer()
print max_palindrome((i for i in sieve.primerange(9999, 99999)))
# My wrong result: The max palindrome is 1997667991 which is producted of 69143 and 91009 numbers
end = default_timer()
print "The time of max_palindrome program's execution is {} sec".format(end - start)
# The time of max_palindrome program's execution is 876.579393732 sec
You need to use generators to chain your operations lazily, otherwise you will be storing huge lists.
First we need to recover the primes.
import math
def find_primes(n):
# Prepare our Sieve, for readability we make index match the number by adding 0 and 1
primes = [False] * 2 + [True] * (n - 1)
# Remove non-primes
for x in range(2, int(math.sqrt(n) + 1)):
if primes[x]:
primes[2*x::x] = [False] * (n // x - 1)
return [x for x, is_prime in enumerate(primes) if is_prime]
Then to keep from memory errors, we make sure to keep everything a generator.
import itertools
def find_palindrome(n):
# Those all output generators instead of lists
primes = find_primes(n)
factors = itertools.combinations_with_replacement(primes, 2)
products = ((p * q, p, q) for p, q in factors)
palindromes = ((pal, p, q) for pal, p, q in products if is_palindrome(pal))
try:
return max(palindromes)
except ValueError:
# No palindrome exists
return None
Profit.
find_palindrome(99999) # (999949999, 30109, 33211)
Running time is around 25 seconds with Intel i5-2500K.
We start from largest palindrome and check if it has 5-digit prime divisors. Next we can take previous palindrome and so on (sorry but example in JavaScript):
const MAX_DIVISOR = 99999;
const MIN_DIVISOR = 10000;
function isPalindrome(number) {
let rebmun = 0;
let remainder = number;
while (remainder !== 0) {
let lastDigit = remainder % 10;
rebmun = rebmun * 10 + lastDigit;
remainder = (remainder - lastDigit) / 10;
}
return number === rebmun;
}
function getPrimeDivisors(oddNumber) {
if (oddNumber % 2 === 0) return [];
for (let i = 3; i <= Math.sqrt(oddNumber); i += 2) {
if (oddNumber % i === 0) {
if (i < MIN_DIVISOR) {
return [];
} else {
return [i, oddNumber / i];
}
}
}
return [];
}
function getMaxPalindrome(fromOdd) {
while (!isPalindrome(fromOdd)) {
fromOdd -= 2;
}
return fromOdd;
}
function getPrevPalindrome(palindrome) {
function digitsCount(number) {
let count = 0;
while (number !== 0) {
count++;
number = ~~(number / 10);
}
return count;
}
function calcPalindrome(num, count) {
let partToMerge = count % 2 ? ~~(num / 10) : num;
while (partToMerge) {
let lastDigit = partToMerge % 10;
num = num * 10 + lastDigit;
partToMerge = (partToMerge - lastDigit) / 10;
}
return num;
}
const palindromeLength = digitsCount(palindrome);
const newFirstPart = ~~(palindrome / Math.pow(10, ~~(palindromeLength / 2))) - 1;
const prevPalindrome = calcPalindrome(newFirstPart, palindromeLength);
return palindromeLength > digitsCount(prevPalindrome) ? (prevPalindrome * 10 + 9) : prevPalindrome;
}
function main() {
let palindrome = getMaxPalindrome(MAX_DIVISOR * MAX_DIVISOR);
let divisors = getPrimeDivisors(palindrome);
while (divisors.length !== 2) {
palindrome = getPrevPalindrome(palindrome);
divisors = getPrimeDivisors(palindrome);
}
return {palindrome, divisors};
}

Spark - Not all data processes in JavaRDD complex object

I have the following code that reads in a text file of 5 rows CSV floats:
0.014, 0.035, 0.030, 0.018, 0.023, 0.027, 0.035, 0.036, -0.009, -0.013, 0.026, 0.042
0.032, 0.055, -0.036, 0.052, 0.047, 0.034, 0.063, 0.048, 0.025, 0.040, 0.036, -0.017
0.054, 0.056, 0.048, -0.007, 0.053, 0.036, 0.017, 0.047, 0.019, 0.017, 0.040, 0.032
0.038, 0.062, -0.037, 0.050, 0.065, -0.043, 0.062, 0.034, 0.035, 0.056, 0.057, 0.025
0.049, 0.067, -0.039, 0.051, 0.049, 0.037, 0.055, 0.025, 0.052, 0.020, 0.045, 0.040
The code loads in the data using Spark's JavaSparkContext textFile().
JavaSparkContext sc = new JavaSparkContext(master, "basicportopt", System.getenv("SPARK_HOME"), System.getenv("JARS"));
JavaRDD<String> lines = sc.textFile(".../src/main/Resources/portfolio.txt");
Next the data is loaded into a JavaRDD type as a List of Lists of type Double:
JavaRDD<List<List<Double>>> inputData = lines.map(new Function<String, List<List<Double>>>() {
#Override
public List<List<Double>> call(String s) {
List<List<Double>> dd = new ArrayList<List<Double>>();
double d = 0;
List<Double> myDoubles = new ArrayList<Double>();
for (String value : s.split(",\\s*")) {
d = Double.parseDouble(value);
myDoubles.add(d);
}
dd.add(myDoubles);
return dd;
}
});
Finally, the idea is that the data will be manipulated to produce some calculations to produce some summary results in the following algorithm:
inputData.foreach(new VoidFunction<List<List<Double>>>() {
public void call(List<List<Double>> col) {
System.out.println("Starting with first row...");
ArrayList l = (ArrayList) col.get(0);
for (List<Double> m : col) {
Double sum1 = 0.0;
for (Double d : m) {
sum1 += d;
}
Double avg1 = sum1 / m.size();
System.out.println("The avg of the row \"m\" being worked with: " + avg1);
System.out.println("Crunch the first fow with the other rows including self.");
for (List<Double> n : col) {
Double sum2 = 0.0;
for (Double d : n) {
sum2 += d;
}
Double avg2 = sum1 / m.size();
System.out.println("The avg of the row \"n\" being worked with: " + avg2);
Double xy = 0.0;
for (int index = 0; index < m.size(); index++) {
xy += m.get(index) * n.get(index);
}
xy -= (avg1 * avg2);
System.out.println("Resulting covariant: " + xy);
}
}
}
});
However I would expect to get 25 results I only get 5 results, because in the line:
for (List<Double> m : col) {...}
I would expect "col" to have 5 elements but stepping through the debugger shows only 1 element.
But using the collect() method:
List<List<List<Double>>> cols = inputData.collect();
shows 5 elements.
Why does the foreach() method not contain the 5 elements?

Generating a random DAG

I am solving a problem on directed acyclic graph.
But I am having trouble testing my code on some directed acyclic graphs. The test graphs should be large, and (obviously) acyclic.
I tried a lot to write code for generating acyclic directed graphs. But I failed every time.
Is there some existing method to generate acyclic directed graphs I could use?
I cooked up a C program that does this. The key is to 'rank' the nodes, and only draw edges from lower ranked nodes to higher ranked ones.
The program I wrote prints in the DOT language.
Here is the code itself, with comments explaining what it means:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define MIN_PER_RANK 1 /* Nodes/Rank: How 'fat' the DAG should be. */
#define MAX_PER_RANK 5
#define MIN_RANKS 3 /* Ranks: How 'tall' the DAG should be. */
#define MAX_RANKS 5
#define PERCENT 30 /* Chance of having an Edge. */
int main (void)
{
int i, j, k,nodes = 0;
srand (time (NULL));
int ranks = MIN_RANKS
+ (rand () % (MAX_RANKS - MIN_RANKS + 1));
printf ("digraph {\n");
for (i = 0; i < ranks; i++)
{
/* New nodes of 'higher' rank than all nodes generated till now. */
int new_nodes = MIN_PER_RANK
+ (rand () % (MAX_PER_RANK - MIN_PER_RANK + 1));
/* Edges from old nodes ('nodes') to new ones ('new_nodes'). */
for (j = 0; j < nodes; j++)
for (k = 0; k < new_nodes; k++)
if ( (rand () % 100) < PERCENT)
printf (" %d -> %d;\n", j, k + nodes); /* An Edge. */
nodes += new_nodes; /* Accumulate into old node set. */
}
printf ("}\n");
return 0;
}
And here is the graph generated from a test run:
The answer to https://mathematica.stackexchange.com/questions/608/how-to-generate-random-directed-acyclic-graphs applies: if you have a adjacency matrix representation of the edges of your graph, then if the matrix is lower triangular, it's a DAG by necessity.
A similar approach would be to take an arbitrary ordering of your nodes, and then consider edges from node x to y only when x < y. That constraint should also get your DAGness by construction. Memory comparison would be one arbitrary way to order your nodes if you're using structs to represent nodes.
Basically, the pseudocode would be something like:
for(i = 0; i < N; i++) {
for (j = i+1; j < N; j++) {
maybePutAnEdgeBetween(i, j);
}
}
where N is the number of nodes in your graph.
The pseudocode suggests that the number of potential DAGs, given N nodes, is
2^(n*(n-1)/2),
since there are
n*(n-1)/2
ordered pairs ("N choose 2"), and we can choose either to have the edge between them or not.
So, to try to put all these reasonable answers together:
(In the following, I used V for the number of vertices in the generated graph, and E for the number of edges, and we assume that E ≀ V(V-1)/2.)
Personally, I think the most useful answer is in a comment, by Flavius, who points at the code at http://condor.depaul.edu/rjohnson/source/graph_ge.c. That code is really simple, and it's conveniently described by a comment, which I reproduce:
To generate a directed acyclic graph, we first
generate a random permutation dag[0],...,dag[v-1].
(v = number of vertices.)
This random permutation serves as a topological
sort of the graph. We then generate random edges of the
form (dag[i],dag[j]) with i < j.
In fact, what the code does is generate the request number of edges by repeatedly doing the following:
generate two numbers in the range [0, V);
reject them if they're equal;
swap them if the first is larger;
reject them if it has generated them before.
The problem with this solution is that as E gets closes to the maximum number of edges V(V-1)/2, then the algorithm becomes slower and slower, because it has to reject more and more edges. A better solution would be to make a vector of all V(V-1)/2 possible edges; randomly shuffle it; and select the first (requested edges) edges in the shuffled list.
The reservoir sampling algorithm lets us do this in space O(E), since we can deduce the endpoints of the kth edge from the value of k. Consequently, we don't actually have to create the source vector. However, it still requires O(V2) time.
Alternatively, one can do a Fisher-Yates shuffle (or Knuth shuffle, if you prefer), stopping after E iterations. In the version of the FY shuffle presented in Wikipedia, this will produce the trailing entries, but the algorithm works just as well backwards:
// At the end of this snippet, a consists of a random sample of the
// integers in the half-open range [0, V(V-1)/2). (They still need to be
// converted to pairs of endpoints).
vector<int> a;
int N = V * (V - 1) / 2;
for (int i = 0; i < N; ++i) a.push_back(i);
for (int i = 0; i < E; ++i) {
int j = i + rand(N - i);
swap(a[i], a[j]);
a.resize(E);
This requires only O(E) time but it requires O(N2) space. In fact, this can be improved to O(E) space with some trickery, but an SO code snippet is too small to contain the result, so I'll provide a simpler one in O(E) space and O(E log E) time. I assume that there is a class DAG with at least:
class DAG {
// Construct an empty DAG with v vertices
explicit DAG(int v);
// Add the directed edge i->j, where 0 <= i, j < v
void add(int i, int j);
};
Now here goes:
// Return a randomly-constructed DAG with V vertices and and E edges.
// It's required that 0 < E < V(V-1)/2.
template<typename PRNG>
DAG RandomDAG(int V, int E, PRNG& prng) {
using dist = std::uniform_int_distribution<int>;
// Make a random sample of size E
std::vector<int> sample;
sample.reserve(E);
int N = V * (V - 1) / 2;
dist d(0, N - E); // uniform_int_distribution is closed range
// Random vector of integers in [0, N-E]
for (int i = 0; i < E; ++i) sample.push_back(dist(prng));
// Sort them, and make them unique
std::sort(sample.begin(), sample.end());
for (int i = 1; i < E; ++i) sample[i] += i;
// Now it's a unique sorted list of integers in [0, N-E+E-1]
// Randomly shuffle the endpoints, so the topological sort
// is different, too.
std::vector<int> endpoints;
endpoints.reserve(V);
for (i = 0; i < V; ++i) endpoints.push_back(i);
std::shuffle(endpoints.begin(), endpoints.end(), prng);
// Finally, create the dag
DAG rv;
for (auto& v : sample) {
int tail = int(0.5 + sqrt((v + 1) * 2));
int head = v - tail * (tail - 1) / 2;
rv.add(head, tail);
}
return rv;
}
You could generate a random directed graph, and then do a depth-first search for cycles. When you find a cycle, break it by deleting an edge.
I think this is worst case O(VE). Each DFS takes O(V), and each one removes at least one edge (so max E)
If you generate the directed graph by uniformly random selecting all V^2 possible edges, and you DFS in random order and delete a random edge - this would give you a uniform distribution (or at least close to it) over all possible dags.
A very simple approach is:
Randomly assign edges by iterating over the indices of a lower diagonal matrix (as suggested by a link above: https://mathematica.stackexchange.com/questions/608/how-to-generate-random-directed-acyclic-graphs)
This will give you a DAG with possibly more than one component. You can use a Disjoint-set data structure to give you the components that can then be merged by creating edges between the components.
Disjoint-sets are described here: https://en.wikipedia.org/wiki/Disjoint-set_data_structure
Edit: I initially found this post while I was working with a scheduling problem named flexible job shop scheduling problem with sequencing flexibility where jobs (the order in which operations are processed) are defined by directed acyclic graphs. The idea was to use an algorithm to generate multiple random directed graphs (jobs) and create instances of the scheduling problem to test my algorithms. The code at the end of this post is a basic version of the one I used to generate the instances. The instance generator can be found here.
I translated to Python and integrated some functionalities to create a transitive set of the random DAG. In this way, the graph generated has the minimum number of edges with the same reachability.
The transitive graph can be visualized at http://dagitty.net/dags.html by pasting the output in Model code (in the right).
Python version of the algorithm
import random
class Graph:
nodes = []
edges = []
removed_edges = []
def remove_edge(self, x, y):
e = (x,y)
try:
self.edges.remove(e)
# print("Removed edge %s" % str(e))
self.removed_edges.append(e)
except:
return
def Nodes(self):
return self.nodes
# Sample data
def __init__(self):
self.nodes = []
self.edges = []
def get_random_dag():
MIN_PER_RANK = 1 # Nodes/Rank: How 'fat' the DAG should be
MAX_PER_RANK = 2
MIN_RANKS = 6 # Ranks: How 'tall' the DAG should be
MAX_RANKS = 10
PERCENT = 0.3 # Chance of having an Edge
nodes = 0
ranks = random.randint(MIN_RANKS, MAX_RANKS)
adjacency = []
for i in range(ranks):
# New nodes of 'higher' rank than all nodes generated till now
new_nodes = random.randint(MIN_PER_RANK, MAX_PER_RANK)
# Edges from old nodes ('nodes') to new ones ('new_nodes')
for j in range(nodes):
for k in range(new_nodes):
if random.random() < PERCENT:
adjacency.append((j, k+nodes))
nodes += new_nodes
# Compute transitive graph
G = Graph()
# Append nodes
for i in range(nodes):
G.nodes.append(i)
# Append adjacencies
for i in range(len(adjacency)):
G.edges.append(adjacency[i])
N = G.Nodes()
for x in N:
for y in N:
for z in N:
if (x, y) != (y, z) and (x, y) != (x, z):
if (x, y) in G.edges and (y, z) in G.edges:
G.remove_edge(x, z)
# Print graph
for i in range(nodes):
print(i)
print()
for value in G.edges:
print(str(value[0]) + ' ' + str(value[1]))
get_random_dag()
Bellow, you may see in the figure the random DAG with many redundant edges generated by the Python code above.
I adapted the code to generate the same graph (same reachability) but with the least possible number of edges. This is also called transitive reduction.
def get_random_dag():
MIN_PER_RANK = 1 # Nodes/Rank: How 'fat' the DAG should be
MAX_PER_RANK = 3
MIN_RANKS = 15 # Ranks: How 'tall' the DAG should be
MAX_RANKS = 20
PERCENT = 0.3 # Chance of having an Edge
nodes = 0
node_counter = 0
ranks = random.randint(MIN_RANKS, MAX_RANKS)
adjacency = []
rank_list = []
for i in range(ranks):
# New nodes of 'higher' rank than all nodes generated till now
new_nodes = random.randint(MIN_PER_RANK, MAX_PER_RANK)
list = []
for j in range(new_nodes):
list.append(node_counter)
node_counter += 1
rank_list.append(list)
print(rank_list)
# Edges from old nodes ('nodes') to new ones ('new_nodes')
if i > 0:
for j in rank_list[i - 1]:
for k in range(new_nodes):
if random.random() < PERCENT:
adjacency.append((j, k+nodes))
nodes += new_nodes
for i in range(nodes):
print(i)
print()
for edge in adjacency:
print(str(edge[0]) + ' ' + str(edge[1]))
print()
print()
Result:
Create a graph with n nodes and an edge between each pair of node n1 and n2 if n1 != n2 and n2 % n1 == 0.
I recently tried re-implementing the accepted answer and found that it is indeterministic. If you don't enforce the min_per_rank parameter, you could end up with a graph with 0 nodes.
To prevent this, I wrapped the for loops in a function and then checked to make sure that, after each rank, that min_per_rank was satisfied. Here's the JavaScript implementation:
https://github.com/karissa/random-dag
And some pseudo-C code that would replace the accepted answer's main loop.
int pushed = 0
int addRank (void)
{
for (j = 0; j < nodes; j++)
for (k = 0; k < new_nodes; k++)
if ( (rand () % 100) < PERCENT)
printf (" %d -> %d;\n", j, k + nodes); /* An Edge. */
if (pushed < min_per_rank) return addRank()
else pushed = 0
return 0
}
Generating a random DAG which might not be connected
Here's an simple algorithm for generating a random DAG that might not be connected.
const randomDAG = (x, n) => {
const length = n * (n - 1) / 2;
const dag = new Array(length);
for (let i = 0; i < length; i++) {
dag[i] = Math.random() < x ? 1 : 0;
}
return dag;
};
const dagIndex = (n, i, j) => n * i + j - (i + 1) * (i + 2) / 2;
const dagToDot = (n, dag) => {
let dot = "digraph {\n";
for (let i = 0; i < n; i++) {
dot += ` ${i};\n`;
for (let j = i + 1; j < n; j++) {
const k = dagIndex(n, i, j);
if (dag[k]) dot += ` ${i} -> ${j};\n`;
}
}
return dot + "}";
};
const randomDot = (x, n) => dagToDot(n, randomDAG(x, n));
new Viz().renderSVGElement(randomDot(0.3, 10)).then(svg => {
document.body.appendChild(svg);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/viz.js/2.1.2/viz.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/viz.js/2.1.2/full.render.js"></script>
If you run this code snippet a couple of times, you might see a DAG which is not connected.
So, how does this code work?
A directed acyclic graph (DAG) is just a topologically sorted undirected graph. An undirected graph of n vertices can have a maximum of n * (n - 1) / 2 edges, not counting repeated edges or edges from a vertex to itself. Now, you can only have an edge from a lower vertex to a higher vertex. Hence, the direction of all the edges are predetermined.
This means that you can represent the entire DAG using a one dimensional array of n * (n - 1) / 2 edge weights. An edge weight of 0 means that the edge is absent. Hence, we just create a random array of zeros or ones, and that's our random DAG.
An edge from vertex i to vertex j in a DAG of n vertices, where i < j, has an edge weight at index k where k = n * i + j - (i + 1) * (i + 2) / 2.
Generating a connected DAG
Once you generate a random DAG, you can check if it's connected using the following function.
const isConnected = (n, dag) => {
const reached = new Array(n).fill(false);
reached[0] = true;
const queue = [0];
while (queue.length > 0) {
const x = queue.shift();
for (let i = 0; i < n; i++) {
if (i === n || reached[i]) continue;
const j = i < x ? dagIndex(n, i, x) : dagIndex(n, x, i);
if (dag[j] === 0) continue;
reached[i] = true;
queue.push(i);
}
}
return reached.every(x => x); // return true if every vertex was reached
};
If it's not connected then its complement will always be connected.
const complement = dag => dag.map(x => x ? 0 : 1);
const randomConnectedDAG = (x, n) => {
const dag = randomDAG(x, n);
return isConnected(n, dag) ? dag : complement(dag);
};
Note that if we create a random DAG with 30% edges then its complement will have 70% edges. Hence, the only safe value for x is 50%. However, if you care about connectivity more than the percentage of edges then this shouldn't be a deal breaker.
Finally, putting it all together.
const randomDAG = (x, n) => {
const length = n * (n - 1) / 2;
const dag = new Array(length);
for (let i = 0; i < length; i++) {
dag[i] = Math.random() < x ? 1 : 0;
}
return dag;
};
const dagIndex = (n, i, j) => n * i + j - (i + 1) * (i + 2) / 2;
const isConnected = (n, dag) => {
const reached = new Array(n).fill(false);
reached[0] = true;
const queue = [0];
while (queue.length > 0) {
const x = queue.shift();
for (let i = 0; i < n; i++) {
if (i === n || reached[i]) continue;
const j = i < x ? dagIndex(n, i, x) : dagIndex(n, x, i);
if (dag[j] === 0) continue;
reached[i] = true;
queue.push(i);
}
}
return reached.every(x => x); // return true if every vertex was reached
};
const complement = dag => dag.map(x => x ? 0 : 1);
const randomConnectedDAG = (x, n) => {
const dag = randomDAG(x, n);
return isConnected(n, dag) ? dag : complement(dag);
};
const dagToDot = (n, dag) => {
let dot = "digraph {\n";
for (let i = 0; i < n; i++) {
dot += ` ${i};\n`;
for (let j = i + 1; j < n; j++) {
const k = dagIndex(n, i, j);
if (dag[k]) dot += ` ${i} -> ${j};\n`;
}
}
return dot + "}";
};
const randomConnectedDot = (x, n) => dagToDot(n, randomConnectedDAG(x, n));
new Viz().renderSVGElement(randomConnectedDot(0.3, 10)).then(svg => {
document.body.appendChild(svg);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/viz.js/2.1.2/viz.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/viz.js/2.1.2/full.render.js"></script>
If you run this code snippet a couple of times, you may see a DAG with a lot more edges than others.
Generating a connected DAG with a certain percentage of edges
If you care about both connectivity and having a certain percentage of edges then you can use the following algorithm.
Start with a fully connected graph.
Randomly remove edges.
After removing an edge, check if the graph is still connected.
If it's no longer connected then add that edge back.
It should be noted that this algorithm is not as efficient as the previous method.
const randomDAG = (x, n) => {
const length = n * (n - 1) / 2;
const dag = new Array(length).fill(1);
for (let i = 0; i < length; i++) {
if (Math.random() < x) continue;
dag[i] = 0;
if (!isConnected(n, dag)) dag[i] = 1;
}
return dag;
};
const dagIndex = (n, i, j) => n * i + j - (i + 1) * (i + 2) / 2;
const isConnected = (n, dag) => {
const reached = new Array(n).fill(false);
reached[0] = true;
const queue = [0];
while (queue.length > 0) {
const x = queue.shift();
for (let i = 0; i < n; i++) {
if (i === n || reached[i]) continue;
const j = i < x ? dagIndex(n, i, x) : dagIndex(n, x, i);
if (dag[j] === 0) continue;
reached[i] = true;
queue.push(i);
}
}
return reached.every(x => x); // return true if every vertex was reached
};
const dagToDot = (n, dag) => {
let dot = "digraph {\n";
for (let i = 0; i < n; i++) {
dot += ` ${i};\n`;
for (let j = i + 1; j < n; j++) {
const k = dagIndex(n, i, j);
if (dag[k]) dot += ` ${i} -> ${j};\n`;
}
}
return dot + "}";
};
const randomDot = (x, n) => dagToDot(n, randomDAG(x, n));
new Viz().renderSVGElement(randomDot(0.3, 10)).then(svg => {
document.body.appendChild(svg);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/viz.js/2.1.2/viz.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/viz.js/2.1.2/full.render.js"></script>
Hope that helps.
To test algorithms I generated random graphs based on node layers. This is the Python script (also print the adjacency list). You can change the nodes connection probability percentages or add layers to have a slightly different or "taller" graphs:
# Weighted DAG generator by forward layers
import argparse
import random
parser = argparse.ArgumentParser("dag_gen2")
parser.add_argument(
"--layers",
help="DAG forward layers. Default=5",
type=int,
default=5,
)
args = parser.parse_args()
layers = [[] for _ in range(args.layers)]
edges = {}
node_index = -1
print(f"Creating {len(layers)} layers graph")
# Random horizontal connections -low probability-
def random_horizontal(layer):
for node1 in layer:
# Avoid cycles
for node2 in filter(
lambda n2: node1 != n2 and node1 not in map(lambda el: el[0], edges[n2]),
layer,
):
if random.randint(0, 100) < 10:
w = random.randint(1, 10)
edges[node1].append((node2, w))
# Connect two layers
def connect(layer1, layer2):
random_horizontal(layer1)
for node1 in layer1:
for node2 in layer2:
if random.randint(0, 100) < 30:
w = random.randint(1, 10)
edges[node1].append((node2, w))
# Start nodes 1 to 3
start_nodes = random.randint(1, 3)
start_layer = []
for sn in range(start_nodes + 1):
node_index += 1
start_layer.append(node_index)
# Gen nodes
for layer in layers:
nodes = random.randint(2, 5)
for n in range(nodes):
node_index += 1
layer.append(node_index)
# Connect all
layers.insert(0, start_layer)
for layer in layers:
for node in layer:
edges[node] = []
for i, layer in enumerate(layers[:-1]):
connect(layer, layers[i + 1])
# Print in DOT language
print("digraph {")
for node_key in [node_key for node_key in edges.keys() if len(edges[node_key]) > 0]:
for node_dst, weight in edges[node_key]:
print(f" {node_key} -> {node_dst} [label={weight}];")
print("}")
print("---- Adjacency list ----")
print(edges)

Why are F# list ranges so much slower than for loops?

I'm surprised how much slower the List range is for the example below. On my machine the for loop is a factor of 8 or so quicker.
Is an actual list of 10,000,000 elements created first? And if so, is there a reason (other than it has not been done yet) why this can't be optimised away by the compiler?
open System
open System.Diagnostics
let timeFunction f v =
let sw = Stopwatch.StartNew()
let result = f v
sw.ElapsedMilliseconds
let length = 10000000
let doSomething n =
(float n) ** 0.1 |> ignore
let listIter n =
[1..length] |> List.iter (fun x -> doSomething (x+n))
let forLoop n =
for x = 1 to length do
doSomething (x+n)
printf "listIter : %d\n" (timeFunction listIter 1) // c50
GC.Collect()
printf "forLoop : %d\n" (timeFunction forLoop 1) // c1000
GC.Collect()
Using ILSpy, listIter looks like this:
public static void listIter(int n)
{
ListModule.Iterate<int>(
new listIter#17(n),
SeqModule.ToList<int>(
Operators.CreateSequence<int>(
Operators.OperatorIntrinsics.RangeInt32(1, 1, 10000000)
)
)
);
}
Here are the basic steps involved:
RangeInt32 creates an IEnumerable (which is inexplicably wrapped by CreateSequence)
SeqModule.ToList builds a list from that sequence
An instance of listIter#17 (your lambda) is new'd up
ListModule.Iterate traverses the list calling the lambda for each element
vs forLoop, which doesn't look much different from what you've written:
public static void forLoop(int n)
{
for (int x = 1; x < 10000001; x++)
{
int num = x + n;
double num2 = Math.Pow((double)num, 0.1);
}
}
...no IEnumerable, lambda (it's automatically inlined), or list creation. There's a potentially significant difference in the amount of work being done.
EDIT
For curiosity's sake, here are FSI timings for list, seq, and for loop versions:
listIter - Real: 00:00:03.889, CPU: 00:00:04.680, GC gen0: 57, gen1: 51, gen2: 6
seqIter - Real: 00:00:01.340, CPU: 00:00:01.341, GC gen0: 0, gen1: 0, gen2: 0
forLoop - Real: 00:00:00.565, CPU: 00:00:00.561, GC gen0: 0, gen1: 0, gen2: 0
and the seq version for reference:
let seqIter n =
{1..length} |> Seq.iter (fun x -> doSomething (x+n))
Using {1..length} |> Seq.iter
is certainly faster as you don't create the full list in memory.
Another slightly faster way than your for loop is:
let reclist n =
let rec downrec x n =
match x with
| 0 -> ()
| x -> doSomething (x+n); downrec (x-1) n
downrec length n
Interesting is that the code for the recursive function boils down to:
while (true)
{
switch (x)
{
case 0:
return;
default:
{
int num = x + n;
double num2 = Math.Pow((double)num, 0.1);
int arg_26_0 = x - 1;
n = n;
x = arg_26_0;
break;
}
}
}
Even when using optimization, there are still a few lines that could have been removed, i.e to this:
while (true)
{
switch (x)
{
case 0:
return;
default:
{
int num = x + n;
double num2 = Math.Pow((double)num, 0.1);
x = x - 1;
break;
}
}
}