X10 Parallel processing shared variable

X10 Parallel processing shared variable - concurrency

Please forgive me if my question is not professional. I am reading tutorials of IBM's x10. Here's the code that computes PI but confuses me:
public static def countPoints(n: Int, rand: ()=>Double) {
var inCircle: Double = 0.0;
for (var j:Long = 1; j<=n; j++) {
val x = rand();
val y = rand();
if (x*x +y*y <= 1.0) inCircle++;
}
return inCircle;
}
val N = args.size() > 0 ? Long.parse(args(0)) : 100000;
val THREADS = args.size() > 1 ? Int.parse(args(1)) : 4;
val nPerThread = N/THREADS;
val inCircle = new Array[Long](1..THREADS);
finish for(var k: Int =1; k<=THREADS; k++) {
val r = new Random(k*k + k + 1);
val rand = () => r.nextDouble();
val kk = k;
async inCircle(kk) = countPoints(nPerThread,rand);
}
var totalInCircle: Long = 0;
for(var k: Int =1; k<=THREADS; k++) {
totalInCircle += inCircle(k);
}
val pi = (4.0*totalInCircle)/N;
The program itself is not hard, my question is, since in each countPoints() call it repeatedly calling the argument rand, and before spawn multi-threads, only one rand is created, will different threads share the same rand and incur race condition? If not, why?

Good that you worry about a possible race condition here. It is often overlooked in parallel invocation of random number generators.
Luckily this example is free of a RNG race condition. Each iteration of the k for-loop creates a new instance of a random number generator (and seeds it) and spawns one thread. Since countPoints calls its own RNG there is no race condition here.

Related

Minimum pluses required to make the equation (x = y) correct

Problem Statement:
Given an equation “x=y”, for example, “111=12”, you need to add pluses
inside x to make the equation correct. In our example “111=12”, we can
add one plus “11+1=12” and the equation becomes correct. You need to
find the minimum number of pluses to add to x to make the equation
correct. If there is no answer print -1.
Note that the value of y won’t exceed 5000. The numbers in the
corrected equation may contain arbitrary amounts of leading zeros.
Input Format The first line contains a string, A as described in the
problem statement.
Constraints 1 <= len(A) <= 10^3
I tried the recursive approach. Which is for every character in the 'x', I have two options I can include the plus sign next to the current digit or move to the next digit (by including the current digit with the next digit) I checked all the combinations and found the minimum pluses. As you know, this is exponential in complexity. I'm not able to apply dynamic programming for this problem as I can't think of the states for the dynamic programming.
I know this problem can be solved by dynamic programming. But, I don't know how to identify the state and the transition.

The first thing that comes to mind is to have a table
int f[N+1][M+1];
where N = len(x) and M = y. Then f[i][j] would record the solution to the sub-problem substr(x,0,i)=j; i.e. how many pluses are needed to get the sum j from the first i digits of x. The table can be incrementally updated through the recurrence relation:
f[i][j] = minimum over 0 <= k < i of (f[k][j - atoi(substr(x,k,i))] + 1)
Configurations that aren't obtainable or out-of-bounds should be understood as having f[i][j] == +infinity rather than -1.
The size of the table will be O(N*M) and the running time is O(N² M).
I'll leave the implementation details and the starting condition for you to complete.

Backtracking along with DP (Memoization) helped me to pass all the cases
here is my code. It passed all the cases in the given time limit
all_ans = {}
def min_pulses(A, target):
if (A, target) in all_ans:
return all_ans[(A, target)]
if len(A) == 0:
if target != 0:
return -1
else:
return 0
while len(A) > 0 and A[0] == '0':
A = A[1:]
if len(A) == 0 and target == 0:
return 1
if target < 0:
return -1
i = 1
ans = float('inf')# initializing ans to infinite number so that min can be update
while i <= 5 and i <=len(A):
curr_num = A[:i]
curr_ans = min_pulses(A[i:], target - int(curr_num))
if curr_ans >= 0:
ans = min(1 + curr_ans, ans)
i += 1
if ans == float('inf'):
ans = -1
all_ans[(A,target)] = ans
return ans
equation = input().split('=')
A = equation[0]
target = int(equation[1])
groups = min_pulses(A, target)
if groups < 0:
print(-1)
else:
print(groups - 1)

//import java.io.*;
import java.util.*;
//import java.lang.Math;
class NewClass15{
public static int minimum_pluses(String S)
{
StringBuilder s = new StringBuilder();
int target=0;
for(int i=0;i<S.length();i++) //distinguishing left and right strings respectively
{
if(S.charAt(i)=='=')
{
target=Integer.parseInt(S.substring(i+1,S.length()));
break;
}
s.append(S.charAt(i));
}
dp = new int[1000][5001][6];
int temp = dfs(s.toString(),0,0,target);
if(temp>=max)
return -1;
else
return temp;
}
static int dp[][][];
private static int dfs(String s,int len,int ind,int target)
{
if(target<0||len>5) return max;
if(ind==s.length())
{
int x=0;
if(len!=0)
x=Integer.parseInt(s.substring(ind-len,ind));
target-=x;
if(target==0) return 0;
return max;
}
if(dp[ind][target][len]!=0)
{
System.out.println("1 dfs("+ind+","+target+","+len+")");
return dp[ind][target][len]-1;
}
//add
long ans=max;
if(s.charAt(ind)=='0' && len==0)
{
System.out.println("2 dfs("+0+","+(ind+1)+","+target+")");
ans=Math.min(ans,dfs(s,0,ind+1,target));
return (int)(ans);
}
System.out.println("3 dfs("+(len+1)+","+(ind+1)+","+target+")");
ans=Math.min(ans,dfs(s,len+1,ind+1,target));
//add +
if(len!=0)
{
int x=Integer.parseInt(s.substring(ind-len,ind));
int j=ind;
while(j<s.length() && s.charAt(j)=='0') j++;
if(j!=s.length()) j=j+1;
System.out.println("4 dfs("+(1)+","+(j)+","+(target-x)+")");
ans=Math.min(ans,1+dfs(s,1,j,target-x));
}
System.out.println("final dfs("+ind+","+target+","+len+")");
dp[ind][target][len]=(int)(ans+1);
return (int)(ans);
}
static int max=10000;
public static void main(String[] args){
Scanner scan = new Scanner(System.in);
String A;
A=scan.next();
int result;
result = minimum_pluses(A);
System.out.print(result);
}
}
This is the answer in java if it is of some help. I have not written the code though.
Can you provide me with some testcases for the given Minimum Pluses Question?
Thank you.

"""2nd one Answer"""
def permute(s):
result = [[s]]
for i in range(1, len(s)):
first = [s[:i]]
rest = s[i:]
for p in permute(rest):
result.append(first + p)
return [[int(j) for j in i] for i in result]
def problem(s):
x,y=s.split("=")
data=permute(x)
newdata=[]
for i in range(1,len(x)+1,1):
for j in data:
if i==len(j):
newdata.append(j)
for i in newdata:
if sum(i)==int(y):
print("str 1",i)
return
print("str -1")
def check_constraint(s):
if (not (1<=len(s)<=10^3)):
print(-1)
elif (s.split("=")[0]==s.split("=")[1]):
print(1)
elif (not (len(s.split("=")[0])>=len(s.split("=")[1]))):
print(-1)
else:
problem(s)
A=input()
check_constraint(A)

How is myArray.size() implemented in Salesforce Apex? Does the method retrieve a value stored on the List.class object, or calculated at time of call?

Often when I'm writing a loop in Apex, i wonder if it's inefficient to call myArray.size() inside of a loop. My question is really: should I use myArray.size() inside the loop, or store myArray.Size() in an Integer variable before starting the loop and reuse that instead (assuming the size of myArray remains constant).
Under the hood, the method either counts each element of an array by iterating through it and incrementing a size variable which it then returns, or the List/Set/Map etc class stores the size variable inside itself and updates the value any time the array is changed. Different languages handle this in different ways, so how does Apex work?
I went searching but couldn't find an answer. The answer could change the way I code, since calling myArray.size() could exponentially increase the number of operations performed when called inside a loop.
I tried running a benchmark of the two scenarios as laid out below and found that myList.size() does take longer, but didn't really answer my question. Especially since the code was too simple to really make much of a difference.
In both samples, the list has ten thousand elements, and I only started timing once the list is fully created. The loop itself doesn't do anything interesting. It just counts up. That was the least resource heavy thing I could think of.
Sample 1 - store size of list in an integer before entering loop:
List<Account> myArray = [SELECT Id FROM Account LIMIT 10000];
Long startTime = System.now().getTime();
Integer size = myArray.size();
Integer count = 0;
for (Integer i = 0; i < 100000; i++) {
if (count < size) {
count++;
}
}
Long finishTime = System.now().getTime();
Long benchmark = finishTime - startTime;
System.debug('benchmark: ' + benchmark);
Results after running 5 times: 497, 474, 561, 445, 474
Sample 2 - use myArray.size() inside loop:
List<Account> myArray = [SELECT Id FROM Account LIMIT 10000];
Long startTime = System.now().getTime();
Integer count = 0;
for (Integer i = 0; i < 100000; i++) {
if (count < myArray.size()) {
count++;
}
}
Long finishTime = System.now().getTime();
Long benchmark = finishTime - startTime;
System.debug('benchmark: ' + benchmark);
Results after running 5 times: 582, 590, 667, 742, 730
Sample 3 - just for good measure (control), here's the loop without the if condition:
Long startTime = System.now().getTime();
Long count = 0;
for (Integer i = 0; i < 100000; i++) {
count++;
}
Long finishTime = System.now().getTime();
Long benchmark = finishTime - startTime;
System.debug('benchmark: ' + benchmark);
Results after running 5 times: 349, 348, 486, 475, 531

converting c for loop to fortran do loop

I am converting a cpp prog (from another author) to a Fortran prog, my C is not too strong. I came across for-loop constructs starting with
for (int n = 1; 1; ++n) {
...
I would have expected this to convert to a Fortran Do as per
Do n=1, 1, 2
...
... at least that is my guess based on my understanding of what ++n will do.
Is my translation correct? If so, the loop will cycle at most once, so what am I missing ???
I understand that in some ways c for-loops have a "do-while" aspect, and hence wrinkles porting to Fortran Do's.
Anyway ... a clarification would be much appreciated.
EDITED: after some prompt responses, and I think I see where this is going
First, the exact C code copy/paste but "trimming" a little, is
for (int n = 1; 1; ++n) {
const double coef = exp(-a2*(n*n)) * expx2 / (a2*(n*n) + y*y);
prod2ax *= exp2ax;
prodm2ax *= expm2ax;
sum1 += coef;
sum2 += coef * prodm2ax;
sum4 += (coef * prodm2ax) * (a*n);
sum3 += coef * prod2ax;
sum5 += (coef * prod2ax) * (a*n);
// test convergence via sum5, since this sum has the slowest decay
if ((coef * prod2ax) * (a*n) < relerr * sum5) break;
}
So yes, there is a "break" in the loop, which on the Fortran side is replaced with an "Exit".
I think the key seems to be from the answers below that the original code's author created the
for (int n=1; 1 ; ++n )
precisely to create a an infinite loop, and I had not guessed that this for construct would create an infinite loop.
Anyway, I can certainly create an infinite loop with an "Exit" in Fortran (though I expect I might "do" it a bit more judiciously)
Many thanks to all.
It seems the Mr Gregory's response was the one that imediately lead to a solution for me, so I will mark his correct. As for the Fortran side, there are a number of alternatives such as:
Do While
:
If( something ) Exit
End Do
but being old fashioned I would probably use a construct with a "limit" such as
Do i=1, MaxIter
:
If( something ) Exit
End Do
For slightly fancier applications I might include a return flag in case it did not converge in MaxIter's etc.

It's difficult to be definitive without seeing how the C++ program breaks out of that loop, but a straightforward Fortran equivalent would be
n = 1
do
! code, including an exit under some condition, presumably on the value of n
n = n+1
end do
If the loop is terminated when n reaches a critical value then the equivalent might be
do n = 1, critical_value ! no need to indicate step size if it is 1
! code
end do

Are you sure you wrote the C code correctly? Typically loops in C/C++ are done like this:
for (int n = 1; n < 10; ++n) {
// ...
}
Note the "n < 10" test condition. Your code's test condition is simply 1, which will always evaluate to Boolean "true". This means the code will loop infinitely, unless there's a break inside the loop, which you haven't shown.
++n means "increment n".
So if the code you've shown is indeed correct, the FORTRAN equivalent would be:
n = 1
do
[Body of the loop, which you haven't shown]
n = n + 1
enddo

Here's what
for (int n = 1; 1; ++n)
does:
It sets n to 1, then loops infinitely, incrementing n by 1 at the end of each loop iteration. The loop will never terminate unless something inside the loop breaks out.
It's been a long time since I wrote Fortran but as I recall the do loop you translated it to is not correct.

I don't think you can translate
for (int n = 1; 1; ++n)
to a FORTRAN DO loop. From what I recall, the notion of the generic conditional in C/C++ cannot be emulated in a FORTRAN DO loop.
The equivalent of
Do n=1, 1, 2
in C/C++ is
for ( int n = 1; n <= 1; n += 2 )

A few notes in addition to CareyGregory’s answer.
++n means ‘increment n by one (before n is evaluated)’
In C and C++, a for loop has three clauses, much like in FORTRAN:
for (init; condition; increment)
The difference is that each of the clauses must be a complete expression, whereas in FORTRAN the clauses are just values. It is just a ‘short’ way of writing an equivalent while loop:
int n = 1; │ for (int n = 1; 1; ++n) │ n = 1
while (1) │ { │ do
{ │ ... │ ...
... │ } │ n = n + 1
++n; │ │ enddo
} │ │

Discrete-event Simulation Algorithm 1.2.1 in C++

I'm currently trying to work and extend on the Algorithm given in "Discrete-event Simulation" text book pg 15. My C++ knowledge is limited, It's not homework problem just want to understand how to approach this problem in C++ & understand what going.
I want to be able to compute 12 delays in a single-server FIFO service node.
Algorithm in the book is as follow:
Co = 0.0; //assumes that a0=0.0
i = 0;
while (more jobs to process) {
i++;
a_i = GetArrival ();
if (a_i < c_i - 1)
d_i = c_i - 1 - a_i; //calculate delay for job i
else
d_i = 0.0; // job i has no delay
s_i = GetService ();
c_i = a_i + d_i + s_i; // calculate departure time for job i
}
n = i;
return d_1, d_2,..., d_n
The GetArrival and GetService procedures read the next arrival and service time from a file.

Just looking at the pseudo-code, it seems that you just need one a which is a at step i, one c which is c at step i-1, and an array of ds to store the delays. I'm assuming the first line in your pseudo-code is c_0 = 0 and not Co = 0, other wise the code doesn't make a lot of sense.
Now here is a C++-ized version of the pseudo-code:
std::vector<int> d;
int c = 0;
int a, s;
while(!arrivalFile.eof() && !serviceFile.eof())
{
arrivalFile >> a;
int delay = 0;
if (a < c)
delay = c - a;
d.push_back(delay);
serviceFile >> s;
c = a + delay + s;
}
return d;

If I understand the code right, d_1, d_2, ..., d_n are the delays you have, number of delays depends on number of jobs to process. while (more jobs to process)
thus if you have 12 processes you will have 12 delays.
In general if arrival time is less than previous departure time then the delay is the previous departure time - current arrival time
if (a_i < c_i-1)
d_i = c_i-1 - a_i;
the first departure time is set to zero
if something is not clear let me know

Fast bitarray in OCaml

Yet another synthetic benchmark: Sieve of Eratosthenes
C++
#include <vector>
#include <cmath>
void find_primes(int n, std::vector<int>& out)
{
std::vector<bool> is_prime(n + 1, true);
int last = sqrt(n);
for (int i = 2; i <= last; ++i)
{
if (is_prime[i])
{
for (int j = i * i; j <= n; j += i)
{
is_prime[j] = false;
}
}
}
for (unsigned i = 2; i < is_prime.size(); ++i)
{
if (is_prime[i])
{
out.push_back(i);
}
}
}
OCaml (using Jane Street's Core and Res libraries)
open Core.Std
module Bits = Res.Bits
module Vect = Res.Array
let find_primes n =
let is_prime = Bits.make (n + 1) true in
let last = float n |! sqrt |! Float.iround_exn ~dir:`Zero in
for i = 2 to last do
if not (Bits.get is_prime i) then () else begin
let j = ref (i * i) in
while !j <= n; do
Bits.set is_prime !j false;
j := !j + i;
done;
end;
done;
let ar = Vect.empty () in
for i = 2 to n do
if Bits.get is_prime i then Vect.add_one ar i else ()
done;
ar
I was surprised that OCaml version (native) is about 13 times slower than C++. I replaced Res.Bits with Core_extended.Bitarray, but it became ~18 times slower. Why it is so slow? Doesn't OCaml provide fast operations for bit manipulation? Is there any alternative fast implementation of bit arrays?
To be clear: I'm from C++ world and consider OCaml as a possible alternative for writing performance critical code. Actually, I'm a bit scary with such results.
EDIT:
Profiling results
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
50.81 1.26 1.26 camlRes__pos_1113
9.72 1.50 0.24 camlRes__unsafe_get_1117
6.68 1.66 0.17 camlRes__unsafe_set_1122
6.28 1.82 0.16 camlNopres_impl__set_1054
6.07 1.97 0.15 camlNopres_impl__get_1051
5.47 2.10 0.14 47786824 0.00 0.00 caml_apply3
3.64 2.19 0.09 22106943 0.00 0.00 caml_apply2
2.43 2.25 0.06 817003 0.00 0.00 caml_oldify_one
2.02 2.30 0.05 1 50.00 265.14 camlPrimes__find_primes_64139
1.21 2.33 0.03 camlRes__unsafe_get_1041
...

Did you try using simple datastructure first before jumping on the sophisticated ones?
On my machine, the following code is only 4x slower than you C++ version (note that I made the minimal changes to use an Array as the cache, and a list to accumulate results; you could use the array get/set syntactic sugar):
let find_primes n =
let is_prime = Array.make (n + 1) true in
let last = int_of_float (sqrt (float n)) in
for i = 2 to last do
if not (Array.get is_prime i) then () else begin
let j = ref (i * i) in
while !j <= n; do
Array.set is_prime !j false;
j := !j + i;
done;
end;
done;
let ar = ref [] in
for i = 2 to n do
if Array.get is_prime i then ar := i :: !ar else ()
done;
ar
(4x slower: it takes 4s to compute the 10_000_000 first primes, vs. 1s
for g++ -O1 or -O2 on your code)
Realizing that the efficiency of your bitvector solution probably
comes from the economic memory layout, I changed the code to use
strings instead of arrays:
let find_primes n =
let is_prime = String.make (n + 1) '0' in
let last = int_of_float (sqrt (float n)) in
for i = 2 to last do
if not (String.get is_prime i = '0') then () else begin
let j = ref (i * i) in
while !j <= n; do
String.set is_prime !j '1';
j := !j + i;
done;
end;
done;
let ar = ref [] in
for i = 2 to n do
if String.get is_prime i = '0' then ar := i :: !ar else ()
done;
ar
This now takes only 2s, which makes it 2x slower than your C++
solution.

It seems Jeffrey Scofield is right. Such terrible performance degradation is due to div and mod operations.
I prototyped small Bitarray module
module Bitarray = struct
type t = { len : int; buf : string }
let create len x =
let init = (if x = true then '\255' else '\000') in
let buf = String.make (len / 8 + 1) init in
{ len = len; buf = buf }
let get t i =
let ch = int_of_char (t.buf.[i lsr 3]) in
let mask = 1 lsl (i land 7) in
(ch land mask) <> 0
let set t i b =
let index = i lsr 3 in
let ch = int_of_char (t.buf.[index]) in
let mask = 1 lsl (i land 7) in
let new_ch = if b then (ch lor mask) else (ch land lnot mask) in
t.buf.[index] <- char_of_int new_ch
end
It uses string as byte array (8 bits per char). Initially I used x / 8 and x mod 8 for bit extraction. It was 10x slower than C++ code. Then I replaced them with x lsr 3 and x land 7. Now, it is only 4x slower than C++.

It's not often useful to compare micro-benchmarks like this, but the basic conclusion is probably correct. This is a case where OCaml is at a distinct disadvantage. C++ can access a more or less ideal representation (vector of machine integers). OCaml can make a vector, but can't get at the machine integers directly. So OCaml has to use div and mod where C++ can use shift and mask.
I reproduced this test (using a different bit vector library) and found that appreciable time in OCaml was spent constructing the result, which isn't a bit array. So the test might not be measuring exactly what you think.
Update
I tried some quick tests packing 32 booleans into a 63-bit int. It does seem to make things go faster, but only a little bit. It's not a perfect test, but it suggests gasche is right that the non-power-of-2 effect is minor.

Please make sure that you install Core including the .cmx file (.cmxa is not enough!), otherwise cross-module inlining will not work. Your profile suggests that certain calls may not have been inlined, which would explain a dramatic loss of efficiency.
Sadly, the Oasis packaging tool, which a lot of OCaml projects use, currently has a bug that prevents it from installing the .cmx file. The Core package is also affected by this problem, probably irrespective of which package manager (Opam, Godi) you use.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

X10 Parallel processing shared variable - concurrency

Related

Minimum pluses required to make the equation (x = y) correct

How is myArray.size() implemented in Salesforce Apex? Does the method retrieve a value stored on the List.class object, or calculated at time of call?

converting c for loop to fortran do loop

Discrete-event Simulation Algorithm 1.2.1 in C++

Fast bitarray in OCaml

Categories

Resources