I have three Do loops over three parameters and I want to use openmp to speed up the calculation to find the maximum value of F as a function of A and eventually plot F_max vs A. Here is my attempt that only uses multiple cores at the start but then falls off to one. If I leave out the critical statement it doesn't produce the correct max value. How can I get this to work properly? Any help would be greatly appreciated.
PROGRAM maxfunction
Do i=0, n1
A= i*1.0
Max=0
Do j=0, n2
B=j*1.0
Do k=0,n3
C=k*1.0
F=findF(A,B,C)
If( F > Max) Then
Max= F
endif
Enddo
Enddo
Enddo
ENDPROGRAM
FUNCTION findF(a,b,c)
findF= g1(a,b,c) + g2(a,b,c) +g3(a,b,c)
!returns value of findF
ENDFUNCTION findF
function g1(a,b,c)
!do stuff with a,b,c
!returns value g1
endfunction g1
function g2(a,b,c)
!do stuff with a,b,c
!returns value g2
endfunction g2
function g3(a,b,c)
!do stuff with a,b,c
!returns value g3
endfunction g3
The code has this minimal structure.
You can try adapting the following code. Hope it helps. The key feature is OMP reduction don't hesitate to read on that topic. https://computing.llnl.gov/tutorials/openMP/
PROGRAM maxfunction
double precision Maximum
n1 = 10
n2 = 20
n3 = 30
Maximum = -1000000.0
c$OMP PARALLEL DO DEFAULT(SHARED)
c$OMP+ SHARED(n1, n2, n3)
c$OMP+ PRIVATE(i,j, A, B, C, F)
c$OMP+ reduction(max: Maximum)
Do i=0, n1
A= i*1.0
Do j=0, n2
B=j*1.0
Do k=0,n3
C=k*1.0
F=findF(A,B,C)
If( F > Maximum) Then
Maximum= F
endif
Enddo
Enddo
Enddo
c$OMP END PARALLEL DO
print *, Maximum
ENDPROGRAM
with
function g1(a,b,c)
g1 = a
endfunction g1
function g2(a,b,c)
g2 = b
endfunction g2
function g3(a,b,c)
g3 = -c
endfunction g3
This code returns 30.0 whatever the number of threads
Related
I would like to ask whether openMP is capable of parallelizing fortran arrays with the same shape and size using simple notation. I did some research but I am not capable to find or figure out whether it is possible.
I refer as simple notation the following form:
a = b + c * 1.1
Find below a full example:
PROGRAM Parallel_Hello_World
USE OMP_LIB
implicit none
integer, parameter :: ILEN = 1000
integer :: a(ILEN,ILEN), b(ILEN,ILEN), c(ILEN,ILEN), d(ILEN,ILEN)
integer :: i, j
a = 1
b = 2
!$OMP PARALLEL SHARED(a, b, c, d)
!$OMP DO
DO i=1,ILEN
DO j=1, ILEN
c(j,i) = a(j,i) + b(j,i) * 1.1
ENDDO
END DO
!$OMP END DO
# is this loop parallel?
d = a + b * 1.1
!$OMP END PARALLEL
write (*,*) "Total C: ", c(1:5, 1)
write (*,*) "Total D: ", d(1:5, 1)
write (*,*) "C same D? ", all(c == d)
END
Is the d loop parallelized with openMP with the current notation?
As commented by #Gilles the answer to the question is to wrap it with the workshare clause:
!$OMP WORKSHARE
d = a + b * 1.1
!$OMP END WORKSHARE
Find more info here
I want to solve the Random Walk problem, so i wrote a fortran sequental code and now i need to parallel this code.
subroutine random_walk(walkers)
implicit none
include "omp_lib.h"
integer :: i, j, col, row, walkers,m,n,iter
real, dimension(:, :), allocatable :: matrix, res
real :: point, z
col = 12
row = 12
allocate (matrix(row, col), res(row, col))
! Read from file
open(2, file='matrix.txt')
do i = 1, row
read(2, *)(matrix(i, j), j=1,col)
end do
res = matrix
! Solve task
!$omp parallel private(i,j,m,n,point,iter)
!$omp do collapse(2)
do i= 2, 11
do j=2, 11
m = i
n = j
iter = 1
point = 0
do while (iter <= walkers)
call random_number(z)
if (z <= 0.25) m = m - 1
if (z > 0.25 .and. z <= 0.5) n = n +1
if (z > 0.5 .and. z <= 0.75) m = m +1
if (z > 0.75) n = n - 1
if (m == 1 .or. m == 12 .or. n == 1 .or. n == 12) then
point = point + matrix(m, n)
m = i
n = j
iter = iter + 1
end if
end do
point = point / walkers
res(i, j) = point
end do
end do
!$omp end do
!$omp end parallel
! Write to file
open(2, file='out_omp.txt')
do i = 1, row
write(2, *)(res(i, j), j=1,col)
end do
contains
end
So, the problem is that parallel program computes MUCH lesser than its sequential version.
Where is the mistake?(except my terrible code)
Update: for now the code is with !$omp do directives, but the result is still the same: it is much lesser than its sequential version.
Most probably, the behavior is related to the random number extraction. RANDOM_NUMBER Fortran procedure is not even guaranteed to be thread-safe but it is thread-safe at least in GNU compiler thanks to a GNU extension. But in any case the performances seem to be very bad as you note.
If you switch to a different thread-safe random number generator, the scalability of your code can be good. I used the classical ran2.f generator:
http://www-star.st-and.ac.uk/~kw25/research/montecarlo/ran2.f
modified to make it thread-safe. If I am not wrong, to do that:
in the calling unit declare and define:
integer :: iv(32), iy, idum2, idum
idum2 = 123456789 ; iv(:) = 0 ; iy = 0
in OpenMP directives add idum as private and idum2, iv, iy as firstprivate (by the way you need to add z as private too)
in the parallel section add (before do)
idum = - omp_get_thread_num()
to have different random numbers for different threads
from ran2 function remove DATA and SAVE lines e pass idum2, iv, iy as arguments:
FUNCTION ran2(idum, iv, iy, idum2)
call ran2 instead of random_number intrinsic
z = ran2(idum, iv, iy, idum2)
With walkers=100000 (GNU compiler) these are my times:
1 thread => 4.7s
2 threads => 2.4s
4 threads => 1.5s
8 threads => 0.78s
16 threads => 0.49s
Not strictly related to the question but I have to say that extracting a real number for each 4 "bit"s info you need (+1 or -1) and the usage of conditionals can be probably changed using a more efficient strategy.
I want to do a fixed calculation step for multiple .dat files.
Here is my code for what I want to do with one .dat file i.e. the calculation:
dimension t(128716),x(128716)
open (unit=88,file='ALFA-gua-100m-2.dat',status='unknown')
do i=1,128716
read(88,*)t(i),x(i)
enddo
sum=0
do j=1,128716
sum=sum+x(j)
enddo
write(*,*)sum/128716
close(88)
stop
end
How do I go about this? Please suggest!
Here is my code for multiple file :
dimension t(128716),x(128716)
open (unit=11,file='ALFA-gua-100m-2.dat',status='unknown')
open (unit=12,file='ALFA-gua-100m-5.dat',status='unknown')
do i=1,2
ii = i + 10
do j=1,128716
read(ii,*)t(j),x(j)
enddo
sum=0
do k=1,128716
sum=sum+x(k)
enddo
enddo
do l=1,2
ll = l + 10
write(ll,*)sum/128716.0
close(ll)
enddo
stop
end
But its not working.
An addendum to #VladimirF's answer.
To sum all the elements in an array called x we can simply write
sumx = sum(x)
there is no need for the programmer to write a loop at all. If using an array of sums, then something like
sums(1) = sum(x)
would be appropriate.
Then to calculate the mean of an array I'd write
meanx = sum(x)/size(x)
While I'm writing: it's not a good idea to call a variable sum. There's an existing intrinsic function of that name and it will only confuse readers (though not the compiler) to have a variable of that name too.
You cannot use the same sum for two iterations of the i loop when you have two separate loops. It will get overwritten when processing the second file.
You can join the loops into one.
do i=1,2
ii = i + 10
do j=1,128716
read(ii,*)t(j),x(j)
enddo
sum=0
do k=1,128716
sum=sum+x(k)
enddo
write(ii,*)sum/128716.0
close(ii)
enddo
You can use an array for the sums sums(i).
do i=1,2
ii = i + 10
do j=1,128716
read(ii,*)t(j),x(j)
enddo
sum=0
do k=1,128716
sums(i)=sums(i)+x(k)
enddo
enddo
do l=1,2
ll = l + 10
write(ll,*)sums(l)/128716.0
close(ll)
enddo
program main
use omp_lib
implicit none
integer :: n=8
integer :: i, j, myid, a(8, 8), b, c(8)
! Generate a 8*8 array A
!$omp parallel default(none), private(i, myid), &
!$omp shared(a, n)
myid = omp_get_thread_num()+1
do i = 1, n
a(i, myid) = i*myid
end do
!$omp end parallel
! Array A
print*, 'Array A is'
do i = 1, n
print*, a(:, i)
end do
! Sum of array A
b = 0
!$omp parallel reduction(+:b), shared(a, n), private(i, myid)
myid = omp_get_thread_num()+1
do i = 1, n
b = b + a(i, myid)
end do
!$omp end parallel
print*, 'Sum of array A by reduction is ', b
b = 0
c = 0
!$omp parallel do
do i = 1, n
do j = 1, n
c(i) = c(i) + a(j, i)
end do
end do
!$omp end parallel do
print*, 'Sum of array A by using parallel do is', sum(c)
!$omp parallel do
do i = 1, n
do j = 1, n
b = b + a(j, i)
end do
end do
!$omp end parallel do
print*, 'Sum of array A by using parallel do in another way is', b
end program main
I wrote a piece of Fortran code above to implement OpenMP to sum up all elements in a 8*8 array in three different ways. First one uses reduction and works. Second, I created a one dimension array with 8 elements. I sum up each column in parallel region and then sum them up. And this works as well. Third one I used an integer to sum up every element in array, and put it in parallel do region. This result is not correct and varies every time. I don't understand why this situation happens. Is because didn't specify public and private or the variable b is overwritten in the procedure?
There is a race condition on b on your third scenario: several threads are reading and writing the same variable without proper synchronization / privatization.
Note that you don't have a race condition in the second scenario: each thread is updating some data (i.e. c(i)) that no one else is accessing.
Finally, some solutions to your last scenario:
Add the reducion(+:b) clause to the pragma
Add a pragma omp atomic directive before the b = b + c(j,i) expression
You can implement a manual privatization
I have already current code, but it still not working. If code is correct, please help how I can compile it. I had tried it to compile so:
gfortran trap.f -fopenmp
PROGRAM TRAP
USE OMP_LIB
DOUBLE PRECISION INTEG, TMPINT
DOUBLE PRECISION A, B
PARAMETER (A=3.0, B=7.0)
INTEGER N
PARAMETER (N=10)
DOUBLE PRECISION H
DOUBLE PRECISION X
INTEGER I
DOUBLE PRECISION F
H = (B-A)/N
INTEG = 0.0
TMPINT = 0.0
!$omp parallel firstprivate(X, TMPINT) shared(INTEG)
!$omp do
DO 10 I=1,N-1,1
X=A+I*H
TMPINT = TMPINT + F(X)
10 CONTINUE
!$omp end do
!$omp critical
INTEG = INTEG + TMPINT
!$omp end critical
!$omp end parallel
NTEG = (INTEG+(F(A)+F(B))/2.0)*H
PRINT *, "WITH N=", N, "INTEGRAL=", INTEG
END
FUNCTION F(X)
DOUBLE PRECISION X
F = X / (X + 1) * EXP(-X + 2)
END
Compiler gives following problems:
[http://i.stack.imgur.com/QPknv.png][1]
[http://i.stack.imgur.com/GYkmN.png][2]
Your program has a suffix .f, so gfortran assumes that the code is in fixed format and complains that many statements are "unclassifiable". To fix this, change the file name to trap.f90 and compile as gfortran -fopenmp trap.f90 to assume free format. There are also other problems: one is that the return type of function F(X) does not match with the type declared in the main program, so F(X) needs to be modified as
FUNCTION F(X)
implicit none !<--- this is always recommended
DOUBLE PRECISION X, F !<--- add F here
F = X / (X + 1) * EXP(-X + 2)
END
Another issue is that NTEG is probably a typo of INTEG, so it should be modified as
INTEG = (INTEG+(F(A)+F(B))/2.0)*H
(this is automatically detected if we have implicit none in the main program). Now running the code with, e.g. 8 threads, gives
$ OMP_NUM_THREADS=8 ./a.out
WITH N= 10 INTEGRAL= 0.28927708626319770
while the exact result is 0.28598... Increasing the value of N, we can confirm that the agreement becomes better:
WITH N= 100 INTEGRAL= 0.28602065571967972
WITH N= 1000 INTEGRAL= 0.28598803555916535
WITH N= 10000 INTEGRAL= 0.28598770935198736
WITH N= 100000 INTEGRAL= 0.28598770608991503
BTW, it is probably easier to use the reduction clause to do the same thing, for example:
INTEG = 0.0
!$omp parallel do reduction(+ : integ) private(x)
DO I = 1, N-1
X = A + I * H
INTEG = INTEG + F( X )
ENDDO
!$omp end parallel do
INTEG = (INTEG+(F(A)+F(B))/2.0)*H
Your code is in fixed form (.f). Therefore, you must code by the rules of the fixed format: The first six characters on each line have a special meaning and should be blank unless you specify a comment in the first position, a line continuation (sixth position), or statement labels 10.
If you format your code accordingly, the compiler complains about a mismatch in the return value of F(X). As you do not use implicit none, the type is defined by the first letter of the function, and F maps to a (single precision) real. So you need to specify the return type explicitly.
Then the code looks like:
PROGRAM TRAP
USE OMP_LIB
DOUBLE PRECISION INTEG, TMPINT
DOUBLE PRECISION A, B
PARAMETER (A=3.0, B=7.0)
INTEGER N
PARAMETER (N=10)
DOUBLE PRECISION H
DOUBLE PRECISION X
INTEGER I
DOUBLE PRECISION F
H = (B-A)/N
INTEG = 0.0
TMPINT = 0.0
c$omp parallel firstprivate(X, TMPINT) shared(INTEG)
c$omp do
DO 10 I=1,N-1,1
X=A+I*H
TMPINT = TMPINT + F(X)
10 CONTINUE
c$omp end do
c$omp critical
INTEG = INTEG + TMPINT
c$omp end critical
c$omp end parallel
INTEG = (INTEG+(F(A)+F(B))/2.0)*H
PRINT *, "WITH N=", N, "INTEGRAL=", INTEG
END
DOUBLE PRECISION FUNCTION F(X)
DOUBLE PRECISION X
F = X / (X + 1) * EXP(-X + 2)
END
[Please note that I also fixed the NTAG = line into INTEG= as I believe this is intended. I did not check the code for validity. ]