parallel do mistake in fortran - fortran

program main
use omp_lib
implicit none
integer :: n=8
integer :: i, j, myid, a(8, 8), b, c(8)
! Generate a 8*8 array A
!$omp parallel default(none), private(i, myid), &
!$omp shared(a, n)
myid = omp_get_thread_num()+1
do i = 1, n
a(i, myid) = i*myid
end do
!$omp end parallel
! Array A
print*, 'Array A is'
do i = 1, n
print*, a(:, i)
end do
! Sum of array A
b = 0
!$omp parallel reduction(+:b), shared(a, n), private(i, myid)
myid = omp_get_thread_num()+1
do i = 1, n
b = b + a(i, myid)
end do
!$omp end parallel
print*, 'Sum of array A by reduction is ', b
b = 0
c = 0
!$omp parallel do
do i = 1, n
do j = 1, n
c(i) = c(i) + a(j, i)
end do
end do
!$omp end parallel do
print*, 'Sum of array A by using parallel do is', sum(c)
!$omp parallel do
do i = 1, n
do j = 1, n
b = b + a(j, i)
end do
end do
!$omp end parallel do
print*, 'Sum of array A by using parallel do in another way is', b
end program main
I wrote a piece of Fortran code above to implement OpenMP to sum up all elements in a 8*8 array in three different ways. First one uses reduction and works. Second, I created a one dimension array with 8 elements. I sum up each column in parallel region and then sum them up. And this works as well. Third one I used an integer to sum up every element in array, and put it in parallel do region. This result is not correct and varies every time. I don't understand why this situation happens. Is because didn't specify public and private or the variable b is overwritten in the procedure?

There is a race condition on b on your third scenario: several threads are reading and writing the same variable without proper synchronization / privatization.
Note that you don't have a race condition in the second scenario: each thread is updating some data (i.e. c(i)) that no one else is accessing.
Finally, some solutions to your last scenario:
Add the reducion(+:b) clause to the pragma
Add a pragma omp atomic directive before the b = b + c(j,i) expression
You can implement a manual privatization

Related

OpenACC | Fortran 90: What is the best way to parallelize nested DO loop?

I am trying to parallelize the following nested DO loop structure (the first code below) using 'Collapse' directive in OpenACC. The variable 'nbl' present in the outermost loop is present in the other DO loops, so there is dependency. Thanks to the compiler its showing an error in advance. So I had to compromise and construct 'collapse' directive only to the remaining four inner most loops. Is there a way to parallelize this loop to get maximum performance by utilizing the parallelism of "nbl = 1,nblocks" as well?
Compiler: pgfortran
Flags: -acc -fast -ta=tesla:managed -Minfo=accel
Code that's giving error due to data dependency between outer most DO loop and other inner DO loops:
!$acc parallel loop collapse(5)
DO nbl = 1,nblocks
DO n_prim = 1,nprims
DO k = 1, NK(nbl)
DO j = 1, NJ(nbl)
DO i = 1, NI(nbl)
Px(i,j,k,nbl,n_prim) = i*j + Cx(i,j,k,nbl,1)*Cx(i,j,k,nbl,5) + Cx(i,j,k,nbl,2)
ENDDO
ENDDO
ENDDO
ENDDO
ENDDO
!$acc end parallel loop
Compromised working code with lesser parllelism:
DO nbl = 1,nblocks
!$acc parallel loop collapse(4)
DO n_prim = 1,nprims
DO k = 1, NK(nbl)
DO j = 1, NJ(nbl)
DO i = 1, NI(nbl)
Px(i,j,k,nbl,n_prim) = i*j + Cx(i,j,k,nbl,1)*Cx(i,j,k,nbl,5) + Cx(i,j,k,nbl,2)
ENDDO
ENDDO
ENDDO
ENDDO
!$acc end parallel loop
ENDDO
Thanks!
The dependency is with the array look-ups for the upper bounds of the loops. In order to collapse loops, the iteration count of the loop must be known before entering, but here the count is variable.
Try something like the following and split the parallelism into two levels:
!$acc parallel loop collapse(2)
DO nbl = 1,nblocks
DO n_prim = 1,nprims
!$acc loop collapse(3)
DO k = 1, NK(nbl)
DO j = 1, NJ(nbl)
DO i = 1, NI(nbl)

fortran arrays same shape and size parallel in OpenMP

I would like to ask whether openMP is capable of parallelizing fortran arrays with the same shape and size using simple notation. I did some research but I am not capable to find or figure out whether it is possible.
I refer as simple notation the following form:
a = b + c * 1.1
Find below a full example:
PROGRAM Parallel_Hello_World
USE OMP_LIB
implicit none
integer, parameter :: ILEN = 1000
integer :: a(ILEN,ILEN), b(ILEN,ILEN), c(ILEN,ILEN), d(ILEN,ILEN)
integer :: i, j
a = 1
b = 2
!$OMP PARALLEL SHARED(a, b, c, d)
!$OMP DO
DO i=1,ILEN
DO j=1, ILEN
c(j,i) = a(j,i) + b(j,i) * 1.1
ENDDO
END DO
!$OMP END DO
# is this loop parallel?
d = a + b * 1.1
!$OMP END PARALLEL
write (*,*) "Total C: ", c(1:5, 1)
write (*,*) "Total D: ", d(1:5, 1)
write (*,*) "C same D? ", all(c == d)
END
Is the d loop parallelized with openMP with the current notation?
As commented by #Gilles the answer to the question is to wrap it with the workshare clause:
!$OMP WORKSHARE
d = a + b * 1.1
!$OMP END WORKSHARE
Find more info here

error: reduction variable ‘v1’ is private in outer context

I have the following Fortran code
program hello
use omp_lib
implicit none
integer :: num_threads = 2
print*, "Display Hello world!"
print*, "Number of threads used = ", num_threads
call loop()
end program hello
subroutine loop()
integer :: i,j,k,n
real :: c0
real, allocatable :: v1(:,:)
n = 3
c0 = 0.
if (.not. allocated (v1)) allocate(v1(n,n))
v1 = c0
!$omp do private(i, j, k) schedule(dynamic) reduction(+: v1)
do i = 1, n
do j = 1, n
do k = 1, n
v1(i,j) = v1(i,j) + k
end do
write (*,*) i, j, v1(i,j)
end do
end do
!$omp end do
end subroutine
gfotran -fopenmp leads to
error: reduction variable ‘v1’ is private in outer context
!$omp do private(i, j, k) schedule(dynamic) reduction(+: v1)
I checked reduction variable is private in outer context
but still unsure the reason for my issue. v1 is only used inside the loop.
What's the reason for the error message reduction variable ‘v1’ is private in outer context ?
[Solved, by adding !$omp parallel and !$omp end parallel]
Thanks for Ian Bush's comment. By adding !$omp parallel and !$omp end parallel, i.e.,
program hello
use omp_lib
implicit none
integer :: num_threads = 2
print*, "Display Hello world!"
print*, "Number of threads used = ", num_threads
call loop()
end program hello
subroutine loop()
integer :: i,j,k,n
real :: c0
real, allocatable :: v1(:,:)
n = 3
c0 = 0.
if (.not. allocated (v1)) allocate(v1(n,n))
!$omp parallel
!$omp do private(i, j, k) schedule(dynamic) reduction(+: v1)
do i = 1, n
do j = 1, n
v1(i,j) = c0
do k = 1, n
v1(i,j) = v1(i,j) + k
end do
write (*,*) i, j, v1(i,j)
end do
end do
!$omp end do
!$omp end parallel
end subroutine
the code runs normally.

Why does my OpenMP code with reduction give wrong results?

My Fortran code is as follows:
! ...................... initialization
do ia=1,NLEV
do ic=1,NLEV
ZGamma(ia,ic) =zero
enddo
enddo
!$OMP PARALLEL DEFAULT(PRIVATE) SHARED(H,ZRO) REDUCTION(+: ZGamma)
!$OMP DO SCHEDULE(DYNAMIC)
do iabcd=1,H%iabcd_max
ia = H%ka(iabcd)
ib = H%kb(iabcd)
ic = H%kc(iabcd)
id = H%kd(iabcd)
ZGamma(ia,ic)=ZGamma(ia,ic) + H%ME2BM(iabcd)*ZRO(id,ib)
ZGamma(ib,ic)=ZGamma(ib,ic) - H%ME2BM(iabcd)*ZRO(id,ia)
ZGamma(ia,id)=ZGamma(ia,id) - H%ME2BM(iabcd)*ZRO(ic,ib)
ZGamma(ib,id)=ZGamma(ib,id) + H%ME2BM(iabcd)*ZRO(ic,ia)
if(ia+ib.eq.ic+id) cycle
ZGamma(ic,ia)=ZGamma(ic,ia) + H%ME2BM(iabcd)*ZRO(ib,id)
ZGamma(id,ia)=ZGamma(id,ia) - H%ME2BM(iabcd)*ZRO(ib,ic)
ZGamma(ic,ib)=ZGamma(ic,ib) - H%ME2BM(iabcd)*ZRO(ia,id)
ZGamma(id,ib)=ZGamma(id,ib) + H%ME2BM(iabcd)*ZRO(ia,ic)
enddo ! iabcd
!$OMP END DO
!$OMP END PARALLEL
In the above code, I calculated the 2D array ZGamma(i,j) using OpenMP. Even though I can compile the code without any problem. Could anyone tell me what the problem is in the code? What changes should I make?
By the way, as the index "iabcd" running from "1" to "H%iabcd_max", the values of "(ia,ib,ic,id)" can be "(1,1,1,1),(1,1,1,2),(1,1,1,...), (1,1,2,1),(1,1,2,..)," etc.

Nested Loop Optimization in OpenMP

I can't get the output result correct once applied openMP, is it anywhere get this right?
!$OMP PARALLEL DO SHARED(outmtresult,inpa,inpb,dynindexlist) PRIVATE(i,j) REDUCTION(+:outcountb)
do i=1,size1
do j=1, size1
outcountb = outcountb + 1
outmtresult(j) = tan(inpa(j) + inpb(j)) + alpha1 + dynindexlist(i)
enddo
enddo
!$OMP END PARALLEL DO
Just swap your loops and everything will be fine:
!$OMP PARALLEL DO SHARED(outmtresult,inpa,inpb,dynindexlist) PRIVATE(i,j) REDUCTION(+:outcountb)
do j=1,size1 ! <-- Swap i and
do i=1, size1 ! j here
outcountb = outcountb + 1
outmtresult(j) = tan(inpa(j) + inpb(j)) + alpha1 + dynindexlist(i)
enddo
enddo
!$OMP END PARALLEL DO
In your example, multiple threads write into the same memory address outmtresult(j) since you parallelize the do i loop.
By swapping the loops, you parallelize over do j and you will not write
at the same destination with multiple concurrent threads.