Don't always get expected results - fortran

The following is the code of a simple tidal-transport model that I have included OpenMP to parallelize the computation:
!$OMP PARALLEL SHARED(w, u, v, nthreads, chunk) PRIVATE(i, j, tid)
do it = 1, itlast
!$OMP DO SCHEDULE(DYNAMIC, CHUNK)
do j = 2, nyw-1
do i = 2, nxw-1
w(i,j) = w(i,j) - rx*depth*(u(i,j) - u(i-1,j)) &
- ry*depth*(v(i,j) - v(i,j-1))
end do
end do
!$OMP END DO
!$OMP SINGLE
call boudary_condition(it)
!$OMP END SINGLE
!$OMP DO SCHEDULE(DYNAMIC, CHUNK)
do j = 1, nyw
jv1 = j
if (jv1 .ge. nyw-1) jv1 = nyw-1
do i = 1, nxw-1
u(i,j) = u(i,j) - rxg*(w(i+1,j) - w(i,j)) &
- constant*u(i,j)*sqrt((u(i,j)**2.) + (v(i,jv1)**2.))
end do
end do
!$OMP END DO
!$OMP DO SCHEDULE(DYNAMIC, CHUNK)
do j = 1, nyw-1
do i = 1, nxw
iu1 = i
if (iu1 .ge. nxw-1) iu1 = nxw-1
v(i,j) = v(i,j) - ryg*(w(i,j+1) - w(i,j)) &
- constant*v(i,j)*sqrt((u(iu1,j)**2.) + (v(i,j)**2.))
end do
end do
!$OMP END DO
call transport_equation(chunk)
!$OMP MASTER
mtprint = it/ntserprint
if (ntserprint*mtprint .ne. it) goto 20
call timeseries(it)
20 continue
!$OMP END MASTER
end do
!$OMP END PARALLEL
The problem is I don't always get the expected results. Using the same input file, I should always get the same results, but sometimes it produces NaN in the output file. I'm not quite understand why this happens. I'm using Intel Visual Fortran Composer XE 2013 on Windows 10 to compile and run the executable file.

You need at least to have it, jv1, and ui1 private.
Try first fixing these, and let us know.

Related

OpenACC | Fortran 90: What is the best way to parallelize nested DO loop?

I am trying to parallelize the following nested DO loop structure (the first code below) using 'Collapse' directive in OpenACC. The variable 'nbl' present in the outermost loop is present in the other DO loops, so there is dependency. Thanks to the compiler its showing an error in advance. So I had to compromise and construct 'collapse' directive only to the remaining four inner most loops. Is there a way to parallelize this loop to get maximum performance by utilizing the parallelism of "nbl = 1,nblocks" as well?
Compiler: pgfortran
Flags: -acc -fast -ta=tesla:managed -Minfo=accel
Code that's giving error due to data dependency between outer most DO loop and other inner DO loops:
!$acc parallel loop collapse(5)
DO nbl = 1,nblocks
DO n_prim = 1,nprims
DO k = 1, NK(nbl)
DO j = 1, NJ(nbl)
DO i = 1, NI(nbl)
Px(i,j,k,nbl,n_prim) = i*j + Cx(i,j,k,nbl,1)*Cx(i,j,k,nbl,5) + Cx(i,j,k,nbl,2)
ENDDO
ENDDO
ENDDO
ENDDO
ENDDO
!$acc end parallel loop
Compromised working code with lesser parllelism:
DO nbl = 1,nblocks
!$acc parallel loop collapse(4)
DO n_prim = 1,nprims
DO k = 1, NK(nbl)
DO j = 1, NJ(nbl)
DO i = 1, NI(nbl)
Px(i,j,k,nbl,n_prim) = i*j + Cx(i,j,k,nbl,1)*Cx(i,j,k,nbl,5) + Cx(i,j,k,nbl,2)
ENDDO
ENDDO
ENDDO
ENDDO
!$acc end parallel loop
ENDDO
Thanks!
The dependency is with the array look-ups for the upper bounds of the loops. In order to collapse loops, the iteration count of the loop must be known before entering, but here the count is variable.
Try something like the following and split the parallelism into two levels:
!$acc parallel loop collapse(2)
DO nbl = 1,nblocks
DO n_prim = 1,nprims
!$acc loop collapse(3)
DO k = 1, NK(nbl)
DO j = 1, NJ(nbl)
DO i = 1, NI(nbl)

error: reduction variable ‘v1’ is private in outer context

I have the following Fortran code
program hello
use omp_lib
implicit none
integer :: num_threads = 2
print*, "Display Hello world!"
print*, "Number of threads used = ", num_threads
call loop()
end program hello
subroutine loop()
integer :: i,j,k,n
real :: c0
real, allocatable :: v1(:,:)
n = 3
c0 = 0.
if (.not. allocated (v1)) allocate(v1(n,n))
v1 = c0
!$omp do private(i, j, k) schedule(dynamic) reduction(+: v1)
do i = 1, n
do j = 1, n
do k = 1, n
v1(i,j) = v1(i,j) + k
end do
write (*,*) i, j, v1(i,j)
end do
end do
!$omp end do
end subroutine
gfotran -fopenmp leads to
error: reduction variable ‘v1’ is private in outer context
!$omp do private(i, j, k) schedule(dynamic) reduction(+: v1)
I checked reduction variable is private in outer context
but still unsure the reason for my issue. v1 is only used inside the loop.
What's the reason for the error message reduction variable ‘v1’ is private in outer context ?
[Solved, by adding !$omp parallel and !$omp end parallel]
Thanks for Ian Bush's comment. By adding !$omp parallel and !$omp end parallel, i.e.,
program hello
use omp_lib
implicit none
integer :: num_threads = 2
print*, "Display Hello world!"
print*, "Number of threads used = ", num_threads
call loop()
end program hello
subroutine loop()
integer :: i,j,k,n
real :: c0
real, allocatable :: v1(:,:)
n = 3
c0 = 0.
if (.not. allocated (v1)) allocate(v1(n,n))
!$omp parallel
!$omp do private(i, j, k) schedule(dynamic) reduction(+: v1)
do i = 1, n
do j = 1, n
v1(i,j) = c0
do k = 1, n
v1(i,j) = v1(i,j) + k
end do
write (*,*) i, j, v1(i,j)
end do
end do
!$omp end do
!$omp end parallel
end subroutine
the code runs normally.

Why does my OpenMP code with reduction give wrong results?

My Fortran code is as follows:
! ...................... initialization
do ia=1,NLEV
do ic=1,NLEV
ZGamma(ia,ic) =zero
enddo
enddo
!$OMP PARALLEL DEFAULT(PRIVATE) SHARED(H,ZRO) REDUCTION(+: ZGamma)
!$OMP DO SCHEDULE(DYNAMIC)
do iabcd=1,H%iabcd_max
ia = H%ka(iabcd)
ib = H%kb(iabcd)
ic = H%kc(iabcd)
id = H%kd(iabcd)
ZGamma(ia,ic)=ZGamma(ia,ic) + H%ME2BM(iabcd)*ZRO(id,ib)
ZGamma(ib,ic)=ZGamma(ib,ic) - H%ME2BM(iabcd)*ZRO(id,ia)
ZGamma(ia,id)=ZGamma(ia,id) - H%ME2BM(iabcd)*ZRO(ic,ib)
ZGamma(ib,id)=ZGamma(ib,id) + H%ME2BM(iabcd)*ZRO(ic,ia)
if(ia+ib.eq.ic+id) cycle
ZGamma(ic,ia)=ZGamma(ic,ia) + H%ME2BM(iabcd)*ZRO(ib,id)
ZGamma(id,ia)=ZGamma(id,ia) - H%ME2BM(iabcd)*ZRO(ib,ic)
ZGamma(ic,ib)=ZGamma(ic,ib) - H%ME2BM(iabcd)*ZRO(ia,id)
ZGamma(id,ib)=ZGamma(id,ib) + H%ME2BM(iabcd)*ZRO(ia,ic)
enddo ! iabcd
!$OMP END DO
!$OMP END PARALLEL
In the above code, I calculated the 2D array ZGamma(i,j) using OpenMP. Even though I can compile the code without any problem. Could anyone tell me what the problem is in the code? What changes should I make?
By the way, as the index "iabcd" running from "1" to "H%iabcd_max", the values of "(ia,ib,ic,id)" can be "(1,1,1,1),(1,1,1,2),(1,1,1,...), (1,1,2,1),(1,1,2,..)," etc.

parallel do mistake in fortran

program main
use omp_lib
implicit none
integer :: n=8
integer :: i, j, myid, a(8, 8), b, c(8)
! Generate a 8*8 array A
!$omp parallel default(none), private(i, myid), &
!$omp shared(a, n)
myid = omp_get_thread_num()+1
do i = 1, n
a(i, myid) = i*myid
end do
!$omp end parallel
! Array A
print*, 'Array A is'
do i = 1, n
print*, a(:, i)
end do
! Sum of array A
b = 0
!$omp parallel reduction(+:b), shared(a, n), private(i, myid)
myid = omp_get_thread_num()+1
do i = 1, n
b = b + a(i, myid)
end do
!$omp end parallel
print*, 'Sum of array A by reduction is ', b
b = 0
c = 0
!$omp parallel do
do i = 1, n
do j = 1, n
c(i) = c(i) + a(j, i)
end do
end do
!$omp end parallel do
print*, 'Sum of array A by using parallel do is', sum(c)
!$omp parallel do
do i = 1, n
do j = 1, n
b = b + a(j, i)
end do
end do
!$omp end parallel do
print*, 'Sum of array A by using parallel do in another way is', b
end program main
I wrote a piece of Fortran code above to implement OpenMP to sum up all elements in a 8*8 array in three different ways. First one uses reduction and works. Second, I created a one dimension array with 8 elements. I sum up each column in parallel region and then sum them up. And this works as well. Third one I used an integer to sum up every element in array, and put it in parallel do region. This result is not correct and varies every time. I don't understand why this situation happens. Is because didn't specify public and private or the variable b is overwritten in the procedure?
There is a race condition on b on your third scenario: several threads are reading and writing the same variable without proper synchronization / privatization.
Note that you don't have a race condition in the second scenario: each thread is updating some data (i.e. c(i)) that no one else is accessing.
Finally, some solutions to your last scenario:
Add the reducion(+:b) clause to the pragma
Add a pragma omp atomic directive before the b = b + c(j,i) expression
You can implement a manual privatization

Nested Loop Optimization in OpenMP

I can't get the output result correct once applied openMP, is it anywhere get this right?
!$OMP PARALLEL DO SHARED(outmtresult,inpa,inpb,dynindexlist) PRIVATE(i,j) REDUCTION(+:outcountb)
do i=1,size1
do j=1, size1
outcountb = outcountb + 1
outmtresult(j) = tan(inpa(j) + inpb(j)) + alpha1 + dynindexlist(i)
enddo
enddo
!$OMP END PARALLEL DO
Just swap your loops and everything will be fine:
!$OMP PARALLEL DO SHARED(outmtresult,inpa,inpb,dynindexlist) PRIVATE(i,j) REDUCTION(+:outcountb)
do j=1,size1 ! <-- Swap i and
do i=1, size1 ! j here
outcountb = outcountb + 1
outmtresult(j) = tan(inpa(j) + inpb(j)) + alpha1 + dynindexlist(i)
enddo
enddo
!$OMP END PARALLEL DO
In your example, multiple threads write into the same memory address outmtresult(j) since you parallelize the do i loop.
By swapping the loops, you parallelize over do j and you will not write
at the same destination with multiple concurrent threads.