I would like to ask whether openMP is capable of parallelizing fortran arrays with the same shape and size using simple notation. I did some research but I am not capable to find or figure out whether it is possible.
I refer as simple notation the following form:
a = b + c * 1.1
Find below a full example:
PROGRAM Parallel_Hello_World
USE OMP_LIB
implicit none
integer, parameter :: ILEN = 1000
integer :: a(ILEN,ILEN), b(ILEN,ILEN), c(ILEN,ILEN), d(ILEN,ILEN)
integer :: i, j
a = 1
b = 2
!$OMP PARALLEL SHARED(a, b, c, d)
!$OMP DO
DO i=1,ILEN
DO j=1, ILEN
c(j,i) = a(j,i) + b(j,i) * 1.1
ENDDO
END DO
!$OMP END DO
# is this loop parallel?
d = a + b * 1.1
!$OMP END PARALLEL
write (*,*) "Total C: ", c(1:5, 1)
write (*,*) "Total D: ", d(1:5, 1)
write (*,*) "C same D? ", all(c == d)
END
Is the d loop parallelized with openMP with the current notation?
As commented by #Gilles the answer to the question is to wrap it with the workshare clause:
!$OMP WORKSHARE
d = a + b * 1.1
!$OMP END WORKSHARE
Find more info here
Related
I have the following Fortran code
program hello
use omp_lib
implicit none
integer :: num_threads = 2
print*, "Display Hello world!"
print*, "Number of threads used = ", num_threads
call loop()
end program hello
subroutine loop()
integer :: i,j,k,n
real :: c0
real, allocatable :: v1(:,:)
n = 3
c0 = 0.
if (.not. allocated (v1)) allocate(v1(n,n))
v1 = c0
!$omp do private(i, j, k) schedule(dynamic) reduction(+: v1)
do i = 1, n
do j = 1, n
do k = 1, n
v1(i,j) = v1(i,j) + k
end do
write (*,*) i, j, v1(i,j)
end do
end do
!$omp end do
end subroutine
gfotran -fopenmp leads to
error: reduction variable ‘v1’ is private in outer context
!$omp do private(i, j, k) schedule(dynamic) reduction(+: v1)
I checked reduction variable is private in outer context
but still unsure the reason for my issue. v1 is only used inside the loop.
What's the reason for the error message reduction variable ‘v1’ is private in outer context ?
[Solved, by adding !$omp parallel and !$omp end parallel]
Thanks for Ian Bush's comment. By adding !$omp parallel and !$omp end parallel, i.e.,
program hello
use omp_lib
implicit none
integer :: num_threads = 2
print*, "Display Hello world!"
print*, "Number of threads used = ", num_threads
call loop()
end program hello
subroutine loop()
integer :: i,j,k,n
real :: c0
real, allocatable :: v1(:,:)
n = 3
c0 = 0.
if (.not. allocated (v1)) allocate(v1(n,n))
!$omp parallel
!$omp do private(i, j, k) schedule(dynamic) reduction(+: v1)
do i = 1, n
do j = 1, n
v1(i,j) = c0
do k = 1, n
v1(i,j) = v1(i,j) + k
end do
write (*,*) i, j, v1(i,j)
end do
end do
!$omp end do
!$omp end parallel
end subroutine
the code runs normally.
I was trying to parallelize a code in Fortran using openMP, with this code:
program pigreco
!----------------------------------------!
use OMP_LIB
implicit none
!----------------------------------------!
integer :: i
integer, parameter :: N = 100000
integer, parameter :: NCPU = 4
real*8 :: t0, t1
real :: h, totale, x, f
!----------------------------------------!
print '(a,2x,i15)', ' Number of intervals: ', N
totale = 0.0
h = 1. / N
call OMP_SET_NUM_THREADS(NCPU)
write(*, '(a,i10)') 'Numero di processori totali: ', NCPU
t0 = OMP_GET_WTIME()
!----------------------------------------!
#ifdef PARALLEL
!
print '(a)', "Scelta la versione parallela."
!
!$OMP PARALLEL DO PRIVATE(x, f) REDUCTION(+:totale)
!
do i = 1, N
x = (i - 0.5) * h
f = (4 * h) / (1 + x**2)
totale = totale + f
enddo
!$OMP END PARALLEL DO
!
#endif
!
t1 = OMP_GET_WTIME()
!
PRINT '(a,2x,f30.25)', ' Computed PI =', totale
PRINT '(a,2x,f30.25)', ' Total computational time =', t1 - t0
!
end program pigreco
When I then try to compile with the line: gfortran prova.F90 -fopenmp -D PARALLEL it gives me an error that says "unclassifiable OpenMP directive at (1)".
The problem is that you defined PARALLEL with the preprocessor, so instead of reading OMP PARALLEL DO, the compiler reads OMP 1 DO, which of course doesn't make sense. Change #ifdef PARALLEL to #ifdef RUNPARALLEL and -DPARALLEL to -DRUNPARALLEL, then the compiler gives no error.
Alternatively, you can use the fact that when compiling with OpenMP support the macro variable _OPENMP is defined automatically, so you could use #ifdef _OPENMP, and no -D flag.
My Fortran code is as follows:
! ...................... initialization
do ia=1,NLEV
do ic=1,NLEV
ZGamma(ia,ic) =zero
enddo
enddo
!$OMP PARALLEL DEFAULT(PRIVATE) SHARED(H,ZRO) REDUCTION(+: ZGamma)
!$OMP DO SCHEDULE(DYNAMIC)
do iabcd=1,H%iabcd_max
ia = H%ka(iabcd)
ib = H%kb(iabcd)
ic = H%kc(iabcd)
id = H%kd(iabcd)
ZGamma(ia,ic)=ZGamma(ia,ic) + H%ME2BM(iabcd)*ZRO(id,ib)
ZGamma(ib,ic)=ZGamma(ib,ic) - H%ME2BM(iabcd)*ZRO(id,ia)
ZGamma(ia,id)=ZGamma(ia,id) - H%ME2BM(iabcd)*ZRO(ic,ib)
ZGamma(ib,id)=ZGamma(ib,id) + H%ME2BM(iabcd)*ZRO(ic,ia)
if(ia+ib.eq.ic+id) cycle
ZGamma(ic,ia)=ZGamma(ic,ia) + H%ME2BM(iabcd)*ZRO(ib,id)
ZGamma(id,ia)=ZGamma(id,ia) - H%ME2BM(iabcd)*ZRO(ib,ic)
ZGamma(ic,ib)=ZGamma(ic,ib) - H%ME2BM(iabcd)*ZRO(ia,id)
ZGamma(id,ib)=ZGamma(id,ib) + H%ME2BM(iabcd)*ZRO(ia,ic)
enddo ! iabcd
!$OMP END DO
!$OMP END PARALLEL
In the above code, I calculated the 2D array ZGamma(i,j) using OpenMP. Even though I can compile the code without any problem. Could anyone tell me what the problem is in the code? What changes should I make?
By the way, as the index "iabcd" running from "1" to "H%iabcd_max", the values of "(ia,ib,ic,id)" can be "(1,1,1,1),(1,1,1,2),(1,1,1,...), (1,1,2,1),(1,1,2,..)," etc.
program main
use omp_lib
implicit none
integer :: n=8
integer :: i, j, myid, a(8, 8), b, c(8)
! Generate a 8*8 array A
!$omp parallel default(none), private(i, myid), &
!$omp shared(a, n)
myid = omp_get_thread_num()+1
do i = 1, n
a(i, myid) = i*myid
end do
!$omp end parallel
! Array A
print*, 'Array A is'
do i = 1, n
print*, a(:, i)
end do
! Sum of array A
b = 0
!$omp parallel reduction(+:b), shared(a, n), private(i, myid)
myid = omp_get_thread_num()+1
do i = 1, n
b = b + a(i, myid)
end do
!$omp end parallel
print*, 'Sum of array A by reduction is ', b
b = 0
c = 0
!$omp parallel do
do i = 1, n
do j = 1, n
c(i) = c(i) + a(j, i)
end do
end do
!$omp end parallel do
print*, 'Sum of array A by using parallel do is', sum(c)
!$omp parallel do
do i = 1, n
do j = 1, n
b = b + a(j, i)
end do
end do
!$omp end parallel do
print*, 'Sum of array A by using parallel do in another way is', b
end program main
I wrote a piece of Fortran code above to implement OpenMP to sum up all elements in a 8*8 array in three different ways. First one uses reduction and works. Second, I created a one dimension array with 8 elements. I sum up each column in parallel region and then sum them up. And this works as well. Third one I used an integer to sum up every element in array, and put it in parallel do region. This result is not correct and varies every time. I don't understand why this situation happens. Is because didn't specify public and private or the variable b is overwritten in the procedure?
There is a race condition on b on your third scenario: several threads are reading and writing the same variable without proper synchronization / privatization.
Note that you don't have a race condition in the second scenario: each thread is updating some data (i.e. c(i)) that no one else is accessing.
Finally, some solutions to your last scenario:
Add the reducion(+:b) clause to the pragma
Add a pragma omp atomic directive before the b = b + c(j,i) expression
You can implement a manual privatization
I can't get the output result correct once applied openMP, is it anywhere get this right?
!$OMP PARALLEL DO SHARED(outmtresult,inpa,inpb,dynindexlist) PRIVATE(i,j) REDUCTION(+:outcountb)
do i=1,size1
do j=1, size1
outcountb = outcountb + 1
outmtresult(j) = tan(inpa(j) + inpb(j)) + alpha1 + dynindexlist(i)
enddo
enddo
!$OMP END PARALLEL DO
Just swap your loops and everything will be fine:
!$OMP PARALLEL DO SHARED(outmtresult,inpa,inpb,dynindexlist) PRIVATE(i,j) REDUCTION(+:outcountb)
do j=1,size1 ! <-- Swap i and
do i=1, size1 ! j here
outcountb = outcountb + 1
outmtresult(j) = tan(inpa(j) + inpb(j)) + alpha1 + dynindexlist(i)
enddo
enddo
!$OMP END PARALLEL DO
In your example, multiple threads write into the same memory address outmtresult(j) since you parallelize the do i loop.
By swapping the loops, you parallelize over do j and you will not write
at the same destination with multiple concurrent threads.