error: reduction variable ‘v1’ is private in outer context - fortran

I have the following Fortran code
program hello
use omp_lib
implicit none
integer :: num_threads = 2
print*, "Display Hello world!"
print*, "Number of threads used = ", num_threads
call loop()
end program hello
subroutine loop()
integer :: i,j,k,n
real :: c0
real, allocatable :: v1(:,:)
n = 3
c0 = 0.
if (.not. allocated (v1)) allocate(v1(n,n))
v1 = c0
!$omp do private(i, j, k) schedule(dynamic) reduction(+: v1)
do i = 1, n
do j = 1, n
do k = 1, n
v1(i,j) = v1(i,j) + k
end do
write (*,*) i, j, v1(i,j)
end do
end do
!$omp end do
end subroutine
gfotran -fopenmp leads to
error: reduction variable ‘v1’ is private in outer context
!$omp do private(i, j, k) schedule(dynamic) reduction(+: v1)
I checked reduction variable is private in outer context
but still unsure the reason for my issue. v1 is only used inside the loop.
What's the reason for the error message reduction variable ‘v1’ is private in outer context ?
[Solved, by adding !$omp parallel and !$omp end parallel]

Thanks for Ian Bush's comment. By adding !$omp parallel and !$omp end parallel, i.e.,
program hello
use omp_lib
implicit none
integer :: num_threads = 2
print*, "Display Hello world!"
print*, "Number of threads used = ", num_threads
call loop()
end program hello
subroutine loop()
integer :: i,j,k,n
real :: c0
real, allocatable :: v1(:,:)
n = 3
c0 = 0.
if (.not. allocated (v1)) allocate(v1(n,n))
!$omp parallel
!$omp do private(i, j, k) schedule(dynamic) reduction(+: v1)
do i = 1, n
do j = 1, n
v1(i,j) = c0
do k = 1, n
v1(i,j) = v1(i,j) + k
end do
write (*,*) i, j, v1(i,j)
end do
end do
!$omp end do
!$omp end parallel
end subroutine
the code runs normally.

Related

fortran arrays same shape and size parallel in OpenMP

I would like to ask whether openMP is capable of parallelizing fortran arrays with the same shape and size using simple notation. I did some research but I am not capable to find or figure out whether it is possible.
I refer as simple notation the following form:
a = b + c * 1.1
Find below a full example:
PROGRAM Parallel_Hello_World
USE OMP_LIB
implicit none
integer, parameter :: ILEN = 1000
integer :: a(ILEN,ILEN), b(ILEN,ILEN), c(ILEN,ILEN), d(ILEN,ILEN)
integer :: i, j
a = 1
b = 2
!$OMP PARALLEL SHARED(a, b, c, d)
!$OMP DO
DO i=1,ILEN
DO j=1, ILEN
c(j,i) = a(j,i) + b(j,i) * 1.1
ENDDO
END DO
!$OMP END DO
# is this loop parallel?
d = a + b * 1.1
!$OMP END PARALLEL
write (*,*) "Total C: ", c(1:5, 1)
write (*,*) "Total D: ", d(1:5, 1)
write (*,*) "C same D? ", all(c == d)
END
Is the d loop parallelized with openMP with the current notation?
As commented by #Gilles the answer to the question is to wrap it with the workshare clause:
!$OMP WORKSHARE
d = a + b * 1.1
!$OMP END WORKSHARE
Find more info here

Unclassifiable OpenMP directive in a Fortran program

I was trying to parallelize a code in Fortran using openMP, with this code:
program pigreco
!----------------------------------------!
use OMP_LIB
implicit none
!----------------------------------------!
integer :: i
integer, parameter :: N = 100000
integer, parameter :: NCPU = 4
real*8 :: t0, t1
real :: h, totale, x, f
!----------------------------------------!
print '(a,2x,i15)', ' Number of intervals: ', N
totale = 0.0
h = 1. / N
call OMP_SET_NUM_THREADS(NCPU)
write(*, '(a,i10)') 'Numero di processori totali: ', NCPU
t0 = OMP_GET_WTIME()
!----------------------------------------!
#ifdef PARALLEL
!
print '(a)', "Scelta la versione parallela."
!
!$OMP PARALLEL DO PRIVATE(x, f) REDUCTION(+:totale)
!
do i = 1, N
x = (i - 0.5) * h
f = (4 * h) / (1 + x**2)
totale = totale + f
enddo
!$OMP END PARALLEL DO
!
#endif
!
t1 = OMP_GET_WTIME()
!
PRINT '(a,2x,f30.25)', ' Computed PI =', totale
PRINT '(a,2x,f30.25)', ' Total computational time =', t1 - t0
!
end program pigreco
When I then try to compile with the line: gfortran prova.F90 -fopenmp -D PARALLEL it gives me an error that says "unclassifiable OpenMP directive at (1)".
The problem is that you defined PARALLEL with the preprocessor, so instead of reading OMP PARALLEL DO, the compiler reads OMP 1 DO, which of course doesn't make sense. Change #ifdef PARALLEL to #ifdef RUNPARALLEL and -DPARALLEL to -DRUNPARALLEL, then the compiler gives no error.
Alternatively, you can use the fact that when compiling with OpenMP support the macro variable _OPENMP is defined automatically, so you could use #ifdef _OPENMP, and no -D flag.

Can I speed this up any more?

I am interested in speeding up computation time for the subroutine compoundret which basically compounds a monthly return series over some holding period, say one month, three months, six months, etc.
I will be calling this subroutine from R as a dll. I have written a main function in the attached code snippet to get everything working in fortran.
subroutine compoundret(R_c, R, RF, horizons, Tn, N, M)
implicit none
! Arguments declarations
integer, intent(in) :: horizons(M), Tn, N, M
real*8, intent(in) :: RF(Tn), R(Tn, N, M)
real*8, intent(out) :: R_c(Tn, N, M)
! Intermediary Variables
integer :: t, j, k
real*8 :: RF_Temp(Tn, N, M)
R_c = 0.0
do t = 1, Tn
RF_Temp(t,:,:) = RF(t)
end do
!$acc data copyin(r(Tn,N,M), RF_Temp(Tn,N,M), horizons(M)), create(R_c(Tn,
N, M))
!$acc parallel loop
do k = 1, M
do j = 1, N
do t = 1, Tn - horizons(k) + 1
R_c( t, j, k) = PRODUCT( 1 + R( t:t + horizons(k) - 1, j, k) + &
RF_Temp( t:t + horizons(k) - 1, j, k)) - &
PRODUCT(1+ RF_Temp( t:t + horizons(k) - 1, j, k))
end do
end do
end do
!$acc end parallel
!$acc update host(R_c)
!$acc end data
end subroutine compoundret
Program main
implicit none
real*8 :: df(1000,5000, 6)
real*8 :: retdata(size(df,1),size(df,2),size(df,3)),RF(size(df,1))
integer :: horizons(6), Tn, N, M
Tn = size(df, 1)
N = size(df, 2)
M = size(df, 3)
df = 0.001
RF = 0.001
horizons(:) = (/1,3,6,12,24,48/)
call compoundret(retdata,df,RF,horizons, Tn, N, M)
print*, retdata(1, 1, 1:M)
end program
My target platform is a compute 6.0 device (GTX 1060).
I'd recommend collapsing the two outer loops and then adding "!$acc loop vector" on the "t" loop.
!$acc parallel loop collapse(2)
do k = 1, M
do j = 1, N
!$acc loop vector
do t = 1, Tn - horizons(k) + 1
R_c( t, j, k) = PRODUCT( 1 + R( t:t + horizons(k) - 1, j, k) + &
RF_Temp( t:t + horizons(k) - 1, j, k)) - &
PRODUCT(1+ RF_Temp( t:t + horizons(k) - 1, j, k))
end do
end do
end do
!$acc end parallel
Right now, you're only parallelizing the outer loop and since "M" is quite small, you're underutilizing the GPU.
Note that the PGI 2017 compilers have a bug which will prevent you from using OpenACC within a DLL (shared objects on Linux are fine). We're working on fixing this issue in the 18.1 compilers. You're current options are to either wait till 18.1 is released early next year or go back to the 16.10 compilers. If you're using the PGI Community edition, you'll need to wait for the 18.4 compilers in April.
Also, putting OpenACC in shared or dynamic libraries require the use of the "-ta=tesla:nordc" option.

parallel do mistake in fortran

program main
use omp_lib
implicit none
integer :: n=8
integer :: i, j, myid, a(8, 8), b, c(8)
! Generate a 8*8 array A
!$omp parallel default(none), private(i, myid), &
!$omp shared(a, n)
myid = omp_get_thread_num()+1
do i = 1, n
a(i, myid) = i*myid
end do
!$omp end parallel
! Array A
print*, 'Array A is'
do i = 1, n
print*, a(:, i)
end do
! Sum of array A
b = 0
!$omp parallel reduction(+:b), shared(a, n), private(i, myid)
myid = omp_get_thread_num()+1
do i = 1, n
b = b + a(i, myid)
end do
!$omp end parallel
print*, 'Sum of array A by reduction is ', b
b = 0
c = 0
!$omp parallel do
do i = 1, n
do j = 1, n
c(i) = c(i) + a(j, i)
end do
end do
!$omp end parallel do
print*, 'Sum of array A by using parallel do is', sum(c)
!$omp parallel do
do i = 1, n
do j = 1, n
b = b + a(j, i)
end do
end do
!$omp end parallel do
print*, 'Sum of array A by using parallel do in another way is', b
end program main
I wrote a piece of Fortran code above to implement OpenMP to sum up all elements in a 8*8 array in three different ways. First one uses reduction and works. Second, I created a one dimension array with 8 elements. I sum up each column in parallel region and then sum them up. And this works as well. Third one I used an integer to sum up every element in array, and put it in parallel do region. This result is not correct and varies every time. I don't understand why this situation happens. Is because didn't specify public and private or the variable b is overwritten in the procedure?
There is a race condition on b on your third scenario: several threads are reading and writing the same variable without proper synchronization / privatization.
Note that you don't have a race condition in the second scenario: each thread is updating some data (i.e. c(i)) that no one else is accessing.
Finally, some solutions to your last scenario:
Add the reducion(+:b) clause to the pragma
Add a pragma omp atomic directive before the b = b + c(j,i) expression
You can implement a manual privatization

Nested Loop Optimization in OpenMP

I can't get the output result correct once applied openMP, is it anywhere get this right?
!$OMP PARALLEL DO SHARED(outmtresult,inpa,inpb,dynindexlist) PRIVATE(i,j) REDUCTION(+:outcountb)
do i=1,size1
do j=1, size1
outcountb = outcountb + 1
outmtresult(j) = tan(inpa(j) + inpb(j)) + alpha1 + dynindexlist(i)
enddo
enddo
!$OMP END PARALLEL DO
Just swap your loops and everything will be fine:
!$OMP PARALLEL DO SHARED(outmtresult,inpa,inpb,dynindexlist) PRIVATE(i,j) REDUCTION(+:outcountb)
do j=1,size1 ! <-- Swap i and
do i=1, size1 ! j here
outcountb = outcountb + 1
outmtresult(j) = tan(inpa(j) + inpb(j)) + alpha1 + dynindexlist(i)
enddo
enddo
!$OMP END PARALLEL DO
In your example, multiple threads write into the same memory address outmtresult(j) since you parallelize the do i loop.
By swapping the loops, you parallelize over do j and you will not write
at the same destination with multiple concurrent threads.