Nesting OMP DO directives - Fortran - fortran

I'm having problems trying to nest a OMP DO directive inside another OMP DO directive in Fortran.
Here's the following code:
DO in=2,n_niveles
allocate(cvalor(2,npuntosp(in),npuntost(in)))
!allocate(avalor(2,npuntosp(in-1),npuntost(in-1)))
allocate(valor_t2(npuntost(in),npuntosp(in-1),2))
!$OMP PARALLEL NUM_THREADS(hilos) DEFAULT(PRIVATE) FIRSTPRIVATE(n_niveles,in) SHARED(npuntosp,npuntost,cubos,central_reg,sumazm1n,expo,mphi,mtheta)
!$OMP DO SCHEDULE(STATIC)
DO aux=1,cubos(in-1)%ncubos_nivel
...
(some code here)
...
!$OMP PARALLEL NUM_THREADS(hilos) DEFAULT(PRIVATE) FIRSTPRIVATE(cuboj,in) SHARED(valor_t2,cvalor)
!$OMP DO SCHEDULE(STATIC)
do i=1,npuntost(in)
val=mtheta(in-1)%inicio(i,1)
do jj=val,val+mtheta(in-1)%inicio(i,2)
do k=1,npuntosp(in-1)
valor_t2(i,k,1)=valor_t2(i,k,1)+mtheta(in-1)%matriz(i,jj)*sumazm1n(in-1)%region(cuboj)%valor(1,k,jj)
valor_t2(i,k,2)=valor_t2(i,k,2)+mtheta(in-1)%matriz(i,jj)*sumazm1n(in-1)%region(cuboj)%valor(2,k,jj)
end do
end do
do k=1,npuntosp(in)
val=mphi(in-1)%inicio(k,1)
do jj=val,val+mphi(in-1)%inicio(k,2)
cvalor(1,k,i)=cvalor(1,k,i)+valor_t2(i,jj,1)*mphi(in-1)%matriz(jj,k)
cvalor(2,k,i)=cvalor(2,k,i)+valor_t2(i,jj,2)*mphi(in-1)%matriz(jj,k)
end do
end do
end do
!$OMP END DO
!$OMP END PARALLEL
...
(some code here)
...
END DO
!$OMP END DO
!$OMP END PARALLEL
deallocate(cvalor)
deallocate(valor_t2)
END DO
When the code is executed, an access violation exception occurs inside the second OpenMP parallel region. Sometimes that exception is changed for an overflow at the variable valor_t2.
Maybe OpenMP does not support this kind of parallelization, but I've searched over the net and didn't found anything about. I know that OpenMP supports the use of various OMP PARALLEL directives nested one inside another and I know how it works. But I'm having a headache with this problem.
Any ideas about what it's happening?
Thank you so much!

You're going to want to use the collapse clause in the do loop at the top level. See the link below for information:
https://computing.llnl.gov/tutorials/openMP/
As long as the code represented by (some code here) doesn't contain any loops, this should work.

Related

Is there a way to completely stop all calculations on a thread?

Explanation of code and approach:
There are various mathematical methods (fortran subroutines) to solve a variable y, each method is sequential and runs on a single thread. The speed of each methods solution is dependent on unknown conditions (i.e. it is a no free lunch situation and I do not know which method is fastest). Therefor, the approach is to run each method on a separate thread, and once a method has found the solution, calculations on the other threads should stop (as they are required for operations after the parallel sections region)
!$omp parallel sections lastprivate(x, y)
!$omp section
call method_1_for_solving_y(x)
!$omp cancel sections
!$omp section
call method_2_for_solving_y(x)
!$omp cancel sections
. . .
!$omp section
call method_z_for_solving_y(x)
!$omp cancel sections
!$omp end parallel sections
The question:
The !$omp cancel sections construct does not completely cancel all operations on the threads that have not found the solution yet, is there a way to completely stop calculations on those threads?
Any additional advice, or possible other approaches would be appreciated.
Regards.

OpenMP accumulate array inside nested parallelism

I am converting an existing application to work with multiple threads and using OpenMP with nested parallelism for this purpose.
The code looks like this (Fortran)
!$omp parallel do private(array) ...
DO i=1...
...
C ---- plenty of code ----
...
!$omp parallel do private(z1,z2,z3,value)...
DO j=1...
...
!$omp critical
DO z1=..
DO z2=..
DO z3=..
...
value = ...
array(z1,z2,z3) = array(z1,z2,z3) + value
END DO
END DO
END DO
!$omp end critical
END DO
END DO
I added an OMP CRITICAL because the accumulation was not thread safe, but this is causing threads from other teams to wait unnecessarily.
What is the best way to parallelize this? Is there any way to make a reduction work in this case?

Openmp: Have a MASTER construct inside parallel do

I have a fortran code that looks like this
!$OMP PARALLEL DO DEFAULT(PRIVATE) SHARED(var1, var2, var3, numberOfCalculationsPerformed)
do ix = 1,nx
! Do parallel work
do iy = 1,ny
! Do a lot of work....
!$OMP ATOMIC
numberOfCalculationsPerformed = numberOfCalculationsPerformed+1
!$OMP END ATOMIC
!$OMP MASTER
! Report progress
call progressCallBack(numberOfCalculationsPerformed/totalNCalculations)
!$OMP END MASTER
end do
end do
When I try to compile it reports that
error #7102: An OpenMP* MASTER directive is not permitted in the
dynamic extent of a DO, PARALLEL DO, SECTIONS, PARALLEL SECTIONS, or
SINGLE directive.
I do not understand this. I have tried to modify the parallel do construct to this
!$OMP PARALLEL DO DEFAULT(PRIVATE) SHARED(var1, var2, var3, numberOfCalculationsPerformed), &
!$OMP& SCHEDULE(STATIC)
(in the thought that it had something to do with the scheduling) but that did nothing to change the error.
Does anyone know what I am not getting right? Is it just impossible to use master inside a parallel do construct or what? If that is so, are there alternatives?
Edit:
!$OMP SINGLE
!$OMP END SINGLE
Instead of the MASTER equivalent yields the same result... (error message)
Ps. I only need one of the threads to execute progressCallback.
The question is a bit old, but since I recently stumbled across the same issue, I wanted to share a simple solution. The idea is to formulate an if-clause which only evaluates to TRUE for one of the threads. This can easily be achieved by querying the current thread number. By requiring it to be zero, the clause is guaranteed to be true for at least one thread:
!$OMP PARALLEL DO DEFAULT(PRIVATE) SHARED(var1, var2, var3, numberOfCalculationsPerformed)
do ix = 1,nx
! Do parallel work
do iy = 1,ny
! Do a lot of work....
!$OMP ATOMIC
numberOfCalculationsPerformed = numberOfCalculationsPerformed+1
!$OMP END ATOMIC
if (OMP_GET_THREAD_NUM() == 0) then
! Report progress
call progressCallBack(numberOfCalculationsPerformed/totalNCalculations)
end if
end do
end do

Summation error in openmp fortran

I am trying to sum up of a variable with openmp with code given below.
normr=0.0
!$omp parallel default(private) shared(nelem,normr,cell_data,alphar,betar,k)
!$omp do REDUCTION(+:normr)
do ii=1,nelem
nnodese=cell_data(ii)%num_vertex
pe=cell_data(ii)%porder
ndofe=cell_data(ii)%ndof
num_neighboure=cell_data(ii)%num_neighbour
be=>cell_data(ii)%Force
Ke=>cell_data(ii)%K
Me=>cell_data(ii)%M
pressuree=>cell_data(ii)%p
Rese=>cell_data(ii)%Res
neighbour_indexe=>cell_data(ii)%neighbour_index(:)
Rese(:)=be(:)
Rese(:)=Rese(:)-cmplx(-1.0,1.0*alphar/k)*matmul(Me(:,:),pressuree(:))
Rese(:)=Rese(:)-cmplx(1.0,1.0*k*betar)*matmul(Ke(:,:),pressuree(:))
do jj=1,num_neighboure
nbeindex=neighbour_indexe(jj)
Knbe=>cell_data(ii)%neighbour(jj)%Knb
pressurenb=>cell_data(nbeindex)%p
ndofnb=cell_data(nbeindex)%ndof
Rese(:)=Rese(:)-cmplx(1.0,1.0*k*betar)*matmul(Knbe(:,:),pressurenb(:))
nullify(pressurenb)
nullify(Knbe)
end do
normr=normr+dot_product(Rese(:),Rese(:))
nullify(pressuree)
nullify(Ke)
nullify(Me)
nullify(Rese)
nullify(neighbour_indexe)
nullify(be)
end do
!$omp end do
!$omp end parallel
The result for summed variable, normr, is different for parallel and sequantial code. In one of the posts I have seen that inner loop variable should be defined inside the parallel construct(Why I don't know). I also changed the pointers to locall allocated variables but result did not changed. normr is a saved real variable.
Any suggestions and helps will be appreciated.
Best Regards,
Gokmen
normr can be different for the parallel and the sequential code, because the summation does not take place in the same order. Hence, the difference does not need to be an error and can be expected from the reduction operation.
Not being an error does not necessary mean not being a problem. One way around this would be to move the summation out of the parallel loop:
!$omp parallel default(private) shared(... keep_dot_product)
!$OMP do
do ii=1,nelem
! ...
keep_dot_product(ii) = dot_product(Rese(:),Rese(:))
! ...
end do
!$omp end do
!$omp end parallel
normr = sum(keep_dot_product)

Strange gfortran compilation error when adding Openmp directives

I have legacy fortran source file named pot.f,
which I need to apply OpenMP to parallel as shown below,but I can error messages about unexpected end state etc. But when I comment out $OMP lines by adding additional ! in the first column, there are not errors.
It is really weird to me. Can anybody tell me what went wrong?
subroutine pot_osc(rvp,R_pot,e_pot,pe_pot,ftmp,gtmp,vtmp,natoms)
implicit none
include 'sizes.h'
include 'constants.h'
include 'omp_lib.h'
double precision ftmp(maxatoms,3),gtmp(3),R_pot(maxatoms,3)
!$OMP PARALLEL WORKSHARE SHARED(gtmp,ftmp)
!$OMP PARALLEL NUM_THREADS(16)
gtmp = 0d0
ftmp = 0d0
!$OMP END PARALLEL WORKSHARE
return
end
subroutine pot_asym(rvp,vtmp)
implicit none
include 'constants.h'
return
end
Error messages:
end
1
Error: Unexpected END statement at (1)
subroutine pot_asym(rvp,vtmp)
1
Error: Unclassifiable statement at (1)
You start a second parallel section in the second OpenMP directive, which is not terminated by an end parallel. So the OpenMP directive should read
!$OMP PARALLEL WORKSHARE SHARED(gtmp,ftmp) NUM_THREADS(16)
gtmp = 0d0
ftmp = 0d0
!$OMP END PARALLEL WORKSHARE
or if you like to keep the line break use
!$OMP PARALLEL WORKSHARE SHARED(gtmp,ftmp) &
!$OMP NUM_THREADS(16)
gtmp = 0d0
ftmp = 0d0
!$OMP END PARALLEL WORKSHARE
In the past, I experienced some problems with exactly this kind of initialization. It seems that when compiled with gfortran the master thread did all the work. Even worse, by means of the "first-stouch principle", the whole array was located in the memory associated with the first thread. On our CCNUMA machine this lead to a huge slowdown.
To solve this I used explicit loops to initialize:
!$OMP PARALLEL DO SHARED(gtmp,ftmp) NUM_THREADS(16)
do i=1,maxatoms
ftmp(i,:) = 0d0
enddo
!$OMP END PARALLEL DO
! No need to do three elements in parallel
gtmp = 0d0
I don't know whether they fixed this problem, but I use this way of initialization for arrays in shared memory since then.