Openmp nested parallelism use available threads - fortran

So, I have this simple Fortran do loop and inside that loop a couple of subroutines are called. I have
Made the do loop parallel with OpenMP, like this
!$omp parallel do
do i=1,n
call a()
call b()
enddo
!$omp end parallel do
Now most of the times the number of iterations in the loop is
less compared to the number of processor/threads available and the subroutines that are called inside the
loop can be called in parallel. So, is there a way to call the subroutines in parallel inside the parallel
do loop ? I have tried with task like this
!$omp parallel do
do i=1,n
!$omp task
call a(i , j )
!$omp end task
!$omp task
call b(i, k)
!$omp end task
!$omp taskwait
enddo
!$omp end parallel do
But this shows some error with segmentation fault. Is there any way to achieve this.
UPDATE:
So, I found out the main reason for the segmentation fault is coming from the fftw library. Lets consider a dummy program
program name
!$use omp_lib
implicit real*8(a-h,p-z)
call system_clock(count_rate=irate)
call system_clock(it1)
!$ call omp_set_nested(.true.)
!$omp parallel do
do i =1,5
call test(i)
print *, i
enddo
!$omp end parallel do
call system_clock(it2)
print *, (it2-it1)/real(irate, kind=8)
end program name
subroutine test(ii)
! just a dummy subroutine for heavy computation
implicit real*8(a-h,p-z)
do j=1,40000
!$omp task
do k=1,40000
x = exp(sqrt(sqrt(2.0d0*ii**3)**2))
enddo
!$omp end task
enddo
end subroutine
This program works exactly what I wants and using the task directives, uses the remaining threads and improves the performance. Now lets consider another dummy program but with fftw, similar to what I'm working.
program name
!$use omp_lib
implicit real*8(a-h,p-z)
integer, parameter :: n=8192*8
complex(kind=8) :: arr(n)
real(kind=8) :: tmp1(n), tmp2(n)
integer(kind=8) :: pF
integer :: i
call system_clock(count_rate=irate)
call dfftw_plan_dft_1d(pF,n,arr,arr,-1,0) ! forward
call system_clock(it1)
!$ call omp_set_nested(.true.)
!$omp parallel do private(arr)
do i =1,5
call random_number(tmp1)
call random_number(tmp2)
arr = cmplx(tmp1, tmp2, kind=8)
call test(pF, arr)
print *, i
enddo
!$omp end parallel do
call system_clock(it2)
print *, (it2-it1)/real(irate, kind=8)
end program name
subroutine test(pF, arr)
implicit real*8(a-h,p-z)
complex(kind=8) :: arr(:)
integer(kind=8) :: pF
do j=1,100
!$omp task private(arr)
do k=1, 100
call dfftw_execute_dft(pF, arr, arr)
enddo
!$omp end task
enddo
end subroutine
Now, this throws the segmentation fault. (NOTE: I have no random numer call in my actual program, they are here just for a dummy purpose). I have checked http://www.fftw.org/fftw3_doc/Thread-safety.html and fftw_execute is thread safe and the program works without the task directives. But with the task it throws error. Anyone knows how to fix this ?

Sigh, yet another example of why !$omp do parallel is a bad idea ... I really do think it is best to clearly separate the thread creation and worksharing phases.
As Vladimir says in the comments you haven't provided nearly enough detail to tell why you are getting a segmentation fault. However you seem to have a few misconceptions about OpenMP which I can try to address.
Firstly a very quick and dirty way to achieve what you want and avoiding any extra OpenMP directives is
!$omp parallel default( none ) private( i ) shared( n ) ! Create threads
!$omp do ! Now share out the work
Do i = 1, 2 * n
If( Mod( i, 2 ) == 1 ) Then
Call a
Else
Call b
End Do
!$omp end do
!$omp end parallel
However if you want to use tasks you're probably not doing it the easiest way if all calls to a and b are completely independent. In that case remember that a new task is created whenever ANY thread hits a !$omp task, and that that task can be executed by any thread, not just the one that created it. Following that logic something like
!$omp parallel default( none ) private( i ) shared( n ) ! Crate the threads
!$omp single
Do i = 1, n
!$omp task
Call a
!$omp end task
!$omp task
call b
!$omp end task
end do
!$omp end single
!$omp end parallel
is what you want - you use one thread to create the list of tasks, and then (or more probably while the list is being created) all the available threads will execute them, each task being taken by the next available thread. Note I have also missed out the taskwait directive as from your description I'm not sure why you think you need it as I can see no need for synchronisation at that point.

Related

Summation error in openmp fortran

I am trying to sum up of a variable with openmp with code given below.
normr=0.0
!$omp parallel default(private) shared(nelem,normr,cell_data,alphar,betar,k)
!$omp do REDUCTION(+:normr)
do ii=1,nelem
nnodese=cell_data(ii)%num_vertex
pe=cell_data(ii)%porder
ndofe=cell_data(ii)%ndof
num_neighboure=cell_data(ii)%num_neighbour
be=>cell_data(ii)%Force
Ke=>cell_data(ii)%K
Me=>cell_data(ii)%M
pressuree=>cell_data(ii)%p
Rese=>cell_data(ii)%Res
neighbour_indexe=>cell_data(ii)%neighbour_index(:)
Rese(:)=be(:)
Rese(:)=Rese(:)-cmplx(-1.0,1.0*alphar/k)*matmul(Me(:,:),pressuree(:))
Rese(:)=Rese(:)-cmplx(1.0,1.0*k*betar)*matmul(Ke(:,:),pressuree(:))
do jj=1,num_neighboure
nbeindex=neighbour_indexe(jj)
Knbe=>cell_data(ii)%neighbour(jj)%Knb
pressurenb=>cell_data(nbeindex)%p
ndofnb=cell_data(nbeindex)%ndof
Rese(:)=Rese(:)-cmplx(1.0,1.0*k*betar)*matmul(Knbe(:,:),pressurenb(:))
nullify(pressurenb)
nullify(Knbe)
end do
normr=normr+dot_product(Rese(:),Rese(:))
nullify(pressuree)
nullify(Ke)
nullify(Me)
nullify(Rese)
nullify(neighbour_indexe)
nullify(be)
end do
!$omp end do
!$omp end parallel
The result for summed variable, normr, is different for parallel and sequantial code. In one of the posts I have seen that inner loop variable should be defined inside the parallel construct(Why I don't know). I also changed the pointers to locall allocated variables but result did not changed. normr is a saved real variable.
Any suggestions and helps will be appreciated.
Best Regards,
Gokmen
normr can be different for the parallel and the sequential code, because the summation does not take place in the same order. Hence, the difference does not need to be an error and can be expected from the reduction operation.
Not being an error does not necessary mean not being a problem. One way around this would be to move the summation out of the parallel loop:
!$omp parallel default(private) shared(... keep_dot_product)
!$OMP do
do ii=1,nelem
! ...
keep_dot_product(ii) = dot_product(Rese(:),Rese(:))
! ...
end do
!$omp end do
!$omp end parallel
normr = sum(keep_dot_product)

Strange gfortran compilation error when adding Openmp directives

I have legacy fortran source file named pot.f,
which I need to apply OpenMP to parallel as shown below,but I can error messages about unexpected end state etc. But when I comment out $OMP lines by adding additional ! in the first column, there are not errors.
It is really weird to me. Can anybody tell me what went wrong?
subroutine pot_osc(rvp,R_pot,e_pot,pe_pot,ftmp,gtmp,vtmp,natoms)
implicit none
include 'sizes.h'
include 'constants.h'
include 'omp_lib.h'
double precision ftmp(maxatoms,3),gtmp(3),R_pot(maxatoms,3)
!$OMP PARALLEL WORKSHARE SHARED(gtmp,ftmp)
!$OMP PARALLEL NUM_THREADS(16)
gtmp = 0d0
ftmp = 0d0
!$OMP END PARALLEL WORKSHARE
return
end
subroutine pot_asym(rvp,vtmp)
implicit none
include 'constants.h'
return
end
Error messages:
end
1
Error: Unexpected END statement at (1)
subroutine pot_asym(rvp,vtmp)
1
Error: Unclassifiable statement at (1)
You start a second parallel section in the second OpenMP directive, which is not terminated by an end parallel. So the OpenMP directive should read
!$OMP PARALLEL WORKSHARE SHARED(gtmp,ftmp) NUM_THREADS(16)
gtmp = 0d0
ftmp = 0d0
!$OMP END PARALLEL WORKSHARE
or if you like to keep the line break use
!$OMP PARALLEL WORKSHARE SHARED(gtmp,ftmp) &
!$OMP NUM_THREADS(16)
gtmp = 0d0
ftmp = 0d0
!$OMP END PARALLEL WORKSHARE
In the past, I experienced some problems with exactly this kind of initialization. It seems that when compiled with gfortran the master thread did all the work. Even worse, by means of the "first-stouch principle", the whole array was located in the memory associated with the first thread. On our CCNUMA machine this lead to a huge slowdown.
To solve this I used explicit loops to initialize:
!$OMP PARALLEL DO SHARED(gtmp,ftmp) NUM_THREADS(16)
do i=1,maxatoms
ftmp(i,:) = 0d0
enddo
!$OMP END PARALLEL DO
! No need to do three elements in parallel
gtmp = 0d0
I don't know whether they fixed this problem, but I use this way of initialization for arrays in shared memory since then.

Openmp Fortran Subroutine

I am trying to parallelize a subroutine using Openmp.
The subroutine contains a successive over relaxation loop which runs on the total
error which is a shared variable. Now, when I parralelize the part where I call the
subroutine in the main program, it makes the error a private variable and then I can't make it explicitly a shared variable in the main program.
I am pasting the code for reference.
program test
!$omp parallel
call sub()
!$omp end parallel
end program test
subroutine sub()
do while(totalerror.ge.0.0001.and.sor.lt.10000)
totalerror=0.0
sor=sor+1
error=0.0
!$OMP DO REDUCTION(+:toterror) REDUCTION(MAX:error)
! shared (vorticity,strmfn,toterror,error,guess) PRIVATE (i,j,t1,t2)
do i=1,nx
do j=1,ny
guess(i,j)=0.25*((h**2.)*vorticity(i,j)+strmfn(i+1,j)+strmfn(i- 1,j)+strmfn(i,j+1)+strmfn(i,j-1))
totalerror = totalerror + error
error = max(abs(strmfn(`enter code here`i,j) - guess(i,j)),error)
strmfn(i,j)= strmfn(i,j) + omega*(guess(i,j)-strmfn(i,j))
enddo
enddo
!$OMP END DO
enddo
Any help would be appreciated.
toterror and error shouldn't be in the shared clause since they are in the reduction. If you need shared versions, copy them to different variables.

Thread issues when writing to files with OpenMP in Fortran

The number of files that are getting written is always less than the number of threads. Logically for me, when I can have 4 threads and the CPU is working at 400%, I was expecting the number of files to be 4 (one each corresponding to every single thread). I don't know if there is a problem with my code or this is how it is supposed to work. The code is as follows:
!!!!!!!! module
module common
use iso_fortran_env
implicit none
integer,parameter:: dp=real64
real(dp):: aa,bb
contains
subroutine evolve(y,yevl)
implicit none
integer(dp),parameter:: id=2
real(dp),intent(in):: y(id)
real(dp),intent(out):: yevl(id)
yevl(1)=y(2)+1.d0-aa*y(1)**2
yevl(2)=bb*y(1)
end subroutine evolve
end module common
use common
implicit none
integer(dp):: iii,iter,i
integer(dp),parameter:: id=2
real(dp),allocatable:: y(:),yt(:)
integer(dp):: OMP_GET_THREAD_NUM, IXD
allocate(y(id)); allocate(yt(id)); y=0.d0; yt=0.d0; bb=0.3d0
!$OMP PARALLEL PRIVATE(iii,iter,y,i,yt) SHARED(bb)
IXD=OMP_GET_THREAD_NUM()
!$OMP DO
do iii=1,20000; print*,iii !! EXPECTED THREADS TO BE OF 5000 ITERATIONS EACH
aa=1.d0+dfloat(iii-1)*0.4d0/80000.d0
loop1: do iter=1,10 !! THE INITIAL CONDITION LOOP
call random_number(y)!! RANDOM INITIALIZATION OF THE VARIABLE
loop2: do i=1,70000 !! ITERATION OF THE SYSTEM
call evolve(y,yt)
y=yt
enddo loop2 !! END OF SYSTEM ITERATION
write(IXD+1,*)aa,yt !!! WRITING FILE CORRESPONDING TO EACH THREAD
enddo loop1 !!INITIAL CONDITION ITERATION DONE
enddo
!$OMP ENDDO
!$OMP END PARALLEL
end
Is this behavior resulting from some race issue in the code? The code compiles and executes just fine without any warnings or errors with ifort version 13.1.0 on ubuntu. Thanks a bunch for any comments or suggestions.
The variable IXD should be explicitely declared as private to make sure every thread has an own copy of it. Changing the line(s)
!$OMP PARALLEL PRIVATE(iii,iter,y,i,yt) SHARED(bb)
IXD=OMP_GET_THREAD_NUM()
to
!$OMP PARALLEL PRIVATE(iii,iter,y,i,yt,ixd) SHARED(bb)
IXD=OMP_GET_THREAD_NUM()
solves the problem.

OpenMP and shared variable in Fortran which are not shared

I encounter a problem with OpenMP and shared variables I cannot understand. Everything I do is in Fortran 90/95.
Here is my problem: I have a parallel region defined in my main program, with the clause DEFAULT(SHARED), in which I call a subroutine that does some computation. I have a local variable (an array) I allocate and on which I do the computations. I was expecting this array to be shared (because of the DEFAULT(SHARED) clause), but it seems that it is not the case.
Here is an example of what I am trying to do and that reproduce the error I get:
program main
!$ use OMP_LIB
implicit none
integer, parameter :: nx=10, ny=10
real(8), dimension(:,:), allocatable :: array
!$OMP PARALLEL DEFAULT(SHARED)
!$OMP SINGLE
allocate(array(nx,ny))
!$OMP END SINGLE
!$OMP WORKSHARE
array = 1.
!$OMP END WORKSHARE
call compute(array,nx,ny)
!$OMP SINGLE
deallocate(array)
!$OMP END SINGLE
!$OMP END PARALLEL
contains
!=============================================================================
! SUBROUTINES
!=============================================================================
subroutine compute(array, nx, ny)
!$ use OMP_LIB
implicit none
real(8), dimension(nx,ny) :: array
integer :: nx, ny
real(8), dimension(:,:), allocatable :: q
integer :: i, j
!$OMP SINGLE
allocate(q(nx,ny))
!$OMP END SINGLE
!$OMP WORKSHARE
q = 0.
!$OMP END WORKSHARE
print*, 'q before: ', q(1,1)
!$OMP DO SCHEDULE(RUNTIME)
do j = 1, ny
do i = 1, nx
if(mod(i,j).eq.0) then
q(i,j) = array(i,j)*2.
else
q(i,j) = array(i,j)*0.5
endif
end do
end do
!$OMP END DO
print*, 'q after: ', q(1,1)
!$OMP SINGLE
deallocate(q)
!$OMP END SINGLE
end subroutine compute
!=============================================================================
end program main
When I execute it like that, I get a segmentation fault, because the local array q is allocated on one thread but not on the others, and when the others try to access it in memory, it crashes.
If I get rid of the SINGLE region the local array q is allocated (though sometimes it crashes, which make sense, if different threads try to allocate it whereas it is already the case (and actually it puzzles me why it does not crash everytime)) but then it is clearly as if the array q is private (therefore one thread returns me the expected value, whereas the others return me something else).
It really puzzled me why the q array is not shared although I declared my parallel region with the clause DEFAULT(SHARED). And since I am in an orphaned subroutine, I cannot declare explicitely q as shared, since it is known only in the subroutine compute... I am stuck with this problem so far, I could not find a workaround.
Is it normal? Should I expect this behaviour? Is there a workaround? Do I miss something obvious?
Any help would be highly appreciated!
q is an entity that is "inside a region but not inside a construct" in terms of OpenMP speak. The subroutine that q is local to is in a procedure that is called during a parallel construct, but q itself does not lexically appear in between the PARALLEL and END PARALLEL directives.
The data sharing rules for such entities in OpenMP then dictate that q is private.
The data sharing clauses such as DEFAULT(SHARED), etc only apply to things that appear in the construct itself (things that lexically appear in between the PARALLEL and END PARALLEL). (They can't apply to things in the region generally - procedures called in the region may have been separately compiled and might be called outside of any parallel constructs.)
The array q is defined INSIDE the called subroutine. Every thread calls this subroutine independently and therefore every thread will have it's own copy. The shared directive in the outer subroutine cannot change this. Try to declare it with the save attribute.