Using openmp and fftw on fortran - fortran

I am currently trying to run fftw with OpenMP on Fortran but I am having some problems running any programs.
I believe I have installed/configured fftw correctly:
./configure --enable-openmp --enable-threads
and I seem to have all the correct libraries and files but i cannot get any program to run, I keep getting the error
undefined reference to 'fftw_init_threads'
The code I use is below:
program trial
use omp_lib
implicit none
include "fftw3.f"
integer :: id, nthreads, void
integer :: error
call fftw_init_threads(void)
!$omp parallel private(id)
id = omp_get_thread_num()
write (*,*) 'Hello World from thread', id
!$omp barrier
if ( id == 0 ) then
nthreads = omp_get_num_threads()
write (*,*) 'There are', nthreads, 'threads'
end if
!$omp end parallel
end program
and to run it I do
gfortran trial.f90 -I/home/files/include -L/home/files/lib -lfftw3_omp -lfftw3 -lm -fopenmp
It would be greatly appreciated, if anyone could help me.

The old FORTRAN interface seems not to support OpenMP... I suggest you take the new Fortran 2003 interface. Please note that fftw_init_threads() is a function!
You also need to include the ISO_C_binding module:
program trial
use,intrinsic :: ISO_C_binding
use omp_lib
implicit none
include "fftw3.f03"
integer :: id, nthreads, void
integer :: error
void = fftw_init_threads()
!$omp parallel private(id)
id = omp_get_thread_num()
write (*,*) 'Hello World from thread', id
!$omp barrier
if ( id == 0 ) then
nthreads = omp_get_num_threads()
write (*,*) 'There are', nthreads, 'threads'
end if
!$omp end parallel
end program

Related

Openmp nested parallelism use available threads

So, I have this simple Fortran do loop and inside that loop a couple of subroutines are called. I have
Made the do loop parallel with OpenMP, like this
!$omp parallel do
do i=1,n
call a()
call b()
enddo
!$omp end parallel do
Now most of the times the number of iterations in the loop is
less compared to the number of processor/threads available and the subroutines that are called inside the
loop can be called in parallel. So, is there a way to call the subroutines in parallel inside the parallel
do loop ? I have tried with task like this
!$omp parallel do
do i=1,n
!$omp task
call a(i , j )
!$omp end task
!$omp task
call b(i, k)
!$omp end task
!$omp taskwait
enddo
!$omp end parallel do
But this shows some error with segmentation fault. Is there any way to achieve this.
UPDATE:
So, I found out the main reason for the segmentation fault is coming from the fftw library. Lets consider a dummy program
program name
!$use omp_lib
implicit real*8(a-h,p-z)
call system_clock(count_rate=irate)
call system_clock(it1)
!$ call omp_set_nested(.true.)
!$omp parallel do
do i =1,5
call test(i)
print *, i
enddo
!$omp end parallel do
call system_clock(it2)
print *, (it2-it1)/real(irate, kind=8)
end program name
subroutine test(ii)
! just a dummy subroutine for heavy computation
implicit real*8(a-h,p-z)
do j=1,40000
!$omp task
do k=1,40000
x = exp(sqrt(sqrt(2.0d0*ii**3)**2))
enddo
!$omp end task
enddo
end subroutine
This program works exactly what I wants and using the task directives, uses the remaining threads and improves the performance. Now lets consider another dummy program but with fftw, similar to what I'm working.
program name
!$use omp_lib
implicit real*8(a-h,p-z)
integer, parameter :: n=8192*8
complex(kind=8) :: arr(n)
real(kind=8) :: tmp1(n), tmp2(n)
integer(kind=8) :: pF
integer :: i
call system_clock(count_rate=irate)
call dfftw_plan_dft_1d(pF,n,arr,arr,-1,0) ! forward
call system_clock(it1)
!$ call omp_set_nested(.true.)
!$omp parallel do private(arr)
do i =1,5
call random_number(tmp1)
call random_number(tmp2)
arr = cmplx(tmp1, tmp2, kind=8)
call test(pF, arr)
print *, i
enddo
!$omp end parallel do
call system_clock(it2)
print *, (it2-it1)/real(irate, kind=8)
end program name
subroutine test(pF, arr)
implicit real*8(a-h,p-z)
complex(kind=8) :: arr(:)
integer(kind=8) :: pF
do j=1,100
!$omp task private(arr)
do k=1, 100
call dfftw_execute_dft(pF, arr, arr)
enddo
!$omp end task
enddo
end subroutine
Now, this throws the segmentation fault. (NOTE: I have no random numer call in my actual program, they are here just for a dummy purpose). I have checked http://www.fftw.org/fftw3_doc/Thread-safety.html and fftw_execute is thread safe and the program works without the task directives. But with the task it throws error. Anyone knows how to fix this ?
Sigh, yet another example of why !$omp do parallel is a bad idea ... I really do think it is best to clearly separate the thread creation and worksharing phases.
As Vladimir says in the comments you haven't provided nearly enough detail to tell why you are getting a segmentation fault. However you seem to have a few misconceptions about OpenMP which I can try to address.
Firstly a very quick and dirty way to achieve what you want and avoiding any extra OpenMP directives is
!$omp parallel default( none ) private( i ) shared( n ) ! Create threads
!$omp do ! Now share out the work
Do i = 1, 2 * n
If( Mod( i, 2 ) == 1 ) Then
Call a
Else
Call b
End Do
!$omp end do
!$omp end parallel
However if you want to use tasks you're probably not doing it the easiest way if all calls to a and b are completely independent. In that case remember that a new task is created whenever ANY thread hits a !$omp task, and that that task can be executed by any thread, not just the one that created it. Following that logic something like
!$omp parallel default( none ) private( i ) shared( n ) ! Crate the threads
!$omp single
Do i = 1, n
!$omp task
Call a
!$omp end task
!$omp task
call b
!$omp end task
end do
!$omp end single
!$omp end parallel
is what you want - you use one thread to create the list of tasks, and then (or more probably while the list is being created) all the available threads will execute them, each task being taken by the next available thread. Note I have also missed out the taskwait directive as from your description I'm not sure why you think you need it as I can see no need for synchronisation at that point.

Openmp parallel workshare for allocatable array

I want to do some element-wise calculation on arrays in Fortran 90, while parallelize my code with openmp. I have now the following code :
program test
implicit none
integer,parameter :: n=50
integer :: i
integer(8) :: t1,t2,freq
real(8) :: seq(n),r(n,n,n,n)
real(8),dimension(n,n,n,n) :: x
call system_clock(COUNT_RATE=freq)
seq=[(i,i=1,n)]
x=spread(spread(spread(seq,2,n),3,n),4,n)
call system_clock(t1)
!$omp parallel workshare
! do some array calculation
r=atan(exp(-x))
!$omp end parallel workshare
call system_clock(t2)
print*, sum(r)
print '(f6.3)',(t2-t1)/real(freq)
end program test
I want now to replace the static arrays x and r with allocatable arrays, so I type :
real(8),dimension(:,:,:,:),allocatable :: x,r
allocate(x(n,n,n,n))
allocate(r(n,n,n,n))
but that the program run in serial without errors and the compiler doesn't take account of the line "!$omp parallel workshare".
What options should I use to parallelize in this case? I have tried with omp parallel do with loops but it is much slower.
I am compiling my code with gfortran 5.1.0 on windows :
gfortran -ffree-form test.f -o main.exe -O3 -fopenmp -fno-automatic
I have come across this issue in gfortran before. The solution is to specify the array in the following form:
!$omp parallel workshare
! do some array calculation
r(:,:,:,:) = atan(exp(-x))
!$omp end parallel workshare
Here is the reference.

Fortran arrays in hybrid MPI/OpenMP

I am facing the following issue when running a hybrid MPI/OpenMP
code with GNU and Intel compilers and OpenMPI. The code is big (commercial)
written in Fortran. It compiles and runs fine with GNU compilers
but crashes with Intel compilers.
I have monitored the part of the code when the program stops working,
it has the following structure:
subroutine test(n,dy,dy)
integer :: i
integer, parameter :: n=6
real*8 :: dx(num),dy(num), ener
ener=0.0
!$omp parallel num_threads(2)
!$omp do
do i=1,100
ener = ener + funct(n,dx,dy) + i
enddo
!$omp end do
!$omp end parallel
end subroutine test
and the function funct has this structure:
real*8 function funct(n,dx,dy)
integer :: n
real*8 :: dx(*),dy(*)
funct = 0.0
do i=1,n
funct = funct + dx(i)+dy(i)
enddo
end function funct
Specifically the code stops inside funct (with Intel). The
program is able to get the end of funct but only one thread
of the two requested is able to return the value, I checked
that by printing the thread numbers.
This issue is only for Intel compilers, for GNU I don't get
the issue.
One way to avoid the issue, I found, is by using plain arrays
inside funct as follows:
real*8 function funct(n,dx,dy)
integer :: n
real*8 :: dx(n),dy(n)
but my point is that I don't understand what is happening.
My guess is that in the Intel case, the compiler cannot
figure out the length of dx and dy inside funct but I am
not sure. I tried to reproduce this issue with a small
Fortran program but I was not able to see that issue.
Any comment is welcome.
One update: I eliminated the issue with the race condition (this is
not the real problem, what I wrote here was the structure of the code).
I realized that subroutinetest is being called from another subroutine
upper which defines dx,dy as pointers:
subroutine upper
real*8,save,pointer :: dx(:)=>Null(), dy(:)=>Null()
....
call test(n,dx,dy)
...
end subroutine upper
what I did now, was to replace pointers by allocatables:
subroutine upper
real*8,save,dimension(:),allocatable :: dx,dy
....
allocate(dx(n),dy(n))
call test(n,dx,dy)
...
end subroutine upper
and I don't get the issue with Intel. I don't know what could be the
difference between pointers and allocatables.
Thanks.

Openmp Fortran Subroutine

I am trying to parallelize a subroutine using Openmp.
The subroutine contains a successive over relaxation loop which runs on the total
error which is a shared variable. Now, when I parralelize the part where I call the
subroutine in the main program, it makes the error a private variable and then I can't make it explicitly a shared variable in the main program.
I am pasting the code for reference.
program test
!$omp parallel
call sub()
!$omp end parallel
end program test
subroutine sub()
do while(totalerror.ge.0.0001.and.sor.lt.10000)
totalerror=0.0
sor=sor+1
error=0.0
!$OMP DO REDUCTION(+:toterror) REDUCTION(MAX:error)
! shared (vorticity,strmfn,toterror,error,guess) PRIVATE (i,j,t1,t2)
do i=1,nx
do j=1,ny
guess(i,j)=0.25*((h**2.)*vorticity(i,j)+strmfn(i+1,j)+strmfn(i- 1,j)+strmfn(i,j+1)+strmfn(i,j-1))
totalerror = totalerror + error
error = max(abs(strmfn(`enter code here`i,j) - guess(i,j)),error)
strmfn(i,j)= strmfn(i,j) + omega*(guess(i,j)-strmfn(i,j))
enddo
enddo
!$OMP END DO
enddo
Any help would be appreciated.
toterror and error shouldn't be in the shared clause since they are in the reduction. If you need shared versions, copy them to different variables.

OpenMP and shared variable in Fortran which are not shared

I encounter a problem with OpenMP and shared variables I cannot understand. Everything I do is in Fortran 90/95.
Here is my problem: I have a parallel region defined in my main program, with the clause DEFAULT(SHARED), in which I call a subroutine that does some computation. I have a local variable (an array) I allocate and on which I do the computations. I was expecting this array to be shared (because of the DEFAULT(SHARED) clause), but it seems that it is not the case.
Here is an example of what I am trying to do and that reproduce the error I get:
program main
!$ use OMP_LIB
implicit none
integer, parameter :: nx=10, ny=10
real(8), dimension(:,:), allocatable :: array
!$OMP PARALLEL DEFAULT(SHARED)
!$OMP SINGLE
allocate(array(nx,ny))
!$OMP END SINGLE
!$OMP WORKSHARE
array = 1.
!$OMP END WORKSHARE
call compute(array,nx,ny)
!$OMP SINGLE
deallocate(array)
!$OMP END SINGLE
!$OMP END PARALLEL
contains
!=============================================================================
! SUBROUTINES
!=============================================================================
subroutine compute(array, nx, ny)
!$ use OMP_LIB
implicit none
real(8), dimension(nx,ny) :: array
integer :: nx, ny
real(8), dimension(:,:), allocatable :: q
integer :: i, j
!$OMP SINGLE
allocate(q(nx,ny))
!$OMP END SINGLE
!$OMP WORKSHARE
q = 0.
!$OMP END WORKSHARE
print*, 'q before: ', q(1,1)
!$OMP DO SCHEDULE(RUNTIME)
do j = 1, ny
do i = 1, nx
if(mod(i,j).eq.0) then
q(i,j) = array(i,j)*2.
else
q(i,j) = array(i,j)*0.5
endif
end do
end do
!$OMP END DO
print*, 'q after: ', q(1,1)
!$OMP SINGLE
deallocate(q)
!$OMP END SINGLE
end subroutine compute
!=============================================================================
end program main
When I execute it like that, I get a segmentation fault, because the local array q is allocated on one thread but not on the others, and when the others try to access it in memory, it crashes.
If I get rid of the SINGLE region the local array q is allocated (though sometimes it crashes, which make sense, if different threads try to allocate it whereas it is already the case (and actually it puzzles me why it does not crash everytime)) but then it is clearly as if the array q is private (therefore one thread returns me the expected value, whereas the others return me something else).
It really puzzled me why the q array is not shared although I declared my parallel region with the clause DEFAULT(SHARED). And since I am in an orphaned subroutine, I cannot declare explicitely q as shared, since it is known only in the subroutine compute... I am stuck with this problem so far, I could not find a workaround.
Is it normal? Should I expect this behaviour? Is there a workaround? Do I miss something obvious?
Any help would be highly appreciated!
q is an entity that is "inside a region but not inside a construct" in terms of OpenMP speak. The subroutine that q is local to is in a procedure that is called during a parallel construct, but q itself does not lexically appear in between the PARALLEL and END PARALLEL directives.
The data sharing rules for such entities in OpenMP then dictate that q is private.
The data sharing clauses such as DEFAULT(SHARED), etc only apply to things that appear in the construct itself (things that lexically appear in between the PARALLEL and END PARALLEL). (They can't apply to things in the region generally - procedures called in the region may have been separately compiled and might be called outside of any parallel constructs.)
The array q is defined INSIDE the called subroutine. Every thread calls this subroutine independently and therefore every thread will have it's own copy. The shared directive in the outer subroutine cannot change this. Try to declare it with the save attribute.