I want to do some element-wise calculation on arrays in Fortran 90, while parallelize my code with openmp. I have now the following code :
program test
implicit none
integer,parameter :: n=50
integer :: i
integer(8) :: t1,t2,freq
real(8) :: seq(n),r(n,n,n,n)
real(8),dimension(n,n,n,n) :: x
call system_clock(COUNT_RATE=freq)
seq=[(i,i=1,n)]
x=spread(spread(spread(seq,2,n),3,n),4,n)
call system_clock(t1)
!$omp parallel workshare
! do some array calculation
r=atan(exp(-x))
!$omp end parallel workshare
call system_clock(t2)
print*, sum(r)
print '(f6.3)',(t2-t1)/real(freq)
end program test
I want now to replace the static arrays x and r with allocatable arrays, so I type :
real(8),dimension(:,:,:,:),allocatable :: x,r
allocate(x(n,n,n,n))
allocate(r(n,n,n,n))
but that the program run in serial without errors and the compiler doesn't take account of the line "!$omp parallel workshare".
What options should I use to parallelize in this case? I have tried with omp parallel do with loops but it is much slower.
I am compiling my code with gfortran 5.1.0 on windows :
gfortran -ffree-form test.f -o main.exe -O3 -fopenmp -fno-automatic
I have come across this issue in gfortran before. The solution is to specify the array in the following form:
!$omp parallel workshare
! do some array calculation
r(:,:,:,:) = atan(exp(-x))
!$omp end parallel workshare
Here is the reference.
Related
This question already has answers here:
Why Segmentation fault is happening in this openmp code?
(2 answers)
Closed 2 years ago.
I have a very strange error when I enable OpenMP in my compilation options. I have pinned it down to a call to a module subroutine using a dynamic sized array from my main program subroutine. Here is a simplified example:
module arr_mod
contains
subroutine add2_mod(arr)
integer, dimension(:) :: arr
integer i, n
n = size(arr)
do i=1,n
arr(i) = arr(i)+2
enddo
end subroutine
end module
PROGRAM TEST_OMP
use arr_mod
integer, dimension(2000000) :: array
array = 0
write(*,*) array(1)
contains
subroutine add2()
! Note that this subroutine is not even called in the main program...
! When the next line is commented, the program runs.
call add2_mod(array)
end subroutine
END PROGRAM TEST_OMP
When I compile and run this program without OpenMP, it runs fine:
$ gfortran -o test_omp test_omp.f90
$ ./test_omp
0
But when I use OpenMP, the program immediately segfaults:
$ gfortran -o test_omp test_omp.f90 -fopenmp
$ ./test_omp
[1] 10291 segmentation fault ./test_omp
If I remove the program subroutine (or simply comment the add2_mod call), it works fine even with OpenMP. It still works fine even if I call the add2_mod subroutine from the main program directly.
It also works when compiling with optimizations (tested with -O3), and when setting unlimited stack with ulimit -s unlimited.
As far as I can tell, it works fine with Intel Fortran (tested on version 17, with no specific flags other than -qopenmp).
As noted in the gfortran documentation:
-fopenmp implies -frecursive, i.e., all local arrays will be allocated on the stack. When porting existing code to OpenMP, this may lead to
surprising results, especially to segmentation faults if the stacksize
is limited.
To overcome this limitation, you can force the array on the heap with the allocatable specifier:
module arr_mod
contains
subroutine add2_mod(arr)
integer, dimension(:) :: arr
integer i, n
n = size(arr)
do i=1,n
arr(i) = arr(i)+2
enddo
end subroutine
end module
PROGRAM TEST_OMP
use arr_mod
integer, dimension(:), allocatable :: array ! array is allocated on the heap
allocate(array(2000000))
array = 0
write(*,*) array(1)
contains
subroutine add2()
call add2_mod(array)
end subroutine
END PROGRAM TEST_OMP
Which will work fine with the -fopenmp flag.
I am assessing the performance of a Fortran 90 code. Running the code through Intel's Advisor program I see that loops with the following style are not getting vectorized. An example of the loop structure is shown in the Subroutine and module files described below.
The code is being compiled with Intel's Compiler 19.0.3
-O3 optimization turned on
Subroutine SampleProblem
Use GlobalVariables
Implicit None
Integer :: ND, K, LP, L
Real :: AVTMP
! Sample of loop structure that is no vectorized
DO ND=1,NDM
DO K=1,KS
DO LP=1,LLWET(K,ND)
L = LKWET(LP,K,ND)
AVTMP = AVMX*HPI(L)
ENDDO
ENDDO
ENDDO
End Subroutine SampleProblem
LLWET and LKWET are allocatable arrays declared in a module 'GlobalVariables'. Something like:
Module GlobalVariables
Implicit None
! Variable declarations
REAL :: AVMX
INTEGER :: NDM
REAL,ALLOCATABLE,DIMENSION(:) :: HPI
INTEGER,ALLOCATABLE,DIMENSION(:,:) :: LLWET
INTEGER,ALLOCATABLE,DIMENSION(:,:,:) :: LKWET
End Module GlobalVariables
I don't see why this loop would not get vectorized by the compiler. There are many loops like his all over the code and none of them get vectorized, per the reported results of Intel's Advisor. I have tried forcing vecotoriztion with a !$SIMD block around the loop.
I am facing the following issue when running a hybrid MPI/OpenMP
code with GNU and Intel compilers and OpenMPI. The code is big (commercial)
written in Fortran. It compiles and runs fine with GNU compilers
but crashes with Intel compilers.
I have monitored the part of the code when the program stops working,
it has the following structure:
subroutine test(n,dy,dy)
integer :: i
integer, parameter :: n=6
real*8 :: dx(num),dy(num), ener
ener=0.0
!$omp parallel num_threads(2)
!$omp do
do i=1,100
ener = ener + funct(n,dx,dy) + i
enddo
!$omp end do
!$omp end parallel
end subroutine test
and the function funct has this structure:
real*8 function funct(n,dx,dy)
integer :: n
real*8 :: dx(*),dy(*)
funct = 0.0
do i=1,n
funct = funct + dx(i)+dy(i)
enddo
end function funct
Specifically the code stops inside funct (with Intel). The
program is able to get the end of funct but only one thread
of the two requested is able to return the value, I checked
that by printing the thread numbers.
This issue is only for Intel compilers, for GNU I don't get
the issue.
One way to avoid the issue, I found, is by using plain arrays
inside funct as follows:
real*8 function funct(n,dx,dy)
integer :: n
real*8 :: dx(n),dy(n)
but my point is that I don't understand what is happening.
My guess is that in the Intel case, the compiler cannot
figure out the length of dx and dy inside funct but I am
not sure. I tried to reproduce this issue with a small
Fortran program but I was not able to see that issue.
Any comment is welcome.
One update: I eliminated the issue with the race condition (this is
not the real problem, what I wrote here was the structure of the code).
I realized that subroutinetest is being called from another subroutine
upper which defines dx,dy as pointers:
subroutine upper
real*8,save,pointer :: dx(:)=>Null(), dy(:)=>Null()
....
call test(n,dx,dy)
...
end subroutine upper
what I did now, was to replace pointers by allocatables:
subroutine upper
real*8,save,dimension(:),allocatable :: dx,dy
....
allocate(dx(n),dy(n))
call test(n,dx,dy)
...
end subroutine upper
and I don't get the issue with Intel. I don't know what could be the
difference between pointers and allocatables.
Thanks.
I am currently trying to run fftw with OpenMP on Fortran but I am having some problems running any programs.
I believe I have installed/configured fftw correctly:
./configure --enable-openmp --enable-threads
and I seem to have all the correct libraries and files but i cannot get any program to run, I keep getting the error
undefined reference to 'fftw_init_threads'
The code I use is below:
program trial
use omp_lib
implicit none
include "fftw3.f"
integer :: id, nthreads, void
integer :: error
call fftw_init_threads(void)
!$omp parallel private(id)
id = omp_get_thread_num()
write (*,*) 'Hello World from thread', id
!$omp barrier
if ( id == 0 ) then
nthreads = omp_get_num_threads()
write (*,*) 'There are', nthreads, 'threads'
end if
!$omp end parallel
end program
and to run it I do
gfortran trial.f90 -I/home/files/include -L/home/files/lib -lfftw3_omp -lfftw3 -lm -fopenmp
It would be greatly appreciated, if anyone could help me.
The old FORTRAN interface seems not to support OpenMP... I suggest you take the new Fortran 2003 interface. Please note that fftw_init_threads() is a function!
You also need to include the ISO_C_binding module:
program trial
use,intrinsic :: ISO_C_binding
use omp_lib
implicit none
include "fftw3.f03"
integer :: id, nthreads, void
integer :: error
void = fftw_init_threads()
!$omp parallel private(id)
id = omp_get_thread_num()
write (*,*) 'Hello World from thread', id
!$omp barrier
if ( id == 0 ) then
nthreads = omp_get_num_threads()
write (*,*) 'There are', nthreads, 'threads'
end if
!$omp end parallel
end program
I am trying to parallelize a subroutine using Openmp.
The subroutine contains a successive over relaxation loop which runs on the total
error which is a shared variable. Now, when I parralelize the part where I call the
subroutine in the main program, it makes the error a private variable and then I can't make it explicitly a shared variable in the main program.
I am pasting the code for reference.
program test
!$omp parallel
call sub()
!$omp end parallel
end program test
subroutine sub()
do while(totalerror.ge.0.0001.and.sor.lt.10000)
totalerror=0.0
sor=sor+1
error=0.0
!$OMP DO REDUCTION(+:toterror) REDUCTION(MAX:error)
! shared (vorticity,strmfn,toterror,error,guess) PRIVATE (i,j,t1,t2)
do i=1,nx
do j=1,ny
guess(i,j)=0.25*((h**2.)*vorticity(i,j)+strmfn(i+1,j)+strmfn(i- 1,j)+strmfn(i,j+1)+strmfn(i,j-1))
totalerror = totalerror + error
error = max(abs(strmfn(`enter code here`i,j) - guess(i,j)),error)
strmfn(i,j)= strmfn(i,j) + omega*(guess(i,j)-strmfn(i,j))
enddo
enddo
!$OMP END DO
enddo
Any help would be appreciated.
toterror and error shouldn't be in the shared clause since they are in the reduction. If you need shared versions, copy them to different variables.