OpenMP parallel function calls - fortran

I have some Fortran code which has been written for scientific purposes. There are two conditional function calls which are tested and run sequentially. The calls to these functions I would like to parallelise as they do not relate to each other. However, I am unsure about the best way to do it.
A sample of the code I have is as follows:
...
if ( flag1 ) then
stat1 = func1( tm, dt, struct % a, struct % b )
if ( stat1 .gt. 0 ) then
call die_error("error message")
status = 1
return
end if
end if
if ( flag2 ) then
stat2 = func2( tm, dt, struct % a, struct % c )
if ( stat2 .gt. 0 ) then
call die_error("error message 2")
status = 1
return
end if
end if
...
The flags flag1 and flag2 are user defined, one of them may be true or both may be true or both may be false which is why they are tested independently.
The function arguments tm and dt are integer and double precision variables respectively but they are not altered by func1 or func2.
The custom data type struct contains three double precision arrays: a, b and c. The array struct % a is not altered by func1 or func2. However, struct % b is altered by func1 and struct % c is altered by func2.
In the particular case when flag1 and flag2 are both true, then func1 and func2 may be called in parallel. However, I am unsure of the following:
how to correctly deal with the custom data type; do I need locks for example?
how to correctly deal with struct % a as it isn't altered by the code; firstprivate(struct % a) for example?
how to correctly implement the conditions as I only wish to parallelise if flag1 and flag2 are both true?
My attempt is as follows:
...
!$omp parallel num_threads(2) firstprivate(tm, dt, struct % a) if(flag1 .and. flag2)
!$omp master
!$omp task lastprivate(stat1)
stat1 = func1( tm, dt, struct % a, struct % b )
!$omp end task
!$omp task lastprivate(stat2)
stat2 = func2( tm, dt, struct % a, struct % c )
!$omp end task
!$omp taskwait
if ( (stat1 .gt. 0) .or. (stat2 .gt. 0) ) then
call die_error("error message")
status = 1
return
end if
!$omp end master
!$omp end parallel
if ( flag1 .and. .not.flag2 ) then
stat1 = func1( tm, dt, struct % a, struct % b )
if ( stat1 .gt. 0 ) then
call die_error("error message")
status = 1
return
end if
end if
if ( flag2 .and. .not.flag1 ) then
stat2 = func2( tm, dt, struct % a, struct % c )
if ( stat2 .gt. 0 ) then
call die_error("error message 2")
status = 1
return
end if
end if
...
A few further questions I have from above are:
Is the !$omp master directive necessary?
Should !$omp sections directive be used instead of !$omp task?
Is my return from the !$omp parallel directive safe if either func1 or func2 fail?

Related

Fortran code freezes when using mpi_send on an HPC but not on my laptop

I have a subroutine that is supposed to mix values in an array W % R between different processors using MPI_SEND. It works on my laptop (in the sense it doesn't crash) with both Intel and gfortran compilers. But when I run it on an HPC the program freezes the first time the subroutine is called.
SUBROUTINE mix_walkers( W )
include 'mpif.h'
TYPE(walkerList), INTENT(INOUT) :: W
INTEGER, SAVE :: calls = 0
INTEGER :: ierr, nthreads, rank, width, self, send, recv, sendFrstWlkr, sendLstWlkr, sendWlkrcount, &
recvFrstWlkr, recvlstWlkr, recvWlkrcount, status
calls = calls + 1
CALL MPI_COMM_SIZE( MPI_COMM_WORLD, nthreads, ierr )
CALL MPI_COMM_RANK ( MPI_COMM_WORLD, rank, ierr )
width = W % nwlkr / nthreads
IF( MODULO( calls, nthreads ) == 0 ) calls = calls + 1
send = MODULO( rank + calls, nthreads )
recv = MODULO( rank - calls, nthreads )
sendFrstWlkr = width * send + 1
recvFrstWlkr = width * recv + 1
sendLstWlkr = MIN( sendFrstWlkr - 1 + width, W % nwlkr )
recvlstWlkr = MIN( recvFrstWlkr - 1 + width, W % nwlkr )
sendWlkrcount = SIZE( W % R( :, :, sendFrstWlkr : sendlstWlkr ) )
recvWlkrcount = SIZE( W % R( :, :, recvFrstWlkr : recvlstWlkr ) )
IF( send == rank ) RETURN
ASSOCIATE( sendWalkers => W % R( :, :, sendFrstWlkr : sendlstWlkr ) , &
recvWalkers => W % R( :, :, recvFrstWlkr : recvLstWlkr ) )
CALL MPI_SEND( sendWalkers, sendWlkrcount, MPI_DOUBLE_PRECISION, send, calls, MPI_COMM_WORLD, ierr )
CALL MPI_RECV( recvWalkers, recvWlkrcount, MPI_DOUBLE_PRECISION, recv, calls, MPI_COMM_WORLD, status, ierr )
END ASSOCIATE
END SUBROUTINE mix_walkers
MPI_SEND is blocking. It is not guaranteed to return until the process which is being sent to posts a corresponding receive. In the code you have all the recieves may never be reached as the process may be waiting in the send. To fix this investigate MPI_ISEND/MPI_IRECV and MPI_WAIT, or MPI_SENDRECV.
For more details see section 3.4 in the MPI standard at https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf

cgeev sovle Non Hermitain matrix is incorrect

Recently I want to reproduce the Fig.1(a) of Edge States and Topological Invariants of Non-Hermitian Systems.I used cgeev to solve eigenvalue of non-Hermitian Hamiltonian matrices,I found the solution become wired.
Here is my Fortran code,the result to Fig1.(a) correspond the abs.dat.
module pub
implicit none
complex,parameter::im = (0.0,1.0)
real,parameter::pi = 3.1415926535
integer xn,N,en,kn
parameter(xn = 100,N = xn*2,en = 100)
complex Ham(N,N)
real t1,t2,t3,gam
!-----------------
integer::lda = N
integer,parameter::lwmax=2*N + N**2
complex,allocatable::w(:) ! store eigenvalues
complex,allocatable::work(:)
real,allocatable::rwork(:)
integer lwork
integer info
integer LDVL, LDVR
parameter(LDVL = N, LDVR = N )
complex VL( LDVL, N ), VR( LDVR, N )
end module pub
!=====================================================
program sol
use pub
! Physics memory allocate
allocate(w(N))
allocate(work(lwmax))
allocate(rwork(2*N))
!-----------------
t2 = 1.0
t3 = 0.0
gam = 3.0/4.0
call band()
end program sol
!======================================================
subroutine band()
use pub
integer m1,i
open(11,file="real.dat")
open(12,file="imag.dat")
open(13,file="abs.dat")
do m1 = -en,en
t1 = 3.0*m1/en
call matset()
call eigsol()
write(11,999)t1,(real(w(i)),i = 1,N)
write(12,999)t1,(aimag(w(i)),i = 1,N)
write(13,999)t1,(abs(w(i)),i = 1,N)
end do
close(11)
close(12)
close(13)
999 format(201f11.6)
end subroutine band
!======================================================
subroutine matset()
use pub
real kx
complex sx(2,2),sy(2,2),sz(2,2)
integer k,m1,m2
sx(1,2) = 1.0
sx(2,1) = 1.0
sy(1,2) = -im
sy(2,1) = im
sz(1,1) = 1.0
sz(2,2) = -1.0
!--------
Ham = 0.0
do k = 0,xn-1
if(k == 0)then
do m1 = 1,2
do m2 = 1,2
ham(m1,m2) = t1*sx(m1,m2) + im*gam/2.0*sy(m1,m2)
ham(m1,m2 + 2) = (t2 + t3)/2.0*sx(m1,m2) - im*(t2 - t3)/2.0*sy(m1,m2)
end do
end do
elseif(k == xn-1)then
do m1 = 1,2
do m2 = 1,2
ham(k*2 + m1,k*2 + m2) = t1*sx(m1,m2) + im*gam/2.0*sy(m1,m2)
ham(k*2 + m1,k*2 + m2 - 2) = (t2 + t3)/2.0*sx(m1,m2) + im*(t2 - t3)/2.0*sy(m1,m2)
end do
end do
else
do m1 = 1,2
do m2 = 1,2
ham(k*2 + m1,k*2 + m2) = t1*sx(m1,m2) + im*gam/2.0*sy(m1,m2)
! right hopping
ham(k*2 + m1,k*2 + m2 + 2) = (t2 + t3)/2.0*sx(m1,m2) - im*(t2 - t3)/2.0*sy(m1,m2)
! left hopping
ham(k*2 + m1,k*2 + m2 - 2) = (t2 + t3)/2.0*sx(m1,m2) + im*(t2 - t3)/2.0*sy(m1,m2)
end do
end do
end if
end do
return
end subroutine matset
!==============================================================================
subroutine eigsol()
use pub
! Query the optimal workspace.
LWORK = -1
CALL cgeev( 'Vectors', 'Vectors', N, Ham, LDA, W, VL, LDVL, VR, LDVR, WORK, LWORK, RWORK, INFO)
LWORK = MIN( LWMAX, INT( WORK( 1 ) ) )
! Solve eigenproblem.
CALL cgeev( 'Vectors', 'Vectors', N, Ham, LDA, W, VL, LDVL,VR, LDVR, WORK, LWORK, RWORK, INFO)
! Check for convergence.
IF( INFO.GT.0 ) THEN
WRITE(*,*)'The algorithm failed to compute eigenvalues.'
STOP
END IF
! open(120,file="eigval.dat")
! do m = 1,N
! write(120,*)m,w(m)
! end do
! close(120)
return
end subroutine eigsol
If I used wrong function from Lapack or my code isn't correct.
I use intel fortran,complie command is
*ifort -mkl file.f90 -o a.out
Run program ./a.out&*

read multiple netcdf files in fortran using openmp

I have two netcdf files - ocean_rst_01.nc and ocean_rst_02.nc - which I wish to read in parallel using openmp in Fortran. The code I am using is the following
implicit none
....
....
!$OMP THREADPRIVATE(ncid,varid)
write ( *, '(a,i8)' ) &
' The number of processors available = ', omp_get_num_procs ( )
write ( *, '(a,i8)' ) &
' The number of threads available = ', omp_get_max_threads ( )
filen = "ocean_rst_0x"
!$OMP PARALLEL DO PRIVATE(j,filename,varname,buf)
do i = 1, nbv
j = OMP_GET_THREAD_NUM()
print*,"i=",i, "j = ",j! OMP_GET_THREAD_NUM()
if (i<10) write(num,"(a11,i1)") filen,i
if (i>=10) write(num,"(a10,i2)") filen,i
filename(i)=trim(num)//".nc"
write(6,*) "Reading file = ",trim(filename(i))," by ", j
varname="CHLA"
call check( NF90_OPEN(trim(filename(i)),NF90_NOWRITE,ncid) )
call check( NF90_INQ_VARID(ncid,trim(varname),varid) )
start = (/1, 1, 1/)
count = (/nlon, nlat, nlev/)
call check( NF90_GET_VAR(ncid,varid,buf,start=start, &
count = count) )
call check( NF90_CLOSE(ncid))
call check( NF90_OPEN(trim(filename(i)),NF90_NOWRITE,ncid) )
call check( NF90_INQ_VARID(ncid,"zeta",varid) )
start_1 = (/1, 1/)
count_1 = (/nlon, nlat/)
call check( NF90_GET_VAR(ncid,varid,buf2d,start=start_1, &
count = count_1) )
call check( NF90_CLOSE(ncid))
var(:,:,:,i) = buf(:,:,:)
var2d(:,:,i) = buf2d(:,:)
write(6,*) "var = ",var(132,231,39,i), "read by thread = ",j
write(6,*) "zeta = ",var2d(132,231,i), "read by thread = ",j
enddo
!$OMP END PARALLEL DO
end
I am compiling the code in a CRAY machine using ftn and netcdf4 libraries.
I am getting random outputs. Sometimes I get the right output. Sometimes the output of var2d and/or var is same for both the threads. And if I declare buf2d as private, I get a segmentation fault (core dumped) error. And Sometimes I get the followong error
a.out: posixio.c:442: px_rel: Assertion `pxp->bf_offset <= offset && offset < pxp->bf_offset + (off_t) pxp->bf_extent' failed.
Aborted (core dumped)
The above algorithm works fine if I read multiple ascii files. What is the right way to read multiple netcdf files in fortran 90 using openmp ?
Unfortunately netcdf is not thread-safe currently, see for instance
https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg13578.html
though people do seem to be thinking about making it so, for instance
https://www.unidata.ucar.edu/blogs/developer/entry/implementing-thread-safe-access-to
Thus what you want to do is unlikely to work, and if it does it may not work reliably, as the actions of one thread will "interfere" with those of another.

Compare two lines in Fortran

I have a data file with 2 columns. Let's say:
column 1 (8,8,8,6,9), reading it as a.
column 2 (3,4,5,6,7), reading it as b.
I want to write a code checking if a(i)=a(i+1) then b=0.
So result should be column 1 as a: (8,8,8,6,7), column 2 as b should be (0,0,0,6,7).
I tried this but failed:
program read2cols
implicit none
integer ::ios,i,j
real a,b
open(file='8081.txt', unit=22, status='old', action='read')
do
read(22,*,iostat=ios) a(i),b(j)
if(a(i)<a(i))b=0
if(ios/=0) exit
print*,a,b
enddo
close(22)
end program read2cols
Your program can be something like this:
program read2cols
implicit none
integer :: ios, i, j
real :: a(5), b(5)
open(file='8081.txt', unit=22, status='old', action='read')
read(22, *, iostat = ios) a(1), b(1)
do i = 2,5
read(22, *, iostat = ios) a(i), b(i)
if (ios /= 0) exit
if (a(i-1) == a(i)) b(i-1) = 0
end do
print *, a, b
close(22)
end program read2cols
Output:
8.00000000 8.00000000 8.00000000 6.00000000 9.00000000
0.00000000 0.00000000 5.00000000 6.00000000 7.00000000
Notes:
You declare a and b as scalars, then index through them using i, fix this by declaring a(5), b(5) as arrays. The loop index is missing in do .., it should read do i = ... Finally, the condition should be if (a(i-1) == a(i)) b(i-1) = 0 because you can compare a value only after it is read.

How to send and receive data in a loop

I am facing a problem in sending and receiving data in a do loop. Check the code below:
include 'mpif.h'
parameter (NRA = 4)
parameter (NCA = 4)
parameter (MASTER = 0)
parameter (FROM_MASTER = 1)
parameter (FROM_WORKER = 2)
integer numtasks,taskid,numworkers,source,dest,mtype,
& cols,avecol,extra, offset,i,j,k,ierr,rc
integer status(MPI_STATUS_SIZE)
real*8 a(NRA,NCA)
call MPI_INIT( ierr )
call MPI_COMM_RANK( MPI_COMM_WORLD, taskid, ierr )
call MPI_COMM_SIZE( MPI_COMM_WORLD, numtasks, ierr )
numworkers = numtasks-1
print *, 'task ID= ',taskid
C *************************** master task *************************************
if (taskid .eq. MASTER) then
if (numworkers .NE. 2) then
print *, 'Please use 3 processors'
print *,'Quitting...'
call MPI_ABORT(MPI_COMM_WORLD,rc,ierr)
endif
C Initialize A and B
do 30 i=1, NRA
do 30 j=1, NCA
a(i,j) = (i-1)+(j-1)
30 continue
C Send matrix data to the worker tasks
avecol = NCA/numworkers
extra = mod(NCA,numworkers)
offset = 1
mtype = FROM_MASTER
do 50 dest=1, numworkers
if (dest .le. extra) then
cols = avecol + 1
else
cols = avecol
endif
write(*,*)' sending',cols,' cols to task',dest
call MPI_SEND(offset,1,MPI_INTEGER,dest,mtype,
& MPI_COMM_WORLD,ierr)
call MPI_SEND(cols,1,MPI_INTEGER,dest,mtype,
& MPI_COMM_WORLD,ierr)
call MPI_SEND(a(1,offset),cols*NRA,MPI_DOUBLE_PRECISION,
& dest,mtype,MPI_COMM_WORLD,ierr )
offset = offset + cols
50 continue
C Receive results from worker tasks
mtype = FROM_WORKER
do 60 i=1, numworkers
source = i
call MPI_RECV(offset,1,MPI_INTEGER,source,
& mtype,MPI_COMM_WORLD,status,ierr )
call MPI_RECV(cols,1,MPI_INTEGER,source,
& mtype,MPI_COMM_WORLD,status,ierr )
call MPI_RECV(a(1,offset),cols*NRA,MPI_DOUBLE_PRECISION,
& source,mtype,MPI_COMM_WORLD,status,ierr)
60 continue
C Print results
do 90 i=1, NRA
do 80 j = 1, NCA
write(*,70)a(i,j)
70 format(2x,f8.2,$)
80 continue
print *, ' '
90 continue
endif
C *************************** worker task *************************************
if (taskid > MASTER) then
C Receive matrix data from master task
mtype = FROM_MASTER
call MPI_RECV(offset,1,MPI_INTEGER,MASTER,
& mtype,MPI_COMM_WORLD,status,ierr)
call MPI_RECV( cols,1,MPI_INTEGER,MASTER,
& mtype,MPI_COMM_WORLD,status,ierr)
call MPI_RECV(a(1,offset),cols*NCA,MPI_DOUBLE_PRECISION,MASTER,
& mtype,MPI_COMM_WORLD,status,ierr)
start0 = offset
end0 = offset+cols-1
C Do matrix multiply
do t=1,5
do i=1, NRA
do j=start0,end0
a(i,j) = a(i,j)*t
enddo
enddo
C Send results back to master task
mtype = FROM_WORKER
call MPI_SEND(offset,1,MPI_INTEGER,MASTER,mtype,
& MPI_COMM_WORLD,ierr)
call MPI_SEND(cols,1,MPI_INTEGER,MASTER,mtype,
& MPI_COMM_WORLD,ierr)
call MPI_SEND(a(1,offset),cols*NCA,MPI_DOUBLE_PRECISION,MASTER,
& mtype,MPI_COMM_WORLD,ierr)
enddo
endif
call MPI_FINALIZE(ierr)
end
I want to print matrix a, every time on the screen which is inside the do loop. When I execute the code, it gets printed for only once, i.e. for the first time of the do loop (t=1). How to modify this code, so that I can get the matrix a printed every time on the screen once it gets calculated.
I got it. I have to put a loop at the master while receiving the data from slave. The modified code.
include 'mpif.h'
parameter (NRA = 4)
parameter (NCA = 4)
parameter (MASTER = 0)
parameter (FROM_MASTER = 1)
parameter (FROM_WORKER = 2)
integer numtasks,taskid,numworkers,source,dest,mtype,
& cols,avecol,extra, offset,i,j,k,ierr,rc
integer status(MPI_STATUS_SIZE)
real*8 a(NRA,NCA)
call MPI_INIT( ierr )
call MPI_COMM_RANK( MPI_COMM_WORLD, taskid, ierr )
call MPI_COMM_SIZE( MPI_COMM_WORLD, numtasks, ierr )
numworkers = numtasks-1
print *, 'task ID= ',taskid
C *************************** master task *************************************
if (taskid .eq. MASTER) then
if (numworkers .NE. 2) then
print *, 'Please use 3 processors'
print *,'Quitting...'
call MPI_ABORT(MPI_COMM_WORLD,rc,ierr)
endif
C Initialize A and B
do 30 i=1, NRA
do 30 j=1, NCA
a(i,j) = (i-1)+(j-1)
30 continue
C Send matrix data to the worker tasks
avecol = NCA/numworkers
extra = mod(NCA,numworkers)
offset = 1
mtype = FROM_MASTER
do 50 dest=1, numworkers
if (dest .le. extra) then
cols = avecol + 1
else
cols = avecol
endif
write(*,*)' sending',cols,' cols to task',dest
call MPI_SEND(offset,1,MPI_INTEGER,dest,mtype,
& MPI_COMM_WORLD,ierr)
call MPI_SEND(cols,1,MPI_INTEGER,dest,mtype,
& MPI_COMM_WORLD,ierr)
call MPI_SEND(a(1,offset),cols*NRA,MPI_DOUBLE_PRECISION,
& dest,mtype,MPI_COMM_WORLD,ierr )
offset = offset + cols
50 continue
C Receive results from worker tasks
do t = 1,5
mtype = FROM_WORKER
do 60 i=1, numworkers
source = i
call MPI_RECV(offset,1,MPI_INTEGER,source,
& mtype,MPI_COMM_WORLD,status,ierr )
call MPI_RECV(cols,1,MPI_INTEGER,source,
& mtype,MPI_COMM_WORLD,status,ierr )
call MPI_RECV(a(1,offset),cols*NRA,MPI_DOUBLE_PRECISION,
& source,mtype,MPI_COMM_WORLD,status,ierr)
60 continue
C Print results
do 90 i=1, NRA
do 80 j = 1, NCA
write(*,70)a(i,j)
70 format(2x,f8.2,$)
80 continue
print *, ' '
90 continue
end do
endif
C *************************** worker task *************************************
if (taskid > MASTER) then
C Receive matrix data from master task
mtype = FROM_MASTER
call MPI_RECV(offset,1,MPI_INTEGER,MASTER,
& mtype,MPI_COMM_WORLD,status,ierr)
call MPI_RECV( cols,1,MPI_INTEGER,MASTER,
& mtype,MPI_COMM_WORLD,status,ierr)
call MPI_RECV(a(1,offset),cols*NCA,MPI_DOUBLE_PRECISION,MASTER,
& mtype,MPI_COMM_WORLD,status,ierr)
start0 = offset
end0 = offset+cols-1
C Do matrix multiply
do t=1,5
do i=1, NRA
do j=start0,end0
a(i,j) = a(i,j)*t
enddo
enddo
C Send results back to master task
mtype = FROM_WORKER
call MPI_SEND(offset,1,MPI_INTEGER,MASTER,mtype,
& MPI_COMM_WORLD,ierr)
call MPI_SEND(cols,1,MPI_INTEGER,MASTER,mtype,
& MPI_COMM_WORLD,ierr)
call MPI_SEND(a(1,offset),cols*NCA,MPI_DOUBLE_PRECISION,MASTER,
& mtype,MPI_COMM_WORLD,ierr)
enddo
endif
call MPI_FINALIZE(ierr)
end