How to send and receive data in a loop - fortran

I am facing a problem in sending and receiving data in a do loop. Check the code below:
include 'mpif.h'
parameter (NRA = 4)
parameter (NCA = 4)
parameter (MASTER = 0)
parameter (FROM_MASTER = 1)
parameter (FROM_WORKER = 2)
integer numtasks,taskid,numworkers,source,dest,mtype,
& cols,avecol,extra, offset,i,j,k,ierr,rc
integer status(MPI_STATUS_SIZE)
real*8 a(NRA,NCA)
call MPI_INIT( ierr )
call MPI_COMM_RANK( MPI_COMM_WORLD, taskid, ierr )
call MPI_COMM_SIZE( MPI_COMM_WORLD, numtasks, ierr )
numworkers = numtasks-1
print *, 'task ID= ',taskid
C *************************** master task *************************************
if (taskid .eq. MASTER) then
if (numworkers .NE. 2) then
print *, 'Please use 3 processors'
print *,'Quitting...'
call MPI_ABORT(MPI_COMM_WORLD,rc,ierr)
endif
C Initialize A and B
do 30 i=1, NRA
do 30 j=1, NCA
a(i,j) = (i-1)+(j-1)
30 continue
C Send matrix data to the worker tasks
avecol = NCA/numworkers
extra = mod(NCA,numworkers)
offset = 1
mtype = FROM_MASTER
do 50 dest=1, numworkers
if (dest .le. extra) then
cols = avecol + 1
else
cols = avecol
endif
write(*,*)' sending',cols,' cols to task',dest
call MPI_SEND(offset,1,MPI_INTEGER,dest,mtype,
& MPI_COMM_WORLD,ierr)
call MPI_SEND(cols,1,MPI_INTEGER,dest,mtype,
& MPI_COMM_WORLD,ierr)
call MPI_SEND(a(1,offset),cols*NRA,MPI_DOUBLE_PRECISION,
& dest,mtype,MPI_COMM_WORLD,ierr )
offset = offset + cols
50 continue
C Receive results from worker tasks
mtype = FROM_WORKER
do 60 i=1, numworkers
source = i
call MPI_RECV(offset,1,MPI_INTEGER,source,
& mtype,MPI_COMM_WORLD,status,ierr )
call MPI_RECV(cols,1,MPI_INTEGER,source,
& mtype,MPI_COMM_WORLD,status,ierr )
call MPI_RECV(a(1,offset),cols*NRA,MPI_DOUBLE_PRECISION,
& source,mtype,MPI_COMM_WORLD,status,ierr)
60 continue
C Print results
do 90 i=1, NRA
do 80 j = 1, NCA
write(*,70)a(i,j)
70 format(2x,f8.2,$)
80 continue
print *, ' '
90 continue
endif
C *************************** worker task *************************************
if (taskid > MASTER) then
C Receive matrix data from master task
mtype = FROM_MASTER
call MPI_RECV(offset,1,MPI_INTEGER,MASTER,
& mtype,MPI_COMM_WORLD,status,ierr)
call MPI_RECV( cols,1,MPI_INTEGER,MASTER,
& mtype,MPI_COMM_WORLD,status,ierr)
call MPI_RECV(a(1,offset),cols*NCA,MPI_DOUBLE_PRECISION,MASTER,
& mtype,MPI_COMM_WORLD,status,ierr)
start0 = offset
end0 = offset+cols-1
C Do matrix multiply
do t=1,5
do i=1, NRA
do j=start0,end0
a(i,j) = a(i,j)*t
enddo
enddo
C Send results back to master task
mtype = FROM_WORKER
call MPI_SEND(offset,1,MPI_INTEGER,MASTER,mtype,
& MPI_COMM_WORLD,ierr)
call MPI_SEND(cols,1,MPI_INTEGER,MASTER,mtype,
& MPI_COMM_WORLD,ierr)
call MPI_SEND(a(1,offset),cols*NCA,MPI_DOUBLE_PRECISION,MASTER,
& mtype,MPI_COMM_WORLD,ierr)
enddo
endif
call MPI_FINALIZE(ierr)
end
I want to print matrix a, every time on the screen which is inside the do loop. When I execute the code, it gets printed for only once, i.e. for the first time of the do loop (t=1). How to modify this code, so that I can get the matrix a printed every time on the screen once it gets calculated.

I got it. I have to put a loop at the master while receiving the data from slave. The modified code.
include 'mpif.h'
parameter (NRA = 4)
parameter (NCA = 4)
parameter (MASTER = 0)
parameter (FROM_MASTER = 1)
parameter (FROM_WORKER = 2)
integer numtasks,taskid,numworkers,source,dest,mtype,
& cols,avecol,extra, offset,i,j,k,ierr,rc
integer status(MPI_STATUS_SIZE)
real*8 a(NRA,NCA)
call MPI_INIT( ierr )
call MPI_COMM_RANK( MPI_COMM_WORLD, taskid, ierr )
call MPI_COMM_SIZE( MPI_COMM_WORLD, numtasks, ierr )
numworkers = numtasks-1
print *, 'task ID= ',taskid
C *************************** master task *************************************
if (taskid .eq. MASTER) then
if (numworkers .NE. 2) then
print *, 'Please use 3 processors'
print *,'Quitting...'
call MPI_ABORT(MPI_COMM_WORLD,rc,ierr)
endif
C Initialize A and B
do 30 i=1, NRA
do 30 j=1, NCA
a(i,j) = (i-1)+(j-1)
30 continue
C Send matrix data to the worker tasks
avecol = NCA/numworkers
extra = mod(NCA,numworkers)
offset = 1
mtype = FROM_MASTER
do 50 dest=1, numworkers
if (dest .le. extra) then
cols = avecol + 1
else
cols = avecol
endif
write(*,*)' sending',cols,' cols to task',dest
call MPI_SEND(offset,1,MPI_INTEGER,dest,mtype,
& MPI_COMM_WORLD,ierr)
call MPI_SEND(cols,1,MPI_INTEGER,dest,mtype,
& MPI_COMM_WORLD,ierr)
call MPI_SEND(a(1,offset),cols*NRA,MPI_DOUBLE_PRECISION,
& dest,mtype,MPI_COMM_WORLD,ierr )
offset = offset + cols
50 continue
C Receive results from worker tasks
do t = 1,5
mtype = FROM_WORKER
do 60 i=1, numworkers
source = i
call MPI_RECV(offset,1,MPI_INTEGER,source,
& mtype,MPI_COMM_WORLD,status,ierr )
call MPI_RECV(cols,1,MPI_INTEGER,source,
& mtype,MPI_COMM_WORLD,status,ierr )
call MPI_RECV(a(1,offset),cols*NRA,MPI_DOUBLE_PRECISION,
& source,mtype,MPI_COMM_WORLD,status,ierr)
60 continue
C Print results
do 90 i=1, NRA
do 80 j = 1, NCA
write(*,70)a(i,j)
70 format(2x,f8.2,$)
80 continue
print *, ' '
90 continue
end do
endif
C *************************** worker task *************************************
if (taskid > MASTER) then
C Receive matrix data from master task
mtype = FROM_MASTER
call MPI_RECV(offset,1,MPI_INTEGER,MASTER,
& mtype,MPI_COMM_WORLD,status,ierr)
call MPI_RECV( cols,1,MPI_INTEGER,MASTER,
& mtype,MPI_COMM_WORLD,status,ierr)
call MPI_RECV(a(1,offset),cols*NCA,MPI_DOUBLE_PRECISION,MASTER,
& mtype,MPI_COMM_WORLD,status,ierr)
start0 = offset
end0 = offset+cols-1
C Do matrix multiply
do t=1,5
do i=1, NRA
do j=start0,end0
a(i,j) = a(i,j)*t
enddo
enddo
C Send results back to master task
mtype = FROM_WORKER
call MPI_SEND(offset,1,MPI_INTEGER,MASTER,mtype,
& MPI_COMM_WORLD,ierr)
call MPI_SEND(cols,1,MPI_INTEGER,MASTER,mtype,
& MPI_COMM_WORLD,ierr)
call MPI_SEND(a(1,offset),cols*NCA,MPI_DOUBLE_PRECISION,MASTER,
& mtype,MPI_COMM_WORLD,ierr)
enddo
endif
call MPI_FINALIZE(ierr)
end

Related

More processors requested than permitted

I parallelized three nested-loops with MPI. When I ran the code, an error popped up, saying 'srun: error: Unable to create step for job 20258899: More processors requested than permitted'
Here is the script that I used to submit job.
#!/bin/bash
#SBATCH --partition=workq
#SBATCH --job-name="code"
#SBATCH --nodes=2
#SBATCH --time=1:00:00
#SBATCH --exclusive
#SBATCH --err=std.err
#SBATCH --output=std.out
#---#
module switch PrgEnv-cray PrgEnv-intel
export OMP_NUM_THREADS=1
#---#
echo "The job "${SLURM_JOB_ID}" is running on "${SLURM_JOB_NODELIST}
#---#
srun --ntasks=1000 --cpus-per-task=${OMP_NUM_THREADS} --hint=nomultithread ./example_parallel
I paste my code below. Would anyone please tell me what problem is with my code? Is the MPI that I used wrong or not? Thank you very much.
PROGRAM THREEDIMENSION
USE MPI
IMPLICIT NONE
INTEGER, PARAMETER :: dp = SELECTED_REAL_KIND(p=15,r=14)
INTEGER :: i, j, k, le(3)
REAL (KIND=dp), ALLOCATABLE :: kp(:,:,:,:), kpt(:,:), col1(:), col2(:)
REAL (KIND=dp) :: su, co, tot
INTEGER :: world_size, world_rank, ierr
INTEGER :: world_comm_1st, world_comm_2nd, world_comm_3rd
INTEGER :: th3_dimension_size, th3_dimension_size_max, th3_dimension_rank
INTEGER :: th2_dimension_size, th2_dimension_size_max, th2_dimension_rank
INTEGER :: th1_dimension_size, th1_dimension_size_max, th1_dimension_rank
INTEGER :: proc_1st_dimension_len, proc_2nd_dimension_len, proc_3rd_last_len, proc_i, proc_j, proc_k
REAL (KIND=dp) :: t0, t1
CALL MPI_INIT(ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, world_size, ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, world_rank, ierr)
IF (world_rank == 0) THEN
t0 = MPI_WTIME()
END IF
le(1) = 1000
le(2) = 600
le(3) = 900
ALLOCATE (kp(le(1),le(2),le(3),3))
ALLOCATE (kpt(le(3),3))
ALLOCATE (col1(le(1)))
ALLOCATE (col2(le(2)))
DO i = 1, le(1), 1
DO j = 1, le(2), 1
DO k = 1, le(3), 1
kp(i,j,k,1) = DBLE(i+j+j+1)
kp(i,j,k,2) = DBLE(i+j+k+2)
kp(i,j,k,3) = DBLE(i+j+k+3)
END DO
END DO
END DO
proc_1st_dimension_len = (world_size - 1) / le(1) + 1
proc_2nd_dimension_len = (world_size - 1 / (le(1) + le(2))) + 1
proc_3rd_last_len = MOD(world_size - 1, le(1)+le(2)) + 1
IF (world_rank <= proc_3rd_last_len*proc_2nd_dimension_len*proc_1st_dimension_len) THEN
proc_i = MOD(world_rank,proc_1st_dimension_len)
proc_j = world_rank / proc_1st_dimension_len
proc_k = world_rank / (proc_1st_dimension_len*proc_2nd_dimension_len)
ELSE
proc_i = MOD(world_rank-proc_3rd_last_len,proc_1st_dimension_len-1)
proc_j = (world_rank-proc_3rd_last_len) / proc_1st_dimension_len-1
proc_k = (world_rank-proc_3rd_last_len) / (proc_2nd_dimension_len*proc_2nd_dimension_len-1)
END IF
CALL MPI_BARRIER(MPI_COMM_WORLD,ierr)
CALL MPI_COMM_SPLIT(MPI_COMM_WORLD,proc_i,world_rank,world_comm_1st,ierr)
CALL MPI_COMM_SIZE(world_comm_1st,th1_dimension_size,ierr)
CALL MPI_COMM_RANK(world_comm_1st,th1_dimension_rank,ierr)
CALL MPI_COMM_SPLIT(MPI_COMM_WORLD,proc_j,world_rank,world_comm_2nd,ierr)
CALL MPI_COMM_SIZE(world_comm_2nd,th2_dimension_size,ierr)
CALL MPI_COMM_RANK(world_comm_2nd,th2_dimension_rank,ierr)
CALL MPI_COMM_SPLIT(MPI_COMM_WORLD,proc_k,world_rank,world_comm_3rd,ierr)
CALL MPI_COMM_SIZE(world_comm_3rd,th3_dimension_size,ierr)
CALL MPI_COMM_RANK(world_comm_3rd,th3_dimension_rank,ierr)
CALL MPI_BARRIER(MPI_COMM_WORLD,ierr)
CALL MPI_ALLREDUCE(th1_dimension_size,th1_dimension_size_max,1,MPI_INT,MPI_MAX,MPI_COMM_WORLD,ierr)
CALL MPI_ALLREDUCE(th2_dimension_size,th2_dimension_size_max,1,MPI_INT,MPI_MAX,MPI_COMM_WORLD,ierr)
IF (world_rank == 0) THEN
OPEN (UNIT=3, FILE='out.dat', STATUS='UNKNOWN')
END IF
col1 = 0.0
DO i = 1, le(1), 1
IF (MOD(i-1,th1_dimension_size_max) /= th1_dimension_rank) CYCLE
col2 = 0.0
DO j = 1, le(2), 1
IF (MOD(j-1,th2_dimension_size_max) /= th2_dimension_rank) CYCLE
kpt = kp(i,j,:,:)
su = 0.0
DO k = 1, le(3), 1
IF(MOD(k-1,th1_dimension_size*th2_dimension_size) /= th3_dimension_rank) CYCLE
CALL CAL(kpt(k,3),co)
su = su + co
END DO
CALL MPI_BARRIER(world_comm_3rd,ierr)
CALL MPI_REDUCE(su,col2(j),1,MPI_DOUBLE,MPI_SUM,0,world_comm_3rd,ierr)
END DO
CALL MPI_BARRIER(world_comm_2nd,ierr)
CALL MPI_REDUCE(col2,col1(i),le(2),MPI_DOUBLE,MPI_SUM,0,world_comm_2nd,ierr)
END DO
CALL MPI_BARRIER(world_comm_1st,ierr)
tot = 0.0
IF (th1_dimension_rank == 0) THEN
CALL MPI_REDUCE(col1,tot,le(1),MPI_DOUBLE,MPI_SUM,0,world_comm_1st,ierr)
WRITE (UNIT=3, FMT=*) tot
CLOSE (UNIT=3)
END IF
DEALLOCATE (kp)
DEALLOCATE (kpt)
DEALLOCATE (col1)
DEALLOCATE (col2)
IF (world_rank == 0) THEN
t1 = MPI_WTIME()
WRITE (UNIT=3, FMT=*) 'Total time:', t1 - t0, 'seconds'
END IF
CALL MPI_FINALIZE (ierr)
STOP
END PROGRAM THREEDIMENSION
SUBROUTINE CAL(arr,co)
IMPLICIT NONE
INTEGER, PARAMETER :: dp=SELECTED_REAL_KIND(p=15,r=14)
INTEGER :: i
REAL (KIND=dp) :: arr(3), co
co = 0.0d0
co = co + (arr(1) ** 2 + arr(2) * 3.1d1) / (arr(3) + 5.0d-1)
RETURN
END SUBROUTINE CAL
With the #SBATCH directives in the header of the file, you request two nodes explicitly, and, as you do not specify --ntasks, you get the default of one task per node, so you implicitly request two tasks.
Then, when the job starts, your srun line tries to "use" 1000 tasks. You should have a line
#SBATCH --ntasks=1000
in the header as suggested per #Gilles. The srun command will inherit from that 1000 tasks by default so there is no need to specify it there in this case.
Also, if ${OMP_NUM_THREADS} were not 1, you would have to specify the --cpu-per-tasks in the header as a SBATCH directive otherwise you will face the same error.

Fortran code freezes when using mpi_send on an HPC but not on my laptop

I have a subroutine that is supposed to mix values in an array W % R between different processors using MPI_SEND. It works on my laptop (in the sense it doesn't crash) with both Intel and gfortran compilers. But when I run it on an HPC the program freezes the first time the subroutine is called.
SUBROUTINE mix_walkers( W )
include 'mpif.h'
TYPE(walkerList), INTENT(INOUT) :: W
INTEGER, SAVE :: calls = 0
INTEGER :: ierr, nthreads, rank, width, self, send, recv, sendFrstWlkr, sendLstWlkr, sendWlkrcount, &
recvFrstWlkr, recvlstWlkr, recvWlkrcount, status
calls = calls + 1
CALL MPI_COMM_SIZE( MPI_COMM_WORLD, nthreads, ierr )
CALL MPI_COMM_RANK ( MPI_COMM_WORLD, rank, ierr )
width = W % nwlkr / nthreads
IF( MODULO( calls, nthreads ) == 0 ) calls = calls + 1
send = MODULO( rank + calls, nthreads )
recv = MODULO( rank - calls, nthreads )
sendFrstWlkr = width * send + 1
recvFrstWlkr = width * recv + 1
sendLstWlkr = MIN( sendFrstWlkr - 1 + width, W % nwlkr )
recvlstWlkr = MIN( recvFrstWlkr - 1 + width, W % nwlkr )
sendWlkrcount = SIZE( W % R( :, :, sendFrstWlkr : sendlstWlkr ) )
recvWlkrcount = SIZE( W % R( :, :, recvFrstWlkr : recvlstWlkr ) )
IF( send == rank ) RETURN
ASSOCIATE( sendWalkers => W % R( :, :, sendFrstWlkr : sendlstWlkr ) , &
recvWalkers => W % R( :, :, recvFrstWlkr : recvLstWlkr ) )
CALL MPI_SEND( sendWalkers, sendWlkrcount, MPI_DOUBLE_PRECISION, send, calls, MPI_COMM_WORLD, ierr )
CALL MPI_RECV( recvWalkers, recvWlkrcount, MPI_DOUBLE_PRECISION, recv, calls, MPI_COMM_WORLD, status, ierr )
END ASSOCIATE
END SUBROUTINE mix_walkers
MPI_SEND is blocking. It is not guaranteed to return until the process which is being sent to posts a corresponding receive. In the code you have all the recieves may never be reached as the process may be waiting in the send. To fix this investigate MPI_ISEND/MPI_IRECV and MPI_WAIT, or MPI_SENDRECV.
For more details see section 3.4 in the MPI standard at https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf

Partition a 3D array AND use allgather

I now want to use the allgather to rebuild a 3D array. 16 cups are claimed and the data of the Y-Z plane are partitioned into 4*4 parts.
Also a new type (newtype) is created for convenience.
Are the errors related to this new type, Thanks!
!==================================================================================================================================
!****** [ Program main ] ********************************************************************************************************
!==================================================================================================================================
program main
Use mpi
implicit none
integer i, j, k, count, realsize
integer, parameter :: nx = 8, ny = 8, nz = 8
Integer :: interval
real(4), dimension(nx,ny,nz):: u_xyz
Real(4),dimension(:,:,:), allocatable :: Temp0
! === MPI Related ===
Integer, Parameter :: master = 0
Integer :: ierr, num_procs, myid, p_row, p_col, newtype, resizedtype
integer, save :: MPI_COMM_CART
integer, dimension(2) :: dims, coord
Integer, Dimension(2) :: R_coord, C_coord, MPGD
Integer, Dimension(3) :: sizes, subsizes, starts
integer,dimension(:),allocatable :: displacement
integer(kind=mpi_address_kind) :: lb, extent
logical, dimension(2) :: periodic
!--------------=======--------------
! Initialize MPI
!
call MPI_Init ( ierr )
!
! Get the number of processes.
!
call MPI_Comm_size ( MPI_COMM_WORLD, num_procs, ierr )
!
! Get the individual process ID.
!
call MPI_Comm_rank ( MPI_COMM_WORLD, myid, ierr )
!--------------=======--------------
! Y-Z PLANE SPLIT
!--------------=======--------------
p_row = 4; p_col = 4
If(p_row*p_col .NE. num_procs) Print *, 'Wrong CPU Numbers'
!--------------=======--------------
dims(1) = p_row
dims(2) = p_col
periodic(1) = .false.
periodic(2) = .false.
call MPI_CART_CREATE(MPI_COMM_WORLD,2,dims,periodic, &
.false., & ! do not reorder rank
MPI_COMM_CART, ierr)
call MPI_CART_COORDS(MPI_COMM_CART,myid,2,coord,ierr)
!--------------=======--------------
!----------YZ Plane Locations-----
!--------------=======--------------
Interval = Ceiling(dble(ny)/dble(p_row))
If (coord(1) .NE. p_row-1 ) then
R_coord(1) = 1 + (coord(1))*Interval
R_coord(2) = R_coord(1) + Interval - 1
Else
R_coord(1) = 1 + coord(1)*Interval
R_coord(2) = ny
End If
Interval = Ceiling(dble(nz)/dble(p_col))
If (coord(2) .NE. p_col-1 ) then
C_coord(1) = 1 + (coord(2))*Interval
C_coord(2) = C_coord(1) + Interval - 1
Else
C_coord(1) = 1 + (coord(2))*Interval
C_coord(2) = nz
End If
!--------------=======--------------
!----------Obtain displacement-----
!--------------=======--------------
! COUNT = 0
! DO K=1,nz
! DO J=1,ny
! DO I=1,nx
! If(i==1.and.j== R_coord(1).and.k==C_coord(1)) print *, myid, R_coord(1), C_coord(1), COUNT
! COUNT = COUNT + 1
! ENDDO
! ENDDO
! ENDDO
allocate(Temp0(nx,R_coord(1):R_coord(2),C_coord(1):C_coord(2)))!
allocate(displacement(num_procs))
Do k=C_coord(1),C_coord(2)
Do j=R_coord(1),R_coord(2)
Do i=1,nx
u_xyz(i,j,k)= i+j+k
End Do; End Do
End Do
Do i=0,num_procs-1
displacement(i)= (i/4)*(16) + mod(i,4)*128
! if(myid==0) print *, i, displacement(i)
Enddo
!--------------=======--------------
! --- Create the same block type ---
!--------------=======--------------
sizes(1) = nx
sizes(2) = ny
sizes(3) = nz
subsizes(1) = nx
subsizes(2) = R_coord(2)-R_coord(1)+1
subsizes(3) = C_coord(2)-C_coord(1)+1
starts(1) = 0 ! 0-based index
starts(2) = 0
starts(3) = 0
call MPI_TYPE_CREATE_SUBARRAY(3, sizes, subsizes, starts, &
MPI_ORDER_FORTRAN, MPI_REAL, newtype, ierr)
call MPI_Type_size(MPI_REAL, realsize, ierr)
extent = 1*realsize
lb = 0
call MPI_Type_create_resized(newtype, lb, extent, resizedtype, ierr)
call MPI_Type_commit(resizedtype, ierr)
Call MPI_Allgather(Temp0(1,R_coord(1),C_coord(1)),resizedtype, &
1, u_xyz, resizedtype, displacement, &
1, MPI_COMM_WORLD)
call MPI_TYPE_FREE(newtype,ierr)
777 Format(15e25.16e3)
Call MPI_Barrier(MPI_COMM_WORLD, ierr)
Call MPI_Finalize ( ierr )
stop
end program main
The code had some error messages as follows:
[desktop:18885] *** An error occurred in MPI_Allgather
[desktop:18885] *** reported by process [139648622723073,139646566662149]
[desktop:18885] *** on communicator MPI_COMM_SELF
[desktop:18885] *** MPI_ERR_TYPE: invalid datatype
[desktop:18885] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[desktop:18885] *** and potentially your MPI job)
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[31373,1],0]
Exit code: 3
--------------------------------------------------------------------------
[desktop:18878] 7 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[desktop:18878] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
The correct code. Thanks to the comments above. Care should be taken when defining the type, such as.
recvcounts
integer array (of length group size) containing the number of elements that are to be received from each process
displs
integer array (of length group size). Entry i specifies the displacement (relative to recvbuf ) at which to place the incoming
data from process i recvtype
!==================================================================================================================================
!****** [ Program main ] ********************************************************************************************************
!==================================================================================================================================
program main
Use mpi
implicit none
integer i, j, k,ii
integer count, realsize
integer, parameter :: nx = 8, ny = 8, nz = 8
Integer :: interval
real(4), dimension(nx*ny*nz):: u_xyz
Real(4),dimension(:,:,:), allocatable :: Temp0
! === MPI Related ===
Integer, Parameter :: master = 0
Integer :: ierr, num_procs, myid, p_row, p_col, newtype, resizedsd, resizedrv
integer, save :: MPI_COMM_CART
integer, dimension(2) :: dims, coord
Integer, Dimension(2) :: R_coord, C_coord, MPGD
Integer, Dimension(3) :: sizes, subsizes, starts
integer,dimension(:),allocatable :: displacement, recvcnt
integer(kind=mpi_address_kind) :: lb, extent
logical, dimension(2) :: periodic
!--------------=======--------------
! Initialize MPI
!
call MPI_Init ( ierr )
!
! Get the number of processes.
!
call MPI_Comm_size ( MPI_COMM_WORLD, num_procs, ierr )
!
! Get the individual process ID.
!
call MPI_Comm_rank ( MPI_COMM_WORLD, myid, ierr )
!--------------=======--------------
! Y-Z PLANE SPLIT
!--------------=======--------------
p_row = 4; p_col = 4
If(p_row*p_col .NE. num_procs) Print *, 'Wrong CPU Numbers'
!--------------=======--------------
dims(1) = p_row
dims(2) = p_col
periodic(1) = .false.
periodic(2) = .false.
call MPI_CART_CREATE(MPI_COMM_WORLD,2,dims,periodic, &
.false., & ! do not reorder rank
MPI_COMM_CART, ierr)
call MPI_CART_COORDS(MPI_COMM_CART,myid,2,coord,ierr)
!--------------=======--------------
!----------YZ Plane Locations-----
!--------------=======--------------
Interval = Ceiling(dble(ny)/dble(p_row))
If (coord(1) .NE. p_row-1 ) then
R_coord(1) = 1 + (coord(1))*Interval
R_coord(2) = R_coord(1) + Interval - 1
Else
R_coord(1) = 1 + coord(1)*Interval
R_coord(2) = ny
End If
Interval = Ceiling(dble(nz)/dble(p_col))
If (coord(2) .NE. p_col-1 ) then
C_coord(1) = 1 + (coord(2))*Interval
C_coord(2) = C_coord(1) + Interval - 1
Else
C_coord(1) = 1 + (coord(2))*Interval
C_coord(2) = nz
End If
!--------------=======--------------
!----------Obtain displacement-----
!--------------=======--------------
! COUNT = 0
! DO K=1,nz
! DO J=1,ny
! DO I=1,nx
! If(i==1.and.j== R_coord(1).and.k==C_coord(1)) print *, myid, R_coord(1), C_coord(1), COUNT
! COUNT = COUNT + 1
! ENDDO
! ENDDO
! ENDDO
allocate(Temp0(1:nx,R_coord(1):R_coord(2),C_coord(1):C_coord(2)))!
allocate(displacement(num_procs),recvcnt(num_procs))
Do k=C_coord(1),C_coord(2)
Do j=R_coord(1),R_coord(2)
Do i=1,nx
Temp0(i,j,k)= i+j*10+k*100
End Do; End Do
End Do
Do i=1,num_procs
ii = i-1
displacement(i)= (ii/4)*(16) + mod(ii,4)*128
! if(myid==0) print *, i, displacement(i)
Enddo
!--------------=======--------------
! --- Create the same block type ---
!--------------=======--------------
sizes(1) = nx
sizes(2) = ny
sizes(3) = nz
subsizes(1) = nx
subsizes(2) = R_coord(2)-R_coord(1)+1
subsizes(3) = C_coord(2)-C_coord(1)+1
starts(1) = 0 ! 0-based index
starts(2) = 0
starts(3) = 0
recvcnt(:)= 1
call MPI_TYPE_CREATE_SUBARRAY(3, sizes, subsizes, starts, &
MPI_ORDER_FORTRAN, MPI_REAL, newtype, ierr)
call MPI_Type_size(MPI_REAL, realsize, ierr)
extent = 1*realsize
lb = 0
call MPI_Type_create_resized(newtype, lb, extent, resizedrv, ierr)
call MPI_Type_commit(resizedrv, ierr)
Call MPI_AllgatherV(Temp0(1,R_coord(1),C_coord(1)), subsizes(1)*subsizes(2)*subsizes(3), MPI_REAL, &
u_xyz, recvcnt,displacement, resizedrv, MPI_COMM_WORLD, ierr)
call MPI_TYPE_FREE(resizedrv,ierr)
! If(myid.eq.10) then
! Count = 0
! do k=1,nz
! do J=1,ny
! do i=1,nx
! Count = Count + 1
! print*, u_xyz(count)- (i+j*10+k*100), i,j,k
! enddo; enddo; enddo
! end if
777 Format(15e25.16e3)
Call MPI_Barrier(MPI_COMM_WORLD, ierr)
Call MPI_Finalize ( ierr )
stop
end program main

Using OpenACC in a fortran77 project ,but has no effect and not output kernel information

I have a fortran77 reservoir simulation project ,and want to use openacc directive to accelerate implementation,the compiler is PGI visual fortran ,a subroutine as follow:
SUBROUTINE jbild(a, b, impl,
[ ia, ja, neqa, kvst, ka, ibkmax, nja, ndima, nbmxc,
[ isymm)
USE parameter_data
USE connect_data
USE contrl
IMPLICIT REAL*8(A-H,O-Z)
include 'eleme.com'
COMMON/G9/NEXG(MNOGN)
COMMON/shiftf/SFTMIN(mxcom)
COMMON/gm_nm/gamman(njamax)
common/jocab2/uf(3,ibnd+maxlay),flw(3),fsav(3,ibnd+maxlay)
[ ,fsum(3),fsums(3), fdsum(3), fdp(3), fdiag(3)
COMMON/well_1/iwell(mnel)
COMMON/well_2/pwell(mnogn),vol_w(mnogn)
COMMON/source/qm_bc(mxcom)
integer ndima, nbmxc, ibkmax, nja, impl(ibkmax),
[ ia(ibkmax+1), ja(nja), neqa(ibkmax), kvst(ibkmax+1),
[ ka( nja+1), isymm(nja)
double precision a(ndima), b(nbmxc)
double precision fdsav(mxcom),eps
parameter (eps=1.0d-300)
COMMON/scndv2/densn(maxnn,mxphs+1),accn(maxnn,mxcom),acck(mxcom),acck_All
double precision Epsilon1(mxphs), Epsilon2(mxphs)
IDiagonal_dominace = 1 !--- =1: JACOBI
do 1 i=1, ka( nja+1 ) ! ndima
a(i) = 0.0d0
1 continue
do 3 i = 1, kvst( ibkmax + 1) ! nbmxc
b(i) = 0.0d0
3 continue
EpsilonMax1 = 0.0
EpsilonMax2 = 0.0
NEpsilonMax1 = 1
NEpsilonMax2 = 1
do 1000 i=1, ibkmax
inode=i
imat=matx(i)
do iphas=1, nph
fsum(iphas) = 0.0d0
fsums(iphas) = 0.0d0
fdsum( iphas ) = 0.0d0
fdsav(iphas) = 0.0d0
do index=1, ia(i+1)-1 - ( ia(i) )
fsav(iphas, index) = 0.0d0
enddo
enddo
if(iwell(inode).eq.0) then
do iphas=1,mnph
qm_bc(iphas)=0.0d0
enddo
elseif(iwell(inode).eq.1) then
call bc_ev(INODE)
elseif(iwell(inode).eq.2) then
call bc_well(inode,ishift,ia, ja,nja)
endif
call eqnsa(INODE,IMAT,ishift)
jconet=0
do 12 index = ia(i)+1, ia(i+1)-1
id = ja( index )
jconet=jconet+1
if(dabs(gamman(index)).le.eps) goto 12
call eqnsf(INODE,id,imat,ishift,index,jconet,impl(i))
do 14 iphas = 1, mxphs
if( impl(i) .eq. 0 )then
a( ka(isymm(index)) + iphas ) = fdp( iphas )
else
fsav(iphas, jconet) = flw(iphas)
endif
14 continue
if( (impl(i) .eq. 0) .AND. (IDiagonal_dominace .eq. 1) )then !---
a( ka(isymm(index)) + 1 ) = fdp( 3 )
a( ka(isymm(index)) + 3 ) = fdp( 1 )
endif
12 continue
do 15 iphas = 1, mxphs
b( kvst(i) + iphas ) = -fsum( iphas )
if(EPSN1.GT.0.0.AND.EPSN2.GT.0.0) then !-- -----------------
Epsilon1(iphas) = abs(b( kvst(i) + iphas )/(acck_All+1.0D-20)) !---
Epsilon2(iphas) = abs(b( kvst(i) + iphas )) !---
if(EpsilonMax1(iphas).LT.Epsilon1(iphas)) then
EpsilonMax1(iphas) = Epsilon1(iphas)
NEpsilonMax1(iphas) = i
endif
if(EpsilonMax2(iphas).LT.Epsilon2(iphas)) then
EpsilonMax2(iphas) = Epsilon2(iphas)
NEpsilonMax2(iphas) = i
endif
endif !--------------------------
if( impl(i) .eq. 0) then
fdsav( iphas ) = fdiag( iphas)
else
fsums(iphas) = fsum(iphas)
fdsav(iphas) = fdiag(iphas)
endif
15 continue
if(IDiagonal_dominace.EQ.1)then !---
b( kvst(i) + 1 ) = -fsum( 3 )
b( kvst(i) + 3 ) = -fsum( 1 )
endif
do 2000 icol=1, nph
isave=1
ishift=1
call save_v(INODE,ISAVE,ICOL)
call shif(INODE,ICOL,stemp)
if( impl(i) .eq. 0)then
kupdat = 0
else
kupdat = 1
endif
if(MOP(10).NE.0) kupdat=0 !--- add by Diyuan, 2014-6-6
call eosms(inode,kupdat,b,kvst)
if(iwell(inode).eq.0) then
!$acc loop
do iphas=1,mnph
qm_bc(iphas)=0.0d0
enddo
elseif(iwell(inode).eq.1) then
call bc_ev(INODE)
elseif(iwell(inode).eq.2) then
call bc_well(inode,ishift,ia, ja,nja)
endif
call eqnsa(inode,IMAT,ishift)
if( impl(i) .eq. 0 ) go to 100
if( impl(i) .eq. 0 .and. icol .gt. 1) go to 100
jconet=0
do 150 index =ia(i)+1, ia(i+1)-1
id = ja( index )
jconet=jconet+1
if(dabs(gamman(index)).le.eps) goto 150
call save_tauf_C(idcon(index),1,ICOL)
call EOSMS_Connection(idcon(index),kupdat) !-- add by Diyuan, 2012-6-22
call eqnsf(INODE,id,imat,ishift,index,jconet,impl(i))
call save_tauf_C(idcon(index),2,ICOL)
do 101 irow=1,mxphs
if( impl(i) .eq. 1)then
a(ka(isymm(index))+(irow-1)*mxphs+icol) = +(flw(irow)-fsav(irow,jconet))/stemp
else
a(ka(isymm(index))+irow) = (flw(irow)-fsav(irow,jconet))/stemp
endif
101 continue
if(IDiagonal_dominace.EQ.1)then !---
a_temp = a(ka(isymm(index))+(1-1)*mxphs+icol)
a(ka(isymm(index))+(1-1)*mxphs+icol) = a(ka(isymm(index))+(3-1)*mxphs+icol)
a(ka(isymm(index))+(3-1)*mxphs+icol) = a_temp
endif
150 continue
100 continue
do 120 irow=1, mxphs
itemp = ka( ia(i) ) + (irow-1)* mxphs
if( impl(i) .eq. 0 ) then
a(itemp + icol) = ( fdiag(irow) - fdsav(irow) ) / stemp
if( icol .eq. 1)then
a( itemp + icol ) = a( itemp + icol ) + fdsum(irow)
endif
else
a(itemp+icol) = + (fsum(irow) - fsums(irow ))/ stemp
endif
120 continue
if(IDiagonal_dominace.EQ.1)then !---
itemp1 = ka( ia(i) ) + (1-1)* mxphs
itemp3 = ka( ia(i) ) + (3-1)* mxphs
a_temp = a(itemp1 + icol)
a(itemp1 + icol) = a(itemp3 + icol)
a(itemp3 + icol) = a_temp
endif
isave=2
call save_v(INODE,ISAVE,ICOL)
2000 continue
1000 continue
ishift=0
RETURN
END
But when I add the openacc directive ,I can’t see the output information and data replication information ,in console there isn’t also outputing kernel execution time information. I have set up the environment variables and command-line parameters to ensure that the information output. :
!$acc parallel loop
do iphas=1, nph
fsum(iphas) = 0.0d0
fsums(iphas) = 0.0d0
fdsum( iphas ) = 0.0d0
fdsav(iphas) = 0.0d0
do index=1, ia(i+1)-1 - ( ia(i) )
fsav(iphas, index) = 0.0d0
enddo
enddo
!$acc end parallel
The array store in .com file . I don’t know why the openacc has no effort ,and what impact the goto-statement have , would I like to delete goto-statement the program to modify the program for using openacc

Fortran runtime error: Bad real number

I have a code as enclosed below. It is supposed to read a binary file and produce a special format. (This code is a part of siesta code.) However, I receive the following error when I execute the code:
At line 127 of file grid2cube.f (unit = 5, file = 'stdin')
Fortran runtime error: Bad real number in item 0 of list input
The fortran compiler and flags that I have compiled the main code are:
FC= /usr/local/bin/mpif90
FFLAGS=-g -O2 FPPFLAGS= -DMPI
-DFC_HAVE_FLUSH -DFC_HAVE_ABORT LDFLAGS=
This code is also compiled with the same flag:
/usr/local/bin/mpif90 -c -g -O2 grid2cube.f
/usr/local/bin/mpif90 -o grid2cube grid2cube.o
I also change "-O2" to "-O1" and "O0" and recompiled everything. But the same error was produced.Besides I am using mpich-3.0.4 and gfortran as the base.
Please kindly help me correct this error.
program grid2cube
implicit none
integer maxp, natmax, nskip
parameter (maxp = 12000000)
parameter (natmax = 1000)
integer ipt, isp, ix, iy, iz, i, ip, natoms, np,
. mesh(3), nspin, Ind, id(3), iix, iiy,
. iiz, ii, length, lb
integer is(natmax), izat(natmax)
character sysname*70, fnamein*75, fnameout(2)*75,
. fnamexv*75, paste*74, task*5, fform*12
double precision rho(maxp,2), rhot(maxp,2)
double precision cell(3,3), xat(natmax,3), cm(3), rt(3),
. delta(3), dr(3), residual
external paste, lb
c ---------------------------------------------------------------------------
read(*,*)
read(5,*) sysname
read(5,*) task
read(5,*) rt(1),rt(2),rt(3)
read(5,*) nskip
read(5,*) fform
fnamexv = paste(sysname,'.XV')
if (task .eq. 'rho') then
fnamein = paste(sysname,'.RHO')
else if (task .eq. 'drho') then
fnamein = paste(sysname,'.DRHO')
else if (task .eq. 'ldos') then
fnamein = paste(sysname,'.LDOS')
else if (task .eq. 'vt') then
fnamein = paste(sysname,'.VT')
else if (task .eq. 'vh') then
fnamein = paste(sysname,'.VH')
else if (task .eq. 'bader') then
fnamein = paste(sysname,'.BADER')
else
write(6,*) 'Wrong task'
write(6,*) 'Accepted values: rho, drho, ldos, vh, vt, bader'
write(6,*) '(in lower case!!!!)'
stop
endif
length = lb(fnamein)
write(6,*)
write(6,*) 'Reading grid data from file ',fnamein(1:length)
c read function from the 3D grid --------------------------------------------
open( unit=1, file=fnamein, form=fform, status='old' )
if (fform .eq. 'unformatted') then
read(1) cell
else if (fform .eq. 'formatted') then
do ix=1,3
read(1,*) (cell(iy,ix),iy=1,3)
enddo
else
stop 'ERROR: last input line must be formatted or unformatted'
endif
write(6,*)
write(6,*) 'Cell vectors'
write(6,*)
write(6,*) cell(1,1),cell(2,1),cell(3,1)
write(6,*) cell(1,2),cell(2,2),cell(3,2)
write(6,*) cell(1,3),cell(2,3),cell(3,3)
residual = 0.0d0
do ix=1,3
do iy=ix+1,3
residual = residual + cell(ix,iy)**2
enddo
enddo
if (residual .gt. 1.0d-6) then
write(6,*)
write(6,*) 'ERROR: this progam can only handle orthogonal cells'
write(6,*) ' with vectors pointing in the X, Y and Z directions'
stop
endif
if (fform .eq. 'unformatted') then
read(1) mesh, nspin
else
read(1,*) mesh, nspin
endif
write(6,*)
write(6,*) 'Grid mesh: ',mesh(1),'x',mesh(2),'x',mesh(3)
write(6,*)
write(6,*) 'nspin = ',nspin
write(6,*)
do ix=1,3
dr(ix)=cell(ix,ix)/mesh(ix)
enddo
np = mesh(1) * mesh(2) * mesh(3)
if (np .gt. maxp) stop 'grid2d: Parameter MAXP too small'
C read(1) ( (rho(ip,isp), ip = 1, np), isp = 1,nspin)
do isp=1,nspin
Ind=0
if (fform .eq. 'unformatted') then
do iz=1,mesh(3)
do iy=1,mesh(2)
read(1) (rho(Ind+ix,isp),ix=1,mesh(1))
Ind=Ind+mesh(1)
enddo
enddo
else
do iz=1,mesh(3)
do iy=1,mesh(2)
read(1,'(e15.6)') (rho(Ind+ix,isp),ix=1,mesh(1))
Ind=Ind+mesh(1)
enddo
enddo
endif
enddo
C translate cell
do ix=1,3
delta(ix) = rt(ix)/dr(ix)
id(ix) = delta(ix)
delta(ix) = rt(ix) - id(ix) * dr(ix)
enddo
do iz=1,mesh(3)
do iy=1,mesh(2)
do ix=1,mesh(1)
iix=ix+id(1)
iiy=iy+id(2)
iiz=iz+id(3)
if (iix .lt. 1) iix=iix+mesh(1)
if (iiy .lt. 1) iiy=iiy+mesh(2)
if (iiz .lt. 1) iiz=iiz+mesh(3)
if (iix .gt. mesh(1)) iix=iix-mesh(1)
if (iiy .gt. mesh(2)) iiy=iiy-mesh(2)
if (iiz .gt. mesh(3)) iiz=iiz-mesh(3)
if (iix .lt. 1) stop 'ix < 0'
if (iiy .lt. 1) stop 'iy < 0'
if (iiz .lt. 1) stop 'iz < 0'
if (iix .gt. mesh(1)) stop 'ix > cell'
if (iiy .gt. mesh(2)) stop 'iy > cell'
if (iiz .gt. mesh(3)) stop 'iz > cell'
i=ix+(iy-1)*mesh(1)+(iz-1)*mesh(1)*mesh(2)
ii=iix+(iiy-1)*mesh(1)+(iiz-1)*mesh(1)*mesh(2)
do isp=1,nspin
rhot(ii,isp)=rho(i,isp)
enddo
enddo
enddo
enddo
close(1)
open( unit=3, file=fnamexv, status='old', form='formatted')
read(3,*)
read(3,*)
read(3,*)
read(3,*) natoms
do i=1,natoms
read(3,*) is(i),izat(i),(xat(i,ix),ix=1,3)
enddo
do i=1,natoms
do ix=1,3
xat(i,ix)=xat(i,ix)+rt(ix)-delta(ix)
if (xat(i,ix) .lt. 0.0) xat(i,ix)=xat(i,ix)+cell(ix,ix)
if (xat(i,ix) .gt. cell(ix,ix))
. xat(i,ix)=xat(i,ix)-cell(ix,ix)
enddo
enddo
close(3)
if (nspin .eq. 1) then
fnameout(1) = paste(fnamein,'.cube')
else if (nspin .eq. 2) then
fnameout(1) = paste(fnamein,'.UP.cube')
fnameout(2) = paste(fnamein,'.DN.cube')
else
stop 'nspin must be either 1 or 2'
endif
do isp=1,nspin
length = lb(fnameout(isp))
write(6,*) 'Writing CUBE file ',fnameout(isp)(1:length)
C open( unit=2, file=fnameout(isp), status='new', form='formatted')
open( unit=2, file=fnameout(isp), form='formatted')
length = lb(fnameout(isp))
write(2,*) fnameout(isp)(1:length)
write(2,*) fnameout(isp)(1:length)
write(2,'(i5,4f12.6)') natoms, 0.0,0.0,0.0
do ix=1,3
ii = mesh(ix)/nskip
if (ii*nskip .ne. mesh(ix)) ii = ii+1
write(2,'(i5,4f12.6)')
. ii,(cell(ix,iy)/ii,iy=1,3)
enddo
do i=1,natoms
write(2,'(i5,4f12.6)') izat(i),0.0,(xat(i,ix),ix=1,3)
enddo
do ix=1,mesh(1),nskip
do iy=1,mesh(2),nskip
write(2,'(6e13.5)')
. (rhot(ix+(iy-1)*mesh(1)+(iz-1)*mesh(1)*mesh(2),isp),
. iz=1,mesh(3),nskip)
enddo
enddo
close(2)
enddo
write(6,*)
end
CHARACTER*(*) FUNCTION PASTE( STR1, STR2 )
C CONCATENATES THE STRINGS STR1 AND STR2 REMOVING BLANKS IN BETWEEN
C Writen by Jose M. Soler
CHARACTER*(*) STR1, STR2
DO 10 L = LEN( STR1 ), 1, -1
IF (STR1(L:L) .NE. ' ') GOTO 20
10 CONTINUE
20 PASTE = STR1(1:L)//STR2
END
INTEGER FUNCTION LB ( STR1 )
C RETURNS THE SIZE IF STRING STR1 WITH BLANKS REMOVED
C Writen by P. Ordejon from Soler's paste.f
CHARACTER*(*) STR1
DO 10 L = LEN( STR1 ), 1, -1
IF (STR1(L:L) .NE. ' ') GOTO 20
10 CONTINUE
20 LB = L
END
The statement at the error line is:
read(5,*) rt(1),rt(2),rt(3)
This is is a list-directed formatted read. As you indicated in the comment, you are trying to read binary (unformatted) data. That cannot work. The statement above expects formatted, data, that means text with human readable numbers.
Also the pre-connected unit 5 is standard input. It shouldn't work for unformatted data if you first read formatted from it (with read(5,*) sysname).
Side note: the number 5 for standard input is not standardized, but is quite a safe assumption in practice. But I would use * instead of 5 anyway.
Response to a comment:
The (*,*) also cannot work. Generally, whenever you provide a format, which is the second argument in the parenthesis to read or write, you do formatted i/o. It doesn't matter if the format is * or something different. You cannot read unformatted data this way. You have to open a file for the unformatted read with form=unformatted with any possible access and read it with:
read(file_unit_number) rt(1),rt(2),rt(3)
If you cannot read the numbers in the data file as a text you cannot use formatted read.