I am trying to use MPI_TYPE_VECTOR in order to exchange information between ghosts cells of a 2D computational domain. The test case is the following:
The topology is made of two blocks in the vertical direction and one block in the other directions. The objective is to fill the ghost cells of each block with the corresponding cells of the neighbor block.
If I don't use the derived types from MPI, the communications are just fine.
PROGRAM test
USE mpi
IMPLICIT NONE
INTEGER*4, PARAMETER :: kim = 3 ! Number of cells (horizontal)
& , kjm = 2 ! Number of cells (vertical)
& , is = 1 ! Number of ghost cells
INTEGER*4, DIMENSION(1-is:kim+is,1-is:kjm+is)
& :: vect2d ! 2D vector representing the computational domain
INTEGER*4 :: i, j
! MPI Parameters
INTEGER*4, PARAMETER :: ndims = 2
INTEGER*4 :: mpicode
INTEGER*4 :: nb_procs
INTEGER*4 :: rank
INTEGER*4 :: comm
INTEGER*4 :: etiquette = 100
INTEGER*4, DIMENSION (ndims) :: dims
INTEGER*4, DIMENSION (ndims) :: coords
LOGICAL, DIMENSION (ndims) :: periods
LOGICAL :: reorganisation
INTEGER, DIMENSION(MPI_STATUS_SIZE) :: statut
INTEGER*4, DIMENSION (2*ndims) :: neighbors
INTEGER*4 :: type_column
! Initialisation MPI
CALL MPI_INIT (mpicode)
CALL MPI_COMM_SIZE (MPI_COMM_WORLD, nb_procs, mpicode)
CALL MPI_COMM_RANK (MPI_COMM_WORLD, rank, mpicode)
dims(1) = 1
dims(2) = 2
periods = .FALSE.
reorganisation = .FALSE.
CALL MPI_CART_CREATE (MPI_COMM_WORLD, ndims, dims, periods,
& reorganisation, comm, mpicode)
CALL MPI_CART_COORDS (comm, rank, ndims, coords, mpicode)
CALL MPI_CART_SHIFT (comm, 0, 1, neighbors(1), neighbors(2),
& mpicode)
CALL MPI_CART_SHIFT (comm, 1, 1, neighbors(3), neighbors(4),
& mpicode)
! Initialisation domain
DO i = 1-is, kim+is
DO j = 1-is, kjm +is
vect2d(i,j) = rank
END DO
END DO
! Communications
! Without vector type
DO i = 1, kim
CALL MPI_RECV (vect2d(i,1-is:0),
& size(vect2d(i,1-is:0)),
& MPI_INTEGER, neighbors(3), etiquette, comm,
& statut, mpicode)
CALL MPI_SEND (vect2d(i,kjm+1-is:kjm),
& size(vect2d(i,kjm+1-is:kjm)),
& MPI_INTEGER, neighbors(4), etiquette, comm,
& mpicode)
CALL MPI_RECV (vect2d(i,kjm+1:kjm+is),
& size(vect2d(i,kjm+1:kjm+is)),
& MPI_INTEGER, neighbors(4), etiquette, comm,
& statut, mpicode)
CALL MPI_SEND (vect2d(i,1:is),
& size(vect2d(i,1:is)),
& MPI_INTEGER, neighbors(3), etiquette, comm,
& mpicode)
END DO
! Write
OPEN (10,
& file = "test.dat",
& form = "formatted",
& status = "unknown")
IF (rank .EQ.1 ) WRITE(10,*) vect2D
CLOSE(10)
CALL MPI_FINALIZE(mpicode)
END PROGRAM test
In this case the first line of the upper block is [10001], so the ghost cells are filled with the value of the lower block.
With MPI_TYPE_VECTOR
PROGRAM test
USE mpi
IMPLICIT NONE
INTEGER*4, PARAMETER :: kim = 3 ! Number of cells (horizontal)
& , kjm = 2 ! Number of cells (vertical)
& , is = 1 ! Number of ghost cells
INTEGER*4, DIMENSION(1-is:kim+is,1-is:kjm+is)
& :: vect2d ! 2D vector representing the computational domain
INTEGER*4 :: i, j
! MPI Parameters
INTEGER*4, PARAMETER :: ndims = 2
INTEGER*4 :: mpicode
INTEGER*4 :: nb_procs
INTEGER*4 :: rank
INTEGER*4 :: comm
INTEGER*4 :: etiquette = 100
INTEGER*4, DIMENSION (ndims) :: dims
INTEGER*4, DIMENSION (ndims) :: coords
LOGICAL, DIMENSION (ndims) :: periods
LOGICAL :: reorganisation
INTEGER, DIMENSION(MPI_STATUS_SIZE) :: statut
INTEGER*4, DIMENSION (2*ndims) :: neighbors
INTEGER*4 :: type_column
!------------------------------------------------------------------------------
! Initialisation MPI
CALL MPI_INIT (mpicode)
CALL MPI_COMM_SIZE (MPI_COMM_WORLD, nb_procs, mpicode)
CALL MPI_COMM_RANK (MPI_COMM_WORLD, rank, mpicode)
dims(1) = 1
dims(2) = 2
periods = .FALSE.
reorganisation = .FALSE.
CALL MPI_CART_CREATE (MPI_COMM_WORLD, ndims, dims, periods,
& reorganisation, comm, mpicode)
CALL MPI_CART_COORDS (comm, rank, ndims, coords, mpicode)
CALL MPI_CART_SHIFT (comm, 0, 1, neighbors(1), neighbors(2),
& mpicode)
CALL MPI_CART_SHIFT (comm, 1, 1, neighbors(3), neighbors(4),
& mpicode)
CALL MPI_TYPE_VECTOR(is, 1, (kim+2*is),
& MPI_INTEGER, type_column, mpicode)
CALL MPI_TYPE_COMMIT(type_column, mpicode)
!------------------------------------------------------------------------------
! Initialisation domain
DO i = 1-is, kim+is
DO j = 1-is, kjm +is
vect2d(i,j) = rank
END DO
END DO
!------------------------------------------------------------------------------
! Communications
! With vector type
DO i = 1, kim
CALL MPI_RECV (vect2d(j,1-is),
& 1,
& type_column, neighbors(3), etiquette, comm,
& statut, mpicode)
CALL MPI_SEND (vect2d(i,kjm+1-is),
& 1,
& type_column, neighbors(4), etiquette, comm,
& mpicode)
CALL MPI_RECV (vect2d(i,kjm+1),
& 1,
& type_column, neighbors(4), etiquette, comm,
& statut, mpicode)
CALL MPI_SEND (vect2d(i,1),
& 1,
& type_column, neighbors(3), etiquette, comm,
& mpicode)
END DO
!------------------------------------------------------------------------------
! Write
OPEN (10,
& file = "test.dat",
& form = "formatted",
& status = "unknown")
IF (rank .EQ.1 ) WRITE(10,*) vect2D
CLOSE(10)
CALL MPI_FINALIZE(mpicode)
END PROGRAM test
In this case the output isn't the expected one, as I obtain : [11110]
I suppose that it is from the definition of my type_column, but I cannot figure it out ... so any help and explanation is welcome !
Thanks
EDIT : Here is the domain associated with the matrix in the test case
(0,3) | (1,3) | (2,3) | (3,3) | (4,3)
(0,2) | (1,2) | (2,2) | (3,2) | (4,2)
(0,1) | (1,1) | (2,1) | (3,1) | (4,1)
(0,0) | (1,0) | (2,0) | (3,0) | (4,0)
For what I understood, the memory disposition associated with this matrix is :
(0,0) | (1,0) | (2,0) | (3,0) | (4,0)|(0,1) | (1,1) | (2,1) | (3,1) | (4,1) ...
The main confusion here is that you have used the terminology "column", but the way you have drawn the arrays means that you are actually swapping rows. Let's stick with your convention for drawing the arrays and call the directions "horizontal" (increasing value of first index "i") and "vertical" (increasing value of second index "j") to avoid confusion.
If we fix your i <-> j bug as mentioned in the comments then your code works. However, the whole point about using vectors is to avoid multiple sends so, as written, your vector code doesn't improve on using basic datatypes. For the pattern shown above then, at least for a single-depth halo, you could just send contiguous data with count=kim. For deeper halos you would need a vector type with count=is, blocksize=kim and stride=kim+2*is. Having done this, you would use count=1 when sending and receiving and you transfer all the halo with a single call.
For halo swaps in the horizontal direction (i.e. swapping vertical halos) then you always need a vector type with count=kjm, blocksize=is and stride=km+2*is. Again, use count=1 when sending and receiving.
There are other issues with issuing the receives first (potential deadlock for periodic boundary conditions) which can't be entirely solved by putting the sends first (since MPI_Send could be implemented synchronously) but for your particular test code the solution lies in being consistent with terminology and using appropriate vector types.
Related
I am making a module in Fortran 90 to run PARPACK on a given matrix. I have an existing ARPACK code which functions normally as expected. I tried converting it into PARPACK and it runs into memory clear errors. I am fairly new to coding and fortran, please excuse any blunders I've made.
The code:
!ARPACK module
module parpack
implicit none
contains
subroutine parp
! use mpi
include '/usr/lib/x86_64-linux-gnu/openmpi/include/mpif.h'
integer comm, myid, nprocs, rc, nloc, status(MPI_STATUS_SIZE)
integer, parameter :: pres=8
integer nev, ncv, maxn, maxnev, maxncv
parameter (maxn=10**7, maxnev=maxn-1, maxncv=maxn)
! Arrays for SNAUPD
integer iparam(11), ipntr(14)
logical, allocatable :: select(:)
real(kind=pres), allocatable :: workd(:), workl(:), worktmp1(:), worktmp2(:)
! Scalars for SNAUPD
character bmat*1, which*2
integer ido, n, info, ierr, ldv
integer i, j, ishfts, maxitr, mode1, nconv
integer(kind=pres) lworkl
real(kind=pres) tol
! Arrays for SNEUPD
real(kind=pres), allocatable :: d(:,:), resid(:), v(:,:), workev(:), z(:,:)
! Scalars for SNEUPD
logical rvec, first
real sigmar, sigmai
!==============================================
real(kind=pres), allocatable :: mat(:,:)
open (11, file = 'matrix.dat', status = 'old')
read (11,*) n
!=============================================
! Dimension of the problem
nev = n/10
ncv = nev+2
ldv = n
bmat = 'I'
which = 'LM'
! Additional environment variables
ido = 0
tol = 0.0E+0
info = 0
lworkl = 3*ncv**2+6*ncv
! Algorithm Mode specifications:
ishfts = 1
maxitr = 300
mode1 = 1
iparam(1) = ishfts
iparam(3) = maxitr
iparam(7) = mode1
! Distribution to nodes
!=============================================
! Matrix allocation
allocate (mat(n,n))
! PDNAUPD
allocate (workd(5*n))
allocate (workl(lworkl))
allocate (resid(n))
allocate (worktmp1(n))
allocate (worktmp2(n))
! PDNEUPD
allocate (d(n,3))
allocate (v(ldv,ncv))
allocate (workev(3*n))
allocate (z(ldv,ncv))
allocate (select(ncv))
!===========================================
! Read Matrix from the provided file
mat = 0
read(11,*) mat
mat = transpose(mat)
!===========================================
! MPI Calling
call MPI_INIT(ierr)
comm = MPI_COMM_WORLD
call MPI_COMM_RANK(comm, myid, ierr)
call MPI_COMM_SIZE(comm, nprocs, ierr)
nloc = n/nprocs
! if ( mod(n, nprocs) .gt. myid ) nloc = nloc + n
!===============================================
20 continue
call pdnaupd(comm, ido, bmat, nloc, which, nev, tol, resid, ncv, v, ldv, iparam, ipntr, workd, workl, lworkl, info) !Top level solver
call MPI_BARRIER(comm,ierr)
print *, ido, info, iparam(5) !for testing
!===============================================
if (ido .eq. -1 .or. ido .eq. 1) then
worktmp1 = 0
if (myid .ne. 0) then !It is slave
call MPI_SEND(workd(ipntr(1)), nloc, MPI_DOUBLE_PRECISION, 0, 0, comm, ierr)
else !It is host
worktmp1(1:nloc) = workd(ipntr(1):ipntr(1)+nloc-1)
i = nprocs
if (i .gt. 1) then
do i=1,nprocs-1
call MPI_RECV(worktmp1(i*nloc+1), nloc, MPI_DOUBLE_PRECISION, i, 0, comm, status, ierr)
end do
endif
endif
call MPI_BARRIER(comm,ierr)
if (myid .eq. 0) then !It is host
! Matrix multiplication
worktmp2 = 0
call matmultiply(n, mat, worktmp1, worktmp2)
workd(ipntr(2):ipntr(2)+nloc-1) = worktmp2(1:nloc)
i = nprocs
if (i .gt. 1) then
do i=1,nprocs-1
call MPI_SEND(worktmp2(i*nloc+1), nloc, MPI_DOUBLE_PRECISION, i, 100*i, comm, ierr)
end do
endif
else !It is slave
call MPI_RECV(workd(ipntr(2)), nloc, MPI_DOUBLE_PRECISION, 0, 100*myid, comm, status, ierr)
endif
go to 20
! call matmultiply(n, mat, workd(ipntr(1):ipntr(1)+n-1), workd(ipntr(2):ipntr(2)+n-1))
! go to 20
endif
! print *, info !for testing
!===============================================================
! Post-processing for eigenvalues
rvec = .true.
if (myid .eq. 0) then
call pdneupd ( comm, rvec, 'A', select, d, d(1,2), z, ldv, sigmar, sigmai, &
workev, bmat, n, which, nev, tol, resid, ncv, v, ldv, iparam, ipntr, &
workd, workl, lworkl, info)
endif
! print *, info !for testing
close(11)
call MPI_FINALIZE(ierr)
return
end subroutine
!==============================================================================================
! Additional Function definitions
subroutine matmultiply(n, mat, v, w)
integer n, i, j
integer, parameter :: pres=8
real(kind = pres) mat(n,n), temp(n)
real(kind = pres) v(n), w(n)
temp = 0
do j = 1,n
do i = 1,n
temp(j) = temp(j) + mat(i,j)*v(i)
end do
end do
w = temp
return
end subroutine
end module
I apologize for the ton of redundant lines and comments, I am yet to clean it up for finalization.
When I run the code on a single thread with ./a.out, I get the following output:
Invalid MIT-MAGIC-COOKIE-1 key 1 0 1629760560
1 0 1629760560
1 0 1629760560
1 0 1629760560
.
.
. <A long chain as the code is exhausting all iterations>
.<first of the numbers is ido, which starts with 1 instead of -1 for some reason, second being
.info and third being iparam(5) which is a random number until the final iteration>
.
99 1 1
munmap_chunk(): invalid pointer
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
#0 0x7f5a863d0d01 in ???
#1 0x7f5a863cfed5 in ???
#2 0x7f5a8620420f in ???
#3 0x7f5a8620418b in ???
#4 0x7f5a861e3858 in ???
#5 0x7f5a8624e3ed in ???
#6 0x7f5a8625647b in ???
#7 0x7f5a862566cb in ???
#8 0x560f05ac1819 in ???
#9 0x560f05abd7bc in checker
at /home/srivatsank/Desktop/fortran/lap_vs_arp/ptest/ptest.f90:45
#10 0x560f05abd8d9 in main
at /home/srivatsank/Desktop/fortran/lap_vs_arp/ptest/ptest.f90:3
Aborted (core dumped)
line 45 in ptest is call parp
line 3 in ptest is use parpack(name of the module)
The main code is as follows:
program checker
use parpack
use arpack
! use lapack
implicit none
!Program to test LAPACK and ARPACK
! 1. Variable definition
integer a,n,i
real, allocatable :: mat(:,:)
real t0, t1
a=2
! Loop
! do 20 a = 1,3
! Open File
open(unit=10, file = 'matrix.dat', status = 'replace')
! 2. Generate Symmetric matrices
n = 10**a
allocate (mat(n,n))
call RANDOM_NUMBER(mat)
! 3. Save symmetric matrices to r.dat
write (10,*) n
do 30 i=1,n
write(10,*) mat(i,:)
30 end do
deallocate(mat)
close(10)
! 4. Test time taken by each of the routines
! call cpu_time(t0)
! call arp
! call cpu_time(t1)
! print *, 'n:', n, 'ARPACK time taken:', t1-t0
call cpu_time(t0)
call parp
call cpu_time(t1)
print *, 'n:', n, 'PARPACK time taken:', t1-t0
!20 end do
end program checker
The memory error occurs at the very end of the subroutine, when the mail program tries to exit from the subroutine. I have verified this by printing statements as the last line in the subroutine.
And on running mpirun -np 4 a.out, the code just enters the pdneupd process and sits there for eternity. Could anyone help?
I modified my code and put MPI_RECV before MPI_SEND. This time, I did not receive any error message; however, it seemed that code still encountered deadlock. The reason is that I opened some files (UNIT=11, 12, 13,14) before MPI_RECV and MPI_SEND commands; then, I collected data through these two commands and wrote them into these files but there was no data written into these files. I paste my modified code below. Would you please have a look at it and give me some suggestions? Thank you so much.
PROGRAM MAIN
USE MPI
USE CAL
IMPLICIT NONE
INTEGER :: nb !Number of valence band
DOUBLE PRECISION :: me !Minimum eigen value
DOUBLE COMPLEX, ALLOCATABLE :: u_s1(:,:) !Array to store the contribution of each eigen state to the total spin orbit torque
DOUBLE COMPLEX, ALLOCATABLE :: u_s2(:,:) !Array to store the contribution of each eigen state to the total spin orbit torque
DOUBLE COMPLEX, ALLOCATABLE :: u_t1(:,:) !Array to collect the contribution of each eigen state to the total spin orbit torque from all processors
DOUBLE COMPLEX, ALLOCATABLE :: u_t2(:,:) !Array to collect the contribution of each eigen state to the total spin orbit torque from all processors
DOUBLE COMPLEX :: sr1 !Sum of Femri surface part for spin orbit torque on all km(1) k points
DOUBLE COMPLEX :: sr2 !Sum of Femri surface part for spin orbit torque on all km(1) k points
DOUBLE PRECISION, ALLOCATABLE, TARGET :: nme(:) !Array to store the minimum eigen value
INTEGER, ALLOCATABLE, TARGET :: nnb(:) !Array to store the number of valence band
INTEGER :: world_size !MPI
INTEGER :: world_rank, ierr !MPI
INTEGER :: irank, j0 !MPI
!
!Initializing MPI
CALL MPI_Init(ierr)
CALL MPI_Comm_size(MPI_COMM_WORLD, world_size, ierr)
CALL MPI_Comm_rank(MPI_COMM_WORLD, world_rank, ierr)
!
!Opening file that stores the total spin orbit torque value for the Fermi surface part
OPEN (UNIT=11, FILE='SOT_Surface.dat', STATUS='UNKNOWN')
!
!Opening file that stores the spin orbit torque for the Fermi surface part versus energy
OPEN (UNIT=12, FILE='SOT_Surface_sve_xz.dat', STATUS='UNKNOWN')
OPEN (UNIT=13, FILE='SOT_Surface_sve_yz.dat', STATUS='UNKNOWN')
!
!Opening file that stores the minimum eigen value and number of valence band
OPEN (UNIT=14, FILE='SOT_mineig_numval.dat', STATUS='UNKNOWN')
!
!Allocating the array used to store the contribution of each eigen state to the total spin orbit torque
ALLOCATE (u_s1(2,nu_wa*km(1)))
ALLOCATE (u_s2(2,nu_wa*km(1)))
!
!Allocating array to collect the contribution of each eigen state to the total spin orbit torque from all processors
IF (world_rank .EQ. 0) THEN
ALLOCATE (u_t1(2,nu_wa*km(1)*km(2)))
ALLOCATE (u_t2(2,nu_wa*km(1)*km(2)))
END IF
u_t1 = CMPLX(0.0d0, 0.0d0)
u_t2 = CMPLX(0.0d0, 0.0d0)
!
!Allocating array to collect the number of valence band and the minimum eigen value
IF (world_rank .EQ. 0) THEN
ALLOCATE (nme(km(2)))
ALLOCATE (nnb(km(2)))
END IF
nme = 0.0d0
nnb = 0
!
!Allocating array to collect the contribution of each eigen state to the total spin orbit torque from all processors
IF (world_rank .EQ. 0) THEN
ALLOCATE (u_t1(2,nu_wa*km(1)*km(2)))
ALLOCATE (u_t2(2,nu_wa*km(1)*km(2)))
END IF
u_t1 = CMPLX(0.0d0, 0.0d0)
u_t2 = CMPLX(0.0d0, 0.0d0)
!
!Allocating array to collect the number of valence band and the minimum eigen value
IF (world_rank .EQ. 0) THEN
ALLOCATE (nme(km(2)))
ALLOCATE (nnb(km(2)))
END IF
nme = 0.0d0
nnb = 0
!
!opening test files
open (unit=21,file='normalisedprefactor.dat',status='unknown')
open (unit=22,file='gd.dat',status='unknown')
open (unit=23,file='con.dat',status='unknown')
open (unit=24,file='par.dat',status='unknown')
open (unit=25,file='grga.dat',status='unknown')
open (unit=26,file='nfdk.dat',status='unknown')
!Reading the Cartesian coordinates of k-point mesh
DO j = 1, km(2), 1
IF (mod(j-1, world_size) .NE. world_rank) CYCLE
DO k = 1, km(1), 1
kp(k,:) = ka(j,k,:)
END DO
!Building up Hamiltonian matrix on k points and diagonalising the matrix to obtain Eigen vectors and values
CALL HAMSUR(vd,kp,nu_wa,nu_nr,km(1),nd1,nd2,nd3,nd4,nd5,hr1,hr2,hr3,hr4,hr5,tb,ec,ev,fermi,an,wf,bv,dk,u_s1,u_s2,sr1,sr2,nb,me)
!
CALL MPI_Barrier(MPI_COMM_WORLD, ierr)
IF (WORLD_RANK .EQ. 0) THEN
u_t1(1:2,1+nu_wa*km(1)*(j-1):nu_wa*km(1)*j) = u_s1
u_t2(1:2,1+nu_wa*km(1)*(j-1):nu_wa*km(1)*j) = u_s2
DO k = 1, WORLD_SIZE-1, 1
IF (j-1+k .EQ. km(2)) EXIT
l = k + 101
n = k + 102
CALL MPI_RECV(u_t1(1,1+nu_wa*km(1)*(j-1+k)), 2*nu_wa*km(1), MPI_DOUBLE_COMPLEX, k,l, MPI_COMM_WORLD, MPI_STATUS_IGNORE, ierr)
CALL MPI_RECV(u_t2(1,1+nu_wa*km(1)*(j-1+k)), 2*nu_wa*km(1), MPI_DOUBLE_COMPLEX, k,n, MPI_COMM_WORLD, MPI_STATUS_IGNORE, ierr)
END DO
ELSE
l = WORLD_RANK + 101
n = WORLD_RANK + 102
CALL MPI_SEND(u_s1,2*nu_wa*km(1), MPI_DOUBLE_COMPLEX, 0, l, MPI_COMM_WORLD, ierr)
CALL MPI_SEND(u_s2,2*nu_wa*km(1), MPI_DOUBLE_COMPLEX, 0, n, MPI_COMM_WORLD, ierr)
END IF
crr1 = crr1 + sr1
crr2 = crr2 + sr2
IF (WORLD_RANK .EQ. 0) THEN
nme(j-1) = me
nnb(j-1) = nb
DO k = 1, WORLD_SIZE-1, 1
IF (j-1+k .EQ. km(2)) EXIT
l = k + 103
n = k + 104
CALL MPI_RECV(nme(j-1+k), 1, MPI_DOUBLE, k,l, MPI_COMM_WORLD, MPI_STATUS_IGNORE, ierr)
CALL MPI_RECV(nnb(j-1+k), 1, MPI_INT, k,n, MPI_COMM_WORLD, MPI_STATUS_IGNORE, ierr)
END DO
ELSE
l = WORLD_RANK + 103
n = WORLD_RANK + 104
CALL MPI_SEND(me, 1, MPI_DOUBLE, 0, l, MPI_COMM_WORLD, ierr)
CALL MPI_SEND(nb, 1, MPI_INT, 0, n, MPI_COMM_WORLD, ierr)
END IF
END DO
!
CALL MPI_Barrier(MPI_COMM_WORLD, ierr)
IF (world_rank .EQ. 0) THEN
ALLOCATE (crr1_all(world_size))
ALLOCATE (crr2_all(world_size))
END IF
crr1_all = CMPLX(0.0d0, 0.0d0)
crr2_all = CMPLX(0.0d0, 0.0d0)
CALL MPI_Gather(crr1, 1, MPI_double_complex, crr1_all, 1, MPI_double_complex, 0, MPI_COMM_WORLD, ierr)
CALL MPI_Gather(crr2, 1, MPI_double_complex, crr2_all, 1, MPI_double_complex, 0, MPI_COMM_WORLD, ierr)
!Writing total conductivity value into the file
IF (world_rank .EQ. 0) THEN
crr1_total = CMPLX(0.0d0, 0.0d0)
crr2_total = CMPLX(0.0d0, 0.0d0)
DO i = 1, world_size, 1
crr1_total = crr1_total + crr1_all(i)
crr2_total = crr2_total + crr2_all(i)
END DO
!Finding the minimum eigen value
NULLIFY (p1, p2)
p1 => nme(1)
p2 => nnb(1)
DO i = 2, km(2), 1
IF (p1 .GE. nme(i)) THEN
p1 => nme(i)
END IF
IF (p2 .LE. nnb(i)) THEN
p2 => nnb(i)
END IF
END DO
WRITE (UNIT=14, FMT='(A27,$)') 'The minimum eigen value is:'
WRITE (UNIT=14, FMT=*) p1
WRITE (UNIT=14, FMT='(A30,$)') 'The number of valence band is:'
WRITE (UNIT=14, FMT=*) p2
!
!Constant for the coefficient
pi = DACOS(-1.0d0)
hb = 1.054571817d-34 !(unit - J)
es = 1.602176634d-19 !(unit - J*s)
!
WRITE (UNIT=11, FMT='(A55,$)') 'Spin Orbit Torque without coeffieicnt within x-z plane:'
WRITE (UNIT=11, FMT=*) crr1_total
WRITE (UNIT=11, FMT='(A52,$)') 'Spin Orbit Torque with coefficient within x-z plane:'
WRITE (UNIT=11, FMT=*) crr1_total * es ** 2 * hb / 4.0d0 / pi
WRITE (UNIT=11, FMT='(A55,$)') 'Spin Orbit Torque without coeffieicnt within y-z plane:'
WRITE (UNIT=11, FMT=*) crr2_total
WRITE (UNIT=11, FMT='(A52,$)') 'Spin Orbit Torque with coefficient within y-z plane:'
WRITE (UNIT=11, FMT=*) crr2_total * es ** 2 * hb / 4.0d0 / pi
DO i = 1, nu_wa*km(1)*km(2), 1
WRITE (UNIT=12, FMT=*) u_t1(1:2,i)
WRITE (UNIT=13, FMT=*) u_t2(1:2,i)
END DO
END IF
!
!Finalising MPI
CALL MPI_Finalize(ierr)
!
!Deallocating array that sotres and collect the fermi-surface-part contribution of each eigen state to the total spin orbit torque
DEALLOCATE (u_s1)
DEALLOCATE (u_s2)
DEALLOCATE (u_t1)
DEALLOCATE (u_t2)
!
STOP
END PROGRAM MAIN
subroutine collect(rank, nprocs, n_local, n_global, u_initial_local)
use mpi
implicit none
integer*8 :: i_local_low, i_local_high
integer*8 :: i_global_low, i_global_high
integer*8 :: i_local, i_global
integer*8 :: n_local, n_global
real*8 :: u_initial_local(n_local)
real*8, dimension(:), allocatable :: u_global
integer :: procs
integer*8 :: n_local_procs
! Data declarations for MPI
integer :: ierr ! error signal variable, Standard value - 0
integer :: rank ! process ID (pid) / Number
integer :: nprocs ! number of processors
! MPI send/ receive arguments
integer :: buffer(2)
integer, parameter :: collect1 = 10
integer, parameter :: collect2 = 20
! status variable - tells the status of send/ received calls
! Needed for receive subroutine
integer, dimension(MPI_STATUS_SIZE) :: status1
i_global_low = (rank *(n_global-1))/nprocs
i_global_high = ((rank+1) *(n_global-1))/nprocs
if (rank > 0) then
i_global_low = i_global_low - 1
end if
i_local_low = 0
i_local_high = i_global_high - i_global_low
if (rank == 0) then
allocate(u_global(1:n_global))
do i_local = i_local_low, i_local_high
i_global = i_global_low + i_local - i_local_low
u_global(i_global) = u_initial_local(i_local)
end do
do procs = 1,nprocs-1
call MPI_RECV(buffer, 2, MPI_INTEGER, procs, collect1, MPI_COMM_WORLD, status1, ierr)
i_global_low = buffer(1)
n_local_procs = buffer(2)
call MPI_RECV(u_global(i_global_low+1), n_local_procs, MPI_DOUBLE_PRECISION, procs, collect2, MPI_COMM_WORLD, status1, ierr)
end do
print *, u_global
else
buffer(1) = i_global_low
buffer(2) = n_local
call MPI_SEND(buffer, 2, MPI_INTEGER, 0, collect1, MPI_COMM_WORLD, ierr)
call MPI_SEND(u_initial_local, n_local, MPI_DOUBLE_PRECISION, 0, collect2, MPI_COMM_WORLD, ierr)
end if
return
end subroutine collect
I am getting the error for MPI_SEND and MPI_RECV corresponding to collect2 tag. "There is no specific subroutine for the generic ‘mpi_recv’ at (1)" and 1 is at the end of .......ierr). MPI_SEND for collect2 tag is sending an array and MPI_RECV is receiving that array.
This does not happen for collect1 tag.
Your n_local is integer*8 but it must be integer (see How to debug Fortran 90 compile error "There is no specific subroutine for the generic 'foo' at (1)"?).
There are many articles (like https://blogs.cisco.com/performance/can-i-mpi_send-and-mpi_recv-with-a-count-larger-than-2-billion) about the problem with large arrays (more than maxint elements) and MPI. If you do have the problems with n_local being too large for integer, you can use derived types (like MPI_Type_contiguous) to lower the number of elements passed to MPI procedures so that it fits into a 4-byte integer.
I have a 2D array of integers and I want to send its rows to each separate process. I assume that number of rows (M=5) is not evenly divisible by number of processes (size = 4), so in my case the process 0 will obtain additional row. Size of the 2D array A is MxN (5x10).
Here is my code
PROGRAM SCATTERV_MATRIX
INCLUDE 'mpif.h'
integer :: rank, size, ierr, dest, src, tag !MPI variables
integer :: status(MPI_STATUS_SIZE) !MPI variables
INTEGER, PARAMETER :: N = 10 !number of columns
INTEGER, PARAMETER :: M = 5 !number of rows
INTEGER, ALLOCATABLE, DIMENSION(:,:) :: A !MxN matrix A
INTEGER :: NEWTYPE, RESIZEDTYPE !MPI derived data types
INTEGER, ALLOCATABLE, DIMENSION(:,:) :: LOCAL
INTEGER, ALLOCATABLE :: SENDCOUNTS(:), DISPLS(:)
INTEGER :: RECVCOUNT, NRBUF
INTEGER :: MMIN, MEXTRA, INTSIZE, K, I, J
INTEGER :: START, EXTENT !(KIND=MPI_ADRESS_KIND)
CALL MPI_INIT(ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierr)
IF ( rank == 0 ) THEN !allocate and create 2Darray
ALLOCATE( A (M, N) )
K = 1
DO I = 1, M
DO J = 1, N
A(I, J) = K
K = K + 1
END DO
END DO
END IF
ALLOCATE( SENDCOUNTS(0:size-1), DISPLS(0:size-1) )
MMIN = M/size !number of rows divided by number of processors
MEXTRA = MOD(M, size) !extra rows
K = 0
DO I = 0, size-1
IF (I < MEXTRA) THEN !SENDCOUNTS=(/2,1,1,1/)
SENDCOUNTS(I) = MMIN + 1
ELSE
SENDCOUNTS(I) = MMIN
END IF
DISPLS(I) = K !DISPLS=(/0,2,3,4/)
K = K + SENDCOUNTS(I)
END DO
RECVCOUNT = SENDCOUNTS(rank)
ALLOCATE( LOCAL(RECVCOUNT,N) )
CALL MPI_TYPE_VECTOR(N, 1, M, MPI_INTEGER, NEWTYPE, ierr)
CALL MPI_TYPE_COMMIT(NEWTYPE, ierr)
START = 0
CALL MPI_TYPE_SIZE(MPI_INTEGER, INTSIZE, ierr)
EXTENT = 1*INTSIZE
CALL MPI_TYPE_CREATE_RESIZED(NEWTYPE, START, EXTENT, RESIZEDTYPE, ierr)
CALL MPI_TYPE_COMMIT(RESIZEDTYPE, ierr)
LOCAL(:, :) = 0
CALL MPI_SCATTERV( &
A, SENDCOUNTS, DISPLS, RESIZEDTYPE, &
LOCAL, RECVCOUNT*N, MPI_INTEGER, &
0, MPI_COMM_WORLD, ierr)
WRITE(*,*) rank, ':', LOCAL
CALL MPI_FINALIZE(ierr)
END PROGRAM SCATTERV_MATRIX
After sucessfull compilation I got "Program Exception - access violation" error. All my previous Fortan MPI programs worked fine. There must be some bug in the code, probably in MPI_SCATTERV.
I was mainly following this answer. I will be gratefull for any suggestion. Thank you.
There's an error in your code:
INTEGER :: START, EXTENT !(KIND=MPI_ADRESS_KIND)
This line should be:
INTEGER(KIND=MPI_ADDRESS_KIND) :: START, EXTENT
In MPI, anything that is related to memory address, or similar concepts such as memory displacement, file size, file cursor etc., must not be normal integer. Some how you have this information in your comment and you also misspell MPI_ADDRESS_KIND.
Vladimir F correctly pointed out that you should 'USE MPI' instead of 'INCLUDE 'mpif.h''. This gives the compiler the opportunity to check the data types. For example, gfortran gives the following error message:
test.f90:59:71:
CALL MPI_TYPE_CREATE_RESIZED(NEWTYPE, START, EXTENT, RESIZEDTYPE, ierr)
1
Error: There is no specific subroutine for the generic
‘mpi_type_create_resized’ at (1)
I am rewriting a numerical simulation code that is parallelized using MPI in one direction.
So far, the arrays containing the data were saved by the master MPI process, which implied transferring the data from all MPI processes to one and allocate huge arrays to store the whole thing. It is not very efficient nor classy, and is a problem for large resolutions.
I am therefore trying to use MPI-IO to write directly the file from the distributed arrays. One of the constraint I have is that the written file needs to respect the fortran "unformatted" format (i.e. 4 bytes integer before and after each field indicating its size).
I wrote a simple test program, that works when I write only one distributed array to the file. However, when I write several arrays, the total size of the file is wrong and when comparing to the equivalent fortran 'unformatted' file, the files are different.
Here is the sample code :
module arrays_dim
implicit none
INTEGER, PARAMETER :: dp = kind(0.d0)
integer, parameter :: imax = 500
integer, parameter :: jmax = 50
integer, parameter :: kmax = 10
end module arrays_dim
module mpi_vars
use mpi
implicit none
integer, save :: ierr, myID, numprocs
integer, save :: i_start, i_end, i_mean, i_loc
integer, save :: subArray, fileH
integer(MPI_OFFSET_KIND), save :: offset, currPos
end module mpi_vars
program test
use mpi
use arrays_dim
use mpi_vars
real(dp), dimension(0:imax,0:jmax+1,0:kmax+1) :: v, w
real(dp), dimension(:,:,:), allocatable :: v_loc, w_loc
integer :: i, j, k
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, myID, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
i_mean = (imax+1)/numprocs
i_start = myID*i_mean
i_end = i_start+i_mean-1
if(i_mean*numprocs<imax+1) then
if(myID == numprocs-1) i_end = imax
endif
i_loc = i_end - i_start + 1
allocate(v_loc(i_start:i_end,0:jmax+1,0:kmax+1))
allocate(w_loc(i_start:i_end,0:jmax+1,0:kmax+1))
print*, 'I am:', myID, i_start, i_end, i_loc
do k=0,kmax+1
do j=0,jmax+1
do i=0,imax
v(i,j,k) = i+j+k
w(i,j,k) = i*j*k
enddo
enddo
enddo
if(myID==0) then
open(10,form='unformatted')
write(10) v
!write(10) w
close(10)
endif
do k=0,kmax+1
do j=0,jmax+1
do i=i_start,i_end
v_loc(i,j,k) = i+j+k
w_loc(i,j,k) = i*j*k
enddo
enddo
enddo
call MPI_Type_create_subarray (3, [imax+1, jmax+2, kmax+2], [i_loc, jmax+2, kmax+2], &
[i_start, 0, 0], &
MPI_ORDER_FORTRAN, MPI_DOUBLE_PRECISION, subArray, ierr)
call MPI_Type_commit(subArray, ierr)
call MPI_File_open(MPI_COMM_WORLD, 'mpi.dat', &
MPI_MODE_WRONLY + MPI_MODE_CREATE + MPI_MODE_APPEND, &
MPI_INFO_NULL, fileH, ierr )
call saveMPI(v_loc, (i_loc)*(jmax+2)*(kmax+2))
!call saveMPI(w_loc, (i_loc)*(jmax+2)*(kmax+2))
call MPI_File_close(fileH, ierr)
deallocate(v_loc,w_loc)
call MPI_FINALIZE(ierr)
end program test
!
subroutine saveMPI(array, n)
use mpi
use arrays_dim
use mpi_vars
implicit none
real(dp), dimension(n) :: array
integer :: n
offset = (imax+1)*(jmax+2)*(kmax+2)*8
if(myID==0) then
call MPI_File_seek(fileH, int(0,MPI_OFFSET_KIND), MPI_SEEK_CUR, ierr)
call MPI_File_write(fileH, [(imax+1)*(jmax+2)*(kmax+2)*8], 1, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
call MPI_File_seek(fileH, offset, MPI_SEEK_CUR, ierr)
call MPI_File_write(fileH, [(imax+1)*(jmax+2)*(kmax+2)*8], 1, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
endif
call MPI_File_set_view(fileH, int(4,MPI_OFFSET_KIND), MPI_DOUBLE_PRECISION, subArray, 'native', MPI_INFO_NULL, ierr)
call MPI_File_write_all(fileH, array, (i_loc)*(jmax+2)*(kmax+2), MPI_DOUBLE_PRECISION, MPI_STATUS_IGNORE, ierr)
end subroutine saveMPI
when the lines !write(10) w and !call saveMPI(w_loc, (i_loc)*(jmax+2)*(kmax+2)) are commented (i.e. I only write the v array), the code is working fine :
mpif90.openmpi -O3 -o prog main.f90
mpirun.openmpi -np 4 ./prog
cmp mpi.dat fort.10
cmp does not generate an output, so the files are identical.
If however I uncomment these lines, then the resulting files (mpi.dat and fort.10) are different. I am sure that the problem lies in the way I define the offset I use to write the data at the right position on the file, but I do not know how to indicate to the second call of saveMPI that the initial position should be the end of the file. What am I missing ?
Only the first call to saveMPI is working as you expect it to. Everything get messed up from the second call up. Here are few indications of what is happening:
MPI_File_set_view resets the independent file pointers and the shared file pointer to zero. See MPI_File_set_view for more details. So you are actually overwriting v data with w data when you call MPI_File_set_view in saveMPI.
with MPI_File_write, the data is written into those parts of the file specified by the current view. This mean that the way you are adding the size information into the file, is not really compatible with the view previously set for v.
calling MPI_File_seek with MPI_SEEK_CUR set the position relative to the current position of the individual pointer. So, for the second call, it is relative to the individual pointer of process 0
I do not use parallel IO that much, so I can not help more that this unless I step into the docs, which I do not have time to. The hint I can give is to:
add an additional parameter to saveMPI that will contain the absolute displacement of the data to write; this can be an [in out] arg. For the first call, it will be zero and for subsequent calls, it will be the size of all data already written to file, including the size information. It can be updated in saveMPI.
before writing the size information (by process 0) call MPI_File_set_view to reset the view to linear byte stream as originally given by MPI_File_open. This can be done by setting the etype and filetype to both MPI_BYTE in calling MPI_File_set_view. look into the doc of MPI_File_open for more information. You will then have to calls to MPI_File_set_view in saveMPI.
Your saveMPI subroutine could look like
subroutine saveMPI(array, n, disp)
use mpi
use arrays_dim
use mpi_vars
implicit none
real(dp), dimension(n) :: array
integer :: n, disp
offset = (imax+1)*(jmax+2)*(kmax+2)*8
call MPI_File_set_view(fileH, int(disp,MPI_OFFSET_KIND), MPI_BYTE, MPI_BYTE, 'native', MPI_INFO_NULL, ierr)
if(myID==0) then
call MPI_File_seek(fileH, int(0,MPI_OFFSET_KIND), MPI_SEEK_END, ierr)
call MPI_File_write(fileH, [(imax+1)*(jmax+2)*(kmax+2)*8], 1, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
call MPI_File_seek(fileH, int(offset,MPI_OFFSET_KIND), MPI_SEEK_CUR, ierr)
call MPI_File_write(fileH, [(imax+1)*(jmax+2)*(kmax+2)*8], 1, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
endif
call MPI_File_set_view(fileH, int(disp+4,MPI_OFFSET_KIND), MPI_DOUBLE_PRECISION, subArray, 'native', MPI_INFO_NULL, ierr)
call MPI_File_write_all(fileH, array, (i_loc)*(jmax+2)*(kmax+2), MPI_DOUBLE_PRECISION, MPI_STATUS_IGNORE, ierr)
disp = disp+offset+8
end subroutine saveMPI
and called like:
disp = 0
call saveMPI(v_loc, (i_loc)*(jmax+2)*(kmax+2), disp)
call saveMPI(w_loc, (i_loc)*(jmax+2)*(kmax+2), disp)
Finally, make sure that you delete the file between two calls because you are using MPI_MODE_APPEND.