use mpi_cart_create with master and slave - fortran

I want to create a topology without 0 processor in it. My idea is to make processor 0 as master and create topology as slave. After certain computation sin topology, I will send data to master. Here is my code:
include "mpif.h"
integer maxn
integer myid,Root,numprocs,numtasks,taskid
integer comm2d, ierr
integer dims(2)
logical periods(2)
data periods/2*.false./
Root = 0
CALL MPI_INIT( ierr )
CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr )
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr )
numtasks = numprocs-1
if(myid .eq. Root) then
print *, 'Hello I am master'
endif
c Get a new communicator for a decomposition of the domain.
c Let MPI find a "good" decomposition
dims(1) = 0
dims(2) = 0
CALL MPI_DIMS_CREATE(numtasks,2,dims,ierr)
CALL MPI_CART_CREATE(MPI_COMM_WORLD,2,dims,periods,.true.,
* comm2d,ierr)
c Get my position in this communicator
CALL MPI_COMM_RANK( comm2d, taskid, ierr )
c
print *, 'task ID= ',taskid
if (myid .eq. master) then
print *,dims(1),dims(2)
endif
CALL MPI_Comm_free( comm2d, ierr )
30 CALL MPI_FINALIZE(ierr)
STOP
END
But, when I run above code; I am getting the following error.
Fatal error in PMPI_Comm_rank: Invalid communicator, error stack:
PMPI_Comm_rank(121): MPI_Comm_rank(MPI_COMM_NULL, rank=0x7fff08bf960c)
failed PMPI_Comm_rank(73).: Null communicator
How can I eliminate the error? What I am doing wrong.

You start your MPI job with numprocs processes. Then you create a Cartesian topology with numtasks = numprocs-1 processes. Therefore, one of the processes ends up not being part of the Cartesian communicator and receives MPI_COMM_NULL in comm2d. Calling MPI_COMM_RANK with a null communicator is an error. The solution is to fix your code to first check the value of comm2d:
CALL MPI_CART_CREATE(MPI_COMM_WORLD,2,dims,periods,.true.,
* comm2d,ierr)
IF (comm2d.NE.MPI_COMM_NULL) THEN
...
ENDIF

Related

MPI_WIN_ALLOCATE_SHARED: memory limited?

It seems like whenever I am trying to allocate a window around 30-32 Mb I get a segmentation fault?
I am using following routine MPI_WIN_ALLOCATE_SHARED
Does anybody know if there is a limit to how big my window can be? If so, is there a way to compile my code relaxing that limit?
I am using INTEL MPI 19.0.3 and ifort 19.0.3 -
Example written in Fortran. By varying the integer size_ you can see when the segmentation fault occurs. I tested it with size_=10e3 and size_=10e4 the latter caused a segmentation fault
C------
program TEST_STACK
use, INTRINSIC ::ISO_C_BINDING
implicit none
include 'mpif.h'
!--- Parameters (They should not be changed ! )
integer, parameter :: whoisroot = 0 ! - Root always 0 here
!--- General parallel
integer :: whoami ! - My rank
integer :: mpi_nproc ! - no. of procs
integer :: mpierr ! - Error status
integer :: status(MPI_STATUS_SIZE)! - For MPI_RECV
!--- Shared memory stuff
integer :: whoami_shm ! - Local rank in shared memory group
integer :: mpi_shm_nproc ! - No. of procs in Shared memory group
integer :: no_partners ! - No. of partners for share memory
integer :: info_alloc
!--- MPI groups
integer :: world_group ! - All procs across all nodes
integer :: shared_group ! - Only procs that share memory
integer :: MPI_COMM_SHM ! - Shared memory communicators (for those in shared_group)
type(C_PTR) :: ptr_buf
integer(kind = MPI_ADDRESS_KIND) :: size_bytes, lb
integer :: win, size_, disp_unit
call MPI_INIT ( mpierr )
call MPI_COMM_RANK ( MPI_COMM_WORLD, whoami, mpierr )
call MPI_COMM_RANK ( MPI_COMM_WORLD, whoami, mpierr )
call MPI_COMM_SIZE ( MPI_COMM_WORLD, mpi_nproc, mpierr)
call MPI_COMM_SPLIT_TYPE( MPI_COMM_WORLD
& , MPI_COMM_TYPE_SHARED
& , 0
& , MPI_INFO_NULL
& , MPI_COMM_SHM
& , mpierr )
call MPI_COMM_RANK( MPI_COMM_SHM, whoami_shm, mpierr )
call MPI_COMM_SIZE( MPI_COMM_SHM, mpi_shm_nproc, mpierr )
size_ = 10e4! - seg fault
size_bytes = size_ * MPI_REAL
disp_unit = MPI_REAL
size_bytes = size_*disp_unit
call MPI_INFO_CREATE( info_alloc, mpierr )
call MPI_INFO_SET( info_alloc
& , "alloc_shared_noncontig"
& , "true"
& , mpierr )
!
call MPI_WIN_ALLOCATE_SHARED( size_bytes
& , disp_unit
& , info_alloc
& , MPI_COMM_SHM
& , ptr_buf
& , win
& , mpierr )
call MPI_WIN_FREE(win, mpierr)
end program TEST_STACK
I run my code using following command
mpif90 test_stack.f90; mpirun -np 2 ./a.out
This wrapper is linked to my ifort 19.0.3 and Intel MPI library. This has been verified by running
mpif90 -v
and to be very precise my mpif90 is a symbolic link to my mpiifort wrapper. This is made for personal convenience but shouldn't be causing problems I guess?
The manual says that the call to MPI_WIN_ALLOCATE_SHARED looks like this
USE MPI
MPI_WIN_ALLOCATE_SHARED(SIZE, DISP_UNIT, INFO, COMM, BASEPTR, WIN, IERROR)
INTEGER(KIND=MPI_ADDRESS_KIND) SIZE, BASEPTR
INTEGER DISP_UNIT, INFO, COMM, WIN, IERROR
At least the types of disp_unit and baseptr do not match in your program.
I was finally able to diagnose where the error stems from.
In the code I have
disp_unit = MPI_REAL
size_bytes = size_*disp_unit
MPI_REAL is a constant/parameter defined by MPI and is not equal to 4 as I very wrongly expected (4 for 4 bytes for single precision)!. In my version it is set to 1275069468 which most likely refers to an id rather than any sensible number.
Hence, multiplying this number with the size of my array can very quickly exceeds the available memory, but and also the number of digits that can be represented by a byte integer

Incorrect results when reading binary file with MPI I/O

I am new to MPI and am struggling with reading a binary file.
Specifically, I have a $198\times 50 \times 50$ array of integers (16 bit integers, to be specific) stored in a binary file. I want to use 2 compute nodes to process this file. So there are two MPI processes and each process will process half of the input. I am using the function MPI_FILE_READ_AT to read respective regions. I expect the array values to fill in the variable/argument 'bucket' that I pass in to the function call. But a sanity check print out of the 'bucket' entries tells me that the values in bucket are all incorrect. I feel that I am going wrong with the arguments.
program main
use mpi
implicit none
integer :: i, error, num_processes, id, fh
integer(MPI_OFFSET_KIND) :: filesize, offset
integer(MPI_OFFSET_KIND) :: num_bytes_per_process
integer(MPI_OFFSET_KIND) :: num_bytes_this_process
integer :: num_ints_per_process, num_ints_this_process
integer(kind = 2), dimension(:), allocatable :: bucket
character(len=100) :: inputFileName
integer, parameter :: INTKIND=2
! Initialize
inputFileName = 'xyz_50x50'
print *, 'MPI_OFFSET_KIND =', MPI_OFFSET_KIND
! MPI basics
call MPI_Init ( error )
call MPI_Comm_size ( MPI_COMM_WORLD, num_processes, error )
call MPI_Comm_rank ( MPI_COMM_WORLD, id, error )
! Open the file
call MPI_FILE_OPEN(MPI_COMM_WORLD, inputFileName, MPI_MODE_RDONLY, &
MPI_INFO_NULL, fh, error)
! get the size of the file
call MPI_File_get_size(fh, filesize, error)
! Note: filesize is the TOTAL number of bytes in the file
num_bytes_per_process = filesize/num_processes
num_ints_per_process = num_bytes_per_process/INTKIND
offset = id * num_bytes_per_process
num_bytes_this_process = min(num_bytes_per_process, filesize - offset)
num_ints_this_process = num_bytes_this_process/INTKIND
allocate(bucket(num_ints_this_process))
call MPI_FILE_READ_AT(fh, offset, bucket, num_ints_this_process, &
MPI_SHORT, MPI_STATUS_SIZE, error)
do i = 1, num_ints_this_process
if (bucket(i) /= 0) then
print *, "my id is ", id, " and bucket(",i,")=", bucket(i)
endif
enddo
! close the file
call MPI_File_close(fh, error)
! close mpi
call MPI_Finalize(error)
end program main
you have to use MPI_STATUS_IGNORE instead of MPI_STATUS_SIZE
(fwiw, i am unable to compile this program unless i fixe this)
call MPI_FILE_READ_AT(fh, offset, bucket, num_ints_this_process, &
MPI_SHORT, MPI_STATUS_IGNORE, error)
note that since all MPI tasks read the file at the same time, you'd rather use the collective MPI_File_read_at_all() subroutine in order to improve performances.

MPI in Fortran gives garbage values

PROGRAM ShareNeighbors
IMPLICIT REAL (a-h,o-z)
INCLUDE "mpif.h"
PARAMETER (m = 500, n = 500)
DIMENSION a(m,n), b(m,n)
DIMENSION h(m,n)
INTEGER istatus(MPI_STATUS_SIZE)
INTEGER iprocs, jprocs
PARAMETER (ROOT = 0)
integer dims(2),coords(2)
logical periods(2)
data periods/2*.false./
integer status(MPI_STATUS_SIZE)
integer comm2d,req,source
CALL MPI_INIT(ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, myrank, ierr)
! Get a new communicator for a decomposition of the domain.
! Let MPI find a "good" decomposition
dims(1) = 0
dims(2) = 0
CALL MPI_DIMS_CREATE(nprocs,2,dims,ierr)
if (myrank.EQ.Root) then
print *,nprocs,'processors have been arranged into',dims(1),'X',dims(2),'grid'
endif
CALL MPI_CART_CREATE(MPI_COMM_WORLD,2,dims,periods,.true., &
comm2d,ierr)
! Get my position in this communicator
CALL MPI_COMM_RANK(comm2d,myrank,ierr)
! Get the decomposition
CALL fnd2ddecomp(comm2d,m,n,ista,iend,jsta,jend)
! print *,ista,jsta,iend,jend
ilen = iend - ista + 1
jlen = jend - jsta + 1
CALL MPI_Cart_get(comm2d,2,dims,periods,coords,ierr)
iprocs = dims(1)
jprocs = dims(2)
myranki = coords(1)
myrankj = coords(2)
DO j = jsta, jend
DO i = ista, iend
a(i,j) = myrank+1
ENDDO
ENDDO
! Send data from each processor to Root
call MPI_ISEND(ista,1,MPI_INTEGER,Root,1, &
MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(iend,1,MPI_INTEGER,Root,1, &
MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(jsta,1,MPI_INTEGER,Root,1, &
MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(jend,1,MPI_INTEGER,Root,1, &
MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(a(ista:iend,jsta:jend),(ilen)*(jlen),MPI_REAL, &
Root,1,MPI_COMM_WORLD,req,ierr )
! Recieved the results from othe precessors
if (myrank.EQ.Root) then
do source = 0,nprocs-1
call MPI_RECV(ista,1,MPI_INTEGER,source, &
1,MPI_COMM_WORLD,status,ierr )
call MPI_RECV(iend,1,MPI_INTEGER,source, &
1,MPI_COMM_WORLD,status,ierr )
call MPI_RECV(jsta,1,MPI_INTEGER,source, &
1,MPI_COMM_WORLD,status,ierr )
call MPI_RECV(jend,1,MPI_INTEGER,source, &
1,MPI_COMM_WORLD,status,ierr )
ilen = iend - ista + 1
jlen = jend - jsta + 1
call MPI_RECV(a(ista:iend,jsta:jend),(ilen)*(jlen),MPI_REAL, &
source,1,MPI_COMM_WORLD,status,ierr)
! print the results
call ZMINMAX(m,n,ista,iend,jsta,jend,a(:,:),amin,amax)
print *, 'myid=',source,amin,amax
call MPI_Wait(req, status, ierr)
enddo
endif
CALL MPI_FINALIZE(ierr)
END
subroutine fnd2ddecomp(comm2d,m,n,ista,iend,jsta,jend)
integer comm2d
integer m,n,ista,jsta,iend,jend
integer dims(2),coords(2),ierr
logical periods(2)
! Get (i,j) position of a processor from Cartesian topology.
CALL MPI_Cart_get(comm2d,2,dims,periods,coords,ierr)
! Decomposition in first (ie. X) direction
CALL MPE_DECOMP1D(m,dims(1),coords(1),ista,iend)
! Decomposition in second (ie. Y) direction
CALL MPE_DECOMP1D(n,dims(2),coords(2),jsta,jend)
return
end
SUBROUTINE MPE_DECOMP1D(n,numprocs,myid,s,e)
integer n,numprocs,myid,s,e,nlocal,deficit
nlocal = n / numprocs
s = myid * nlocal + 1
deficit = mod(n,numprocs)
s = s + min(myid,deficit)
! Give one more slice to processors
if (myid .lt. deficit) then
nlocal = nlocal + 1
endif
e = s + nlocal - 1
if (e .gt. n .or. myid .eq. numprocs-1) e = n
return
end
SUBROUTINE ZMINMAX(IX,JX,SX,EX,SY,EY,ZX,ZXMIN,ZXMAX)
INTEGER :: IX,JX,SX,EX,SY,EY
REAL :: ZX(IX,JX)
REAL :: ZXMIN,ZXMAX
ZXMIN=1000.
ZXMAX=-1000.
DO II=SX,EX
DO JJ=SY,EY
IF(ZX(II,JJ).LT.ZXMIN)ZXMIN=ZX(II,JJ)
IF(ZX(II,JJ).GT.ZXMAX)ZXMAX=ZX(II,JJ)
ENDDO
ENDDO
RETURN
END
When I am running the above code with 4 processors Root receives garbage values. Where as for 15 processors, the data transfer is proper. How I can tackle this?
I guess it is related buffer, a point which is not clear to me. How I have to tackle the buffer wisely?
1. problem
You are doing multiple sends
call MPI_ISEND(ista,1,MPI_INTEGER,Root,1, &
MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(iend,1,MPI_INTEGER,Root,1, &
MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(jsta,1,MPI_INTEGER,Root,1, &
MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(jend,1,MPI_INTEGER,Root,1, &
MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(a(ista:iend,jsta:jend),(ilen)*(jlen),MPI_REAL, &
Root,1,MPI_COMM_WORLD,req,ierr )
and all of them with the same request variable req. That can't work.
2. problem
You are using a subarray a(ista:iend,jsta:jend) in non-blocking MPI. That is not allowed*. You need to copy the array into some temporary array buffer or use MPI derived subarray datatype (too hard for you at this stage).
The reason for the problem is that the compiler will create a temporary copy just for the call to ISend. The ISend will remember the address, but will not send anything. Then temporary is deleted and the address becomes invalid. And then the MPI_Wait will try to use that address and will fail.
3. problem
Your MPI_Wait is in the wrong place. It must be after the sends out of any if conditions so that they are always executed (provided you are always sending).
You must collect all request separately and than wait for all of them. Best to have them in a an array and wait for all of them at once using MPI_Waitall.
Remeber, the ISend typically does not actually send anything if the buffer is large. The exchange often happens during the Wait operation. At least for larger arrays.
Recommendation:
Take a simple problem example and try to exchange just two small arrays with MPI_IRecv and MPI_ISend between two processes. As simple test problem as you can do. Learn from it, do simple steps. Take no offence, but your current understanding of non-blocking MPI is too weak to write full scale programs. MPI is hard, non-blocking MPI is even harder.
* not allowed when using the interface available in MPI-2. MPI-3 brings a new interface available by using use mpi_f08 where it is possible. But learn the basics first.

Invalid pointer and segmentation fault when using MPI_Gather in Fortran

I have a simple program, which is supposed to gather a number of small arrays into one big one using MPI.
PROGRAM main
include 'mpif.h'
integer ierr, i, myrank, thefile, n_procs
integer, parameter :: BUFSIZE = 3
complex*16, allocatable :: loc_arr(:), glob_arr(:)
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, myrank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, n_procs, ierr)
allocate(loc_arr(BUFSIZE))
loc_arr = 0.7 * myrank - cmplx(0.3, 0, kind=8)
allocate(glob_arr(n_procs* BUFSIZE))
write (*,*) myrank, shape(glob_arr)
call MPI_Gather(loc_arr, BUFSIZE, MPI_DOUBLE_COMPLEX,&
glob_arr, n_procs * BUFSIZE, MPI_DOUBLE_COMPLEX,&
0, MPI_COMM_WORLD, ierr)
write (*,*) myrank,"Errorcode:" , ierr
call MPI_FINALIZE(ierr)
END PROGRAM main
I have some experience with MPI in C, but for Fortran 90 nothing seems to work. Here is how I compile(I use ifort) and run it:
mpif90 test.f90 -check all && mpirun -np 4 ./a.out
1 12
3 12
3 Errorcode: 0
1 Errorcode: 0
0 12
2 12
2 Errorcode: 0
0 Errorcode: 0
*** Error in `./a.out': free(): invalid pointer: 0x0000000000a25790 ***
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 10889 RUNNING AT LenovoX1kabel
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 10889 RUNNING AT LenovoX1kabel
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
What do I do wrong? Sometimes I will get this pointer problem, sometimes I will a segmentation fault, but to me it doesn't look like any of the ifort checks complain.
All the Errorcodes are 0, so I'm not sure where I go wrong.
You should never specify the number of processes in MPI collectives. That is a simple rule of thumb.
Therefore the line n_procs * BUFSIZE is clearly wrong.
And indeed the manual states that: recvcount Number of elements for any single receive (integer, significant only at root).
You should just use BUFSIZE. This is the same for C and Fortran.

Using MPI in subroutines

I'm working on code that serially calls a subroutine (which in turn performs iterations) many times. I wish to parallelize the iterations inside the subroutine. The problem with mpi is that I'm only allowed to initialize it once. Hence, I cannot initialize it in my subroutine, which gets called multiple times. Can anyone suggest a way out of this?
My problem is roughly as outlined below:
program p
...
do i=1,10000
call subroutine s(i)
end do
end program p
subroutine s(j)
...
do i=1,10000
...
end do
end subroutine s
I wish to parallelize this process.
Thanks a lot. That helped! But let me re frame my question,
Within the iterations of the main program, along with the subroutine s, I have to call another subroutine s2, (which doesn't need to be parallelized). I thought, it could be done this way:
!initialize mpi
do i=1:1000
if rank!=0
call s
else call s2
end if
end do
!finalize mpi
But the main problem here is, while the rest of the processes proceed slowly, process 0 will proceed quickly. (Something not desirable).So, is it possible to make process 0 wait after each iteration till the other process complete their iteration?
You need to initialize and finalize MPI in the main program. Typically, you then define a load-balancing that is valid for the work in the subroutine.
Then you do your loop inside the subroutine in parallel and gather (reduce?) the results at the end of the subroutine so you have all information you need when the subroutine is called next.
This works the same way as it would with a loop in the main program (without calling the subroutine).
Here is a minimum example:
module testMod
use mpi
implicit none
!#include "mpif.h"
!===
contains
!===
subroutine s(mysize, myrank, array)
integer,intent(in) :: mysize, myrank
integer,intent(inout) :: array(:)
integer :: i, ierror
! Do stuff
do i=1,size(array)
! Skip element that is not associated with the current process
if ( mod(i,mysize) .ne. myrank ) cycle
array(i) = array(i) + 1
enddo ! i
! MPI Allreduce
call MPI_Allreduce(MPI_IN_PLACE, array, size(array), MPI_INTEGER, &
MPI_MAX, MPI_COMM_WORLD, ierror)
end subroutine
end module
program mpiTest
use testMod
use mpi
implicit none
!#include "mpif.h"
integer :: mysize, myrank, ierror
integer,parameter :: ITER=100
integer,parameter :: arraySize=10
integer :: work(arraySize)
integer :: i
! MPI Initialization
call MPI_Init(ierror)
call MPI_Comm_rank(MPI_COMM_WORLD, myrank, ierror)
call MPI_Comm_size(MPI_COMM_WORLD, mysize, ierror)
work = 0
do i=1,ITER
call s(mysize, myrank, work)
enddo
if ( myrank .eq. 0 ) write(*,*) work
! MPI Finalize
call MPI_Finalize(ierror)
end program