Using MPI_AllReduce with MPI_IN_PLACE in Fortran

Using MPI_AllReduce with MPI_IN_PLACE in Fortran - fortran

I am trying to calculate a sum of a column vector using mpi.
I have read the following thread
Can MPI sendbuf and recvbuf be the same thing?
and found the use of MPI_IN_PLACE would be a good choice to minimize the memory usage for very large data.
However, in my trivial test the mpi_allreduce keeps reporting memory segmentation fault:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
I could not figure out what went wrong.
Any help would be greatly appreciated.
Here's my code:
program mpitest
implicit none
include "mpif.h"
integer :: i, mprank, mpsize, mpierr
real(KIND=8), dimension(5) :: Qin
call MPI_INIT(mpierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,mpsize,mpierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,mprank,mpierr)
write(*,*) mprank,'/',mpsize
do i = 1, 5
Qin(i) = (mprank+1)*i
write(*,*) Qin(i), mprank
enddo
call MPI_AllReduce(MPI_IN_PLACE,Qin,5,MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD)
if (mprank == 0) then
do i = 1, 5
write(*,*) Qin(i)
enddo
endif
call MPI_FINALIZE(mpierr)
end program mpitest

Related

MPI_WIN_ALLOCATE_SHARED: memory limited?

It seems like whenever I am trying to allocate a window around 30-32 Mb I get a segmentation fault?
I am using following routine MPI_WIN_ALLOCATE_SHARED
Does anybody know if there is a limit to how big my window can be? If so, is there a way to compile my code relaxing that limit?
I am using INTEL MPI 19.0.3 and ifort 19.0.3 -
Example written in Fortran. By varying the integer size_ you can see when the segmentation fault occurs. I tested it with size_=10e3 and size_=10e4 the latter caused a segmentation fault
C------
program TEST_STACK
use, INTRINSIC ::ISO_C_BINDING
implicit none
include 'mpif.h'
!--- Parameters (They should not be changed ! )
integer, parameter :: whoisroot = 0 ! - Root always 0 here
!--- General parallel
integer :: whoami ! - My rank
integer :: mpi_nproc ! - no. of procs
integer :: mpierr ! - Error status
integer :: status(MPI_STATUS_SIZE)! - For MPI_RECV
!--- Shared memory stuff
integer :: whoami_shm ! - Local rank in shared memory group
integer :: mpi_shm_nproc ! - No. of procs in Shared memory group
integer :: no_partners ! - No. of partners for share memory
integer :: info_alloc
!--- MPI groups
integer :: world_group ! - All procs across all nodes
integer :: shared_group ! - Only procs that share memory
integer :: MPI_COMM_SHM ! - Shared memory communicators (for those in shared_group)
type(C_PTR) :: ptr_buf
integer(kind = MPI_ADDRESS_KIND) :: size_bytes, lb
integer :: win, size_, disp_unit
call MPI_INIT ( mpierr )
call MPI_COMM_RANK ( MPI_COMM_WORLD, whoami, mpierr )
call MPI_COMM_RANK ( MPI_COMM_WORLD, whoami, mpierr )
call MPI_COMM_SIZE ( MPI_COMM_WORLD, mpi_nproc, mpierr)
call MPI_COMM_SPLIT_TYPE( MPI_COMM_WORLD
& , MPI_COMM_TYPE_SHARED
& , 0
& , MPI_INFO_NULL
& , MPI_COMM_SHM
& , mpierr )
call MPI_COMM_RANK( MPI_COMM_SHM, whoami_shm, mpierr )
call MPI_COMM_SIZE( MPI_COMM_SHM, mpi_shm_nproc, mpierr )
size_ = 10e4! - seg fault
size_bytes = size_ * MPI_REAL
disp_unit = MPI_REAL
size_bytes = size_*disp_unit
call MPI_INFO_CREATE( info_alloc, mpierr )
call MPI_INFO_SET( info_alloc
& , "alloc_shared_noncontig"
& , "true"
& , mpierr )
!
call MPI_WIN_ALLOCATE_SHARED( size_bytes
& , disp_unit
& , info_alloc
& , MPI_COMM_SHM
& , ptr_buf
& , win
& , mpierr )
call MPI_WIN_FREE(win, mpierr)
end program TEST_STACK
I run my code using following command
mpif90 test_stack.f90; mpirun -np 2 ./a.out
This wrapper is linked to my ifort 19.0.3 and Intel MPI library. This has been verified by running
mpif90 -v
and to be very precise my mpif90 is a symbolic link to my mpiifort wrapper. This is made for personal convenience but shouldn't be causing problems I guess?

The manual says that the call to MPI_WIN_ALLOCATE_SHARED looks like this
USE MPI
MPI_WIN_ALLOCATE_SHARED(SIZE, DISP_UNIT, INFO, COMM, BASEPTR, WIN, IERROR)
INTEGER(KIND=MPI_ADDRESS_KIND) SIZE, BASEPTR
INTEGER DISP_UNIT, INFO, COMM, WIN, IERROR
At least the types of disp_unit and baseptr do not match in your program.

I was finally able to diagnose where the error stems from.
In the code I have
disp_unit = MPI_REAL
size_bytes = size_*disp_unit
MPI_REAL is a constant/parameter defined by MPI and is not equal to 4 as I very wrongly expected (4 for 4 bytes for single precision)!. In my version it is set to 1275069468 which most likely refers to an id rather than any sensible number.
Hence, multiplying this number with the size of my array can very quickly exceeds the available memory, but and also the number of digits that can be represented by a byte integer

Passing an allocated array from a SUBTROUTINE to the main program in Fortran

There are several threads with similar titles of mine, but I do not believe they are the same. One was very similar fortran pass allocated array to main procedure, but the answer required Fortran 2008. I am after a Fortran 90/95 solution.
Another very good, and quite similar thread is Dynamic array allocation in fortran90. However in this method while they allocate in the subroutine, they don't ever appear to deallocate, which seems odd. My method looks on the surface at least to be the same, yet when I print the array in the main program, only blank spaces are printed. When I print in the subroutine itself, the array prints to screen the correct values, and the correct number of values.
In the following a MAIN program calls a subroutine. This subroutine reads data into an allocatable array, and passes the array back to the main program. I do this by using small subroutines each designed to look for specific terms in the input file. All of these subroutines are in one module file. So there are three files: Main.f90, input_read.f90 and filename.inp.
It seems then that I do not know how to pass an array that is allocatable in program Main.f90 as well as in the called subroutine where it is actually allocated, sized, and then deallocated before being passed to program Main. This perhaps sounds confusing, so here is the code for all three programs. I apologize for the poor formatting when I pasted it. I tried to separate all the rows.
main.f90:
Program main
use input_read ! the module with the subroutines used for reading filename.inp
implicit none
REAL, Allocatable :: epsilstar(:)
INTEGER :: natoms
call Obtain_LJ_Epsilon(epsilstar, natoms)
print*, 'LJ Epsilon : ', epsilstar
END Program main
Next is the module with a subroutine (I removed all but the necessary one for space), input_read.f90:
module input_read
contains
!===============================================================
!===============================================================
Subroutine Obtain_LJ_Epsilon(epsilstar,natoms)
! Reads epsilon and sigma parameters for Lennard-Jones Force-Field and also
! counts the number of types of atoms in the system
!===============================================================
!===============================================================
INTEGER :: error,line_number,natoms_eps,i
CHARACTER(120) :: string, next_line, next_next_line,dummy_char
CHARACTER(8) :: dummy_na,dummy_eps
INTEGER,intent(out) :: natoms
LOGICAL :: Proceed
real, intent(out), allocatable :: epsilstar(:)
error = 0
line_number = 0
Proceed = .true.
open(10,file='filename.inp',status='old')
!=============================================
! Find key word LJ_Epsilon
!=============================================
DO
line_number = line_number + 1
Read(10,'(A120)',iostat=error) string
IF (error .NE. 0) THEN
print*, "Error, stopping read input due to an error reading line"
exit
END IF
IF (string(1:12) == '$ LJ_epsilon') THEN
line_number = line_number + 1
exit
ELSE IF (string(1:3) == 'END' .or. line_number > 2000) THEN
print*, "Hit end of file before reading '$ LJ_epsilon' "
Proceed = .false.
exit
ENDIF
ENDDO
!========================================================
! Key word found, now determine number of parameters
! needing to be read
!========================================================
natoms_eps = -1
dummy_eps = 'iii'
do while ((dummy_eps(1:1) .ne. '$') .and. (dummy_eps(1:1) .ne. ' '))
natoms_eps = natoms_eps + 1
read(10,*) dummy_eps
enddo !we now know the number of atoms in the system (# of parameters)
close(10)
Allocate(epsilstar(natoms_eps))
epsilstar = 0.0
!============================================================
! Number of parameters found, now read their values
!============================================================
if(Proceed) then
open(11,file='filename.inp',status='old')
do i = 1,line_number-1
read(11,*) ! note it is not recording anything for this do loop
enddo
do i = 1,natoms_eps
read(11,*) dummy_char
read(dummy_char,*) epsilstar(i) ! convert string read in to real, and store in epsilstar
enddo
close(11)
PRINT*, 'LJ_epsilon: ', epsilstar ! printing to make sure it worked
endif
deallocate(epsilstar)
END Subroutine Obtain_LJ_Epsilon
end module input_read
And finally the input file: filename.inp
# Run_Type
NVT
# Run_Name
Test_Name
# Pressure
1.0
# Temperature
298.15
# Number_Species
# LJ_epsilon
117.1
117.1
117.1
# LJ_sigma
3.251
3.251
3.251
END
And again, I can't figure out how to pass the allocated epsilstar array to the main program. I have tried passing an unallocated array to the subroutine from the main.f90, allocating it inside, passing it back, and deallocating it in the main.f90, but that did not work. I have tried it as the code currently is... the code works (i.e. is bug free) but it does not pass epsilstar from the subroutine where it correctly finds it and creates an array.

It turns out that the mistake I made was in deallocating the array in the subroutine before passing it to the main program. By NOT deallocating, the array was sent back fine. Also, I do not deallocate in the main program either.

Invalid pointer and segmentation fault when using MPI_Gather in Fortran

I have a simple program, which is supposed to gather a number of small arrays into one big one using MPI.
PROGRAM main
include 'mpif.h'
integer ierr, i, myrank, thefile, n_procs
integer, parameter :: BUFSIZE = 3
complex*16, allocatable :: loc_arr(:), glob_arr(:)
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, myrank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, n_procs, ierr)
allocate(loc_arr(BUFSIZE))
loc_arr = 0.7 * myrank - cmplx(0.3, 0, kind=8)
allocate(glob_arr(n_procs* BUFSIZE))
write (*,*) myrank, shape(glob_arr)
call MPI_Gather(loc_arr, BUFSIZE, MPI_DOUBLE_COMPLEX,&
glob_arr, n_procs * BUFSIZE, MPI_DOUBLE_COMPLEX,&
0, MPI_COMM_WORLD, ierr)
write (*,*) myrank,"Errorcode:" , ierr
call MPI_FINALIZE(ierr)
END PROGRAM main
I have some experience with MPI in C, but for Fortran 90 nothing seems to work. Here is how I compile(I use ifort) and run it:
mpif90 test.f90 -check all && mpirun -np 4 ./a.out
1 12
3 12
3 Errorcode: 0
1 Errorcode: 0
0 12
2 12
2 Errorcode: 0
0 Errorcode: 0
*** Error in `./a.out': free(): invalid pointer: 0x0000000000a25790 ***
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 10889 RUNNING AT LenovoX1kabel
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 10889 RUNNING AT LenovoX1kabel
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
What do I do wrong? Sometimes I will get this pointer problem, sometimes I will a segmentation fault, but to me it doesn't look like any of the ifort checks complain.
All the Errorcodes are 0, so I'm not sure where I go wrong.

You should never specify the number of processes in MPI collectives. That is a simple rule of thumb.
Therefore the line n_procs * BUFSIZE is clearly wrong.
And indeed the manual states that: recvcount Number of elements for any single receive (integer, significant only at root).
You should just use BUFSIZE. This is the same for C and Fortran.

Using MPI in subroutines

I'm working on code that serially calls a subroutine (which in turn performs iterations) many times. I wish to parallelize the iterations inside the subroutine. The problem with mpi is that I'm only allowed to initialize it once. Hence, I cannot initialize it in my subroutine, which gets called multiple times. Can anyone suggest a way out of this?
My problem is roughly as outlined below:
program p
...
do i=1,10000
call subroutine s(i)
end do
end program p
subroutine s(j)
...
do i=1,10000
...
end do
end subroutine s
I wish to parallelize this process.
Thanks a lot. That helped! But let me re frame my question,
Within the iterations of the main program, along with the subroutine s, I have to call another subroutine s2, (which doesn't need to be parallelized). I thought, it could be done this way:
!initialize mpi
do i=1:1000
if rank!=0
call s
else call s2
end if
end do
!finalize mpi
But the main problem here is, while the rest of the processes proceed slowly, process 0 will proceed quickly. (Something not desirable).So, is it possible to make process 0 wait after each iteration till the other process complete their iteration?

You need to initialize and finalize MPI in the main program. Typically, you then define a load-balancing that is valid for the work in the subroutine.
Then you do your loop inside the subroutine in parallel and gather (reduce?) the results at the end of the subroutine so you have all information you need when the subroutine is called next.
This works the same way as it would with a loop in the main program (without calling the subroutine).
Here is a minimum example:
module testMod
use mpi
implicit none
!#include "mpif.h"
!===
contains
!===
subroutine s(mysize, myrank, array)
integer,intent(in) :: mysize, myrank
integer,intent(inout) :: array(:)
integer :: i, ierror
! Do stuff
do i=1,size(array)
! Skip element that is not associated with the current process
if ( mod(i,mysize) .ne. myrank ) cycle
array(i) = array(i) + 1
enddo ! i
! MPI Allreduce
call MPI_Allreduce(MPI_IN_PLACE, array, size(array), MPI_INTEGER, &
MPI_MAX, MPI_COMM_WORLD, ierror)
end subroutine
end module
program mpiTest
use testMod
use mpi
implicit none
!#include "mpif.h"
integer :: mysize, myrank, ierror
integer,parameter :: ITER=100
integer,parameter :: arraySize=10
integer :: work(arraySize)
integer :: i
! MPI Initialization
call MPI_Init(ierror)
call MPI_Comm_rank(MPI_COMM_WORLD, myrank, ierror)
call MPI_Comm_size(MPI_COMM_WORLD, mysize, ierror)
work = 0
do i=1,ITER
call s(mysize, myrank, work)
enddo
if ( myrank .eq. 0 ) write(*,*) work
! MPI Finalize
call MPI_Finalize(ierror)
end program

How to call subroutines in fortran parallelized by MPI?

My problem is that I don't know how to call subroutines when I use mpi scheme in Fortran.
I have written this small code named TRY.f90 in which there is a subroutine named CONCENTRATION.f90. How should I change CONCENTRATION.f90 in order to make the code works?
PROGRAM TRY
USE MPI
integer status(mpi_status_size)
INTEGER I, J, K, II, IERR, MY_ID, NUM_PROCS, PSP
INTEGER , PARAMETER :: GRIDX =64, GRIDY=64
REAL , DIMENSION(gridx,gridy) :: PSI
PSI=0
PRINT*, 'VARIABLE'
CALL MPI_INIT(IERR)
CALL MPI_COMM_RANK(MPI_COMM_WORLD,MY_ID,IERR)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD,NUM_PROCS,IERR)
CALL CONCENTRATION(GRIDX, GRIDY, NUM_PROCS, MY_ID , PSI)
IF (MY_ID .NE. 0) THEN
CALL mpi_send( PSI(1+MY_ID*GRIDX/NUM_PROCS:(MY_ID+1)*GRIDX/NUM_PROCS:1,1:GRIDY:1),&
(GRIDX/NUM_PROCS)*GRIDY,mpi_real, 0,10,mpi_comm_world,ierr)
END IF
IF (MY_ID .EQ. 0) THEN
DO II=1,NUM_PROCS-1
CALL mpi_recv(PSI(1+II*GRIDX/NUM_PROCS:(II+1)*GRIDX/NUM_PROCS:1,1:GRIDY:1),&
(GRIDX/NUM_PROCS)*GRIDY,mpi_real, &
II,10,mpi_comm_world,status,ierr)
END DO
END IF
CALL MPI_FINALIZE(IERR)
END PROGRAM TRY
I am using a subroutine named CONCENTRATION.f90 which is:
SUBROUTINE CONCENTRATION(GRIDX, GRIDY, NUM_PROCS, MY_ID , PSI)
implicit none
INTEGER*8, INTENT(IN) :: GRIDX, GRIDY
INTEGER , INTENT(IN) :: NUM_PROCS, MY_ID
REAL*8 , DIMENSION(GRIDX,GRIDY), INTENT(OUT) :: PSI
INTEGER*8 I, J
DO I=1+MY_ID*GRIDX/NUM_PROCS, (MY_ID+1)*GRIDX/NUM_PROCS
DO J=1,GRIDY
PSI(I,J)=2.0
END DO
END DO
END SUBROUTINE CONCENTRATION
The code currently gives me error since I think I should have made some changes on the subroutine CONCENTRATION.f90. Or I should also change the way I call the subroutine.
Could you please tell me what are those changes? Thanks for your helps in advance

Your program segfaults because of type mismatch. In the main program you have declared PSI as an array of REAL:
REAL , DIMENSION(gridx,gridy) :: PSI
while in the CONCENTRATION subroutine you use another type of REAL*8:
REAL*8 , DIMENSION(GRIDX,GRIDY), INTENT(OUT) :: PSI
By default REAL is 4 bytes long while REAL*8 (or DOUBLE PRECISION or REAL(KIND=8)) is 8 bytes long. So you are giving to CONCENTRATION an array that is 2 times smaller than what it believes to be and all ranks from NUM_PROCS/2 onwards write past the end of the PSI array and thus cause segfaults. If you run with one process only, then even rank 0 will segfault.
You should also read about MPI collective operations. MPI_GATHER and MPI_GATHERV do exactly what you are trying to achieve whith multiple sends and receives here.

The only change would be to declare concentration as reentrant. That could be the default for Fortran 90. (The bulk of my experience is F77, and reentrant is not the default there.)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using MPI_AllReduce with MPI_IN_PLACE in Fortran - fortran

Related

MPI_WIN_ALLOCATE_SHARED: memory limited?

Passing an allocated array from a SUBTROUTINE to the main program in Fortran

Invalid pointer and segmentation fault when using MPI_Gather in Fortran

Using MPI in subroutines

How to call subroutines in fortran parallelized by MPI?

Categories

Resources