I'm working on code that serially calls a subroutine (which in turn performs iterations) many times. I wish to parallelize the iterations inside the subroutine. The problem with mpi is that I'm only allowed to initialize it once. Hence, I cannot initialize it in my subroutine, which gets called multiple times. Can anyone suggest a way out of this?
My problem is roughly as outlined below:
program p
...
do i=1,10000
call subroutine s(i)
end do
end program p
subroutine s(j)
...
do i=1,10000
...
end do
end subroutine s
I wish to parallelize this process.
Thanks a lot. That helped! But let me re frame my question,
Within the iterations of the main program, along with the subroutine s, I have to call another subroutine s2, (which doesn't need to be parallelized). I thought, it could be done this way:
!initialize mpi
do i=1:1000
if rank!=0
call s
else call s2
end if
end do
!finalize mpi
But the main problem here is, while the rest of the processes proceed slowly, process 0 will proceed quickly. (Something not desirable).So, is it possible to make process 0 wait after each iteration till the other process complete their iteration?
You need to initialize and finalize MPI in the main program. Typically, you then define a load-balancing that is valid for the work in the subroutine.
Then you do your loop inside the subroutine in parallel and gather (reduce?) the results at the end of the subroutine so you have all information you need when the subroutine is called next.
This works the same way as it would with a loop in the main program (without calling the subroutine).
Here is a minimum example:
module testMod
use mpi
implicit none
!#include "mpif.h"
!===
contains
!===
subroutine s(mysize, myrank, array)
integer,intent(in) :: mysize, myrank
integer,intent(inout) :: array(:)
integer :: i, ierror
! Do stuff
do i=1,size(array)
! Skip element that is not associated with the current process
if ( mod(i,mysize) .ne. myrank ) cycle
array(i) = array(i) + 1
enddo ! i
! MPI Allreduce
call MPI_Allreduce(MPI_IN_PLACE, array, size(array), MPI_INTEGER, &
MPI_MAX, MPI_COMM_WORLD, ierror)
end subroutine
end module
program mpiTest
use testMod
use mpi
implicit none
!#include "mpif.h"
integer :: mysize, myrank, ierror
integer,parameter :: ITER=100
integer,parameter :: arraySize=10
integer :: work(arraySize)
integer :: i
! MPI Initialization
call MPI_Init(ierror)
call MPI_Comm_rank(MPI_COMM_WORLD, myrank, ierror)
call MPI_Comm_size(MPI_COMM_WORLD, mysize, ierror)
work = 0
do i=1,ITER
call s(mysize, myrank, work)
enddo
if ( myrank .eq. 0 ) write(*,*) work
! MPI Finalize
call MPI_Finalize(ierror)
end program
Related
Since the collective communications of MPI don't have a flag parameter I was wondering if this can cause problems:
I have 2 MPI_communicators, that are not equal, but overlapping and I want to execute MPI_IBcast on both of them simuntaniously:
program main
use mpi
implicit none
integer :: comm1, comm2, ierr, me, req(2)
integer :: color1(4) = [1,1,1,0]
integer :: color2(4) = [0,1,1,1]
integer :: a, b, new_me
call MPI_Init(ierr)
call MPI_Comm_rank(MPI_COMM_WORLD, me, ierr)
a = me - 1
b = me + 1
call MPI_Comm_split(MPI_COMM_WORLD, color1(me+1), me, comm1, ierr)
call MPI_Comm_split(MPI_COMM_WORLD, color2(me+1), me, comm2, ierr)
CALL MPI_IBcast(a,1, MPI_INTEGER, 0, comm1, req(1), ierr)
CALL MPI_IBcast(b,1, MPI_INTEGER, 0, comm2, req(2), ierr)
call MPI_Waitall(2, req, MPI_STATUSES_IGNORE, ierr)
call MPI_Finalize(ierr)
write (*,*) me, a, b
end program main
It worked fine on my local machine, but my question is: Is this guaranteed to work or do I have to use MPI_Bcast, rather than MPI_Ibcast?
The communications use different communicators. As such it is fine. It doesn't matter that they both contain the same set of processes, they are different communicators and that is all that matters.
Consider the following Fortran program
program test_prg
use iso_fortran_env, only : real64
use mpi_f08
implicit none
real(real64), allocatable :: arr_send(:), arr_recv(:)
integer :: ierr
call MPI_Init(ierr)
allocate(arr_send(3), arr_recv(3))
arr_send = 1
print *, lbound(arr_recv)
call MPI_Gatherv(arr_send, size(arr_send), MPI_DOUBLE_PRECISION, arr_recv, [size(arr_send)], [0], MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, ierr)
print *, lbound(arr_recv)
call MPI_Finalize(ierr)
end program
Execution of this program on 1 processor (compiled with gfortran 9.3.0 and mpich 3.3.2), prints:
1
0
So arr_recv has changed its lower bound after the call to MPI_Gatherv. If I use arr_recv(1) instead of arr_recv in the call to MPI_Gatherv, then it doesn't change. If I replace mpi_f08 module with mpi, then using either arr_recv(1) or arr_recv doesn't change the lower bound.
Why is lower bound changing in this program?
At this stage, I believe this is a bug in gfortran affecting the MPI Fortran 2018 bindings (e.g. use mpi_f08) and I reported it at https://gcc.gnu.org/pipermail/fortran/2020-September/055068.html.
All gfortran versions are affected (I tried 9.2.0, 10.2.0 and the latest master branch, versions 8 and earlier do not support dimension(..).
The reproducer below can be used to evidence the issue
MODULE FOO
INTERFACE
SUBROUTINE dummyc(x0) BIND(C, name="sync")
type(*), dimension(..) :: x0
END SUBROUTINE
END INTERFACE
contains
SUBROUTINE dummy(x0)
type(*), dimension(..) :: x0
call dummyc(x0)
END SUBROUTINE
END MODULE
PROGRAM main
USE FOO
IMPLICIT NONE
integer :: before(2), after(2)
INTEGER, parameter :: n = 1
DOUBLE PRECISION, ALLOCATABLE :: buf(:)
DOUBLE PRECISION :: buf2(n)
ALLOCATE(buf(n))
before(1) = LBOUND(buf,1)
before(2) = UBOUND(buf,1)
CALL dummy (buf)
after(1) = LBOUND(buf,1)
after(2) = UBOUND(buf,1)
if (before(1) .NE. after(1)) stop 1
if (before(2) .NE. after(2)) stop 2
before(1) = LBOUND(buf2,1)
before(2) = UBOUND(buf2,1)
CALL dummy (buf2)
after(1) = LBOUND(buf2,1)
after(2) = LBOUND(buf2,1)
if (before(1) .NE. after(1)) stop 3
if (before(2) .NE. after(2)) stop 4
END PROGRAM
FWIW, Intel ifort compiler (I tried 18.0.5) works fine with the reproducer.
This question is a bit old, but I had the same issue on the last days with GNU Fortran (GCC) 11.2.0 when moving from use mpi to use mpi_f08 - but in my case, it was with MPI_Allreduce. It's clearly a bug on the MPI functions, but one workaround is to send the argument with the bounds defined, as var(1:end). This worked for me. In your case, you could try:
call MPI_Gatherv(arr_send(1:3), size(arr_send(1:3)), MPI_DOUBLE_PRECISION, arr_recv(1:3), [size(arr_send(1:3))], [0], MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, ierr)
In this thread it has been explained in two ways how to pass messages using MPI with declared data types. I have a data structure with allocatables
type t_example
real, allocatable :: x(:), y(:), z(:)
end type
For maintability of the code would the easiest thing not be to use MPI_TYPE_CONTIGUOUS as follows
! -- declare
type(t_example) :: p1
type(MPI_DATATYPE) :: mpi_dtexample
(...)
call MPI_TYPE_CONTIGUOUS(sizeof(p1), MPI_BYTE, mpi_dtexample, ierr);
call MPI_TYPE_COMMIT(mpi_dtexample, ierr)
Following this I can simply use the send/recv with mpi_dtexample as being the data type.
I cannot come to my mind when it becomes more sensible to use the mpi_type_create_struct, as this would require you to explicitly tell the sequence of the declared type, with the data type and their corresponding sizes.
YES, the MPI_TYPE_CONTIGUOUS approach assumes that the declared type is contiguous and I would not be able to use this approach if I wanted to pass certain strided elements of the declared type.
Is there else anything I should raise my alarm bells on when using MPI_TYPE_CONTIGUOUS
A FULL EXAMPLE
Running with 2 ranks only.
module md_
use mpi_f08
integer numtasks, rank, tag, i, ierr
type(mpi_status) stat
type T_PART
real, allocatable :: x(:), y(:), z(:)
end type
contains
end module
program struct
use md_
implicit none
type(t_part) :: test_
type(mpi_datatype) :: mpidt
integer :: sz, szz
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
tag = 1
szz= 10
allocate(test_% x(szz),test_% y(szz), test_% z(szz) )
sz = sizeof(test_)
call MPI_Type_contiguous(sz, MPI_BYTE ,mpidt, ierr)
call MPI_TYPE_COMMIT(mpidt, ierr)
if (rank .eq. 0) then
do i=1,szz
test_%x(i) = i*1+mod(i,3)
test_%y(i) = i*2+mod(i,3)
test_%z(i) = i*3+mod(i,3)
end do
call MPI_SEND(test_, 1, mpidt, 1, tag, &
MPI_COMM_WORLD, ierr)
else
call MPI_RECV(test_, 1, mpidt, 0, tag, &
MPI_COMM_WORLD, stat, ierr)
endif
print *, 'rank= ',rank,' test_% x= ', test_%z(1) ! seg faults for rank 2
call mpi_barrier(MPI_COMM_WORLD, ierr)
! free datatype when done using it
call MPI_TYPE_FREE(mpidt, ierr)
call MPI_FINALIZE(ierr)
end
I am writing a generic subroutine in fortran90 that will read in a column of data (real values). The subroutine should first check to see that the file exists and can be opened, then it determines the number of elements (Array_Size) in the column by reading the number of lines until end of file. Next the subroutine rewinds the file back to the beginning and reads in the data points and assigns each to an array (Column1(n)) and also determines the largest element in the array (Max_Value). The hope is that this subroutine can be written to be completely generic and not require any prior knowledge of the number of data points in the file, which is why the number of elements is first determined so the array, "Column1", can be dynamically allocated to contain "Array_Size" number of data points. Once the array is passed to the main program, it is transferred to another array and the initial dynamically allocated array is deallocated so that the routine can be repeated for multiple other input files, although this example only reads in one data file.
As written below, the program compiles just fine on the Intel fortran compiler; however, when it runs it gives me a severe (174): SIGSEV fault. I place the write(,) statements before and after the allocate statement in the subroutine and it prints the first statement "Program works here", but not the second, which indicates that the problem is occurring at the ALLOCATE (Column1(Array_Size)) statement, between the two write(,) statements. I re-compiled it with -C flag and ran the executable, which fails again and states severe (408): "Attempt to fetch from allocatable variable MISC_ARRAY when it is not allocated". The variable MISC_ARRAY is the dummy variable in the main program, which seems to indicate that the compiler wants the array allocated in the main program and not in the subprogram. If I statically allocate the array, the program works just fine. In order to make the program generic and not require any knowledge of the size of each file, it needs to be dynamically allocated and this should happen in the subprogram, not the main program. Is there a way to accomplish this that I am not seeing?
PROGRAM MAIN
IMPLICIT NONE
! - variable Definitions for MAIN program
INTEGER :: n
! - Variable Definitions for EXPENSE READER Subprograms
REAL, DIMENSION(:), ALLOCATABLE :: Misc_Array,MISC_DATA
INTEGER :: Size_Misc
REAL :: Peak_Misc_Value
! REAL :: Misc_Array(365)
CHARACTER(LEN=13) :: File_Name
File_Name = "Misc.txt"
CALL One_Column(File_Name,Size_Misc,Peak_Misc_Value,Misc_Array)
ALLOCATE (MISC_DATA(Size_Misc))
DO n = 1,Size_Misc ! Transfers array data
MISC_DATA(n) = Misc_Array(n)
END DO
DEALLOCATE (Misc_Array)
END PROGRAM MAIN
SUBROUTINE One_Column(File_Name,Array_Size,Max_Value,Column1)
IMPLICIT NONE
REAL, DIMENSION(:), ALLOCATABLE,INTENT(OUT) :: Column1
! REAL :: Column1(365)
REAL, INTENT(OUT) :: Max_Value
CHARACTER,INTENT(IN) :: File_Name*13
INTEGER, INTENT(OUT) :: Array_Size
INTEGER :: Open_Status,Input_Status,n
! Open the file and check to ensure it is properly opened
OPEN(UNIT=100,FILE = File_Name,STATUS = 'old',ACTION = 'READ', &
IOSTAT = Open_Status)
IF(Open_Status > 0) THEN
WRITE(*,'(A,A)') "**** Cannot Open ",File_Name
STOP
RETURN
END IF
! Determine the size of the file
Array_Size = 0
DO 300
READ(100,*,IOSTAT = Input_Status)
IF(Input_Status < 0) EXIT
Array_Size = Array_Size + 1
300 CONTINUE
REWIND(100)
WRITE(*,*) "Program works here"
ALLOCATE (Column1(Array_Size))
WRITE(*,*) "Program stops working here"
Max_Value = 0.0
DO n = 1,Array_Size
READ(100,*) Column1(n)
IF(Column1(n) .GT. Max_Value) Max_Value = Column1(n)
END DO
END SUBROUTINE One_Column
This is an educated guess: I think that the subroutine One_Column ought to have an explicit interface. As written the source code has 2 compilation units, a program (called main) and an external subroutine (called One_Column).
At compile-time the compiler can't figure out the correct way to call the subroutine from the program. In good-old (emphasis on old) Fortran style it takes a leap of faith and leaves it to the linker to find a subroutine with the right name and crosses its fingers (as it were) and hopes that the actual arguments match the dummy arguments at run-time. This approach won't work on subroutines returning allocated data structures.
For a simple fix move end program to the end of the source file, in the line vacated enter the keyword contains. The compiler will then take care of creating the necessary interface.
For a more scalable fix, put the subroutine into a module and use-associate it.
I think it is important to show the corrected code so that future users can read the question and also see the solution. I broke the subroutine into a series of smaller functions and one subroutine to keep the data as local as possible and implemented it into a module. The main program and module are attached. The main program includes a call to the functions twice, just to show that it can be used modularly to open multiple files.
PROGRAM MAIN
!
! - Author: Jonathan A. Webb
! - Date: December 11, 2014
! - Purpose: This code calls subprograms in module READ_COLUMNAR_FILE
! to determine the number of elements in an input file, the
! largest element in the input file and reads in the column of
! data as an allocatable array
!***************************************************************************
!***************************************************************************
!********************* **********************
!********************* VARIABLE DEFINITIONS **********************
!********************* **********************
!***************************************************************************
!***************************************************************************
USE READ_COLUMNAR_FILE
IMPLICIT NONE
CHARACTER(LEN=13) :: File_Name
INTEGER :: Size_Misc,Size_Bar,Unit_Number
REAL :: Peak_Misc_Value,Peak_Bar_Value
REAL, DIMENSION(:), ALLOCATABLE :: Misc_Array,Bar_Array
!***************************************************************************
!***************************************************************************
!********************* **********************
!********************* FILE READER BLOCK **********************
!********************* **********************
!***************************************************************************
!***************************************************************************
! - This section reads in data from all of the columnar input decks.
! User defines the input file name and number
File_Name = "Misc.txt"; Unit_Number = 100
! Determines the number of rows in the file
Size_Misc = File_Length(File_Name,Unit_Number)
! Yields the allocatable array and the largest element in the array
CALL Read_File(File_Name,Unit_Number,Misc_Array,Peak_Misc_Value)
File_Name = "Bar.txt"; Unit_Number = 100
Size_Bar = File_Length(File_Name,Unit_Number)
CALL Read_File(File_Name,Unit_Number,Bar_Array,Peak_Bar_Value)
END PROGRAM MAIN
MODULE READ_COLUMNAR_FILE
!***********************************************************************************
!***********************************************************************************
! ***
! Author: Jonathan A. Webb ***
! Purpose: Compilation of subprograms required to read in multi-column ***
! data files ***
! Drafted: December 11, 2014 ***
! ***
!***********************************************************************************
!***********************************************************************************
!
!-----------------------------------
! Public functions and subroutines for this module
!-----------------------------------
PUBLIC :: Read_File
PUBLIC :: File_Length
!-----------------------------------
! Private functions and subroutines for this module
!-----------------------------------
PRIVATE :: Check_File
!===============================================================================
CONTAINS
!===============================================================================
SUBROUTINE Check_File(Unit_Number,Open_Status,File_Name)
INTEGER,INTENT(IN) :: Unit_Number
CHARACTER(LEN=13), INTENT(IN) :: File_Name
INTEGER,INTENT(OUT) :: Open_Status
! Check to see if the file exists
OPEN(UNIT=Unit_Number,FILE = File_Name,STATUS='old',ACTION='read', &
IOSTAT = Open_Status)
IF(Open_Status .GT. 0) THEN
WRITE(*,*) "**** Cannot Open ", File_Name," ****"
STOP
RETURN
END IF
END SUBROUTINE Check_File
!===============================================================================
FUNCTION File_Length(File_Name,Unit_Number)
INTEGER :: File_Length
INTEGER, INTENT(IN) :: Unit_Number
CHARACTER(LEN=13),INTENT(IN) :: File_Name
INTEGER :: Open_Status,Input_Status
! Calls subroutine to check on status of file
CALL Check_File(Unit_Number,Open_Status,File_Name)
IF(Open_Status .GT. 0)THEN
WRITE(*,*) "**** Cannot Read", File_Name," ****"
STOP
RETURN
END IF
! Determine File Size
File_Length = 0
DO 300
READ(Unit_Number,*,IOSTAT = Input_Status)
IF(Input_Status .LT. 0) EXIT
File_Length = File_Length + 1
300 CONTINUE
CLOSE(Unit_Number)
END FUNCTION File_Length
!===============================================================================
SUBROUTINE Read_File(File_Name,Unit_Number,Column1,Max_Value)
INTEGER, INTENT(IN) :: Unit_Number
REAL, DIMENSION(:), ALLOCATABLE,INTENT(OUT) :: Column1
CHARACTER(LEN=13),INTENT(IN) :: File_Name
REAL, INTENT(OUT) :: Max_Value
INTEGER :: Array_Size,n
! Determines the array size and allocates the array
Array_Size = File_Length(File_Name,Unit_Number)
ALLOCATE (Column1(Array_Size))
! - Reads in columnar array and determines the element with
! the largest value
Max_Value = 0.0
OPEN(UNIT= Unit_Number,File = File_Name)
DO n = 1,Array_Size
READ(Unit_Number,*) Column1(n)
IF(Column1(n) .GT. Max_Value) Max_Value = Column1(n)
END DO
CLOSE(Unit_Number)
END SUBROUTINE Read_File
!===============================================================================
END MODULE READ_COLUMNAR_FILE
My problem is that I don't know how to call subroutines when I use mpi scheme in Fortran.
I have written this small code named TRY.f90 in which there is a subroutine named CONCENTRATION.f90. How should I change CONCENTRATION.f90 in order to make the code works?
PROGRAM TRY
USE MPI
integer status(mpi_status_size)
INTEGER I, J, K, II, IERR, MY_ID, NUM_PROCS, PSP
INTEGER , PARAMETER :: GRIDX =64, GRIDY=64
REAL , DIMENSION(gridx,gridy) :: PSI
PSI=0
PRINT*, 'VARIABLE'
CALL MPI_INIT(IERR)
CALL MPI_COMM_RANK(MPI_COMM_WORLD,MY_ID,IERR)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD,NUM_PROCS,IERR)
CALL CONCENTRATION(GRIDX, GRIDY, NUM_PROCS, MY_ID , PSI)
IF (MY_ID .NE. 0) THEN
CALL mpi_send( PSI(1+MY_ID*GRIDX/NUM_PROCS:(MY_ID+1)*GRIDX/NUM_PROCS:1,1:GRIDY:1),&
(GRIDX/NUM_PROCS)*GRIDY,mpi_real, 0,10,mpi_comm_world,ierr)
END IF
IF (MY_ID .EQ. 0) THEN
DO II=1,NUM_PROCS-1
CALL mpi_recv(PSI(1+II*GRIDX/NUM_PROCS:(II+1)*GRIDX/NUM_PROCS:1,1:GRIDY:1),&
(GRIDX/NUM_PROCS)*GRIDY,mpi_real, &
II,10,mpi_comm_world,status,ierr)
END DO
END IF
CALL MPI_FINALIZE(IERR)
END PROGRAM TRY
I am using a subroutine named CONCENTRATION.f90 which is:
SUBROUTINE CONCENTRATION(GRIDX, GRIDY, NUM_PROCS, MY_ID , PSI)
implicit none
INTEGER*8, INTENT(IN) :: GRIDX, GRIDY
INTEGER , INTENT(IN) :: NUM_PROCS, MY_ID
REAL*8 , DIMENSION(GRIDX,GRIDY), INTENT(OUT) :: PSI
INTEGER*8 I, J
DO I=1+MY_ID*GRIDX/NUM_PROCS, (MY_ID+1)*GRIDX/NUM_PROCS
DO J=1,GRIDY
PSI(I,J)=2.0
END DO
END DO
END SUBROUTINE CONCENTRATION
The code currently gives me error since I think I should have made some changes on the subroutine CONCENTRATION.f90. Or I should also change the way I call the subroutine.
Could you please tell me what are those changes? Thanks for your helps in advance
Your program segfaults because of type mismatch. In the main program you have declared PSI as an array of REAL:
REAL , DIMENSION(gridx,gridy) :: PSI
while in the CONCENTRATION subroutine you use another type of REAL*8:
REAL*8 , DIMENSION(GRIDX,GRIDY), INTENT(OUT) :: PSI
By default REAL is 4 bytes long while REAL*8 (or DOUBLE PRECISION or REAL(KIND=8)) is 8 bytes long. So you are giving to CONCENTRATION an array that is 2 times smaller than what it believes to be and all ranks from NUM_PROCS/2 onwards write past the end of the PSI array and thus cause segfaults. If you run with one process only, then even rank 0 will segfault.
You should also read about MPI collective operations. MPI_GATHER and MPI_GATHERV do exactly what you are trying to achieve whith multiple sends and receives here.
The only change would be to declare concentration as reentrant. That could be the default for Fortran 90. (The bulk of my experience is F77, and reentrant is not the default there.)