How to divide work between processes in MPI - fortran

I have the following code to count the number of primes, I have divided the work in the loop across processors. The problem is that when the subroutine assigns sub-sections of the loop to processors it assigns according to rank and i cant seem to control the order in which they arrive
i.e i would like then to come as 0,1,2,3... and not like 2,1,0,3..
So that if there were 500 iterations in the loop and 5 processors.
Rank 0 executes [1 - 100]
Rank 1 executes [101-200] etc...
program main
implicit none
include 'mpif.h'
integer(4), parameter :: n = 36500
integer(4) :: a(n)
integer(4) :: i
integer(4) :: j
integer(4) :: ista
integer(4) :: iend
integer(4) :: sum
integer(4) :: f=0
integer(4) :: ssum
integer(4) :: ierr
integer(4) :: iproc
integer(4) :: nproc
call MPI_INIT(ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, nproc, ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, iproc, ierr)
call loop_range(2, n, nproc, iproc, ista, iend)
sum = 0.0
print *,ista,"-",iend,">",iproc
do i = ista, iend
f=0
do j=2,INT(SQRT(REAL(i)))
if(MOD(i,j)==0) then
f=1
end if
end do
if(f==0) then
sum = sum + 1
end if
end do
call MPI_REDUCE(sum, ssum, 1, MPI_INTEGER,MPI_SUM, 0,MPI_COMM_WORLD, ierr)
if ( iproc == 0 ) write(6,*)'Total No of primes=', ssum
call MPI_FINALIZE(ierr)
end program main
subroutine para_range(n1, n2, nprocs, irank, ista, iend)
integer(4) :: n1 ! Lowest value of iteration variable
integer(4) :: n2 ! Highest value of iteration variable
integer(4) :: nprocs ! No of Cores/Processors you want to use
integer(4) :: irank ! Process rank
integer(4) :: ista ! Start of iterations for rank iproc
integer(4) :: iend ! End of iterations for rank iproc
integer(4) :: iwork1, iwork2
print *,irank
iwork1 = ( n2 - n1 + 1 ) / nprocs
iwork2 = MOD(n2 - n1 + 1, nprocs)
ista = irank * iwork1 + n1 + MIN(irank, iwork2)
iend = ista + iwork1 - 1
if ( iwork2 > irank ) then
iend = iend + 1
end if
end subroutine para_range
I am using Open MPI.

This question is pretty much the same as your other question (Open MPI ranks are not in order) and the answer is the same. You're misunderstanding the problem.
The "ordering" of the ranks in your case can be seen as arbitrary and unimportant. The ranks that you assign the problem to will do the work you assign them. The problem you have is that you want them all to be printed out in sorted order. That is impossible in MPI. There is no guarantee that if you print messages from all of the ranks at exactly the same time, they will be printed in any particular order. The reason for this is that all of the output must first be sent to the process that launches the application mpiexec or mpirun and then printed to the screen. This transfer may be faster for some processes than others. If it's critical that you output all of the results in order, you must send them all to the same process first, then print them out from there. It is guaranteed that if you print text out all in the same rank, they will come out in the correct order.
So your code will look roughly like this:
...initialize variables...
...divide up work...
...each work does the work assigned to it...
...MPI_SEND result to rank 0...
if (rank == 0)
MPI_RECV results from all processes
print results in order

Related

Using MPI_PUT in fortran and different ranks have different displacements using c_loc

I have MPI ranks split up to calculate different parts an an array, then I want to put/send those slices onto a different rank that doesn't participate in the calculation. That rank is the master of a new communicator set up to do other things with the array (averaging, IO, etc). I got it to work with MPI_isend and MPI_irecv, and now I want to try MPI_Put.
use mpi_f08
use iso_c_binding
implicit none
integer, parameter :: n=10, gps = 18, pes=12, dpes = 6
integer :: main=pes, d=dpes
integer :: diag_master
integer :: global_size, global_rank, diag_size, diag_rank
type(MPI_comm),allocatable :: diag_comm
integer :: pelist_diag
TYPE(MPI_Win) :: win
integer :: ierr, i, j
type(MPI_COMM) :: comm, mycomm
integer :: gsz, grk
integer :: lsz, lrk
integer(KIND=MPI_ADDRESS_KIND) :: local_group
logical :: local_flag
integer :: color,key
!!! THIS IS THE ARRAY
real, dimension(n,pes) :: r
!!!
logical :: on_dpes = .false.
logical,allocatable,dimension(:) :: dpes_list ! true if on dpes list
integer :: comm_manager
integer :: dmg
integer(KIND=MPI_ADDRESS_KIND) :: buff_size !< the size of a variable type
integer(kind=MPI_ADDRESS_KIND) :: displacement
integer :: disp_size
integer :: loc_base
integer, pointer :: fptr
!!!!!!!! THIS ALL WORKS BEGIN !!!!!!!!
comm=MPI_COMM_WORLD
call MPI_INIT(ierr)
call MPI_COMM_SIZE(COMM, gsz, ierr)
call MPI_COMM_RANK(COMM, grk, ierr)
allocate(dpes_list(gsz))
! write (6,*) "I am ",grk," of ",gsz
!> Find the group
call MPI_COMM_GET_ATTR(COMM,MPI_APPNUM,local_group,local_flag,ierr)
!> Split a new communicator as mycom
color = int(local_group)
key = 0
call MPI_COMM_SPLIT(COMM, color, key, mycomm, ierr)
!> Get information about the split communicators
call mpi_comm_size(mycomm,lsz,ierr)
call mpi_comm_rank(mycomm,lrk,ierr)
!> Create data on the main communicator
if (lsz == pes) then
comm_manager = main
on_dpes = .false.
r = 0.0
if (mod(lrk,2) == 0) then
c_loop: do concurrent (i=1:n)
r(i,lrk+1) = sin(real(i))+real(i)
enddo c_loop
else
r(:,lrk+1) = 10.0-dble(lrk)
endif
if (lsz == dpes) then
diag_size = lsz
diag_rank = lrk
comm_manager = d
on_dpes = .true.
diag_comm = mycomm
if (lrk==0) then
dmg = grk
endif
endif
call MPI_ALLGATHER(on_dpes,1,MPI_LOGICAL, &
dpes_list,gsz,MPI_LOGICAL, MPI_COMM_WORLD, ierr)
!> Get the master of dpes
do i=1,gsz
if (dpes_list(i)) then
dmg = i-1
exit
endif
enddo
diag_master = dmg
diag_global_master = dmg
!!!!!!!! THIS ALL WORKS END !!!!!!!!
!! At this point, the ranks that participate in the calculation
!! have values in r(i,lrk+1) where lrk is their rank
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!! THIS IS WHERE THINGS GO WRONG? !!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
disp_size = storage_size(r)
buff_size = disp_size*size(r)
call c_f_pointer(c_loc(r(1,1)),fptr)
loc_base = fptr
nullify(fptr)
write (6,*) loc_base, grk
call MPI_Win_create(loc_base,buff_size,disp_size,MPI_INFO_NULL,&
mpi_comm_world,win,ierr)
call MPI_Win_Fence(0,win,ierr)
displacement = loc_base + disp_size *buff_size
! if (.not.allocated(diag_comm)) then
if (grk == 11) then
call MPI_Put(r(:,global_rank+1),size(r,1),MPI_FLOAT,&
diag_master,displacement,size(r,1), MPI_FLOAT, win ,ierr)
endif
call MPI_Win_Fence(0,win,ierr)
CALL MPI_WIN_FREE(win, ierr)
call MPI_FINALIZE(ierr)
I have ! if (.not.allocated(diag_comm)) then commented out because I tried to do this with all of the ranks that calculate r, but I got the same result.
I am compiling with mpiifort -O0 -fpe0 -init=snan,arrays -no-wrap-margin -traceback -stand f18 and run with mpirun -n 12 ./$#.x : -n 6 ./$#.x in my Makefile. The version of mpiifort I am using is
> mpiifort -v
mpiifort for the Intel(R) MPI Library 2019 Update 2 for Linux*
Copyright 2003-2019, Intel Corporation.
ifort version 19.0.2.187
The output (write (6,*) loc_base, grk)is strange.
1072411986 0
0 1
0 2
0 3
0 4
0 5
0 6
0 7
0 8
0 9
0 10
0 11
2142952877 12
2142952877 13
2142952877 14
2142952877 15
2142952877 16
2142952877 17
Rank 12-17 are the ranks that don't participate in "calculating r", but I'm not sure why c_loc(r(1,1)) is different for these ranks. Also, it is different for rank 0.
My actual questions are
1) How do I calculate the displacement variable? Am I doing it correctly? Is it supposed to be different between ranks because it will be in this case?
2) Why is c_loc(r(1,1)) different for the ranks 12-17? Does it have anything to do with the fact that this is a SPMD program? Why is it different for rank 0?
3) Can I do the one way communication with all of the ranks instead of just one? I had each rank call mpi_isend, and then i just called mpi_irecv in a loop through all of the ranks sending when I did this the other way. Can I do something similar with MPI_Put? Should I be using MPI_Get? Something else?
4) How do I get this to work? This is just an educational example for myself, and what I actually need to do is much more complicated.
I can answer item 2, at least. You have:
call c_f_pointer(c_loc(r(1,1)),fptr)
loc_base = fptr
where loc_base is declared integer. You seem to be assuming that loc_base is some sort of address, but it is not. In Fortran, intrinsic assignment from a pointer assigns the value of the target, not the location of the target. So you're effectively doing a TRANSFER of the REAL values of r to loc_base - probably not what you want.

Fortran code returns 0 for every calculation in the loop

Can anyone help me to find where I am going wrong about writing this code
program time_period
! This program calculates time period of an SHM given length of the chord
implicit none
integer, parameter:: length=10
real, parameter :: g=9.81, pi=3.1415926535897932384
integer, dimension(1:length)::chordlength
integer :: l
real :: time
do l= 1,length
time = 2*pi*(chordlength(l)/(g))**.5
print *, l, time
enddo
end program
Result:
1 0.00000000E+00
2 0.00000000E+00
3 0.00000000E+00
4 0.00000000E+00
5 0.00000000E+00
6 0.00000000E+00
7 0.00000000E+00
8 0.00000000E+00
9 0.00000000E+00
10 0.00000000E+00
If the chord lengths you're interested are the integer values 1,2,...,10 you hardly need an array to store them. Further, if what you are interested in are the SHM period lengths for each of those 10 chord lengths, it strikes me that you should have an array like this:
real, dimension(length) :: shm_periods
which you would then populate, perhaps like this:
do l= 1,length
shm_periods(l) = 2*pi*(l/g)**.5
print *, l, shm_periods(l)
enddo
Next, you could learn about Fortran's array syntax and write only one statement to assign values to shm_periods.
#High Performance Mark
i worked it the following way
program time_period
! This program calculates time period of an SHM given length of the chord
implicit none
integer, parameter:: length=10
real, parameter :: g=9.81, pi=3.1415926535897932384
integer, dimension(1:length)::chordlength
integer :: l
real, dimension(1:length) :: timeperiod
do l= 1,length
print *, 'Enter ChordLength', l
read *, chordlength(l)
timeperiod(l) = 2*pi*(chordlength(l)/g)**.5
enddo
do l=1,length
print *, l, timeperiod(l)
enddo
end program
its giving me results but asking to type the chord lengths...appreciate your help
The code below does not answer your question (since you already did that). But it does address some issues with the design of the code.
As a next step, lets say you want to use a) a function for the calculation, b) have some standard length values to display the period and c) input a custom length for calculation.
Fortran allows for the declaration of elemental functions which can operate on single values or arrays just the same (with no need for a loop). See the example below:
elemental function CalcTimePeriod(chord_length) result(period)
! Calculate the SHM time period from the chord length
real, parameter :: g=9.80665, pi=3.1415926535897932384
real, intent(in) :: chord_length
real :: period
period = 2*pi*sqrt(chord_length/g)
end function
So I am posting the code below in hopes that you can learn something new with modern Fortran.
program SHM_CalcTime
implicit none
! Variables
integer, parameter :: n = 10
real, dimension(n) :: gen_lengths, periods
real :: input_length
integer :: i
! Example calculation from generated array of chord lengths
! fill an array of lengths using the formula len = 1.0 + (i-1)/2
gen_lengths = [ (1.0+real(i-1)/2, i=1, n) ]
! calculate the time periods for ALL the lengths in the array
periods = CalcTimePeriod(gen_lengths)
write (*, '(1x,a14,1x,a18)') 'length', 'period'
do i=1,n
write (*, '(1x,g18.4,1x,g18.6)') gen_lengths(i), periods(i)
end do
input_length = 1.0
do while( input_length>0 )
write (*,*) 'Enter chord length (0 to exit):'
read (*,*) input_length
if(input_length<=0.0) then
exit
end if
write (*, '(1x,g18.4,1x,g18.6)') input_length, CalcTimePeriod(input_length)
end do
contains
elemental function CalcTimePeriod(chord_length) result(period)
! Calculate the SHM time period from the chord length
real, parameter :: g=9.80665, pi=3.1415926535897932384
real, intent(in) :: chord_length
real :: period
period = 2*pi*sqrt(chord_length/g)
end function
end program SHM_CalcTime
On a final note, see that programs can have internal functions declared after a contains statement, with no need for an explicit interface declaration as you would with older Fortran variants.

Using MPI in subroutines

I'm working on code that serially calls a subroutine (which in turn performs iterations) many times. I wish to parallelize the iterations inside the subroutine. The problem with mpi is that I'm only allowed to initialize it once. Hence, I cannot initialize it in my subroutine, which gets called multiple times. Can anyone suggest a way out of this?
My problem is roughly as outlined below:
program p
...
do i=1,10000
call subroutine s(i)
end do
end program p
subroutine s(j)
...
do i=1,10000
...
end do
end subroutine s
I wish to parallelize this process.
Thanks a lot. That helped! But let me re frame my question,
Within the iterations of the main program, along with the subroutine s, I have to call another subroutine s2, (which doesn't need to be parallelized). I thought, it could be done this way:
!initialize mpi
do i=1:1000
if rank!=0
call s
else call s2
end if
end do
!finalize mpi
But the main problem here is, while the rest of the processes proceed slowly, process 0 will proceed quickly. (Something not desirable).So, is it possible to make process 0 wait after each iteration till the other process complete their iteration?
You need to initialize and finalize MPI in the main program. Typically, you then define a load-balancing that is valid for the work in the subroutine.
Then you do your loop inside the subroutine in parallel and gather (reduce?) the results at the end of the subroutine so you have all information you need when the subroutine is called next.
This works the same way as it would with a loop in the main program (without calling the subroutine).
Here is a minimum example:
module testMod
use mpi
implicit none
!#include "mpif.h"
!===
contains
!===
subroutine s(mysize, myrank, array)
integer,intent(in) :: mysize, myrank
integer,intent(inout) :: array(:)
integer :: i, ierror
! Do stuff
do i=1,size(array)
! Skip element that is not associated with the current process
if ( mod(i,mysize) .ne. myrank ) cycle
array(i) = array(i) + 1
enddo ! i
! MPI Allreduce
call MPI_Allreduce(MPI_IN_PLACE, array, size(array), MPI_INTEGER, &
MPI_MAX, MPI_COMM_WORLD, ierror)
end subroutine
end module
program mpiTest
use testMod
use mpi
implicit none
!#include "mpif.h"
integer :: mysize, myrank, ierror
integer,parameter :: ITER=100
integer,parameter :: arraySize=10
integer :: work(arraySize)
integer :: i
! MPI Initialization
call MPI_Init(ierror)
call MPI_Comm_rank(MPI_COMM_WORLD, myrank, ierror)
call MPI_Comm_size(MPI_COMM_WORLD, mysize, ierror)
work = 0
do i=1,ITER
call s(mysize, myrank, work)
enddo
if ( myrank .eq. 0 ) write(*,*) work
! MPI Finalize
call MPI_Finalize(ierror)
end program

How to call subroutines in fortran parallelized by MPI?

My problem is that I don't know how to call subroutines when I use mpi scheme in Fortran.
I have written this small code named TRY.f90 in which there is a subroutine named CONCENTRATION.f90. How should I change CONCENTRATION.f90 in order to make the code works?
PROGRAM TRY
USE MPI
integer status(mpi_status_size)
INTEGER I, J, K, II, IERR, MY_ID, NUM_PROCS, PSP
INTEGER , PARAMETER :: GRIDX =64, GRIDY=64
REAL , DIMENSION(gridx,gridy) :: PSI
PSI=0
PRINT*, 'VARIABLE'
CALL MPI_INIT(IERR)
CALL MPI_COMM_RANK(MPI_COMM_WORLD,MY_ID,IERR)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD,NUM_PROCS,IERR)
CALL CONCENTRATION(GRIDX, GRIDY, NUM_PROCS, MY_ID , PSI)
IF (MY_ID .NE. 0) THEN
CALL mpi_send( PSI(1+MY_ID*GRIDX/NUM_PROCS:(MY_ID+1)*GRIDX/NUM_PROCS:1,1:GRIDY:1),&
(GRIDX/NUM_PROCS)*GRIDY,mpi_real, 0,10,mpi_comm_world,ierr)
END IF
IF (MY_ID .EQ. 0) THEN
DO II=1,NUM_PROCS-1
CALL mpi_recv(PSI(1+II*GRIDX/NUM_PROCS:(II+1)*GRIDX/NUM_PROCS:1,1:GRIDY:1),&
(GRIDX/NUM_PROCS)*GRIDY,mpi_real, &
II,10,mpi_comm_world,status,ierr)
END DO
END IF
CALL MPI_FINALIZE(IERR)
END PROGRAM TRY
I am using a subroutine named CONCENTRATION.f90 which is:
SUBROUTINE CONCENTRATION(GRIDX, GRIDY, NUM_PROCS, MY_ID , PSI)
implicit none
INTEGER*8, INTENT(IN) :: GRIDX, GRIDY
INTEGER , INTENT(IN) :: NUM_PROCS, MY_ID
REAL*8 , DIMENSION(GRIDX,GRIDY), INTENT(OUT) :: PSI
INTEGER*8 I, J
DO I=1+MY_ID*GRIDX/NUM_PROCS, (MY_ID+1)*GRIDX/NUM_PROCS
DO J=1,GRIDY
PSI(I,J)=2.0
END DO
END DO
END SUBROUTINE CONCENTRATION
The code currently gives me error since I think I should have made some changes on the subroutine CONCENTRATION.f90. Or I should also change the way I call the subroutine.
Could you please tell me what are those changes? Thanks for your helps in advance
Your program segfaults because of type mismatch. In the main program you have declared PSI as an array of REAL:
REAL , DIMENSION(gridx,gridy) :: PSI
while in the CONCENTRATION subroutine you use another type of REAL*8:
REAL*8 , DIMENSION(GRIDX,GRIDY), INTENT(OUT) :: PSI
By default REAL is 4 bytes long while REAL*8 (or DOUBLE PRECISION or REAL(KIND=8)) is 8 bytes long. So you are giving to CONCENTRATION an array that is 2 times smaller than what it believes to be and all ranks from NUM_PROCS/2 onwards write past the end of the PSI array and thus cause segfaults. If you run with one process only, then even rank 0 will segfault.
You should also read about MPI collective operations. MPI_GATHER and MPI_GATHERV do exactly what you are trying to achieve whith multiple sends and receives here.
The only change would be to declare concentration as reentrant. That could be the default for Fortran 90. (The bulk of my experience is F77, and reentrant is not the default there.)

how to write wrapper for 'allocate'

I am trying to write a wrapper for 'allocate' function, i.e. function which receives an array and dimensions, allocates memory and returns allocated array. The most important thing is that the function must work with arrays of different rank. But I have to explicitly state rank of array in function interface, and in this case code only compiles if I pass arrays of certain rank as a parameter. For example, this code does not compile:
module memory_allocator
contains
subroutine memory(array, length)
implicit none
real(8), allocatable, intent(out), dimension(:) :: array
integer, intent(in) :: length
integer :: ierr
print *, "memory: before: ", allocated(array)
allocate(array(length), stat=ierr)
if (ierr /= 0) then
print *, "error allocating memory: ierr=", ierr
end if
print *, "memory: after: ", allocated(array)
end subroutine memory
subroutine freem(array)
implicit none
real(8), allocatable, dimension(:) :: array
print *, "freem: before: ", allocated(array)
deallocate(array)
print *, "freem: after: ", allocated(array)
end subroutine freem
end module memory_allocator
program alloc
use memory_allocator
implicit none
integer, parameter :: n = 3
real(8), allocatable, dimension(:,:,:) :: foo
integer :: i, j, k
print *, "main: before memory: ", allocated(foo)
call memory(foo, n*n*n)
print *, "main: after memory: ", allocated(foo)
do i = 1,n
do j = 1,n
do k = 1, n
foo(i, j, k) = real(i*j*k)
end do
end do
end do
print *, foo
print *, "main: before freem: ", allocated(foo)
call freem(foo)
print *, "main: after freem: ", allocated(foo)
end program alloc
Compilation error:
gfortran -o alloc alloc.f90 -std=f2003
alloc.f90:46.14:
call memory(foo, n*n*n)
1
Error: Rank mismatch in argument 'array' at (1) (1 and 3)
alloc.f90:60.13:
call freem(foo)
1
Error: Rank mismatch in argument 'array' at (1) (1 and 3)
Is there any way of implementing such wrapper?..
Thanks!
This can be done via a generic interface block. You have to create procedures for each rank that you want to handle, e.g., memory_1d, memory_2d, ... memory_4d. (Obviously a lot of cut & pasting.) Then you write a generic interface block that gives all of these procedures the alternative name memory as a generic procedure name. When you call memory, the compiler distinguishes which memory_Xd should be called based on the rank of the argument. The same for your freem functions.
This is how intrinsic functions such as sin have long worked -- you can call sin with a real arguments of various previsions, or with a complex argument, and the compiler figures out with actual sin function to call. In really old FORTRAN you had to use different names for the different sin functions. Now modern Fortran you can setup the same thing with your own routines.
Edit: adding a code example demonstrating the method & syntax:
module double_array_mod
implicit none
interface double_array
module procedure double_vector
module procedure double_array_2D
end interface double_array
private ! hides items not listed on public statement
public :: double_array
contains
subroutine double_vector (vector)
integer, dimension (:), intent (inout) :: vector
vector = 2 * vector
end subroutine double_vector
subroutine double_array_2D (array)
integer, dimension (:,:), intent (inout) :: array
array = 2 * array
end subroutine double_array_2D
end module double_array_mod
program demo_user_generic
use double_array_mod
implicit none
integer, dimension (2) :: A = [1, 2]
integer, dimension (2,2) :: B = reshape ( [11, 12, 13, 14], [2,2] )
integer :: i
write (*, '( / "vector before:", / 2(2X, I3) )' ) A
call double_array (A)
write (*, '( / "vector after:", / 2(2X, I3) )' ) A
write (*, '( / "2D array before:" )' )
do i=1, 2
write (*, '( 2(2X, I3) )' ) B (i, :)
end do
call double_array (B)
write (*, '( / "2D array after:" )' )
do i=1, 2
write (*, '( 2(2X, I3) )' ) B (i, :)
end do
stop
end program demo_user_generic
subroutine memory(array, length) has as it first dummy parameter 1-dimensional array (real(8), allocatable, intent(out), dimension(:) :: array).
Calling this subroutine from your main program with 3-dimensional array foo (real(8), allocatable, dimension(:,:,:) :: foo) is error obviously. And this is what compiler actually said.
If you really need such subroutines write one pair memory/freem subroutines for each array of different dimension - one subroutines pair for 1-dimensional array, another for 2-dimensional array, etc.
By the way, memory subroutines will be different in general because in order to allocate n-dimensional array you need to pass n extents to above-mentioned subroutine.