Value of variable for every single iteration - fortran

This is my code:
Program Arrays_0
Implicit none
Integer :: i , Read_number , Vig_Position , Vipg_Position , n_iter
Integer , parameter :: Br_gra = 12
Integer , parameter , dimension ( Br_gra ) :: Vig = [ ( i , i = 1 , Br_gra) ]
Integer , parameter , dimension ( Br_gra ) :: Vipg = [ 0 , 1 , 1 , 1 , 2 , 2 , 3 , 4 , 4 , 7 , 7 , 7 ]
Integer :: Result_of_calculation
Write(*,*)"Enter the number (From 1 to Br_gra):"
Read(*,*) Read_number
Vig_Position = Vig(Read_number)
Vipg_Position = Vipg(Vig_Position)
n_iter = 0
Result_of_calculation = Vig_Position
Do while( Vipg_Position .ne. Vipg(1) )
n_iter = n_iter + 1
Vig_Position = Vipg_Position
Result_of_calculation = Result_of_calculation + Vig_Position
Vipg_Position = Vipg(Vig_Position)
End Do
Write(*,'(a,1x,i0)')"The number of iteration is:",n_iter
Write(*,'(a,1x,i0)')"The result of calculation is:",Result_of_calculation
End Program Arrays_0
Intention is to get value in every iteration for a variables:
Vig_Position , Result_of_calculation and Vipg_position.
How to declare variables for that kind of calculation?
In general, is there other method for counting a number of iteration?
How to declare variables in function of number of iteration befoure the code set that number like result of calculation?

Now that the question has been clarified, here's a typical way of solving the problem in Fortran. It isn't the only possible way, but it is the most general. The strategy in routine resize to double the old size is reasonable - you want to minimize the number of times this is called. The data set in the sample program is small, so to show the effect I allocated the array very small to begin with. In reality, you would want a reasonably large initial allocation (say, 100 at least).
Note the use of an internal procedure that inherits the type vals_t from its host.
Program Arrays_0
Implicit none
Integer :: i , Read_number , Vig_Position , Vipg_Position , n_iter
Integer , parameter :: Br_gra = 12
Integer , parameter , dimension ( Br_gra ) :: Vig = [ ( i , i = 1 , Br_gra) ]
Integer , parameter , dimension ( Br_gra ) :: Vipg = [ 0 , 1 , 1 , 1 , 2 , 2 , 3 , 4 , 4 , 7 , 7 , 7 ]
Integer :: Result_of_calculation
! Declare a type that will hold one iteration's values
type vals_t
integer Vig_Position
integer Vipg_Position
integer Result_of_calculation
end type vals_t
! Declare an allocatable array to hold the values
! Initial size doesn't matter, but should be close
! to a lower limit of possible sizes
type(vals_t), allocatable :: vals(:)
allocate (vals(2))
Write(*,*)"Enter the number (From 1 to Br_gra):"
Read(*,*) Read_number
Vig_Position = Vig(Read_number)
Vipg_Position = Vipg(Vig_Position)
n_iter = 0
Result_of_calculation = Vig_Position
Do while( Vipg_Position .ne. Vipg(1) )
n_iter = n_iter + 1
Vig_Position = Vipg_Position
Result_of_calculation = Result_of_calculation + Vig_Position
Vipg_Position = Vipg(Vig_Position)
! Do we need to make vals bigger?
if (n_iter > size(vals)) call resize(vals)
vals(n_iter) = vals_t(Vig_Position,Vipg_Position,Result_of_calculation)
End Do
Write(*,'(a,1x,i0)')"The number of iteration is:",n_iter
Write(*,'(a,1x,i0)')"The result of calculation is:",Result_of_calculation
! Now vals is an array of size(vals) of the sets of values
! For demonstration, print the size of the array and the values
Write(*,'(a,1x,i0)')"Size of vals is:", size(vals)
Write(*,'(3i7)') vals(1:n_iter)
contains
! Subroutine resize reallocates the array passed to it
! with double the current size, copies the old data to
! the new array, and transfers the allocation to the
! input array
subroutine resize(old_array)
type(vals_t), allocatable, intent(inout) :: old_array(:)
type(vals_t), allocatable :: new_array(:)
! Allocate a new array at double the size
allocate (new_array(2*size(old_array)))
write (*,*) "Allocated new array of size ", size(new_array)
! Copy the data
new_array(1:size(old_array)) = old_array
! Transfer the allocation to old_array
call MOVE_ALLOC (FROM=new_array, TO=old_array)
! new_array is now deallocated
return
end subroutine resize
End Program Arrays_0
Sample output:
Enter the number (From 1 to Br_gra):
12
Allocated new array of size 4
The number of iteration is: 3
The result of calculation is: 23
Size of vals is: 4
7 3 19
3 1 22
1 0 23

Related

I have an array of zeros and I want to add 1's to some array elements

I tried this in fortran.
The initial array is zeros, for example:
InitialMatrix = 0 0
0 0
0 0
0 0
And I want to add numbers 1 sequentially:
FinalMatrix = 0 0
0 1
1 0
1 1
As if adding one bit at a time.
I generated a matrix containing all elements equal to zero and tried to use ibset to change the zero element to 1, but without success.
The code I made was this one:
program test
implicit none
integer(1) numSitio
integer:: Comb
integer:: i, j
integer, dimension(10, 10)::MatrixZeros
integer, dimension(10, 10)::MatrixSpins
print*, "Set the number of sites: "
read(*,*)numSitio
Comb = 2**numSitio
MatrixZeros = 0
MatrixSpins = ibset(MatrixZeros, 1)
do i = 1, Comb
do j = 1, numSitio
MatrixSpins(i,j) = 0
end do
end do
do i = 1, Comb
write(*,*)(MatrixSpins(i,j), j= 1, numSitio)
end do
!write(*,*)MatrixZeros
end program test
I generated a matrix of zeros to be auxiliary, and then I created the matrix of spins that I want. I tried using the ibset command to add numbers 1 to zero array.
Note: I want to generate a matrix with n columns and 2^n rows, where the first row is all zero elements and starting from the second row, add a 1 bit in the last column. In the third line, the rightmost bit (last column, move to the left column and go on adding bits 1 until in the last line of the matrix, all elements are 1.
If you simply want to count in binary, then just
program test
implicit none
integer i, n
character fmt
print *, "Input number of bits: "; read *, n
write( fmt, "( i0 )" ) n
print "( b0." // fmt // " )", ( i, i = 0, 2 ** n - 1 )
end program test
If you absolutely have to store all the configurations in a matrix (an array of characters would be just as good) then you can just count up with traditional "carry" operations:
program test
implicit none
integer i, j, imx, jmx
integer n
integer, allocatable :: M(:,:)
print *, "Input number of bits: "; read *, n
imx = 2 ** n - 1; jmx = n - 1
allocate( M(0:imx, 0:jmx) )
M = 0
do i = 1, imx
M(i,:) = M(i-1,:)
M(i,jmx) = M(i,jmx) + 1
j = jmx
do while ( M(i,j) > 1 ) ! "carry" operations
M(i,j) = 0
j = j - 1
M(i,j) = M(i,j) + 1
end do
end do
do i = 0, imx
print "( *( i1, 1x ) )", M(i,:)
end do
end program test
or you could use bit operations:
program test
implicit none
integer i, j, imx, jmx, p
integer n
integer, allocatable :: M(:,:)
print *, "Input number of bits: "; read *, n
imx = 2 ** n - 1; jmx = n - 1
allocate( M(0:imx, 0:jmx) )
M = 0
p = 1
do j = 0, jmx
do i = 0, imx
if ( iand(i,p) > 0 ) M(i,jmx-j) = 1
end do
p = 2 * p
end do
do i = 0, imx
print "( *( i1, 1x ) )", M(i,:)
end do
end program test

Need Help: Fortran Infinite Loop

I'm very new to using Fortran, and I can't seem to figure out why this subroutine is getting stuck in an infinite loop. Here's the code for said DO loop:
SUBROUTINE FILLARRAY(K, N)
REAL X, Y
INTEGER XPOS, YPOS
INTEGER K(N,N)
DO 10 I = 1, 100
15 CALL RANDOM_NUMBER(X)
CALL RANDOM_NUMBER(Y)
XPOS = 20 * X + 1.0
YPOS = 20 * Y + 1.0
PRINT *, XPOS
PRINT *, YPOS
IF(K(XPOS, YPOS).NE.1) THEN
K(XPOS,YPOS) = 1
END IF
IF (K(XPOS, YPOS).EQ.1) THEN
GOTO 15
END IF
10 CONTINUE
RETURN
END
I am basically trying to fill a 20 x 20 array randomly with the value 1.
I was also wondering if there is a way to forego using END IF that anyone knows about! Thank you!
The array will eventually all be set to 1 leading to an infinte loop with GOTO 15.
Try this code instead:
IF(K(XPOS, YPOS).NE.1) THEN
K(XPOS,YPOS) = 1
ELSE
GOTO 15
END IF
This method is horribly inefficient. I'd do it something like the below. Note i've filled with i rather than 1, partially to show the random order of filling, partially to act as a check I haven't screwed up, as each number should appear exactly once.
ian#eris:~/work/stack$ cat random_fill.f90
Program random_fill
Implicit None
Integer, Parameter :: n = 5
Integer, Dimension( 1:n, 1:n ) :: K
Call fillarray( k, n )
Write( *, '( 5( 5( i2, 1x ) / ) )' ) K
Contains
Subroutine fillarray( k, n )
Implicit None
Integer , Intent( In ) :: n
Integer, Dimension( 1:n, 1:n ), Intent( Out ) :: K
Integer, Dimension( : ), Allocatable :: index_list
Real :: rand
Integer :: val, x, y
Integer :: i
index_list = [ ( i, i = 0, n * n - 1 ) ]
Do i = 1, n * n
Call Random_number( rand )
val = 1 + Int( rand * Size( index_list ) )
x = 1 + index_list( val ) / n
y = 1 + Mod( index_list( val ), n )
K( x, y ) = i
index_list = [ index_list( :val - 1 ), index_list( val + 1: ) ]
End Do
End Subroutine fillarray
End Program random_fill
ian#eris:~/work/stack$ gfortran -O -Wall -Wextra -pedantic -fcheck=all -std=f2008 random_fill.f90
ian#eris:~/work/stack$ ./a.out
11 8 14 24 16
19 23 25 15 3
21 20 5 7 18
6 17 22 12 9
2 4 1 10 13
ian#eris:~/work/stack$ ./a.out
24 15 7 22 25
8 17 10 1 14
9 5 4 12 2
11 21 20 3 18
6 19 23 13 16
ian#eris:~/work/stack$ ./a.out
22 11 6 21 24
7 3 8 10 25
17 19 16 2 9
13 4 15 5 23
12 1 14 20 18
You are stuck in an infinite loop because the statement goto 15 is always executed.
If k(xpos, ypos) is 1 then the first if statement is false, but the second is true so the goto 15 is executed.
If instead k(xpos, ypos) is not 1 then the first if statement is true, and so k(xpos, ypos) is set to 1. The second if statement is only evaluated after this, and so is true, and so the goto 15 is executed.
As other answers have mentioned, the method you are using is horribly inefficient. However, if you still want to use it, here is the fixed code, with a number of modernisations:
subroutine fillarray(k, n)
implicit none
integer, intent(in) :: n
integer, intent(inout) :: k(n,n)
real(dp) :: x, y
integer :: xpos, ypos
integer :: i
i=1
do while (i<=100)
call random_number(x)
call random_number(y)
xpos = 20*x + 1.0_dp
ypos = 20*y + 1.0_dp
if (k(xpos, ypos)/=1) then
k(xpos, ypos) = 1
i = i+1
endif
enddo
end subroutine
Note that this assumes that the array k has already been initialised, otherwise checking the contents of the array will lead to undefined behaviour.
As to whether end if is optional or not. No, it is not optional. It is always required. All languages need to know where the end of a loop is. C uses }, Python uses un-indentation, Fortran uses endif.

Population of vector with matrix elements column by column

My IDE is CodeBlocks 16.01.
This is my code:
Program Matrix_To_Vector
Implicit none
Integer::i,j
Integer, parameter :: M = 3 , N = 2
Integer, dimension ( M , N ) :: Matrix_0
Integer, dimension ( M*N ) :: Vector_0
! Population of matrix
Do i = 1 , 3
Do j = 1 , 2
Matrix_0(i,j) = i+j
End Do
End Do
Open (15, File = 'Result.txt', Status = 'Unknown', Action = 'Write')
Do i = 1 , 3
Write(15,*) Matrix_0(i,:)
End Do
Write(15,*) ( Vector_0(i), i =1 , size(Vector_0))
Close (15)
End Program Matrix_To_Vector
The result of matrix population is:
2 3
3 4
4 5
My intention is to make vector Vector_0 with elements from matrix Matrix_0. The size of vector is M*N. First element of vector is (1,1) from matrix and last is (3,2) - i want to do that column by column.
Is there way for doing that with do loops?
The contetn of wanted vector is:
2 3 4 3 4 5
like this?
do j=1,2
vector_0(3*(j-1)+1:3*(j-1)+3)=Matrix_0(:,j)
enddo
of course you could just do
vector_0=reshape(matrix_0,shape(vector_0))
as well

MPI, SUBARRAY types

I have concerns using the Subarray type. I'm trying to transfer a part of global domain (represented by a 2D array) between two procs. I have no problem achieving this without the sub-array structure. The following example illustrate what I want to do. A whole 2D domain is equally divided into two parts for each MPI processus, one containing "zero" (left) and the other containing "one" (right). On each MPI-processus, the half-domain is made of the "real domain" plus a border of guard cells (that's why the array indexing begin at 1-ist, see below). The objective is simple : right domain has to send it's two first columns into the two "guard cells" columns of the left one.
The code that works is the followng :
PROGRAM TEST
USE mpi
IMPLICIT NONE
INTEGER*4, PARAMETER :: ist = 2 ! Guard cells
INTEGER*4, PARAMETER :: nx = 5, ny = 2 ! Domain size
INTEGER*4, DIMENSION (1-ist:nx+ist,1-ist:ny+ist) :: prim ! A vector
INTEGER*4, DIMENSION (1:ist,1-ist:ny+ist) :: prim_S ! Mini vetctor (Send)
INTEGER*4, DIMENSION (1:ist,1-ist:ny+ist) :: prim_R ! Mini vector (Receive)
! MPI stuff
INTEGER*4, PARAMETER :: ndims = 2
INTEGER*4 :: mpicode, nb_procs, rang, comm, etiquette = 100
LOGICAL, DIMENSION (ndims) :: periods
LOGICAL :: reorganisation
INTEGER*4, DIMENSION (ndims) :: dims
INTEGER*4, DIMENSION (2) :: voisinage
INTEGER*4 :: i, j
!--------------------------------------------------------------------
periods = .FALSE.
reorganisation = .FALSE.
dims(1) = 2
dims(2) = 1
! Initialize MPI
CALL MPI_INIT (mpicode)
CALL MPI_COMM_SIZE (MPI_COMM_WORLD, nb_procs, mpicode)
CALL MPI_COMM_RANK (MPI_COMM_WORLD, rang, mpicode)
WRITE (*,*) "PROCESSUS ", rang, " OK"
! Create topology
CALL MPI_CART_CREATE (MPI_COMM_WORLD, ndims, dims, periods,
& reorganisation, comm, mpicode)
CALL MPI_CART_SHIFT (comm, 0, 1, voisinage(1), voisinage(2),
& mpicode)
! Fill each part of the domain
IF (rang .eq. 0) then
prim = 0
ELSE
prim = 1
END IF
! Print the left side BEFORE communication
IF (rang .eq. 0) then
DO j=1-ist, ny+ist
WRITE (*,*) prim(:,j)
END DO
WRITE(*,*) " "
END IF
IF (rang .eq. 1) then
DO i=1, ist
DO j=1-ist, ny+ist
prim_S(i,j) = prim(i,j)
END DO
END DO
END IF
CALL MPI_BARRIER (MPI_COMM_WORLD, mpicode)
! Communication
IF (rang .eq. 0) then
CALL MPI_RECV (prim_R, size(prim_R), MPI_INTEGER
& , voisinage(2),
& etiquette, comm, mpicode)
END IF
IF (rang .eq. 1) then
CALL MPI_SEND (prim_S, size(prim_S), MPI_INTEGER ,
& voisinage(1),
& etiquette,comm, mpicode)
END IF
IF (rang .eq. 0) then
DO i=nx+1, nx+ist
DO j=1-ist, ny+ist
prim(i,j) = prim_R(i-nx,j)
END DO
END DO
END IF
! Print the left domain AFTER the communication
IF (rang .eq. 0) then
DO j=1-ist, ny+ist
WRITE (*,*) prim(:,j)
END DO
END IF
CALL MPI_FINALIZE(mpicode)
END PROGRAM
So it's working, here is the output after the communication :
0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 1 1
The fact is that I don't like this method that much, and as the subarray type looks like created for such purposes, I would like to use it. Here is the code, equivalent as previous :
PROGRAM TEST
USE mpi
IMPLICIT NONE
INTEGER*4, PARAMETER :: ist = 2 ! Guard cells
INTEGER*4, PARAMETER :: nx = 5, ny = 2 ! Domain size
INTEGER*4, DIMENSION (1-ist:nx+ist,1-ist:ny+ist) :: prim ! A vector
! MPI stuff
INTEGER*4, PARAMETER :: ndims = 2
INTEGER*4 :: mpicode, nb_procs, rang, comm, etiquette = 100
LOGICAL, DIMENSION (ndims) :: periods
LOGICAL :: reorganisation
INTEGER*4, DIMENSION (ndims) :: dims
INTEGER*4, DIMENSION (6) :: voisinage
INTEGER*4, DIMENSION (2) :: profil_tab, profil_sous_tab
INTEGER*4 :: i, j
INTEGER*4 :: type_envoi_W, type_envoi_E
INTEGER*4 :: type_reception_W, type_reception_E
!--------------------------------------------------------------------
periods = .FALSE.
reorganisation = .FALSE.
dims(1) = 2
dims(2) = 1
CALL MPI_INIT (mpicode)
CALL MPI_COMM_SIZE (MPI_COMM_WORLD, nb_procs, mpicode)
CALL MPI_COMM_RANK (MPI_COMM_WORLD, rang, mpicode)
WRITE (*,*) "PROCESSUS ", rang, " OK"
CALL MPI_CART_CREATE (MPI_COMM_WORLD, ndims, dims, periods,
& reorganisation, comm, mpicode)
CALL MPI_CART_SHIFT (comm, 0, 1, voisinage(1), voisinage(2),
& mpicode)
profil_tab(:) = SHAPE (prim)
profil_sous_tab(:) = (/ist, ny+2*ist/)
! Envoi W
CALL MPI_TYPE_CREATE_SUBARRAY (2, profil_tab, profil_sous_tab,
& (/ist,0/) , MPI_ORDER_FORTRAN, MPI_DOUBLE_PRECISION
& , type_envoi_W, mpicode)
CALL MPI_TYPE_COMMIT (type_envoi_W, mpicode)
! Reception E
CALL MPI_TYPE_CREATE_SUBARRAY (2, profil_tab, profil_sous_tab,
& (/nx+ist,0/) , MPI_ORDER_FORTRAN, MPI_DOUBLE_PRECISION,
& type_reception_E, mpicode)
CALL MPI_TYPE_COMMIT (type_reception_E, mpicode)
IF (rang .eq. 0) then
prim = 0
ELSE
prim = 1
END IF
IF (rang .eq. 0) then
DO j=1-ist, ny+ist
WRITE (*,*) prim(:,j)
END DO
WRITE(*,*) " "
END IF
CALL MPI_BARRIER (MPI_COMM_WORLD, mpicode)
IF (rang .eq. 0) then
CALL MPI_RECV (prim, 1, type_reception_E, voisinage(2),
& etiquette, comm, mpicode)
END IF
IF (rang .eq. 1) then
CALL MPI_SEND (prim, 1, type_envoi_W, voisinage(1),
& etiquette,comm, mpicode)
END IF
IF (rang .eq. 0) then
DO j=1-ist, ny+ist
WRITE (*,*) prim(:,j)
END DO
END IF
CALL MPI_FINALIZE(mpicode)
END PROGRAM
The output is that weird domain, plus a segmentation fault... :
0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1
0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1
0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
I guess I'm wrong with the beginning coordinates when I'm creating the subarray types but I don't understand why.
I wish you guys can help me with that! Thanks for reading, it's quite a long post but I tried to be clear.
Oak
Your array type should be composed of MPI_INTEGER, not MPI_DOUBLE_PRECISION.
Your MPI_RECV() call in both cases requires a Status argument.

Sending 2D arrays in Fortran with MPI_Gather

I want to send 2d chunks of data using MPI_GATHER. For example: I have 2x3 arrays on each node and I want 8x3 array on root, if I have 4 nodes. For 1d arrays, MPI_GATHER sorts data according to MPI ranks, but for 2d data it creates a mess!
What is the clean way to put chunks in order?
I expected the output of this code:
program testmpi
use mpi
implicit none
integer :: send (2,3)
integer :: rec (4,3)
integer :: ierror,my_rank,i,j
call MPI_Init(ierror)
MPI_DATA_TYPE type_col
! find out process rank
call MPI_Comm_rank(MPI_COMM_WORLD, my_rank, ierror)
if (my_rank==0) then
send=1
do i=1,2
print*,(send(i,j),j=1,3)
enddo
endif
if (my_rank==1) then
send=5
! do 1,2
! print*,(send(i,j),j=1,3)
! enddo
endif
call MPI_GATHER(send,6,MPI_INTEGER,rec,6,MPI_INTEGER,0,MPI_COMM_WORLD,ierror)
if (my_rank==0) then
print*,'<><><><><>rec'
do i=1,4
print*,(rec(i,j),j=1,3)
enddo
endif
call MPI_Finalize(ierror)
end program testmpi
to be something like this :
1 1 1
1 1 1
5 5 5
5 5 5
but it looks like this:
1 1 5
1 1 5
1 5 5
1 5 5
The following a literal Fortran translation of this answer. I had thought this was unnecessary, but the multiple differences in array indexing and memory layout might mean that it is worth doing a Fortran version.
Let me start by saying that you generally don't really want to do this - scatter and gather huge chunks of data from some "master" process. Normally you want each task to be chugging away at its own piece of the puzzle, and you should aim to never have one processor need a "global view" of the whole data; as soon as you require that, you limit scalability and the problem size. If you're doing this for I/O - one process reads the data, then scatters it, then gathers it back for writing, you'll want eventually to look into MPI-IO.
Getting to your question, though, MPI has very nice ways of pulling arbitrary data out of memory, and scatter/gathering it to and from a set of processors. Unfortunately that requires a fair number of MPI concepts - MPI Types, extents, and collective operations. A lot of the basic ideas are discussed in the answer to this question -- MPI_Type_create_subarray and MPI_Gather .
Consider a 1d integer global array that task 0 has that you want to distribute to a number of MPI tasks, so that they each get a piece in their local array. Say you have 4 tasks, and the global array is [0,1,2,3,4,5,6,7]. You could have task 0 send four messages (including one to itself) to distribute this, and when it's time to re-assemble, receive four messages to bundle it back together; but that obviously gets very time consuming at large numbers of processes. There are optimized routines for these sorts of operations - scatter/gather operations. So in this 1d case you'd do something like this:
integer, dimension(8) :: global ! only root has this
integer, dimension(2) :: local ! everyone has this
integer, parameter :: root = 0
integer :: rank, comsize
integer :: i, ierr
call MPI_Init(ierr)
call MPI_Comm_size(MPI_COMM_WORLD, comsize, ierr)
call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)
if (rank == root) then
global = [ (i, i=1,8) ]
endif
call MPI_Scatter(global, 2, MPI_INTEGER, & ! send everyone 2 ints from global
local, 2, MPI_INTEGER, & ! each proc recieves 2 into
root, & ! sending process is root,
MPI_COMM_WORLD, ierr) ! all procs in COMM_WORLD participate
After this, the processors' data would look like
task 0: local:[1,2] global: [1,2,3,4,5,6,7,8]
task 1: local:[3,4] global: [garbage]
task 2: local:[5,6] global: [garbage]
task 3: local:[7,8] global: [garbage]
That is, the scatter operation takes the global array and sends contiguous 2-int chunks to all the processors.
To re-assemble the array, we use the MPI_Gather() operation, which works exactly the same but in reverse:
local = local + rank
call MPI_Gather (local, 2, MPI_INTEGER, & ! everyone sends 2 ints from local
global, 2, MPI_INTEGER, & ! root receives 2 ints each proc into global
root, & ! receiving process is root,
MPI_COMM_WORLD, ierr) ! all procs in COMM_WORLD participate
And now the arrays look like:
task 0: local:[1,2] global: [1,2,4,5,7,8,10,11]
task 1: local:[4,5] global: [garbage-]
task 2: local:[7,8] global: [garbage-]
task 3: local:[10,11] global: [garbage-]
Gather brings all the data back.
What happens if the number of data points doesn't evenly divide the number of processes, and we need to send different numbers of items to each process? Then you need a generalized version of scatter, MPI_Scatterv, which lets you specify the counts for each processor, and displacements -- where in the global array that piece of data starts. So let's say with the same 4 tasks you had an array of characters [a,b,c,d,e,f,g,h,i] with 9 characters, and you were going to assign every process two characters except the last, that got three. Then you'd need
character, dimension(9) :: global
character, dimension(3) :: local
integer, dimension(4) :: counts
integer, dimension(4) :: displs
if (rank == root) then
global = [ (achar(i+ichar('a')), i=0,8) ]
endif
local = ['-','-','-']
counts = [2,2,2,3]
displs = [0,2,4,6]
mycounts = counts(rank+1)
call MPI_Scatterv(global, counts, displs, & ! proc i gets counts(i) chars from displs(i)
MPI_CHARACTER, &
local, mycounts, MPI_CHARACTER, & ! I get mycounts chars into
root, & ! root rank does sending
MPI_COMM_WORLD, ierr) ! all procs in COMM_WORLD participate
Now the data looks like
task 0: local:"ab-" global: "abcdefghi"
task 1: local:"cd-" global: *garbage*
task 2: local:"ef-" global: *garbage*
task 3: local:"ghi" global: *garbage*
You've now used scatterv to distribute the irregular amounts of data. The displacement in each case is two*rank (measured in characters; the displacement is in unit of the types being sent for a scatter or received for a gather; it's not generally in bytes or something) from the start of the array, and the counts are [2,2,2,3]. If it had been the first processor we wanted to have 3 characters, we would have set counts=[3,2,2,2] and displacements would have been [0,3,5,7]. Gatherv again works exactly the same but reverse; the counts and displs arrays would remain the same.
Now, for 2D, this is a bit trickier. If we want to send 2d sublocks of a 2d array, the data we're sending now no longer is contiguous. If we're sending (say) 3x3 subblocks of a 6x6 array to 4 processors, the data we're sending has holes in it:
2D Array
---------
|000|222|
|000|222|
|000|222|
|---+---|
|111|333|
|111|333|
|111|333|
---------
Actual layout in memory
[000111000111000111222333222333222333]
(Note that all high-performance computing comes down to understanding the layout of data in memory.)
If we want to send the data that is marked "1" to task 1, we need to skip three values, send three values, skip three values, send three values, skip three values, send three values. A second complication is where the subregions stop and start; note that region "1" doesn't start where region "0" stops; after the last element of region "0", the next location in memory is partway-way through region "1".
Let's tackle the first layout problem first - how to pull out just the data we want to send. We could always just copy out all the "0" region data to another, contiguous array, and send that; if we planned it out carefully enough, we could even do that in such a way that we could call MPI_Scatter on the results. But we'd rather not have to transpose our entire main data structure that way.
So far, all the MPI data types we've used are simple ones - MPI_INTEGER specifies (say) 4 bytes in a row. However, MPI lets you create your own data types that describe arbitrarily complex data layouts in memory. And this case -- rectangular subregions of an array -- is common enough that there's a specific call for that. For the 2-dimensional case we're describing above,
integer :: newtype;
integer, dimension(2) :: sizes, subsizes, starts
sizes = [6,6] ! size of global array
subsizes = [3,3] ! size of sub-region
starts = [0,0] ! let's say we're looking at region "0"
! which begins at offset [0,0]
call MPI_Type_create_subarray(2, sizes, subsizes, starts, MPI_ORDER_FORTRAN, MPI_INTEGER, newtype, ierr)
call MPI_Type_commit(newtype, ierr)
This creates a type which picks out just the region "0" from the global array. Note that even in Fortran, the start parameter is given as an offset (eg, 0-based) from the start of the array, not an index (eg, 1-based).
We could send just that piece of data now to another processor
call MPI_Send(global, 1, newtype, dest, tag, MPI_COMM_WORLD, ierr) ! send region "0"
and the receiving process could receive it into a local array. Note that the receiving process, if it's only receiving it into a 3x3 array, can not describe what it's receiving as a type of newtype; that no longer describes the memory layout, because there aren't big skips between the end of one row and the start of the next. Instead, it's just receiving a block of 3*3 = 9 integers:
call MPI_Recv(local, 3*3, MPI_INTEGER, 0, tag, MPI_COMM_WORLD, ierr)
Note that we could do this for other sub-regions, too, either by creating a different type (with different start array) for the other blocks, or just by sending starting from the first location of the particular block:
if (rank == root) then
call MPI_Send(global(4,1), 1, newtype, 1, tag, MPI_COMM_WORLD, ierr)
call MPI_Send(global(1,4), 1, newtype, 2, tag, MPI_COMM_WORLD, ierr)
call MPI_Send(global(4,4), 1, newtype, 3, tag, MPI_COMM_WORLD, ierr)
local = global(1:3, 1:3)
else
call MPI_Recv(local, 3*3, MPI_INTEGER, 0, tag, MPI_COMM_WORLD, rstatus, ierr)
endif
Now that we understand how to specify subregions, there's only one more thing to discuss before using scatter/gather operations, and that's the "size" of these types. We couldn't just use MPI_Scatter() (or even scatterv) with these types yet, because these types have an extent of 15 integers; that is, where they end is 15 integers after they start -- and where they end doesn't line up nicely with where the next block begins, so we can't just use scatter - it would pick the wrong place to start sending data to the next processor.
Of course, we could use MPI_Scatterv() and specify the displacements ourselves, and that's what we'll do - except the displacements are in units of the send-type size, and that doesn't help us either; the blocks start at offsets of (0,3,18,21) integers from the start of the global array, and the fact that a block ends 15 integers from where it starts doesn't let us express those displacements in integer multiples at all.
To deal with this, MPI lets you set the extent of the type for the purposes of these calculations. It doesn't truncate the type; it's just used for figuring out where the next element starts given the last element. For types like these with holes in them, it's frequently handy to set the extent to be something smaller than the distance in memory to the actual end of the type.
We can set the extent to be anything that's convenient to us. We could just make the extent 1 integer, and then set the displacements in units of integers. In this case, though, I like to set the extent to be 3 integers - the size of a sub-column - that way, block "1" starts immediately after block "0", and block "3" starts immediately after block "2". Unfortunately, it doesn't quite work as nicely when jumping from block "2" to block "3", but that can't be helped.
So to scatter the subblocks in this case, we'd do the following:
integer(kind=MPI_ADDRESS_KIND) :: extent
starts = [0,0]
sizes = [6, 6]
subsizes = [3, 3]
call MPI_Type_create_subarray(2, sizes, subsizes, starts, &
MPI_ORDER_FORTRAN, MPI_INTEGER, &
newtype, ierr)
call MPI_Type_size(MPI_INTEGER, intsize, ierr)
extent = 3*intsize
call MPI_Type_create_resized(newtype, 0, extent, resizedtype, ierr)
call MPI_Type_commit(resizedtype, ierr)
Here we've created the same block type as before, but we've resized it; we haven't changed where the type "starts" (the 0) but we've changed where it "ends" (3 integers). We didn't mention this before, but the MPI_Type_commit is required to be able to use the type; but you only need to commit the final type you actually use, not any intermediate steps. You use MPI_Type_free to free the committed type when you're done.
So now, finally, we can scatterv the blocks: the data manipulations above are a little complicated, but once it's done, the scatterv looks just like before:
counts = 1 ! we will send one of these new types to everyone
displs = [0,1,6,7] ! the starting point of everyone's data
! in the global array, in block extents
call MPI_Scatterv(global, counts, displs, & ! proc i gets counts(i) types from displs(i)
resizedtype, &
local, 3*3, MPI_INTEGER, & ! I'm receiving 3*3 int
root, MPI_COMM_WORLD, ierr) !... from (root, MPI_COMM_WORLD)
And now we're done, after a little tour of scatter, gather, and MPI derived types.
An example code which shows both the gather and the scatter operation, with character arrays, follows. Running the program:
$ mpirun -np 4 ./scatter2d
global array is:
000222
000222
000222
111333
111333
111333
Rank 0 received:
000
000
000
Rank 1 received:
111
111
111
Rank 2 received:
222
222
222
Rank 3 received:
333
333
333
Rank 0 sending:
111
111
111
Rank 1 sending:
222
222
222
Rank 2 sending:
333
333
333
Rank 3 sending:
444
444
444
Root received:
111333
111333
111333
222444
222444
222444
and the code follows:
program scatter
use mpi
implicit none
integer, parameter :: gridsize = 6 ! size of array
integer, parameter :: procgridsize = 2 ! size of process grid
character, allocatable, dimension (:,:) :: global, local
integer, dimension(procgridsize**2) :: counts, displs
integer, parameter :: root = 0
integer :: rank, comsize
integer :: localsize
integer :: i, j, row, col, ierr, p, charsize
integer, dimension(2) :: sizes, subsizes, starts
integer :: newtype, resizedtype
integer, parameter :: tag = 1
integer, dimension(MPI_STATUS_SIZE) :: rstatus
integer(kind=MPI_ADDRESS_KIND) :: extent, begin
call MPI_Init(ierr)
call MPI_Comm_size(MPI_COMM_WORLD, comsize, ierr)
call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)
if (comsize /= procgridsize**2) then
if (rank == root) then
print *, 'Only works with np = ', procgridsize**2, ' for now.'
endif
call MPI_Finalize(ierr)
stop
endif
localsize = gridsize/procgridsize
allocate( local(localsize, localsize) )
if (rank == root) then
allocate( global(gridsize, gridsize) )
forall( col=1:procgridsize, row=1:procgridsize )
global((row-1)*localsize+1:row*localsize, &
(col-1)*localsize+1:col*localsize) = &
achar(ichar('0')+(row-1)+(col-1)*procgridsize)
end forall
print *, 'global array is: '
do i=1,gridsize
print *, global(i,:)
enddo
endif
starts = [0,0]
sizes = [gridsize, gridsize]
subsizes = [localsize, localsize]
call MPI_Type_create_subarray(2, sizes, subsizes, starts, &
MPI_ORDER_FORTRAN, MPI_CHARACTER, &
newtype, ierr)
call MPI_Type_size(MPI_CHARACTER, charsize, ierr)
extent = localsize*charsize
begin = 0
call MPI_Type_create_resized(newtype, begin, extent, resizedtype, ierr)
call MPI_Type_commit(resizedtype, ierr)
counts = 1 ! we will send one of these new types to everyone
forall( col=1:procgridsize, row=1:procgridsize )
displs(1+(row-1)+procgridsize*(col-1)) = (row-1) + localsize*procgridsize*(col-1)
endforall
call MPI_Scatterv(global, counts, displs, & ! proc i gets counts(i) types from displs(i)
resizedtype, &
local, localsize**2, MPI_CHARACTER, & ! I'm receiving localsize**2 chars
root, MPI_COMM_WORLD, ierr) !... from (root, MPI_COMM_WORLD)
do p=1, comsize
if (rank == p-1) then
print *, 'Rank ', rank, ' received: '
do i=1, localsize
print *, local(i,:)
enddo
endif
call MPI_Barrier(MPI_COMM_WORLD, ierr)
enddo
local = achar( ichar(local) + 1 )
do p=1, comsize
if (rank == p-1) then
print *, 'Rank ', rank, ' sending: '
do i=1, localsize
print *, local(i,:)
enddo
endif
call MPI_Barrier(MPI_COMM_WORLD, ierr)
enddo
call MPI_Gatherv( local, localsize**2, MPI_CHARACTER, & ! I'm sending localsize**2 chars
global, counts, displs, resizedtype,&
root, MPI_COMM_WORLD, ierr)
if (rank == root) then
print *, ' Root received: '
do i=1,gridsize
print *, global(i,:)
enddo
endif
call MPI_Type_free(newtype,ierr)
if (rank == root) deallocate(global)
deallocate(local)
call MPI_Finalize(ierr)
end program scatter
So that's the general solution. For your particular case, where we are just appending by rows, we don't need a Gatherv, we can just use a gather, because in this case, all of the displacements are the same -- before, in the 2d block case we had one displacement going 'down', and then jumps in that displacement as you went 'across' to the next column of blocks. Here, the displacement is always one extent from the previous one, so we don't need to give displacements explicitly. So a final code looks like:
program testmpi
use mpi
implicit none
integer, dimension(:,:), allocatable :: send, recv
integer, parameter :: nsendrows = 2, nsendcols = 3
integer, parameter :: root = 0
integer :: ierror, my_rank, comsize, i, j, ierr
integer :: blocktype, resizedtype
integer, dimension(2) :: starts, sizes, subsizes
integer (kind=MPI_Address_kind) :: start, extent
integer :: intsize
call MPI_Init(ierror)
call MPI_Comm_rank(MPI_COMM_WORLD, my_rank, ierror)
call MPI_Comm_size(MPI_COMM_WORLD, comsize, ierror)
allocate( send(nsendrows, nsendcols) )
send = my_rank
if (my_rank==root) then
! we're going to append the local arrays
! as groups of send rows
allocate( recv(nsendrows*comsize, nsendcols) )
endif
! describe what these subblocks look like inside the full concatenated array
sizes = [ nsendrows*comsize, nsendcols ]
subsizes = [ nsendrows, nsendcols ]
starts = [ 0, 0 ]
call MPI_Type_create_subarray( 2, sizes, subsizes, starts, &
MPI_ORDER_FORTRAN, MPI_INTEGER, &
blocktype, ierr)
start = 0
call MPI_Type_size(MPI_INTEGER, intsize, ierr)
extent = intsize * nsendrows
call MPI_Type_create_resized(blocktype, start, extent, resizedtype, ierr)
call MPI_Type_commit(resizedtype, ierr)
call MPI_Gather( send, nsendrows*nsendcols, MPI_INTEGER, & ! everyone send 3*2 ints
recv, 1, resizedtype, & ! root gets 1 resized type from everyone
root, MPI_COMM_WORLD, ierr)
if (my_rank==0) then
print*,'<><><><><>recv'
do i=1,nsendrows*comsize
print*,(recv(i,j),j=1,nsendcols)
enddo
endif
call MPI_Finalize(ierror)
end program testmpi
Running this with 3 processes gives:
$ mpirun -np 3 ./testmpi
<><><><><>recv
0 0 0
0 0 0
1 1 1
1 1 1
2 2 2
2 2 2
Here's another code block for any other struggling Fortran beginners out there like myself. It shows two different ways to achieve a MPI_Gather on a nx * ny array divided into M * N blocks, with one block on each process.
One way uses MPI derived datatypes, the other simply sends 1D raw data and sorts if afterward on the main node.
Both produce an ordered M * N 2D array. For the example of 3 x 2 sub-arrays in x * y, each having 4 x 5 elements:
n mpi ranks: 6
rank 0 has topology coords 0,0
rank 1 has topology coords 0,1
rank 2 has topology coords 1,0
rank 3 has topology coords 1,1
rank 4 has topology coords 2,0
rank 5 has topology coords 2,1
1 1 1 1 3 3 3 3 5 5 5 5
1 1 1 1 3 3 3 3 5 5 5 5
1 1 1 1 3 3 3 3 5 5 5 5
1 1 1 1 3 3 3 3 5 5 5 5
1 1 1 1 3 3 3 3 5 5 5 5
0 0 0 0 2 2 2 2 4 4 4 4
0 0 0 0 2 2 2 2 4 4 4 4
0 0 0 0 2 2 2 2 4 4 4 4
0 0 0 0 2 2 2 2 4 4 4 4
0 0 0 0 2 2 2 2 4 4 4 4
Note that if the raw data is sent and not re-arranged it would be ordered by default like so:
5 5 5 5 5 5 5 5 5 5 5 5
4 4 4 4 5 5 5 5 5 5 5 5
4 4 4 4 4 4 4 4 4 4 4 4
3 3 3 3 3 3 3 3 4 4 4 4
3 3 3 3 3 3 3 3 3 3 3 3
2 2 2 2 2 2 2 2 2 2 2 2
1 1 1 1 2 2 2 2 2 2 2 2
1 1 1 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0
The following code can be compiled with
mpif90 -Wall test.F90 -o test.out
and run with
mpirun -n 6 test.out
program main
use mpi
use, intrinsic :: iso_fortran_env
use iso_c_binding
implicit none
! ===
integer(int64) i, j
integer(int32) nx , ny
integer(int32) nxr , nyr
integer(int32) npx , npy
integer(int32) ri , rj
integer, dimension(2) :: mpi_coords, mpi_coords2
integer rank, n_ranks, ierror
integer rank_cart
integer comm, comm2d
real(real64), dimension(:,:), allocatable :: A, A_global
real(real64), dimension(:), allocatable :: B_global
integer(int32), dimension(:), allocatable :: lengths, displacements
integer(int32) subarraytype, resizedtype
integer(int32) dblsize
integer(kind=MPI_ADDRESS_KIND) :: start, extent
! === MPI interface initialization
call MPI_INIT(ierror)
comm = MPI_COMM_WORLD
call MPI_COMM_SIZE(comm, n_ranks, ierror)
call MPI_COMM_RANK(comm, rank, ierror)
if (rank .eq. 0) then
print '(a, i0)', 'n mpi ranks: ', n_ranks
end if
npx = 3 !! n processes in x
nxr = 4 !! n pts per rank in x
nx = npx*nxr !! n pts x total
npy = 2 !! n processes in y
nyr = 5 !! n pts per rank in y
ny = npy*nyr !! n pts y total
! === check that n_ranks are equal to the hardcoded number of processes in [x,y]
if (n_ranks .ne. npx*npy) then
if (rank .eq. 0) then
print '(a)','n_ranks != npx*npy'
end if
call MPI_Abort(MPI_COMM_WORLD, -1, ierror)
call MPI_Finalize(ierror)
end if
call MPI_BARRIER(comm, ierror)
! === create 2D Cartesian grid
call MPI_Cart_create(comm, 2, (/npx,npy/), (/.true.,.false./), .true., comm2d, ierror)
! === get rank in 2D communicator
call MPI_Comm_rank(comm2d, rank_cart, ierror)
! === get this rank ID and coordinates within the 2D topology
call MPI_Cart_coords(comm2d, rank_cart, 2, mpi_coords, ierror)
! === print topology
if (.true.) then
print '(a,i0,a,i0,a,i0)', 'rank ', rank_cart, ' has topology coords ', mpi_coords(1), ',', mpi_coords(2)
end if
call MPI_BARRIER(comm, ierror)
call flush(6)
! === populate data
allocate( A(nxr,nyr) )
A(:,:) = real(rank,real64)
! ===
if (.true.) then !! use MPI derived types
allocate (lengths(n_ranks))
allocate (displacements(n_ranks))
call MPI_Type_create_subarray(2, (/nx,ny/), (/nxr,nyr/), (/0,0/), &
MPI_ORDER_FORTRAN, MPI_DOUBLE_PRECISION, subarraytype, ierror)
call MPI_Type_size(MPI_DOUBLE_PRECISION, dblsize, ierror)
start = 0
extent = nxr*dblsize
call MPI_Type_create_resized(subarraytype, start, extent, resizedtype, ierror)
call MPI_Type_commit(resizedtype, ierror)
lengths = 1
!! displacements = [ npx*0*nyr+0, npx*1*nyr+0, &
!! npx*0*nyr+1, npx*1*nyr+1, &
!! npx*0*nyr+2, npx*1*nyr+2 ] !! for 3x2
!! displacements = [ npx*0*nyr+0, npx*1*nyr+0, npx*2*nyr+0, &
!! npx*0*nyr+1, npx*1*nyr+1, npx*2*nyr+1 ] !! for 2x3
do i=0,n_ranks-1
call MPI_Cart_coords(comm2d, int(i,int32), 2, mpi_coords2, ierror)
ri = mpi_coords2(1)
rj = mpi_coords2(2)
displacements(i+1) = npx*rj*nyr + ri
end do
if (rank .eq. 0) then
allocate( A_global(nx,ny) )
end if
call MPI_Gatherv( A , nxr*nyr, MPI_DOUBLE_PRECISION, &
A_global, lengths, displacements, resizedtype, &
0, comm2d, ierror )
else !! send raw data in 1D, then re-arrange
if (rank .eq. 0) then
allocate( A_global(nx,ny) )
allocate( B_global(nx*ny) )
end if
call MPI_Gather( A , nxr*nyr, MPI_DOUBLE_PRECISION, &
B_global, nxr*nyr, MPI_DOUBLE_PRECISION, &
0, comm2d, ierror )
if (rank .eq. 0) then
if (.true.) then !! re-arrange data
do i=0,n_ranks-1
call MPI_Cart_coords(comm2d, int(i,int32), 2, mpi_coords2, ierror)
ri = mpi_coords2(1)
rj = mpi_coords2(2)
A_global( (ri+0)*nxr+1:(ri+1)*nxr+1 , &
(rj+0)*nyr+1:(rj+1)*nyr+1 ) &
= &
reshape( B_global( (i+0)*nxr*nyr+1 : (i+1)*nxr*nyr+1 ) , (/nxr,nyr/) )
end do
else !! dont do re-arrange
A_global(:,:) = reshape( B_global , (/nx,ny/) )
end if
end if
if (rank .eq. 0) then
deallocate( B_global )
end if
end if
! ===
!! print the array to the terminal
if (rank .eq. 0) then
do j=ny, 1, -1
do i=1, nx
write(*,'(i0,a)',advance='no') int(A_global(i,j)) , " "
end do
write (*,*) ''
end do
end if
!! save data to binary file
if (rank==0) then
open(3, file=trim("A.dat"), access="stream")
write(3) reshape( A_global , (/nx,ny/) )
close(3)
end if
if (rank .eq. 0) then
deallocate( A_global )
end if
! ===
call MPI_BARRIER(comm, ierror)
call MPI_FINALIZE(ierror)
end program main
A final note: pulling all data to one process (for I/O purposes or otherwise) is typically very bad practice for parallel programming. The moment this is done, the code will lose potential scalability. The method shown here is really only fit for smaller codes or for debugging. For I/O purposes, one should seriously look into collective MPI-I/O or libraries such as HDF5 which allow for collective I/O.