I am very new to MPI and Fortran alike. I have been working on trying to figure this out for a few hours now, with no luck. In my code below, everything is working just fine (besides the fact that my s variable is isolated between processes. When I try to implement the MPI_SEND and MPI_RECV I get the seg faults constantly. I can't seem to figure out what the issue is.
SUBROUTINE do_mpi_simpsons(l, u, n)
INTEGER, INTENT (in) :: l, u, n
! REAL, INTENT (in) :: func
DOUBLE PRECISION :: result, walltime
INTEGER :: clock_start, clock_rate, clock_max, clock_end
DOUBLE PRECISION :: h, s, argVal, finalS
INTEGER :: rank, size, ierror, tag, status(MPI_STATUS_SIZE), count, start, stop
walltime = 0.0D0
h = (u - l) / dble(n)
s = func_hw(dble(l)) + func_hw(dble(u))
CALL system_clock(clock_start, clock_rate, clock_max)
CALL MPI_INIT(ierror)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
count = n / size
start = rank * count
stop = start + count -1
! WRITE(*,*) "Start: ", start
! WRITE(*,*) "Stop: ", stop
WRITE(*,*) rank
DO i = start, stop, 2
s = s + 4 * func_hw(dble(l)+dble(i)*h)
END DO
DO i = start+1, stop-1, 2
s = s + 2 * func_hw(dble(l)+dble(i)*h)
END DO
! This block is causing the seg faults
IF(rank.eq.0) THEN
finalS = s
DO i = 1, size - 1
CALL MPI_RECV(s, 64, MPI_DOUBLE, i, 1, MPI_COMM_WORLD, status, ierror)
finalS = finalS + s
END DO
ELSE
CALL MPI_SEND(s, 64, MPI_DOUBLE, 0, 1, MPI_COMM_WORLD, ierror)
END IF
CALL MPI_FINALIZE(ierror)
CALL system_clock(clock_end, clock_rate, clock_max)
walltime = walltime + real(clock_end - clock_start) / real(clock_rate)
result = s * h / 3
WRITE(*,*) "walltime = ", walltime, " seconds"
WRITE(*,*) "result = ", result
END SUBROUTINE
Related
I would like to perform many independent operations (e.g. time integration of an ODE with different initial conditions) using MPI and Fortran. The initial conditions are a 2$\times 1000$ vector IC for example.
do i=1,1000
(x0,y0) = (x(i),y(i))
Solve an ODE with (x0,y0) for a time duration
Save the result at the end of this duration
enddo
Can anyone help with a minimal code using MPI or a link to something similar.
I have already used OMP but I think with MPI I would have access to more CPUs
If the operations are truly independent (and the number of cases is a multiple of the number of processors) then:
call mpi_scatter to distribute start points from root
call
call mpi_gather to collect the results back on root
root can then write to file.
If the number of processors doesn't divide into the number of cases then you can use mpi_scatterv and mpi_gatherv instead.
Example (rather trivial work per job, rather than solving ODEs):
program main
use iso_fortran_env
use mpi
implicit none
integer stat(mpi_status_size), tag, ierr
integer size, rank
integer, parameter :: N = 256 * 1000 ! assumes this is a multiple of the number of processors
integer, parameter :: root = 0
integer myN
integer i
real(real64), allocatable :: Y(:), myY(:)
real(real64) start, finish
call mpi_init( ierr )
call mpi_comm_size( mpi_comm_world, size, ierr )
call mpi_comm_rank( mpi_comm_world, rank, ierr )
! Set initial values for full array, then start timing
if ( rank == root ) then
allocate( Y(N) )
Y = [ ( i + 0.0_real64, i = 1, N ) ]
start = gettime()
end if
! Root parcels out the work (i.e., distributes the starting points)
! Processor with rank r will look at indices 1+r*N/size to (r+1)*N/size
myN = N / size
allocate( myY(myN) )
call mpi_scatter( Y, myN, mpi_double_precision, &
myY, myN, mpi_double_precision, root, mpi_comm_world, ierr )
! Each processor does its own work
call myWork( myN, myY )
! Root gets its results back
call mpi_gather ( myY, myN, mpi_double_precision, &
Y, myN, mpi_double_precision, root, mpi_comm_world, ierr )
! Root concludes timing, then writes to file
if ( rank == root ) then
finish = gettime()
write( *, * ) "Time taken = ", finish - start
open( 10, file="output.txt" )
write( 10, "( i8, 1x, es11.4 )" ) ( i, Y(i), i = 1, N )
close( 10 )
deallocate( Y )
end if
deallocate( myY )
call mpi_finalize( ierr )
contains
subroutine myWork( N, Y )
integer , intent(in ) :: N
real(real64), intent(inout) :: Y(N)
integer i
do i = 1, 10000
Y = 2 * Y - Y ! silly example, just to use some flops
end do
end subroutine myWork
real(real64) function getTime()
integer t(8)
call date_and_time( values=t )
getTime = 3600 * t(5) + 60 * t(6) + t(7) + 0.001 * t(8)
end function getTime
end program main
I am making a module in Fortran 90 to run PARPACK on a given matrix. I have an existing ARPACK code which functions normally as expected. I tried converting it into PARPACK and it runs into memory clear errors. I am fairly new to coding and fortran, please excuse any blunders I've made.
The code:
!ARPACK module
module parpack
implicit none
contains
subroutine parp
! use mpi
include '/usr/lib/x86_64-linux-gnu/openmpi/include/mpif.h'
integer comm, myid, nprocs, rc, nloc, status(MPI_STATUS_SIZE)
integer, parameter :: pres=8
integer nev, ncv, maxn, maxnev, maxncv
parameter (maxn=10**7, maxnev=maxn-1, maxncv=maxn)
! Arrays for SNAUPD
integer iparam(11), ipntr(14)
logical, allocatable :: select(:)
real(kind=pres), allocatable :: workd(:), workl(:), worktmp1(:), worktmp2(:)
! Scalars for SNAUPD
character bmat*1, which*2
integer ido, n, info, ierr, ldv
integer i, j, ishfts, maxitr, mode1, nconv
integer(kind=pres) lworkl
real(kind=pres) tol
! Arrays for SNEUPD
real(kind=pres), allocatable :: d(:,:), resid(:), v(:,:), workev(:), z(:,:)
! Scalars for SNEUPD
logical rvec, first
real sigmar, sigmai
!==============================================
real(kind=pres), allocatable :: mat(:,:)
open (11, file = 'matrix.dat', status = 'old')
read (11,*) n
!=============================================
! Dimension of the problem
nev = n/10
ncv = nev+2
ldv = n
bmat = 'I'
which = 'LM'
! Additional environment variables
ido = 0
tol = 0.0E+0
info = 0
lworkl = 3*ncv**2+6*ncv
! Algorithm Mode specifications:
ishfts = 1
maxitr = 300
mode1 = 1
iparam(1) = ishfts
iparam(3) = maxitr
iparam(7) = mode1
! Distribution to nodes
!=============================================
! Matrix allocation
allocate (mat(n,n))
! PDNAUPD
allocate (workd(5*n))
allocate (workl(lworkl))
allocate (resid(n))
allocate (worktmp1(n))
allocate (worktmp2(n))
! PDNEUPD
allocate (d(n,3))
allocate (v(ldv,ncv))
allocate (workev(3*n))
allocate (z(ldv,ncv))
allocate (select(ncv))
!===========================================
! Read Matrix from the provided file
mat = 0
read(11,*) mat
mat = transpose(mat)
!===========================================
! MPI Calling
call MPI_INIT(ierr)
comm = MPI_COMM_WORLD
call MPI_COMM_RANK(comm, myid, ierr)
call MPI_COMM_SIZE(comm, nprocs, ierr)
nloc = n/nprocs
! if ( mod(n, nprocs) .gt. myid ) nloc = nloc + n
!===============================================
20 continue
call pdnaupd(comm, ido, bmat, nloc, which, nev, tol, resid, ncv, v, ldv, iparam, ipntr, workd, workl, lworkl, info) !Top level solver
call MPI_BARRIER(comm,ierr)
print *, ido, info, iparam(5) !for testing
!===============================================
if (ido .eq. -1 .or. ido .eq. 1) then
worktmp1 = 0
if (myid .ne. 0) then !It is slave
call MPI_SEND(workd(ipntr(1)), nloc, MPI_DOUBLE_PRECISION, 0, 0, comm, ierr)
else !It is host
worktmp1(1:nloc) = workd(ipntr(1):ipntr(1)+nloc-1)
i = nprocs
if (i .gt. 1) then
do i=1,nprocs-1
call MPI_RECV(worktmp1(i*nloc+1), nloc, MPI_DOUBLE_PRECISION, i, 0, comm, status, ierr)
end do
endif
endif
call MPI_BARRIER(comm,ierr)
if (myid .eq. 0) then !It is host
! Matrix multiplication
worktmp2 = 0
call matmultiply(n, mat, worktmp1, worktmp2)
workd(ipntr(2):ipntr(2)+nloc-1) = worktmp2(1:nloc)
i = nprocs
if (i .gt. 1) then
do i=1,nprocs-1
call MPI_SEND(worktmp2(i*nloc+1), nloc, MPI_DOUBLE_PRECISION, i, 100*i, comm, ierr)
end do
endif
else !It is slave
call MPI_RECV(workd(ipntr(2)), nloc, MPI_DOUBLE_PRECISION, 0, 100*myid, comm, status, ierr)
endif
go to 20
! call matmultiply(n, mat, workd(ipntr(1):ipntr(1)+n-1), workd(ipntr(2):ipntr(2)+n-1))
! go to 20
endif
! print *, info !for testing
!===============================================================
! Post-processing for eigenvalues
rvec = .true.
if (myid .eq. 0) then
call pdneupd ( comm, rvec, 'A', select, d, d(1,2), z, ldv, sigmar, sigmai, &
workev, bmat, n, which, nev, tol, resid, ncv, v, ldv, iparam, ipntr, &
workd, workl, lworkl, info)
endif
! print *, info !for testing
close(11)
call MPI_FINALIZE(ierr)
return
end subroutine
!==============================================================================================
! Additional Function definitions
subroutine matmultiply(n, mat, v, w)
integer n, i, j
integer, parameter :: pres=8
real(kind = pres) mat(n,n), temp(n)
real(kind = pres) v(n), w(n)
temp = 0
do j = 1,n
do i = 1,n
temp(j) = temp(j) + mat(i,j)*v(i)
end do
end do
w = temp
return
end subroutine
end module
I apologize for the ton of redundant lines and comments, I am yet to clean it up for finalization.
When I run the code on a single thread with ./a.out, I get the following output:
Invalid MIT-MAGIC-COOKIE-1 key 1 0 1629760560
1 0 1629760560
1 0 1629760560
1 0 1629760560
.
.
. <A long chain as the code is exhausting all iterations>
.<first of the numbers is ido, which starts with 1 instead of -1 for some reason, second being
.info and third being iparam(5) which is a random number until the final iteration>
.
99 1 1
munmap_chunk(): invalid pointer
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
#0 0x7f5a863d0d01 in ???
#1 0x7f5a863cfed5 in ???
#2 0x7f5a8620420f in ???
#3 0x7f5a8620418b in ???
#4 0x7f5a861e3858 in ???
#5 0x7f5a8624e3ed in ???
#6 0x7f5a8625647b in ???
#7 0x7f5a862566cb in ???
#8 0x560f05ac1819 in ???
#9 0x560f05abd7bc in checker
at /home/srivatsank/Desktop/fortran/lap_vs_arp/ptest/ptest.f90:45
#10 0x560f05abd8d9 in main
at /home/srivatsank/Desktop/fortran/lap_vs_arp/ptest/ptest.f90:3
Aborted (core dumped)
line 45 in ptest is call parp
line 3 in ptest is use parpack(name of the module)
The main code is as follows:
program checker
use parpack
use arpack
! use lapack
implicit none
!Program to test LAPACK and ARPACK
! 1. Variable definition
integer a,n,i
real, allocatable :: mat(:,:)
real t0, t1
a=2
! Loop
! do 20 a = 1,3
! Open File
open(unit=10, file = 'matrix.dat', status = 'replace')
! 2. Generate Symmetric matrices
n = 10**a
allocate (mat(n,n))
call RANDOM_NUMBER(mat)
! 3. Save symmetric matrices to r.dat
write (10,*) n
do 30 i=1,n
write(10,*) mat(i,:)
30 end do
deallocate(mat)
close(10)
! 4. Test time taken by each of the routines
! call cpu_time(t0)
! call arp
! call cpu_time(t1)
! print *, 'n:', n, 'ARPACK time taken:', t1-t0
call cpu_time(t0)
call parp
call cpu_time(t1)
print *, 'n:', n, 'PARPACK time taken:', t1-t0
!20 end do
end program checker
The memory error occurs at the very end of the subroutine, when the mail program tries to exit from the subroutine. I have verified this by printing statements as the last line in the subroutine.
And on running mpirun -np 4 a.out, the code just enters the pdneupd process and sits there for eternity. Could anyone help?
I have the following code:
real :: s_s, d_s, s_r(size), d_r(size)
integer :: k, k_r(size)
! - size = number of processors
! - Do something to initialise s_s, d_s, k
write(*,*) "SENDING >>>>"
write(*,*) s_s, d_s
call MPI_Allgather( s_s, 1, MPI_REAL,
& s_r, 1, MPI_REAL, MPI_COMM_PGM, mpi_err)
call MPI_Allgather( d_s, 1, MPI_REAL,
& d_r, 1, MPI_REAL, MPI_COMM_PGM, mpi_err)
call MPI_Allgather ( k, 1, MPI_INTEGER,
& k_r, 1, MPI_INTEGER, MPI_COMM_PGM, mpi_err)
write(*,*) "RECEIVED <<<<"
write(*,*) s_r, d_r, kr
This generates the following output:
SENDING >>>>
-1803.80339864908 0.616157856320407
RECEIVED <<<<
6.953077622513053E-310 3.565412685916647E-314 1.221334434576037E-314
1.498827614035474E-314 6.952991536467244E-310 6.953288052096687E-310
6.953108563966064E-310 2.350861403096908E-314 4 1
2 3
kr is being gathered correctly however, s_r and d_r seem to be receiving junk. Could this be because of the MPI datatypes? I tried with MPI_REAL MPI_REAL8 and MPI_DOUBLE but that didn't work. Furthermore, mpi_err = MPI_SUCCESS
What could I do to resolve this?
EDIT 1
I worked on the following prototype program:
program allgather
implicit none
include "mpif.h"
real a(4)
integer rank,size,ierr
real as(4)
real ar(16)
integer i, j, k,z
a=1
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierr)
if(size.ne.4)then
write(*,*)'Error!:# of processors must be equal to 4'
write(*,*)'Programm aborting....'
call MPI_ABORT(ierr)
endif
do k=1,4
if ( rank == (mod(k, size))) then
a(k) = k
else
a(k) = 0.0
endif
enddo
write(*,*) "Rank :", rank
write(*,*) a
call MPI_Allgather(a, 4, MPI_REAL, ar,
& 4,
& MPI_REAL, MPI_COMM_WORLD, ierr)
write(*,*) "Recieved array"
write(*,*) ar
do i = 1, 16
if ( ar(i) /= 0.0 ) then
z = mod(i, size)
if ( z == 0 ) then
a( size ) = ar(i)
else
a ( z ) = ar(i)
endif
endif
enddo
write(*,*) "---------"
write(*,*) a
write(*,*) "---------"
call MPI_FINALIZE(ierr)
end
And this generates the expected results i.e. ar doesn't gather junk. I'm unable to however tell the difference between the implementations.
It turns out that for the project, the data type to be used was MPI_FLT. It is strange that MPI_FLT works and not MPI_REALx where x=4,8 also not MPI_FLOAT. I grep-ed MPI_FLT in the project to see what it is defined as but didn't turn up anywhere in the project.
The OpenMPI version I'm using is:
$ mpirun --version
mpirun (Open MPI) 3.0.0
The compiler I use is:
$ mpifort --version
ifort (IFORT) 19.0.1.144 20181018
In a future edit I will elaborate on the cause.
I have a 2D array of integers and I want to send its rows to each separate process. I assume that number of rows (M=5) is not evenly divisible by number of processes (size = 4), so in my case the process 0 will obtain additional row. Size of the 2D array A is MxN (5x10).
Here is my code
PROGRAM SCATTERV_MATRIX
INCLUDE 'mpif.h'
integer :: rank, size, ierr, dest, src, tag !MPI variables
integer :: status(MPI_STATUS_SIZE) !MPI variables
INTEGER, PARAMETER :: N = 10 !number of columns
INTEGER, PARAMETER :: M = 5 !number of rows
INTEGER, ALLOCATABLE, DIMENSION(:,:) :: A !MxN matrix A
INTEGER :: NEWTYPE, RESIZEDTYPE !MPI derived data types
INTEGER, ALLOCATABLE, DIMENSION(:,:) :: LOCAL
INTEGER, ALLOCATABLE :: SENDCOUNTS(:), DISPLS(:)
INTEGER :: RECVCOUNT, NRBUF
INTEGER :: MMIN, MEXTRA, INTSIZE, K, I, J
INTEGER :: START, EXTENT !(KIND=MPI_ADRESS_KIND)
CALL MPI_INIT(ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierr)
IF ( rank == 0 ) THEN !allocate and create 2Darray
ALLOCATE( A (M, N) )
K = 1
DO I = 1, M
DO J = 1, N
A(I, J) = K
K = K + 1
END DO
END DO
END IF
ALLOCATE( SENDCOUNTS(0:size-1), DISPLS(0:size-1) )
MMIN = M/size !number of rows divided by number of processors
MEXTRA = MOD(M, size) !extra rows
K = 0
DO I = 0, size-1
IF (I < MEXTRA) THEN !SENDCOUNTS=(/2,1,1,1/)
SENDCOUNTS(I) = MMIN + 1
ELSE
SENDCOUNTS(I) = MMIN
END IF
DISPLS(I) = K !DISPLS=(/0,2,3,4/)
K = K + SENDCOUNTS(I)
END DO
RECVCOUNT = SENDCOUNTS(rank)
ALLOCATE( LOCAL(RECVCOUNT,N) )
CALL MPI_TYPE_VECTOR(N, 1, M, MPI_INTEGER, NEWTYPE, ierr)
CALL MPI_TYPE_COMMIT(NEWTYPE, ierr)
START = 0
CALL MPI_TYPE_SIZE(MPI_INTEGER, INTSIZE, ierr)
EXTENT = 1*INTSIZE
CALL MPI_TYPE_CREATE_RESIZED(NEWTYPE, START, EXTENT, RESIZEDTYPE, ierr)
CALL MPI_TYPE_COMMIT(RESIZEDTYPE, ierr)
LOCAL(:, :) = 0
CALL MPI_SCATTERV( &
A, SENDCOUNTS, DISPLS, RESIZEDTYPE, &
LOCAL, RECVCOUNT*N, MPI_INTEGER, &
0, MPI_COMM_WORLD, ierr)
WRITE(*,*) rank, ':', LOCAL
CALL MPI_FINALIZE(ierr)
END PROGRAM SCATTERV_MATRIX
After sucessfull compilation I got "Program Exception - access violation" error. All my previous Fortan MPI programs worked fine. There must be some bug in the code, probably in MPI_SCATTERV.
I was mainly following this answer. I will be gratefull for any suggestion. Thank you.
There's an error in your code:
INTEGER :: START, EXTENT !(KIND=MPI_ADRESS_KIND)
This line should be:
INTEGER(KIND=MPI_ADDRESS_KIND) :: START, EXTENT
In MPI, anything that is related to memory address, or similar concepts such as memory displacement, file size, file cursor etc., must not be normal integer. Some how you have this information in your comment and you also misspell MPI_ADDRESS_KIND.
Vladimir F correctly pointed out that you should 'USE MPI' instead of 'INCLUDE 'mpif.h''. This gives the compiler the opportunity to check the data types. For example, gfortran gives the following error message:
test.f90:59:71:
CALL MPI_TYPE_CREATE_RESIZED(NEWTYPE, START, EXTENT, RESIZEDTYPE, ierr)
1
Error: There is no specific subroutine for the generic
‘mpi_type_create_resized’ at (1)
I am doing a simple MPI based program in Fortran. I have been successful in estimating the partial sums, but I am facing a problem in calculation of the global sum using MPI_allreduce.
Main code:
program tst_trap
use iso_fortran_env
use some_functions
implicit none
include 'mpif.h'
integer :: count1, count2, count_rate, i, npts,n,n1
real(kind=8) :: answer, dx, sum, x, xmax, xmin,ssum,ierror
integer(4) :: ista, iend,ierr, iproc, nproc
call MPI_INIT(ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, nproc, ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, iproc, ierr)
npts = 1000000001
call para_range(1,npts,nproc, iproc, ista, iend)
write(output_unit,'(a)') 'Fortran version (MPI)'
dx = (xmax-xmin)/real(npts-1)
sum=0.0
do i = ista, iend
x = xmin + real(i-1)*dx
sum = sum + g(x)
end do
sum = (sum + 0.5*(g(xmin) + g(xmax)) )*dx
write(6,*) 'Procesor ',iproc,':',' partial sum=', sum
call MPI_allREDUCE(sum,ssum,1,MPI_REAL,MPI_SUM,MPI_COMM_WORLD,ierr)
sum = ssum
if ( iproc .eq. 0) write(6,*) 'global sum =', sum
call MPI_FINALIZE(ierr)
end program tst_trap
Output :
Procesor 0 : partial sum= 7.490350421761612E-002
Procesor 2 : partial sum= 3.94636946947332
Procesor 3 : partial sum= 19.0687865689115
Procesor 1 : partial sum= 0.696046284884674
global sum = 2.114738958711681E-314