Obtain global sum using the local sums ( MPI/Fortran) - fortran

I am doing a simple MPI based program in Fortran. I have been successful in estimating the partial sums, but I am facing a problem in calculation of the global sum using MPI_allreduce.
Main code:
program tst_trap
use iso_fortran_env
use some_functions
implicit none
include 'mpif.h'
integer :: count1, count2, count_rate, i, npts,n,n1
real(kind=8) :: answer, dx, sum, x, xmax, xmin,ssum,ierror
integer(4) :: ista, iend,ierr, iproc, nproc
call MPI_INIT(ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, nproc, ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, iproc, ierr)
npts = 1000000001
call para_range(1,npts,nproc, iproc, ista, iend)
write(output_unit,'(a)') 'Fortran version (MPI)'
dx = (xmax-xmin)/real(npts-1)
sum=0.0
do i = ista, iend
x = xmin + real(i-1)*dx
sum = sum + g(x)
end do
sum = (sum + 0.5*(g(xmin) + g(xmax)) )*dx
write(6,*) 'Procesor ',iproc,':',' partial sum=', sum
call MPI_allREDUCE(sum,ssum,1,MPI_REAL,MPI_SUM,MPI_COMM_WORLD,ierr)
sum = ssum
if ( iproc .eq. 0) write(6,*) 'global sum =', sum
call MPI_FINALIZE(ierr)
end program tst_trap
Output :
Procesor 0 : partial sum= 7.490350421761612E-002
Procesor 2 : partial sum= 3.94636946947332
Procesor 3 : partial sum= 19.0687865689115
Procesor 1 : partial sum= 0.696046284884674
global sum = 2.114738958711681E-314

Related

MPI, Fortran, multitasking

I would like to perform many independent operations (e.g. time integration of an ODE with different initial conditions) using MPI and Fortran. The initial conditions are a 2$\times 1000$ vector IC for example.
do i=1,1000
(x0,y0) = (x(i),y(i))
Solve an ODE with (x0,y0) for a time duration
Save the result at the end of this duration
enddo
Can anyone help with a minimal code using MPI or a link to something similar.
I have already used OMP but I think with MPI I would have access to more CPUs
If the operations are truly independent (and the number of cases is a multiple of the number of processors) then:
call mpi_scatter to distribute start points from root
call
call mpi_gather to collect the results back on root
root can then write to file.
If the number of processors doesn't divide into the number of cases then you can use mpi_scatterv and mpi_gatherv instead.
Example (rather trivial work per job, rather than solving ODEs):
program main
use iso_fortran_env
use mpi
implicit none
integer stat(mpi_status_size), tag, ierr
integer size, rank
integer, parameter :: N = 256 * 1000 ! assumes this is a multiple of the number of processors
integer, parameter :: root = 0
integer myN
integer i
real(real64), allocatable :: Y(:), myY(:)
real(real64) start, finish
call mpi_init( ierr )
call mpi_comm_size( mpi_comm_world, size, ierr )
call mpi_comm_rank( mpi_comm_world, rank, ierr )
! Set initial values for full array, then start timing
if ( rank == root ) then
allocate( Y(N) )
Y = [ ( i + 0.0_real64, i = 1, N ) ]
start = gettime()
end if
! Root parcels out the work (i.e., distributes the starting points)
! Processor with rank r will look at indices 1+r*N/size to (r+1)*N/size
myN = N / size
allocate( myY(myN) )
call mpi_scatter( Y, myN, mpi_double_precision, &
myY, myN, mpi_double_precision, root, mpi_comm_world, ierr )
! Each processor does its own work
call myWork( myN, myY )
! Root gets its results back
call mpi_gather ( myY, myN, mpi_double_precision, &
Y, myN, mpi_double_precision, root, mpi_comm_world, ierr )
! Root concludes timing, then writes to file
if ( rank == root ) then
finish = gettime()
write( *, * ) "Time taken = ", finish - start
open( 10, file="output.txt" )
write( 10, "( i8, 1x, es11.4 )" ) ( i, Y(i), i = 1, N )
close( 10 )
deallocate( Y )
end if
deallocate( myY )
call mpi_finalize( ierr )
contains
subroutine myWork( N, Y )
integer , intent(in ) :: N
real(real64), intent(inout) :: Y(N)
integer i
do i = 1, 10000
Y = 2 * Y - Y ! silly example, just to use some flops
end do
end subroutine myWork
real(real64) function getTime()
integer t(8)
call date_and_time( values=t )
getTime = 3600 * t(5) + 60 * t(6) + t(7) + 0.001 * t(8)
end function getTime
end program main

Inaccurate results from Lagrange Interpolation in Fortran

I have written a Fortran program to compute the Lagrange interpolation of two data sets: x,G. I am not able to evaluate the defined function correctly. Please see what I did wrong as while my program runs, the numbers are not at all accurate for the fxn see first two programs (Matlab code) to see actual result). They are provided by the author and are what I am trying to emulate on Fortran:
%% example 1.1 langrange interpolation %(Matlab)
% X : interpolation points
% Y : value of f(X)
% x : points where we want an evaluation of P(x),
% where P is the interpolator polynomial
x = [-1:0.01:1];
X = [-1:0.20:1];
y = 1./(1+25*x.^2);
Y = 1./(1+25*X.^2);
pol = lagrange_interp(X,Y,x)
%plot(x,pol,'k',x,y,'k--',X,Y,'k.');
legend('Lagrange Polynomial','Expected behavior','Data Points');
function polynomial = lagrange_interp(X,Y,x) %(Matlab)
n = length(X);
phi = ones(n,length(x));
polynomial = zeros(1,length(x));
i = 0;
j = 0;
for i = [1:n]
for j = [1:n]
if not(i==j)
phi(i,:) = phi(i,:).*(x-X(j))./(X(i)-X(j));
end;
end;
end;
for i = [1:n]
polynomial = polynomial + Y(i)*phi(i,:);
end;
!Lagrange Interpolation example !(Fortran)
program Lagrange
implicit none
integer:: i
integer, parameter:: n=10
integer, parameter:: z=201
integer, parameter:: z1=11
real, parameter:: delta=.01
real,parameter:: delta2=.20
real, dimension(1:z):: x,G,y,H
real*8, dimension(1:n):: M
real*8, dimension(1:n):: linterp(n)
x(1)=-1
G(1)=-1
do i=2,z
x(i)=x(i-1)+delta
y(i)=1/(1+25*(x(i)**2))
end do
print*, "The one-dimensional array x is:", x(1:z)
print*, "The one dimensional array y is", y(1:z)
do i=2,z1
G(i)=G(i-1)+delta2
H(i)=1/(1+25*(G(i)**2))
end do
print*,"The other one-dimensional array G is:", G(1:z1)
print*, "Then the one dimensional array H is", H(1:z1)
M=linterp(1:n)
print*, M(1:n)
end program
!Lagrange interpolation polynomial function !(Fortran)
real*8 function linterp(n)
implicit none
integer,parameter:: n=10
integer, dimension(1:n):: poly,pol
integer:: i, j
i=0
j=0
do i=1,n
do j=1,n
if (i/=j)then
poly(i,j)=poly(i,j)*(x(i)-G(j))/(y(i)-G(j))
end if
end do
end do
print*, poly(i,j)
do i=1,n
pol(i)=pol(i)+H(i)*poly(i,j)
end do
print*, pol(1:n)
end function

MPI_SCATTERV in Fortran - sending rows of 2D array

I have a 2D array of integers and I want to send its rows to each separate process. I assume that number of rows (M=5) is not evenly divisible by number of processes (size = 4), so in my case the process 0 will obtain additional row. Size of the 2D array A is MxN (5x10).
Here is my code
PROGRAM SCATTERV_MATRIX
INCLUDE 'mpif.h'
integer :: rank, size, ierr, dest, src, tag !MPI variables
integer :: status(MPI_STATUS_SIZE) !MPI variables
INTEGER, PARAMETER :: N = 10 !number of columns
INTEGER, PARAMETER :: M = 5 !number of rows
INTEGER, ALLOCATABLE, DIMENSION(:,:) :: A !MxN matrix A
INTEGER :: NEWTYPE, RESIZEDTYPE !MPI derived data types
INTEGER, ALLOCATABLE, DIMENSION(:,:) :: LOCAL
INTEGER, ALLOCATABLE :: SENDCOUNTS(:), DISPLS(:)
INTEGER :: RECVCOUNT, NRBUF
INTEGER :: MMIN, MEXTRA, INTSIZE, K, I, J
INTEGER :: START, EXTENT !(KIND=MPI_ADRESS_KIND)
CALL MPI_INIT(ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierr)
IF ( rank == 0 ) THEN !allocate and create 2Darray
ALLOCATE( A (M, N) )
K = 1
DO I = 1, M
DO J = 1, N
A(I, J) = K
K = K + 1
END DO
END DO
END IF
ALLOCATE( SENDCOUNTS(0:size-1), DISPLS(0:size-1) )
MMIN = M/size !number of rows divided by number of processors
MEXTRA = MOD(M, size) !extra rows
K = 0
DO I = 0, size-1
IF (I < MEXTRA) THEN !SENDCOUNTS=(/2,1,1,1/)
SENDCOUNTS(I) = MMIN + 1
ELSE
SENDCOUNTS(I) = MMIN
END IF
DISPLS(I) = K !DISPLS=(/0,2,3,4/)
K = K + SENDCOUNTS(I)
END DO
RECVCOUNT = SENDCOUNTS(rank)
ALLOCATE( LOCAL(RECVCOUNT,N) )
CALL MPI_TYPE_VECTOR(N, 1, M, MPI_INTEGER, NEWTYPE, ierr)
CALL MPI_TYPE_COMMIT(NEWTYPE, ierr)
START = 0
CALL MPI_TYPE_SIZE(MPI_INTEGER, INTSIZE, ierr)
EXTENT = 1*INTSIZE
CALL MPI_TYPE_CREATE_RESIZED(NEWTYPE, START, EXTENT, RESIZEDTYPE, ierr)
CALL MPI_TYPE_COMMIT(RESIZEDTYPE, ierr)
LOCAL(:, :) = 0
CALL MPI_SCATTERV( &
A, SENDCOUNTS, DISPLS, RESIZEDTYPE, &
LOCAL, RECVCOUNT*N, MPI_INTEGER, &
0, MPI_COMM_WORLD, ierr)
WRITE(*,*) rank, ':', LOCAL
CALL MPI_FINALIZE(ierr)
END PROGRAM SCATTERV_MATRIX
After sucessfull compilation I got "Program Exception - access violation" error. All my previous Fortan MPI programs worked fine. There must be some bug in the code, probably in MPI_SCATTERV.
I was mainly following this answer. I will be gratefull for any suggestion. Thank you.
There's an error in your code:
INTEGER :: START, EXTENT !(KIND=MPI_ADRESS_KIND)
This line should be:
INTEGER(KIND=MPI_ADDRESS_KIND) :: START, EXTENT
In MPI, anything that is related to memory address, or similar concepts such as memory displacement, file size, file cursor etc., must not be normal integer. Some how you have this information in your comment and you also misspell MPI_ADDRESS_KIND.
Vladimir F correctly pointed out that you should 'USE MPI' instead of 'INCLUDE 'mpif.h''. This gives the compiler the opportunity to check the data types. For example, gfortran gives the following error message:
test.f90:59:71:
CALL MPI_TYPE_CREATE_RESIZED(NEWTYPE, START, EXTENT, RESIZEDTYPE, ierr)
1
Error: There is no specific subroutine for the generic
‘mpi_type_create_resized’ at (1)

Check bounds changes variables

I'm porting a program that I use in a chemistry classroom from Matlab (very forgiving) to Fortran (err, not so much). The problem I see is that if I include print statements in 1 subroutine, my code returns significantly different values than if I don't (the ones with the print statement included are correct).
After reading stack overflow, I removed the print statement, recompiled with gfortran and fcheck='bounds', and my program returned the correct results, and no errors during compile.
The subroutines stored in a module Basis_Subs, and called from the main program, which I've posted below. The problem appears in the 4 dimensional matrix Gabcd(nb,nb,nb,nb) which is constructed using the subroutine Build_Electron_Repulsion from the Basis_Subs module. That subroutine calculates the matrix elements of Gabcd, and uses 1 internal helper functions, Rntuv, and 1 internal subroutine Gprod_1D, both of which are also stored in the Basis_Subs module.
These functions/routines are used in another section of the program, and that portion of the program doesn't show any errors or funny array behavior. That leads me to think the problem must either be in Build_Electron_Repulsion, how I'm calling Build_Electron_Repulsion or how I'm calling the the helper functions from inside Build_Electron_Repulsion.
I've posted the main program, and the subroutines for Build_Electron_Repulsion, gprod_1D, and the function Rntuv. What I'm really wondering is if you have any tips on tracking down where the error might be.
I'm using a pico style editor and gfortran.
Main Program, Z.f08
program HF
use typedefs
use Basis_Subs
use SCF_Mod
implicit none
real(dp) :: output, start, finish
integer (kind=4) :: IFLAG , i, N, nb,j,k,l,natom
integer, allocatable, dimension(:) :: Z
real(dp), allocatable, dimension(:,:) :: AL, S,T, VAB, H0
real(dp), allocatable, dimension(:,:,:,:) :: Gabcd
real(dp), dimension(maxl) :: Ex=0
real(dp) :: Energy, Nuc
type(primitive) :: g1, Build_Primitive
type(Basis) :: b1
type(Basis), dimension(100) :: bases
character(LEN=20) :: fname
print *, 'Input the filename'
read (*,*), fname
open(unit=12, file=fname)
read(12,*) natom
allocate(Z(natom))
allocate(AL(natom,3))
read(12,*) Z
do i=1, natom
read(12,*) AL(i,1), AL(i,2), AL(i,3)
end do
print *, 'Atomic Coorinates = ', AL
print *, 'Z in the main routine = ', Z
call cpu_time(start)
%Calculate the energies that don't depend on electrons
call Nuclear_Repulsion(natom, Z, AL, Nuc)
N=Sum(Z)
%Build the atom specific basis set
call Build_Bases(Z, AL, nb, bases)
%Using nb, from Build_Basis, allocate matrices
allocate(S(nb,nb))
allocate(T(nb,nb))
allocate(VAB(nb,nb))
allocate(Gabcd(nb,nb,nb,nb))
call Build_Overlap(bases, nb, S)
call Build_Kinetic(bases, nb, T)
call Build_Nuclear_Attraction(Z, AL, bases, nb, VAB)
H0 = T+VAB
call Build_Electron_Repulsion(bases, nb, Gabcd)
call cpu_time(finish)
print *, 'Total time for Matrix Elements= ', finish - start
call SCF(N, nb, H0, S, Gabcd, Nuc, Energy)
end program HF
Build_Electron_Repulsion is located inside the module Basis_Subs:
subroutine Build_Electron_Repulsion(bases, nbases, Gabcd)
!!Calculate the 4 centered electron repulsion integrals. Loop over array of !!basis sets 1:nb 4 times. Each element of basis set is a defined type that !!includes and array of gaussian functions and contraction coefficients !!basis(a)%g(1:nga) and basis(a)%c(1:nga). For each gaussian in each basis set,
!!Calculate int(int(basis(a1)*basis(b1)*basis(c2)*basis(d2)*1/r12 dr1)dr2).
!!Uses helper function Rntuv listed below
implicit none
type(basis), dimension(100), intent(in) :: bases
integer, intent(in) :: nbases
real(dp), dimension(nbases, nbases,nbases,nbases), intent(out) :: Gabcd
integer :: a, b,c,d, nga, ngb, ngc, ngd, index, lx, ly, lz, llx, lly,llz
integer :: llxmax, llymax, llzmax, lxmax, lymax, lzmax, xmax, ymax, zmax
integer :: x, y, z
real(dp) :: p, q, midpoint, PX, PY, PZ, output
real(dp) :: pp, qq, midpoint2, PPX, PPY, PPZ, tmp
real(dp) :: alpha_a, alpha_b, alpha_c, alpha_d, alpha
real(dp) :: ax, ay, az, bx, by, bz, cx,cy,cz, dx,dy,dz
real(dp), dimension(maxl) ::EabX, EabY, EabZ, EcdX, EcdY, EcdZ
real(dp), dimension(2*maxl, 2*maxl, 2*maxl) :: R
R=0
Gabcd=0.0D0
print *, 'Calculating 4 centered integrals'
do a=1, nbases
do b=1, nbases
do c=1, nbases
do d=1, nbases
do nga = 1, bases(a)%n
do ngb = 1, bases(b)%n
alpha_a=bases(a)%g(nga)%alpha
alpha_b=bases(b)%g(ngb)%alpha
p=alpha_a + alpha_b
ax=bases(a)%g(nga)%x
ay=bases(a)%g(nga)%y
az=bases(a)%g(nga)%z
bx=bases(b)%g(ngb)%x
by=bases(b)%g(ngb)%y
bz=bases(b)%g(ngb)%z
PX=(alpha_a*ax + alpha_b*bx)/p
PY=(alpha_a*ay + alpha_b*by)/p
PZ=(alpha_a*az + alpha_b*bz)/p
call gprod_1D(ax, alpha_a, bases(a)%g(nga)%lx, bx, alpha_b, bases(b)%g(ngb)%lx, EabX)
call gprod_1D(ay, alpha_a, bases(a)%g(nga)%ly, by, alpha_b, bases(b)%g(ngb)%ly, EabY)
call gprod_1D(az, alpha_a, bases(a)%g(nga)%lz, bz, alpha_b, bases(b)%g(ngb)%lz, EabZ)
lxmax=bases(a)%g(nga)%lx + bases(b)%g(ngb)%lx
lymax=bases(a)%g(nga)%ly + bases(b)%g(ngb)%ly
lzmax=bases(a)%g(nga)%lz + bases(b)%g(ngb)%lz
do ngc= 1, bases(c)%n
do ngd = 1, bases(d)%n
alpha_c=bases(c)%g(ngc)%alpha
alpha_d=bases(d)%g(ngd)%alpha
pp=alpha_c + alpha_d
cx=bases(c)%g(ngc)%x
cy=bases(c)%g(ngc)%y
cz=bases(c)%g(ngc)%z
dx=bases(d)%g(ngd)%x
dx=bases(d)%g(ngd)%y
dz=bases(d)%g(ngd)%z
PPX=(alpha_c*cx + alpha_d*dx)/pp
PPY=(alpha_c*cy + alpha_d*dy)/pp
PPZ=(alpha_c*cz + alpha_d*dz)/pp
llxmax=bases(c)%g(ngc)%lx + bases(d)%g(ngd)%lx
llymax=bases(c)%g(ngc)%ly + bases(d)%g(ngd)%ly
llzmax=bases(c)%g(ngc)%lz + bases(d)%g(ngd)%lz
call gprod_1D(cx, alpha_c, bases(c)%g(ngc)%lx, dx, alpha_d, bases(d)%g(ngd)%lx, EcdX)
call gprod_1D(cy, alpha_c, bases(c)%g(ngc)%ly, dy, alpha_d, bases(d)%g(ngd)%ly, EcdY)
call gprod_1D(cz, alpha_c, bases(c)%g(ngc)%lz, dz, alpha_d, bases(d)%g(ngd)%lz, EcdZ)
alpha=p*pp/(p+pp)
tmp=0
xmax= lxmax + llxmax
ymax = lymax + llymax
zmax = lzmax + llzmax
do x = 0, xmax
do y =0, ymax
do z=0, zmax
R(x+1,y+1,z+1)=Rntuv(0,x,y,z,alpha, PX, PY, PZ, PPX, PPY, PPZ)
end do
end do
end do
!if (a ==1 .and. b==1 .and. c ==1 .and. d==1) then
! print *,' R = ', R(1,1,1)
!print *, xmax, ymax, zmax
!print *,a,b,c,d,nga,ngb,ngc,ngd, 'R = ', R(1,1,1)
!end if
! if (PZ ==PPZ) then
! ! print *, R(1,1,1)
! output = Rntuv(0,0,0,0,alpha, PX, PY, PZ, PPX, PPY, PPZ)
! print *, output
! print *, a,b,c,d , PY, PPY
!
! end if
do lx = 0, lxmax
do ly = 0, lymax
do lz = 0, lzmax
do llx= 0, llxmax
do lly= 0, llymax
do llz= 0, llzmax
tmp = tmp + EabX(lx+1)*EabY(ly+1)*EabZ(lz+1)*(-1.0D0)**(llx + lly + llz) * &
EcdX(llx+1)*EcdY(lly+1)*EcdZ(llz+1)*R(lx+ llx+1, ly+lly+1, lz+llz+1)
end do
end do
end do
end do
end do
end do
Gabcd(a,b,c,d) = Gabcd(a,b,c,d) + 2.0D0*pi**2.5D0/(p*pp*sqrt(p + pp))*tmp*bases(a)%g(nga)%N &
* bases(b)%g(ngb)%N * bases(c)%g(ngc)%N * bases(d)%g(ngd)%N * bases(a)%c(nga) &
* bases(b)%c(ngb) * bases(c)%c(ngc) * bases(d)%c(ngd)
end do
end do
end do
end do
end do
end do
end do
end do
end subroutine Build_Electron_Repulsion
real(dp) function Rntuv(n, tmax, umax, vmax, p, Px, Py, Pz, Ax, Ay, Az) result(out)
!Rntuv(n, t,u,v,p,P,A)Determine the helper integral Rntuv for the coulomb
!integral of order n, the t,u,v th Hermite polynomial with exponent p
!centered at [Px Py Pz] and charge centered at location [Ax Ay Az];
implicit none
integer, intent(in) :: n, tmax, umax, vmax
real(dp), intent(in) :: Px, Py, Pz, Ax, Ay, Az, p
real(dp) :: PA2, output
real(dp), dimension(n+tmax+umax+vmax+2, tmax+1, umax+1, vmax+1) :: R
integer :: nmax, t, u, v
integer :: i, IFLAG
R=0
nmax = n+ tmax + umax + vmax + 2
PA2 = (Px-Ax)**2.0D0 + (Py-Ay)**2.0D0 + (Pz-Az)**2.0D0
do i = 0, nmax-1
output=Boys(i, p*PA2)
R(i+1,1,1,1)= (-2*p)**(1.0D0*i)*Boys(i, p*PA2)
end do
do t=1, tmax
if (t==1) then
do i=1,nmax-1
R(i,2,1,1)=(Px - Ax)*R(i+1,1,1,1)
end do
else
do i=1,nmax-1
R(i,t+1,1,1)=(t-1)*R(i+1,t-1,1,1)+ (Px-Ax)*R(i+1,t,1,1)
end do
end if
end do
do u = 1,umax
if (u==1) then
do i = 1,nmax-1
R(i,tmax+1,2,1)=(Py-Ay)*R(i+1,tmax+1,1,1)
end do
else
do i = 1,nmax-1
R(i,tmax+1,u+1,1)=(u-1)*R(i+1,tmax+1,u-1,1) + (Py-Ay)*R(i+1,tmax+1,u,1)
end do
end if
end do
do v=1,vmax
if (v==1) then
do i = 1, nmax-1
R(i,tmax+1,umax+1,2)=(Pz-Az)*R(i+1,tmax+1,umax+1,1)
end do
else
do i = 1, nmax-1
R(i,tmax+1,umax+1,v+1)=(v-1)*R(i+1,tmax+1,umax+1,v-1) + (Pz-Az)*R(i+1,tmax+1,umax+1,v)
end do
end if
end do
out = R(n+1,tmax+1,umax+1,vmax+1)
end function Rntuv
subroutine gprod_1D(x1, alpha1, lx1, x2, alpha2, lx2, Ex)
real(dp), intent(in) :: x1, alpha1, x2, alpha2
integer, intent(in) :: lx1, lx2
integer :: tmax, i, j ,t, qint
real(dp) :: p, q, midpoint, weighted_middle, KAB
real(dp), dimension(maxl), intent(inout) :: Ex
real(dp), dimension(maxl, maxl, 2*maxl) ::coefficients
coefficients=0.0D0
tmax=lx1 + lx2
Ex=0
p=alpha1 + alpha2
q=alpha1*alpha2/p
midpoint = x1 - x2
weighted_middle=(alpha1*x1 + alpha2*x2)/p
KAB= e**(-q*midpoint**2.0D0)
coefficients(1,1,1) = KAB
i=0
j=0
do while (i < lx1)
do t= 0, i+j+1
if (t==0) then
coefficients(i+2,j+1,t+1)=(weighted_middle - x1)*coefficients(i+1,j+1,t+1) + (t+1)*coefficients(i+1,j+1,t+2)
else
coefficients(i+2,j+1,t+1)=1/(2*p)*coefficients(i+1,j+1,t) + (weighted_middle-x1)*coefficients(i+1,j+1,t+1) + &
(t+1)*coefficients(i+1,j+1,t+2)
end if
end do
i=i+1
end do
do while (j < lx2)
do t=0, i+j+1
if (t==0) then
coefficients(i+1,j+2,t+1) = (weighted_middle - x2)*coefficients(i+1,j+1,t+1) + (dble(t)+1.0d0)*coefficients(i+1,j+1,t+2)
else
coefficients(i+1,j+2,t+1)=1/(2*p)*coefficients(i+1,j+1,t) + (weighted_middle - x2)*coefficients(i+1,j+1,t+1) + &
(t+1)*coefficients(i+1,j+1,t+2)
end if
end do
j=j+1
end do
do qint=1, i+j+1
Ex(qint) = coefficients(i+1,j+1,qint)
end do
end subroutine gprod_1D

Seg Faults while using MPI (Fortran)

I am very new to MPI and Fortran alike. I have been working on trying to figure this out for a few hours now, with no luck. In my code below, everything is working just fine (besides the fact that my s variable is isolated between processes. When I try to implement the MPI_SEND and MPI_RECV I get the seg faults constantly. I can't seem to figure out what the issue is.
SUBROUTINE do_mpi_simpsons(l, u, n)
INTEGER, INTENT (in) :: l, u, n
! REAL, INTENT (in) :: func
DOUBLE PRECISION :: result, walltime
INTEGER :: clock_start, clock_rate, clock_max, clock_end
DOUBLE PRECISION :: h, s, argVal, finalS
INTEGER :: rank, size, ierror, tag, status(MPI_STATUS_SIZE), count, start, stop
walltime = 0.0D0
h = (u - l) / dble(n)
s = func_hw(dble(l)) + func_hw(dble(u))
CALL system_clock(clock_start, clock_rate, clock_max)
CALL MPI_INIT(ierror)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
count = n / size
start = rank * count
stop = start + count -1
! WRITE(*,*) "Start: ", start
! WRITE(*,*) "Stop: ", stop
WRITE(*,*) rank
DO i = start, stop, 2
s = s + 4 * func_hw(dble(l)+dble(i)*h)
END DO
DO i = start+1, stop-1, 2
s = s + 2 * func_hw(dble(l)+dble(i)*h)
END DO
! This block is causing the seg faults
IF(rank.eq.0) THEN
finalS = s
DO i = 1, size - 1
CALL MPI_RECV(s, 64, MPI_DOUBLE, i, 1, MPI_COMM_WORLD, status, ierror)
finalS = finalS + s
END DO
ELSE
CALL MPI_SEND(s, 64, MPI_DOUBLE, 0, 1, MPI_COMM_WORLD, ierror)
END IF
CALL MPI_FINALIZE(ierror)
CALL system_clock(clock_end, clock_rate, clock_max)
walltime = walltime + real(clock_end - clock_start) / real(clock_rate)
result = s * h / 3
WRITE(*,*) "walltime = ", walltime, " seconds"
WRITE(*,*) "result = ", result
END SUBROUTINE