Using Persistent Communication in Fortran MPI

Using Persistent Communication in Fortran MPI - fortran

I am using persistent communication in my CFD code. I have the communications setup in another subroutine and in the main subroutine, where I have the do loop, I use the MPI_STARTALL(), MPI_WAITALL().
In order to make it shorter, I am showing hte first part of the setup. The rest of the arrays are exactly the same.
My setup subrotuine looks like:
Subroutine MPI_Subroutine
use Variables
use mpi
implicit none
!Starting up MPI
call MPI_INIT(ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,npes,ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,MyRank,ierr)
!Compute the size of local block (1D Decomposition)
Jmax = JmaxGlobal
Imax = ImaxGlobal/npes
if (MyRank.lt.(ImaxGlobal - npes*Imax)) then
Imax = Imax + 1
end if
if (MyRank.ne.0.and.MyRank.ne.(npes-1)) then
Imax = Imax + 2
Else
Imax = Imax + 1
endif
! Computing neighboars
if (MyRank.eq.0) then
Left = MPI_PROC_NULL
else
Left = MyRank - 1
end if
if (MyRank.eq.(npes -1)) then
Right = MPI_PROC_NULL
else
Right = MyRank + 1
end if
! Initializing the Arrays in each processor, according to the number of local nodes
Call InitializeArrays
!Creating the channel of communication for this computation,
!Sending and receiving the u_old (Ghost cells)
Call MPI_SEND_INIT(u_old(2,:),Jmax,MPI_DOUBLE_PRECISION,Left,tag,MPI_COMM_WORLD,req(1),ierr)
Call MPI_RECV_INIT(u_old(Imax,:),jmax,MPI_DOUBLE_PRECISION,Right,tag,MPI_COMM_WORLD,req(2),ierr)
Call MPI_SEND_INIT(u_old(Imax-1,:),Jmax,MPI_DOUBLE_PRECISION,Right,tag,MPI_COMM_WORLD,req(3),ierr)
Call MPI_RECV_INIT(u_old(1,:),jmax,MPI_DOUBLE_PRECISION,Left,tag,MPI_COMM_WORLD,req(4),ierr)
Since I am debugging my code I am just checking these arrays. When I check my ghost cells are full of zeroes. Then I guess that I messing with the instruction.
The main code, where I call the MPI_STARTALL, MPI_WAITALL looks like:
Program
use Variables
use mpi
implicit none
open(32, file = 'error.dat')
Call MPI_Subroutine
!kk=kk+1
DO kk=1, 2001
! A lot of calculation
! communicating the maximum error among the processes and delta t
call MPI_REDUCE(eps,epsGlobal,1,MPI_DOUBLE_PRECISION,MPI_MAX,0,MPI_COMM_WORLD,ierr)
call MPI_BCAST(epsGlobal,1,MPI_DOUBLE_PRECISION,0,MPI_COMM_WORLD,ierr)
call MPI_REDUCE(delta_t,delta_tGlobal,1,MPI_DOUBLE_PRECISION,MPI_MIN,0,MPI_COMM_WORLD,ierr)
if(MyRank.eq.0) delta_t = delta_tGlobal
call MPI_BCAST(delta_t,1,MPI_DOUBLE_PRECISION,0,MPI_COMM_WORLD,ierr)
if(MyRank.eq.0) then
write(*,*) kk,epsGlobal,(kk*delta_t)
write(32,*) kk,epsGlobal
endif
Call Swap
Call MPI_STARTALL(4,req,ierr) !
Call MPI_WAITALL(4,req,status,ierr)
enddo
The variables are set in another module. the MPI related variables looks like:
! MPI variables
INTEGER :: npes, MyRank, ierr, Left, Right, tag
INTEGER :: status(MPI_STATUS_SIZE,4)
INTEGER,dimension(4) :: req
I appreciate your time and suggestion in this problem.

Related

Error on file_set_view in a collective mpi_write

I am just trying to write in a collective way in MPI Fortran from a CFD code. In each process, data are divided in blocks, with a general number of cells, and a structure var(b) is created which hosts the two variables r and p of the block b. Then a double MPI structure derived type is created to collect all data in a process, the first type collecting all variables in a block, and the second one all the structures in a block. So, each process has to write one of this double derived datatype, where the offset is evaluated all the data amount in the previous processes (0 for rank 0, all data in the rank 0 for rank 1, and so one). The code is the following
module var_mod
type vt
sequence
double precision,dimension(:,:,:),allocatable :: r,p
end type vt
type(vt),target,dimension(:),allocatable :: var
end module var_mod
PROGRAM main
USE MPI_F08
USE var_mod
IMPLICIT NONE
! FILES
INTEGER,PARAMETER :: NB = 4
!----------------------------------------------------------------
INTEGER :: b,i,j,k,me,np
TYPE(MPI_File) :: mpifh
INTEGER(KIND=MPI_OFFSET_KIND) :: mpidisp,sum_dim
integer,dimension(:),allocatable :: ni,nj,nk,mpiblock,mpistride
integer :: cont,mpierr
INTEGER,dimension(nb) :: Blocks
INTEGER(KIND=MPI_ADDRESS_KIND),dimension(:),allocatable :: Offsets,Pos
INTEGER(KIND=MPI_COUNT_KIND) :: lb, ext8
TYPE(MPI_Datatype),dimension(:),allocatable :: Elem_Type,Types
TYPE(MPI_Datatype) :: All_Type,mpiparflowtype
TYPE(MPI_Status) :: status
CHARACTER(LEN=MPI_MAX_ERROR_STRING) :: string
INTEGER :: resultlen
!----------------------------------------------------------------
call mpi_init
call mpi_comm_size(mpi_comm_world,np)
call mpi_comm_rank(mpi_comm_world,me)
allocate (ni(nb))
allocate (nj(nb))
allocate (nk(nb))
allocate (var(nb))
do b = 1,NB
ni(b) = b/3+1
nj(b) = b
nk(b) = b/5+1
allocate (var(b)%r(ni(b),nj(b),nk(b)))
allocate (var(b)%p(ni(b),nj(b),nk(b)))
END DO
!
! Initialize the data
!
do b = 1,nb
DO k = 1,nk(b)
DO j = 1,nj(b)
DO i = 1,ni(b)
var(b)%r(i,j,k) = 10000*me+1000*b+i*100+j*10+k
var(b)%p(i,j,k) = -var(b)%r(i,j,k)
END DO
END DO
END DO
end do
! (1) Create a separate structure datatype for each record
allocate (Offsets(2),Pos(2),Types(2),Elem_Type(nb))
DO b = 1,nb
CALL MPI_GET_ADDRESS(var(b)%r,POS(1))
CALL MPI_GET_ADDRESS(var(b)%p,POS(2))
Offsets = POS-POS(1)
Types = MPI_REAL8
Blocks = ni(b)*nj(b)*nk(b)
CALL MPI_TYPE_CREATE_STRUCT(2,Blocks,Offsets,Types,Elem_Type(b),mpierr)
END DO
deallocate (Offsets,Pos,Types)
! Create a structure of structures that describes the whole array
allocate (Offsets(nb),Pos(nb))
Blocks = 1
DO b = 1,nb
CALL MPI_GET_ADDRESS(var(b)%r,POS(b))
END DO
Offsets = POS-POS(1)
CALL MPI_TYPE_CREATE_STRUCT(nb,Blocks,Offsets,Elem_Type,All_Type)
CALL MPI_TYPE_COMMIT(All_Type,mpierr)
! Free the intermediate datatypes
DO b = 1,nb
CALL MPI_TYPE_FREE(Elem_Type(b))
END DO
deallocate(Offsets,Pos,Elem_Type)
! Set index
cont = 1
allocate(mpiblock(cont))
allocate(mpistride(cont))
mpiblock=1
mpistride=0
call MPI_TYPE_INDEXED(cont,mpiblock,mpistride,All_Type,mpiparflowtype)
call MPI_TYPE_COMMIT(mpiparflowtype)
deallocate(mpiblock,mpistride)
! Position where to write
CALL MPI_Type_get_extent(MPI_REAL8, lb, ext8)
mpidisp = 0
do b = 1,nb
mpidisp = mpidisp + (ni(b)*nj(b)*nk(b)) ! number of cell in the block b
end do
mpidisp = mpidisp*2*ext8*me !multiply for number of variables and byte of each variable and shif to the process rank
! Open file
call MPI_FILE_OPEN(MPI_COMM_WORLD,'MPIDATA',IOR(MPI_MODE_CREATE,MPI_MODE_WRONLY),MPI_INFO_NULL,mpifh)
! setting file view
call MPI_FILE_SET_VIEW(mpifh,mpidisp,All_Type,mpiparflowtype,'native',MPI_INFO_NULL,mpierr)
write(*,*) me,'error on file set view:',mpierr
call MPI_Error_string(mpierr, string, resultlen)
write(*,*) 'string:',trim(string),resultlen
! MPI Write file
call MPI_FILE_WRITE_ALL(mpifh,var(1)%r,1,All_Type,status)
! Close file
call MPI_FILE_CLOSE(mpifh)
! deallocations and free
CALL MPI_TYPE_FREE(All_Type)
CALL MPI_TYPE_FREE(mpiparflowtype)
do b = 1,nb
deallocate (var(b)%r,var(b)%p)
END DO
deallocate (var)
deallocate (ni,nj,nk)
! end
call mpi_finalize
END PROGRAM main
When the code is launched, for instance, on two processes (both Intel and gnu compilers no problem in compilation phase), the run concludes but an error MPI_TYPE_ERR in MPI_FILE_SET_VIEW is issued and the data file contains only rank 0 data.
I would expect a file with data from all ranks, but I can not understand what the problem is.

Array access in async OpenACC kernels

Say I have a Fortran program that performs two tasks on an array: task A computes its mean and task B doubles it. The point is that task B should be independent from task A. When accelerating the program with OpenACC, it would make sense to run the two tasks concurrently by making task A asynchronous:
program test
implicit none
integer, parameter :: n = 1000000
real(8) :: mean
real(8) :: array(n)
real(8) :: array_d(n)
! initialize array
array = [(i, i=1, n)]
!$acc kernels async num_gangs(1)
! Task A: get mean of array
mean = 0d0
!$acc loop independent reduction(+:mean)
do i = 1, n
mean = mean + array(i)
end do
mean = mean / n
!$acc end kernels
!$acc kernels
! Task B: work on array
!$acc loop independent
do i = 1, n
array(i) = array(i) * 2
end do
!$acc end kernels
!$acc wait
!$acc end data
! print array and mean
print "(10(g0.2, x))", array(:10)
print "('mean = ', g0.2)", mean
end program
However, when running the two tasks at the same time, task B will modify the array that task A is reading, leading to incorrect values. On CPU (no acceleration) I get:
2.0 4.0 6.0 8.0 10. 12. 14. 16. 18. 20.
mean = 500000.5000000000
On GPU (using the NVIDIA HPC SDK), I get a different mean which is obviously incorrect:
2.0 4.0 6.0 8.0 10. 12. 14. 16. 18. 20.
mean = 999967.6836640000
Is there an elegant way to "protect" the array being worked by task A?

Passing an allocated array from a SUBTROUTINE to the main program in Fortran

There are several threads with similar titles of mine, but I do not believe they are the same. One was very similar fortran pass allocated array to main procedure, but the answer required Fortran 2008. I am after a Fortran 90/95 solution.
Another very good, and quite similar thread is Dynamic array allocation in fortran90. However in this method while they allocate in the subroutine, they don't ever appear to deallocate, which seems odd. My method looks on the surface at least to be the same, yet when I print the array in the main program, only blank spaces are printed. When I print in the subroutine itself, the array prints to screen the correct values, and the correct number of values.
In the following a MAIN program calls a subroutine. This subroutine reads data into an allocatable array, and passes the array back to the main program. I do this by using small subroutines each designed to look for specific terms in the input file. All of these subroutines are in one module file. So there are three files: Main.f90, input_read.f90 and filename.inp.
It seems then that I do not know how to pass an array that is allocatable in program Main.f90 as well as in the called subroutine where it is actually allocated, sized, and then deallocated before being passed to program Main. This perhaps sounds confusing, so here is the code for all three programs. I apologize for the poor formatting when I pasted it. I tried to separate all the rows.
main.f90:
Program main
use input_read ! the module with the subroutines used for reading filename.inp
implicit none
REAL, Allocatable :: epsilstar(:)
INTEGER :: natoms
call Obtain_LJ_Epsilon(epsilstar, natoms)
print*, 'LJ Epsilon : ', epsilstar
END Program main
Next is the module with a subroutine (I removed all but the necessary one for space), input_read.f90:
module input_read
contains
!===============================================================
!===============================================================
Subroutine Obtain_LJ_Epsilon(epsilstar,natoms)
! Reads epsilon and sigma parameters for Lennard-Jones Force-Field and also
! counts the number of types of atoms in the system
!===============================================================
!===============================================================
INTEGER :: error,line_number,natoms_eps,i
CHARACTER(120) :: string, next_line, next_next_line,dummy_char
CHARACTER(8) :: dummy_na,dummy_eps
INTEGER,intent(out) :: natoms
LOGICAL :: Proceed
real, intent(out), allocatable :: epsilstar(:)
error = 0
line_number = 0
Proceed = .true.
open(10,file='filename.inp',status='old')
!=============================================
! Find key word LJ_Epsilon
!=============================================
DO
line_number = line_number + 1
Read(10,'(A120)',iostat=error) string
IF (error .NE. 0) THEN
print*, "Error, stopping read input due to an error reading line"
exit
END IF
IF (string(1:12) == '$ LJ_epsilon') THEN
line_number = line_number + 1
exit
ELSE IF (string(1:3) == 'END' .or. line_number > 2000) THEN
print*, "Hit end of file before reading '$ LJ_epsilon' "
Proceed = .false.
exit
ENDIF
ENDDO
!========================================================
! Key word found, now determine number of parameters
! needing to be read
!========================================================
natoms_eps = -1
dummy_eps = 'iii'
do while ((dummy_eps(1:1) .ne. '$') .and. (dummy_eps(1:1) .ne. ' '))
natoms_eps = natoms_eps + 1
read(10,*) dummy_eps
enddo !we now know the number of atoms in the system (# of parameters)
close(10)
Allocate(epsilstar(natoms_eps))
epsilstar = 0.0
!============================================================
! Number of parameters found, now read their values
!============================================================
if(Proceed) then
open(11,file='filename.inp',status='old')
do i = 1,line_number-1
read(11,*) ! note it is not recording anything for this do loop
enddo
do i = 1,natoms_eps
read(11,*) dummy_char
read(dummy_char,*) epsilstar(i) ! convert string read in to real, and store in epsilstar
enddo
close(11)
PRINT*, 'LJ_epsilon: ', epsilstar ! printing to make sure it worked
endif
deallocate(epsilstar)
END Subroutine Obtain_LJ_Epsilon
end module input_read
And finally the input file: filename.inp
# Run_Type
NVT
# Run_Name
Test_Name
# Pressure
1.0
# Temperature
298.15
# Number_Species
# LJ_epsilon
117.1
117.1
117.1
# LJ_sigma
3.251
3.251
3.251
END
And again, I can't figure out how to pass the allocated epsilstar array to the main program. I have tried passing an unallocated array to the subroutine from the main.f90, allocating it inside, passing it back, and deallocating it in the main.f90, but that did not work. I have tried it as the code currently is... the code works (i.e. is bug free) but it does not pass epsilstar from the subroutine where it correctly finds it and creates an array.

It turns out that the mistake I made was in deallocating the array in the subroutine before passing it to the main program. By NOT deallocating, the array was sent back fine. Also, I do not deallocate in the main program either.

MPI in Fortran gives garbage values

PROGRAM ShareNeighbors
IMPLICIT REAL (a-h,o-z)
INCLUDE "mpif.h"
PARAMETER (m = 500, n = 500)
DIMENSION a(m,n), b(m,n)
DIMENSION h(m,n)
INTEGER istatus(MPI_STATUS_SIZE)
INTEGER iprocs, jprocs
PARAMETER (ROOT = 0)
integer dims(2),coords(2)
logical periods(2)
data periods/2*.false./
integer status(MPI_STATUS_SIZE)
integer comm2d,req,source
CALL MPI_INIT(ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, myrank, ierr)
! Get a new communicator for a decomposition of the domain.
! Let MPI find a "good" decomposition
dims(1) = 0
dims(2) = 0
CALL MPI_DIMS_CREATE(nprocs,2,dims,ierr)
if (myrank.EQ.Root) then
print *,nprocs,'processors have been arranged into',dims(1),'X',dims(2),'grid'
endif
CALL MPI_CART_CREATE(MPI_COMM_WORLD,2,dims,periods,.true., &
comm2d,ierr)
! Get my position in this communicator
CALL MPI_COMM_RANK(comm2d,myrank,ierr)
! Get the decomposition
CALL fnd2ddecomp(comm2d,m,n,ista,iend,jsta,jend)
! print *,ista,jsta,iend,jend
ilen = iend - ista + 1
jlen = jend - jsta + 1
CALL MPI_Cart_get(comm2d,2,dims,periods,coords,ierr)
iprocs = dims(1)
jprocs = dims(2)
myranki = coords(1)
myrankj = coords(2)
DO j = jsta, jend
DO i = ista, iend
a(i,j) = myrank+1
ENDDO
ENDDO
! Send data from each processor to Root
call MPI_ISEND(ista,1,MPI_INTEGER,Root,1, &
MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(iend,1,MPI_INTEGER,Root,1, &
MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(jsta,1,MPI_INTEGER,Root,1, &
MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(jend,1,MPI_INTEGER,Root,1, &
MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(a(ista:iend,jsta:jend),(ilen)*(jlen),MPI_REAL, &
Root,1,MPI_COMM_WORLD,req,ierr )
! Recieved the results from othe precessors
if (myrank.EQ.Root) then
do source = 0,nprocs-1
call MPI_RECV(ista,1,MPI_INTEGER,source, &
1,MPI_COMM_WORLD,status,ierr )
call MPI_RECV(iend,1,MPI_INTEGER,source, &
1,MPI_COMM_WORLD,status,ierr )
call MPI_RECV(jsta,1,MPI_INTEGER,source, &
1,MPI_COMM_WORLD,status,ierr )
call MPI_RECV(jend,1,MPI_INTEGER,source, &
1,MPI_COMM_WORLD,status,ierr )
ilen = iend - ista + 1
jlen = jend - jsta + 1
call MPI_RECV(a(ista:iend,jsta:jend),(ilen)*(jlen),MPI_REAL, &
source,1,MPI_COMM_WORLD,status,ierr)
! print the results
call ZMINMAX(m,n,ista,iend,jsta,jend,a(:,:),amin,amax)
print *, 'myid=',source,amin,amax
call MPI_Wait(req, status, ierr)
enddo
endif
CALL MPI_FINALIZE(ierr)
END
subroutine fnd2ddecomp(comm2d,m,n,ista,iend,jsta,jend)
integer comm2d
integer m,n,ista,jsta,iend,jend
integer dims(2),coords(2),ierr
logical periods(2)
! Get (i,j) position of a processor from Cartesian topology.
CALL MPI_Cart_get(comm2d,2,dims,periods,coords,ierr)
! Decomposition in first (ie. X) direction
CALL MPE_DECOMP1D(m,dims(1),coords(1),ista,iend)
! Decomposition in second (ie. Y) direction
CALL MPE_DECOMP1D(n,dims(2),coords(2),jsta,jend)
return
end
SUBROUTINE MPE_DECOMP1D(n,numprocs,myid,s,e)
integer n,numprocs,myid,s,e,nlocal,deficit
nlocal = n / numprocs
s = myid * nlocal + 1
deficit = mod(n,numprocs)
s = s + min(myid,deficit)
! Give one more slice to processors
if (myid .lt. deficit) then
nlocal = nlocal + 1
endif
e = s + nlocal - 1
if (e .gt. n .or. myid .eq. numprocs-1) e = n
return
end
SUBROUTINE ZMINMAX(IX,JX,SX,EX,SY,EY,ZX,ZXMIN,ZXMAX)
INTEGER :: IX,JX,SX,EX,SY,EY
REAL :: ZX(IX,JX)
REAL :: ZXMIN,ZXMAX
ZXMIN=1000.
ZXMAX=-1000.
DO II=SX,EX
DO JJ=SY,EY
IF(ZX(II,JJ).LT.ZXMIN)ZXMIN=ZX(II,JJ)
IF(ZX(II,JJ).GT.ZXMAX)ZXMAX=ZX(II,JJ)
ENDDO
ENDDO
RETURN
END
When I am running the above code with 4 processors Root receives garbage values. Where as for 15 processors, the data transfer is proper. How I can tackle this?
I guess it is related buffer, a point which is not clear to me. How I have to tackle the buffer wisely?

1. problem
You are doing multiple sends
call MPI_ISEND(ista,1,MPI_INTEGER,Root,1, &
MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(iend,1,MPI_INTEGER,Root,1, &
MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(jsta,1,MPI_INTEGER,Root,1, &
MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(jend,1,MPI_INTEGER,Root,1, &
MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(a(ista:iend,jsta:jend),(ilen)*(jlen),MPI_REAL, &
Root,1,MPI_COMM_WORLD,req,ierr )
and all of them with the same request variable req. That can't work.
2. problem
You are using a subarray a(ista:iend,jsta:jend) in non-blocking MPI. That is not allowed*. You need to copy the array into some temporary array buffer or use MPI derived subarray datatype (too hard for you at this stage).
The reason for the problem is that the compiler will create a temporary copy just for the call to ISend. The ISend will remember the address, but will not send anything. Then temporary is deleted and the address becomes invalid. And then the MPI_Wait will try to use that address and will fail.
3. problem
Your MPI_Wait is in the wrong place. It must be after the sends out of any if conditions so that they are always executed (provided you are always sending).
You must collect all request separately and than wait for all of them. Best to have them in a an array and wait for all of them at once using MPI_Waitall.
Remeber, the ISend typically does not actually send anything if the buffer is large. The exchange often happens during the Wait operation. At least for larger arrays.
Recommendation:
Take a simple problem example and try to exchange just two small arrays with MPI_IRecv and MPI_ISend between two processes. As simple test problem as you can do. Learn from it, do simple steps. Take no offence, but your current understanding of non-blocking MPI is too weak to write full scale programs. MPI is hard, non-blocking MPI is even harder.
* not allowed when using the interface available in MPI-2. MPI-3 brings a new interface available by using use mpi_f08 where it is possible. But learn the basics first.

Whats wrong with my Hermite Interpolation in Fortran?

Hermite Interpolation woes
I am trying to find the Newton Dividing Differences for the function and derivative values of a given set of x's. I'm running into serious problems with my code working for tiny examples, but failing on bigger one's. As is clearly visible, my answers are very much larger than they original function values.
Does anybody have any idea what I'm doing wrong?
program inter
implicit none
integer ::n,m
integer ::i
real(kind=8),allocatable ::xVals(:),fxVals(:),newtonDivDiff(:),dxVals(:),zxVals(:),zdxVals(:),zfxVals(:)
real(kind=8) ::Px
real(kind=8) ::x
Open(Unit=8,File="data/xVals")
Open(Unit=9,File="data/fxVals")
Open(Unit=10,File="data/dxVals")
n = 4 ! literal number of data pts
m = n*2+1
!after we get the data points allocate the space
allocate(xVals(0:n))
allocate(fxVals(0:n))
allocate(dxVals(0:n))
allocate(newtonDivDiff(0:n))
!allocate the zvalue arrays
allocate(zxVals(0:m))
allocate(zdxVals(0:m))
allocate(zfxVals(0:m))
!since the size is the same we can read in one loop
do i=0,n
Read(8,*) xVals(i)
Read(9,*) fxVals(i)
Read(10,*) dxVals(i)
end do
! contstruct the z illusion
do i=0,m,2
zxVals(i) = xVals(i/2)
zxVals(i+1) = xVals(i/2)
zdxVals(i) = dxVals(i/2)
zdxVals(i+1) = dxVals(i/2)
zfxVals(i) = fxVals(i/2)
zfxVals(i+1) = fxVals(i/2)
end do
!slightly modified business as usual
call getNewtonDivDiff(zxVals,zdxVals,zfxVals,newtonDivDiff,m)
do i=0,n
call evaluatePolynomial(m,newtonDivDiff,xVals(i),Px,zxVals)
print*, xVals(i) ,Px
end do
close(8)
close(9)
close(10)
stop
deallocate(xVals,fxVals,dxVals,newtonDivDiff,zxVals,zdxVals,zfxVals)
end program inter
subroutine getNewtonDivDiff(xVals,dxVals,fxVals,newtonDivDiff,n)
implicit none
integer ::i,k
integer, intent(in) ::n
real(kind=8), allocatable,dimension(:,:) ::table
real(kind=8),intent(in) ::xVals(0:n),dxVals(0:n),fxVals(0:n)
real(kind=8), intent(inout) ::newtonDivDiff(0:n)
allocate(table(0:n,0:n))
table = 0.0d0
do i=0,n
table(i,0) = fxVals(i)
end do
do k=1,n
do i = k,n
if( k .eq. 1 .and. mod(i,2) .eq. 1) then
table(i,k) = dxVals(i)
else
table(i,k) = (table(i,k-1) - table(i-1,k-1))/(xVals(i) - xVals(i-k))
end if
end do
end do
do i=0,n
newtonDivDiff(i) = table(i,i)
!print*, newtonDivDiff(i)
end do
deallocate(table)
end subroutine getNewtonDivDiff
subroutine evaluatePolynomial(n,newtonDivDiff,x,Px,xVals)
implicit none
integer,intent(in) ::n
real(kind=8),intent(in) ::newtonDivDiff(0:n),xVals(0:n)
real(kind=8),intent(in) ::x
real(kind=8), intent(out) ::Px
integer ::i
Px = newtonDivDiff(n)
do i=n,1,-1
Px = Px * (x- xVals(i-1)) + newtonDivDiff(i-1)
end do
end subroutine evaluatePolynomial
Values
x f(x) f'(x)
1.16, 1.2337, 2.6643
1.32, 1.6879, 2.9989
1.48, 2.1814, 3.1464
1.64, 2.6832, 3.0862
1.8, 3.1553, 2.7697
Output
1.1599999999999999 62.040113431002474
1.3200000000000001 180.40121445431600
1.4800000000000000 212.36319446149312
1.6399999999999999 228.61845650513027
1.8000000000000000 245.11610836104515

You are accessing array newtonDivDiff out of bounds.
You are first allocating it as 0:n (main program's n) then you are passing to subroutine getNewtonDivDiff as 0:n (the subroutine's n) but you pass m (m=n*2+1) to the argument n. That means you tell the subroutine that the array has bounds 0:m which is 0:9, but it has only bounds 0:4.
It is quite difficult to debug the program as it stands, I had to use valgrind. If you move your subroutines to a module and change the dummy arguments to assumed shape arrays (:,:) then the bound checking in gfortran (-fcheck=all) will catch the error.
Other notes:
kind=8 is ugly, 8 can mean different things for different compilers. If you want 64bit variables, you can use kind=real64 (real64 comes from module iso_fortran_env in Fortran 2008) or use selected_real_kind() (Fortran 90 kind parameter)
You do not have to deallocate your local arrays in the subroutines, they are deallocated automatically.
Your deallocate statement in the main program is after the stop statement, it will never be executed. I would just delete the stop, there is no reason to have it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js