I like to do this:
program main
implicit none
integer l
integer, allocatable, dimension(:) :: array
allocate(array(10))
array = 0
!$omp parallel do private(array)
do l = 1, 10
array(l) = l
enddo
!$omp end parallel do
print *, array
deallocate(array)
end
But I am running into error messages:
* glibc detected * ./a.out: munmap_chunk(): invalid pointer: 0x00007fff25d05a40 *
This seems to be a bug in ifort according to some discussions at intel forums but should be resolved in the version I am using (11.1.073 - Linux). This is a MASSIVE downscaled version of my code! I unfortunately can not use static arrays to have a workaround.
If I put the print into the loop, I get other errors:
* glibc detected ./a.out: double free or corruption (out): 0x00002b22a0c016f0 **
I didn't get the errors you're getting, but you have an issue with privatizing array in your OpenMP call.
[mjswartz#666-lgn testfiles]$ vi array.f90
[mjswartz#666-lgn testfiles]$ ifort -o array array.f90 -openmp
[mjswartz#666-lgn testfiles]$ ./array
0 0 0 0 0 0
0 0 0 0
[mjswartz#666-lgn testfiles]$ vi array.f90
[mjswartz#666-lgn testfiles]$ ifort -o array array.f90 -openmp
[mjswartz#666-lgn testfiles]$ ./array
1 2 3 4 5 6
7 8 9 10
First run is with private array, second is without.
program main
implicit none
integer l
integer, allocatable, dimension(:) :: array
allocate(array(10))
!$omp parallel do
do l = 1, 10
array(l) = l
enddo
print*, array
deallocate(array)
end program main
I just ran your code with ifort and openmp and it spewed 0d0's. I had to manually quit the execution. What is your expected output? I'm not a big fan of unnecessarily dynamically allocating arrays. You know what you're going to allocate your matrices as, so just make parameters and statically do it. I'll mess with some stuff and edit this response in a few.
Ok, so here's my edits:
program main
implicit none
integer :: l, j
integer, parameter :: lmax = 15e3
integer, parameter :: jmax = 25
integer, parameter :: nk = 300
complex*16, dimension(9*nk) :: x0, xin, xout
complex*16, dimension(lmax) :: e_pump, e_probe
complex*16 :: e_pumphlp, e_probehlp
character*25 :: problemtype
real*8 :: m
! OpenMP variables
integer :: myid, nthreads, omp_get_num_threads, omp_get_thread_num
x0 = 0.0d0
problemtype = 'type1'
if (problemtype .ne. 'type1') then
write(*,*) 'Problem type not specified. Quitting'
stop
else
! Spawn a parallel region explicitly scoping all variables
!$omp parallel
myid = omp_get_thread_num()
if (myid .eq. 0) then
nthreads = omp_get_num_threads()
write(*,*) 'Starting program with', nthreads, 'threads'
endif
!$omp do private(j,l,m,e_pumphlp,e_probehlp,e_pump,e_probe)
do j = 1, jmax - 1
do l = 1, lmax
call electricfield(0.0d0, 0.0d0, e_pumphlp, &
e_probehlp, 0.0d0)
! print *, e_pumphlp, e_probehlp
e_pump(l) = e_pumphlp
e_probe(l) = e_probehlp
print *, e_pump(l), e_probe(l)
end do
end do
!$omp end parallel
end if
end program main
Notice I removed your use of a module since it was unnecessary. You have an external module containing a subroutine, so just make it an external subroutine. Also, I changed your matrices to be statically allocated. Case statements are a fancy and expensive version of if statements. You were casing 15e3*25 times rather than once (expensive), so I moved those outside. I changed the OpenMP calls, but only semantically. I gave you some output so that you know what OpenMP is actually doing.
Here is the new subroutine:
subroutine electricfield(t, tdelay, e_pump, e_probe, phase)
implicit none
real*8, intent(in) :: t, tdelay
complex*16, intent(out) :: e_pump, e_probe
real*8, optional, intent (in) :: phase
e_pump = 0.0d0
e_probe = 0.0d0
return
end subroutine electricfield
I just removed the module shell around it and changed some of your variable names. Fortran is not case sensitive, so don't torture yourself by doing caps and having to repeat it throughout.
I compiled this with
ifort -o diffeq diffeq.f90 electricfield.f90 -openmp
and ran with
./diffeq > output
to catch the program vomiting 0's and to see how many threads I was using:
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
Starting program with 32 threads
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
Hope this helps!
It would appear that you are running into a compiler bug associated with the implementation of OpenMP 3.0.
If you can't update your compiler, then you will need to change your approach. There are a few options - for example you could make the allocatable arrays shared, increase their rank by one and have one thread allocate them such that the extent of the additional dimension is the number of workers in the team. All subsequent references to those arrays then need to be have the subscript for that additional rank be the omp team number (+ 1, depending on what you've used for the lower bound).
Explicit allocation of the private allocatable arrays inside the parallel construct (only) may also be an option.
Related
Consider the following Fortran program
program test_prg
use iso_fortran_env, only : real64
use mpi_f08
implicit none
real(real64), allocatable :: arr_send(:), arr_recv(:)
integer :: ierr
call MPI_Init(ierr)
allocate(arr_send(3), arr_recv(3))
arr_send = 1
print *, lbound(arr_recv)
call MPI_Gatherv(arr_send, size(arr_send), MPI_DOUBLE_PRECISION, arr_recv, [size(arr_send)], [0], MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, ierr)
print *, lbound(arr_recv)
call MPI_Finalize(ierr)
end program
Execution of this program on 1 processor (compiled with gfortran 9.3.0 and mpich 3.3.2), prints:
1
0
So arr_recv has changed its lower bound after the call to MPI_Gatherv. If I use arr_recv(1) instead of arr_recv in the call to MPI_Gatherv, then it doesn't change. If I replace mpi_f08 module with mpi, then using either arr_recv(1) or arr_recv doesn't change the lower bound.
Why is lower bound changing in this program?
At this stage, I believe this is a bug in gfortran affecting the MPI Fortran 2018 bindings (e.g. use mpi_f08) and I reported it at https://gcc.gnu.org/pipermail/fortran/2020-September/055068.html.
All gfortran versions are affected (I tried 9.2.0, 10.2.0 and the latest master branch, versions 8 and earlier do not support dimension(..).
The reproducer below can be used to evidence the issue
MODULE FOO
INTERFACE
SUBROUTINE dummyc(x0) BIND(C, name="sync")
type(*), dimension(..) :: x0
END SUBROUTINE
END INTERFACE
contains
SUBROUTINE dummy(x0)
type(*), dimension(..) :: x0
call dummyc(x0)
END SUBROUTINE
END MODULE
PROGRAM main
USE FOO
IMPLICIT NONE
integer :: before(2), after(2)
INTEGER, parameter :: n = 1
DOUBLE PRECISION, ALLOCATABLE :: buf(:)
DOUBLE PRECISION :: buf2(n)
ALLOCATE(buf(n))
before(1) = LBOUND(buf,1)
before(2) = UBOUND(buf,1)
CALL dummy (buf)
after(1) = LBOUND(buf,1)
after(2) = UBOUND(buf,1)
if (before(1) .NE. after(1)) stop 1
if (before(2) .NE. after(2)) stop 2
before(1) = LBOUND(buf2,1)
before(2) = UBOUND(buf2,1)
CALL dummy (buf2)
after(1) = LBOUND(buf2,1)
after(2) = LBOUND(buf2,1)
if (before(1) .NE. after(1)) stop 3
if (before(2) .NE. after(2)) stop 4
END PROGRAM
FWIW, Intel ifort compiler (I tried 18.0.5) works fine with the reproducer.
This question is a bit old, but I had the same issue on the last days with GNU Fortran (GCC) 11.2.0 when moving from use mpi to use mpi_f08 - but in my case, it was with MPI_Allreduce. It's clearly a bug on the MPI functions, but one workaround is to send the argument with the bounds defined, as var(1:end). This worked for me. In your case, you could try:
call MPI_Gatherv(arr_send(1:3), size(arr_send(1:3)), MPI_DOUBLE_PRECISION, arr_recv(1:3), [size(arr_send(1:3))], [0], MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, ierr)
I have MPI ranks split up to calculate different parts an an array, then I want to put/send those slices onto a different rank that doesn't participate in the calculation. That rank is the master of a new communicator set up to do other things with the array (averaging, IO, etc). I got it to work with MPI_isend and MPI_irecv, and now I want to try MPI_Put.
use mpi_f08
use iso_c_binding
implicit none
integer, parameter :: n=10, gps = 18, pes=12, dpes = 6
integer :: main=pes, d=dpes
integer :: diag_master
integer :: global_size, global_rank, diag_size, diag_rank
type(MPI_comm),allocatable :: diag_comm
integer :: pelist_diag
TYPE(MPI_Win) :: win
integer :: ierr, i, j
type(MPI_COMM) :: comm, mycomm
integer :: gsz, grk
integer :: lsz, lrk
integer(KIND=MPI_ADDRESS_KIND) :: local_group
logical :: local_flag
integer :: color,key
!!! THIS IS THE ARRAY
real, dimension(n,pes) :: r
!!!
logical :: on_dpes = .false.
logical,allocatable,dimension(:) :: dpes_list ! true if on dpes list
integer :: comm_manager
integer :: dmg
integer(KIND=MPI_ADDRESS_KIND) :: buff_size !< the size of a variable type
integer(kind=MPI_ADDRESS_KIND) :: displacement
integer :: disp_size
integer :: loc_base
integer, pointer :: fptr
!!!!!!!! THIS ALL WORKS BEGIN !!!!!!!!
comm=MPI_COMM_WORLD
call MPI_INIT(ierr)
call MPI_COMM_SIZE(COMM, gsz, ierr)
call MPI_COMM_RANK(COMM, grk, ierr)
allocate(dpes_list(gsz))
! write (6,*) "I am ",grk," of ",gsz
!> Find the group
call MPI_COMM_GET_ATTR(COMM,MPI_APPNUM,local_group,local_flag,ierr)
!> Split a new communicator as mycom
color = int(local_group)
key = 0
call MPI_COMM_SPLIT(COMM, color, key, mycomm, ierr)
!> Get information about the split communicators
call mpi_comm_size(mycomm,lsz,ierr)
call mpi_comm_rank(mycomm,lrk,ierr)
!> Create data on the main communicator
if (lsz == pes) then
comm_manager = main
on_dpes = .false.
r = 0.0
if (mod(lrk,2) == 0) then
c_loop: do concurrent (i=1:n)
r(i,lrk+1) = sin(real(i))+real(i)
enddo c_loop
else
r(:,lrk+1) = 10.0-dble(lrk)
endif
if (lsz == dpes) then
diag_size = lsz
diag_rank = lrk
comm_manager = d
on_dpes = .true.
diag_comm = mycomm
if (lrk==0) then
dmg = grk
endif
endif
call MPI_ALLGATHER(on_dpes,1,MPI_LOGICAL, &
dpes_list,gsz,MPI_LOGICAL, MPI_COMM_WORLD, ierr)
!> Get the master of dpes
do i=1,gsz
if (dpes_list(i)) then
dmg = i-1
exit
endif
enddo
diag_master = dmg
diag_global_master = dmg
!!!!!!!! THIS ALL WORKS END !!!!!!!!
!! At this point, the ranks that participate in the calculation
!! have values in r(i,lrk+1) where lrk is their rank
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!! THIS IS WHERE THINGS GO WRONG? !!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
disp_size = storage_size(r)
buff_size = disp_size*size(r)
call c_f_pointer(c_loc(r(1,1)),fptr)
loc_base = fptr
nullify(fptr)
write (6,*) loc_base, grk
call MPI_Win_create(loc_base,buff_size,disp_size,MPI_INFO_NULL,&
mpi_comm_world,win,ierr)
call MPI_Win_Fence(0,win,ierr)
displacement = loc_base + disp_size *buff_size
! if (.not.allocated(diag_comm)) then
if (grk == 11) then
call MPI_Put(r(:,global_rank+1),size(r,1),MPI_FLOAT,&
diag_master,displacement,size(r,1), MPI_FLOAT, win ,ierr)
endif
call MPI_Win_Fence(0,win,ierr)
CALL MPI_WIN_FREE(win, ierr)
call MPI_FINALIZE(ierr)
I have ! if (.not.allocated(diag_comm)) then commented out because I tried to do this with all of the ranks that calculate r, but I got the same result.
I am compiling with mpiifort -O0 -fpe0 -init=snan,arrays -no-wrap-margin -traceback -stand f18 and run with mpirun -n 12 ./$#.x : -n 6 ./$#.x in my Makefile. The version of mpiifort I am using is
> mpiifort -v
mpiifort for the Intel(R) MPI Library 2019 Update 2 for Linux*
Copyright 2003-2019, Intel Corporation.
ifort version 19.0.2.187
The output (write (6,*) loc_base, grk)is strange.
1072411986 0
0 1
0 2
0 3
0 4
0 5
0 6
0 7
0 8
0 9
0 10
0 11
2142952877 12
2142952877 13
2142952877 14
2142952877 15
2142952877 16
2142952877 17
Rank 12-17 are the ranks that don't participate in "calculating r", but I'm not sure why c_loc(r(1,1)) is different for these ranks. Also, it is different for rank 0.
My actual questions are
1) How do I calculate the displacement variable? Am I doing it correctly? Is it supposed to be different between ranks because it will be in this case?
2) Why is c_loc(r(1,1)) different for the ranks 12-17? Does it have anything to do with the fact that this is a SPMD program? Why is it different for rank 0?
3) Can I do the one way communication with all of the ranks instead of just one? I had each rank call mpi_isend, and then i just called mpi_irecv in a loop through all of the ranks sending when I did this the other way. Can I do something similar with MPI_Put? Should I be using MPI_Get? Something else?
4) How do I get this to work? This is just an educational example for myself, and what I actually need to do is much more complicated.
I can answer item 2, at least. You have:
call c_f_pointer(c_loc(r(1,1)),fptr)
loc_base = fptr
where loc_base is declared integer. You seem to be assuming that loc_base is some sort of address, but it is not. In Fortran, intrinsic assignment from a pointer assigns the value of the target, not the location of the target. So you're effectively doing a TRANSFER of the REAL values of r to loc_base - probably not what you want.
While trying to use a type for blockdiagonal matrices in Fortran code,
I stumbled over a surprising bug in the following piece of code using the following compilers:
GNU Fortran (SUSE Linux) 7.4.0
ifort (IFORT) 18.0.5 20180823
ifort (IFORT) 16.0.1 20151021
If I compile
gfortran -Wall -Werror --debug ifort_bug.f && valgrind ./a.out
I have no errors reported from valgrind.
If I compile
ifort -warn all,error -debug -stacktrace ifort_bug.f && valgrind ./a.out
I get a segmentation fault for ifort_18 in my code and "only" memory leaks for ifort_16.
Is this a bug in the intel compilers, or does gfortran silently fix my bad code?
module blockdiagonal_matrices
implicit none
private
public :: t_blockdiagonal, new, delete,
& blocksizes, operator(.mult.), mult
save
integer, parameter :: dp = kind(1.d0)
type :: t_blockdiagonal
real(dp), allocatable :: block(:, :)
end type
interface new
module procedure block_new
end interface
interface delete
module procedure block_delete
end interface
interface operator (.mult.)
module procedure mult_blocks
end interface
contains
subroutine block_new(blocks, blocksizes)
type(t_blockdiagonal), intent(out) :: blocks(:)
integer, intent(in) :: blocksizes(:)
integer :: i, L
do i = 1, size(blocks)
L = blocksizes(i)
allocate(blocks(i)%block(L, L))
end do
end subroutine
subroutine block_delete(blocks)
type(t_blockdiagonal) :: blocks(:)
integer :: i
do i = 1, size(blocks)
deallocate(blocks(i)%block)
end do
end subroutine
function blocksizes(A) result(res)
type(t_blockdiagonal), intent(in) :: A(:)
integer :: res(size(A))
integer :: i
res = [(size(A(i)%block, 1), i = 1, size(A))]
end function
function mult_blocks(A, B) result(C)
type(t_blockdiagonal), intent(in) :: A(:), B(:)
type(t_blockdiagonal) :: C(size(A))
integer :: i
call new(C, blocksizes=blocksizes(A))
do i = 1, size(A)
C(i)%block = matmul(A(i)%block, B(i)%block)
end do
end function
subroutine mult(A, B, C)
type(t_blockdiagonal), intent(in) :: A(:), B(:)
type(t_blockdiagonal) :: C(:)
integer :: i
do i = 1, size(A)
C(i)%block = matmul(A(i)%block, B(i)%block)
end do
end subroutine
end module blockdiagonal_matrices
program time_blockdiagonal
use blockdiagonal_matrices
integer, parameter :: n_blocks = 2, L_block = 10**2
type(t_blockdiagonal) :: A(n_blocks), B(n_blocks), C(n_blocks)
integer :: i
integer :: start, finish, rate
call system_clock(count_rate=rate)
call new(A, blocksizes=[(L_block, i = 1, n_blocks)])
call new(B, blocksizes=[(L_block, i = 1, n_blocks)])
call new(C, blocksizes=[(L_block, i = 1, n_blocks)])
do i = 1, n_blocks
call random_number(A(i)%block)
call random_number(B(i)%block)
end do
call system_clock(start)
C = A .mult. B
call system_clock(finish)
write(6,*) 'Elapsed Time in seconds:',
& dble(finish - start) / dble(rate)
call system_clock(start)
call mult(A, B, C)
call system_clock(finish)
write(6,*) 'Elapsed Time in seconds:',
& dble(finish - start) / dble(rate)
call delete(A)
call delete(B)
call delete(C)
end program time_blockdiagonal
The ifort_18 segmentation fault is:
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
a.out 00000000004134BD Unknown Unknown Unknown
libpthread-2.26.s 00007FF720EB6300 Unknown Unknown Unknown
a.out 000000000040ABB8 Unknown Unknown Unknown
a.out 000000000040B029 Unknown Unknown Unknown
a.out 0000000000407113 Unknown Unknown Unknown
a.out 0000000000402B4E Unknown Unknown Unknown
libc-2.26.so 00007FF720B0AF8A __libc_start_main Unknown Unknown
a.out 0000000000402A6A Unknown Unknown Unknown
When I went into it with gdb it turned out, that the segfault is raised upon returning from function mult_blocks.
blockdiagonal_matrices::mult_blocks (c=..., a=..., b=...) at ifort_bug.f:63
63 do i = 1, size(A)
(gdb) s
64 C(i)%block = matmul(A(i)%block, B(i)%block)
(gdb) s
s
s
65 end do
(gdb) s
64 C(i)%block = matmul(A(i)%block, B(i)%block)
(gdb) s
65 end do
(gdb) s
66 end function
(gdb) s
Program received signal SIGSEGV, Segmentation fault.
0x000000000040abc4 in do_deallocate_all ()
(gdb) q
A debugging session is active.
Even with this information I cannot find a bug in my code.
EDIT
I found a fix, although I don't understand why it works.
Compilation fix
If I use -heap-arrays for compilation it works. So at first glance it seems to run into a Stackoverflow. If I do ulimit -s unlimited it does not solve the problem though.
Code fix
If I explicitly allocate in the code it solves the issue.
subroutine new(blocks, blocksizes)
type(t_blockdiagonal), allocatable, intent(out) :: blocks(:)
integer, intent(in) :: blocksizes(:)
integer :: i, L
allocate(blocks(size(blocksizes)))
do i = 1, size(blocks)
L = blocksizes(i)
allocate(blocks(i)%block(L, L))
end do
end subroutine
subroutine delete(blocks)
type(t_blockdiagonal), allocatable :: blocks(:)
integer :: i
do i = 1, size(blocks)
deallocate(blocks(i)%block)
end do
deallocate(blocks)
end subroutine
function mult_blocks(A, B) result(C)
type(t_blockdiagonal), intent(in) :: A(:), B(:)
type(t_blockdiagonal), allocatable :: C(:)
integer :: i
call new(C, blocksizes=blocksizes(A))
do i = 1, size(A)
C(i)%block = matmul(A(i)%block, B(i)%block)
end do
end function
This way of writing is IMHO actually better than before and not a "dirty hack".
It is not anymore possible to call new with a differing size of blocksize and blocks.
Open question
Speaking C the type(t_blockdiagonal) :: blocks(n)
Should be a float** blocks[n] so just a vector of pointers to pointers.
The allocation of the actual blocks happened also in the first version on the heap. Hence I don't get the Stackoverflow for a vector that contains ca. 10 pointers.
Intel® Fortran Compiler - Increased stack usage of 8.0 or higher compilers causes segmentation fault (via archive.org)
According to this article, ulimit -s unlimited does not mean that the stack will be literally "unlimited":
The size of "unlimited" varies by Linux configuration, so you may need to specify a larger, specific number to ulimit (for example, 999999999). On Linux also note that many 32bit Linux distributions ship with a pthread static library (libpthread.a) that at runtime will fix the stacksize to 2093056 bytes regardless of the ulimit setting.
It is most likely that you do indeed have a stack overflow, given that -heap-arrays solves the issue.
I'm writing a subroutine that transform a regular vector into the one with only non-zero elements. Say, vector a=(0,0,1,2,3)' (n by 1). Then the non-zero vector is c=(1,2,3), and the row index is recorded as ic=(0,0,0,1,2,3) where ic(1)=0, ic(i+1)-ic(i) is the number of non-zero elements in i-th row. The vector index jc=(1,1,1) with size 3 as there are 3 non-zero entries. See the sparse matrix wiki for FYI: https://en.wikipedia.org/wiki/Sparse_matrix.
Despite its simplicity, I'm having troubles in running the following code named sparsem.f90
!This subroutine coverts a regular sparse matrix a into a CSR form
MODULE SPARSEM
CONTAINS
SUBROUTINE vsparse(a,c,jc,ic,counta,ierr,myid)
IMPLICIT NONE
REAL(8), INTENT(IN):: a(:)
INTEGER, INTENT(IN):: counta,myid
REAL(8), INTENT(OUT):: c(counta)
INTEGER, INTENT(OUT):: jc(counta),ic(size(a)+1)
INTEGER:: ierr,countaa,i
character(len=90):: filename
ierr=0
jc=0
c=0.0d0
ic=0
PRINT *, 'SIZE OF A IN VSPARSE', size(a),count(a>0.0d0),counta
IF (COUNT(a>0.0d0) /= counta) THEN
ierr=1
PRINT *, 'ERROR: number count of non-zero a(i,j) is not', counta
ELSE
countaa=0
ic(1)=0
DO i=1,size(a)
IF (a(i) > 0.0d0 ) THEN
countaa=countaa+1
c(countaa)=a(i)
ic(i+1)=ic(i)+1
jc(countaa)=1
IF (countaa<100) PRINT *,'checkcheckcheck', a(i), &
countaa,jc(countaa),c(countaa),jc(1:5)
ELSE
ic(i+1)=ic(i)
END IF
END DO
PRINT *, 'JCJCJCJC',jc(1:5)
END IF
IF (myid==7) THEN
WRITE(filename,'("sparsedens_dcheck",I1,".txt")') myid+1
OPEN(UNIT=212101, FILE="/home/wenya/Workspace/Model4/valuef/"//filename,ACTION='write',status='replace')
DO i=1,counta+1
IF (i<=counta) THEN
WRITE(212101,*) c(i),jc(i)
ELSE
WRITE(212101,*) 0.0D0,0
END IF
END DO
CLOSE(212101)
END IF
return
END SUBROUTINE vsparse
END MODULE SPARSEM
So the three print jc codes shall give 1 1 1 1 1.... Yet starting from the second print jc code, the result is 6750960 6750691 6750692 .... The array of jc has size 9,000,000. And I know the first 2250000 element is 0.
To replicate this problem, here is the main program
PROGRAM MAIN
USE SPARSEM
IMPICIT NONE
REAL(8):: dens_last(9000000)
REAL(8), ALLOCATABLE :: dens(:)
INTEGER, ALLOCATABLE :: ic(:),jc(:)
INTEGER:: i
dens_last(1:2250000)=0.0d0
dens_last(2250001:9000000)=1.0d0/6750000.0d0
ncount=count(dens_last>0.0d0)
ALLOCATE(dens(ncount), ic(9000000+1), jc(ncount)_
CALL VSPASEM(dens_last, dens, jc, ic, ncount,ierr)
DEALLOCATE(dens,ic,jc)
END PROGRAM MAIN
I am using gfortran 6.3.0 and openmpi latest version on a UBUNTU 17.04 computer. Although openmpi is not used in this example, it's used in the rest of the program. Any thoughts? Thanks!
Hermite Interpolation woes
I am trying to find the Newton Dividing Differences for the function and derivative values of a given set of x's. I'm running into serious problems with my code working for tiny examples, but failing on bigger one's. As is clearly visible, my answers are very much larger than they original function values.
Does anybody have any idea what I'm doing wrong?
program inter
implicit none
integer ::n,m
integer ::i
real(kind=8),allocatable ::xVals(:),fxVals(:),newtonDivDiff(:),dxVals(:),zxVals(:),zdxVals(:),zfxVals(:)
real(kind=8) ::Px
real(kind=8) ::x
Open(Unit=8,File="data/xVals")
Open(Unit=9,File="data/fxVals")
Open(Unit=10,File="data/dxVals")
n = 4 ! literal number of data pts
m = n*2+1
!after we get the data points allocate the space
allocate(xVals(0:n))
allocate(fxVals(0:n))
allocate(dxVals(0:n))
allocate(newtonDivDiff(0:n))
!allocate the zvalue arrays
allocate(zxVals(0:m))
allocate(zdxVals(0:m))
allocate(zfxVals(0:m))
!since the size is the same we can read in one loop
do i=0,n
Read(8,*) xVals(i)
Read(9,*) fxVals(i)
Read(10,*) dxVals(i)
end do
! contstruct the z illusion
do i=0,m,2
zxVals(i) = xVals(i/2)
zxVals(i+1) = xVals(i/2)
zdxVals(i) = dxVals(i/2)
zdxVals(i+1) = dxVals(i/2)
zfxVals(i) = fxVals(i/2)
zfxVals(i+1) = fxVals(i/2)
end do
!slightly modified business as usual
call getNewtonDivDiff(zxVals,zdxVals,zfxVals,newtonDivDiff,m)
do i=0,n
call evaluatePolynomial(m,newtonDivDiff,xVals(i),Px,zxVals)
print*, xVals(i) ,Px
end do
close(8)
close(9)
close(10)
stop
deallocate(xVals,fxVals,dxVals,newtonDivDiff,zxVals,zdxVals,zfxVals)
end program inter
subroutine getNewtonDivDiff(xVals,dxVals,fxVals,newtonDivDiff,n)
implicit none
integer ::i,k
integer, intent(in) ::n
real(kind=8), allocatable,dimension(:,:) ::table
real(kind=8),intent(in) ::xVals(0:n),dxVals(0:n),fxVals(0:n)
real(kind=8), intent(inout) ::newtonDivDiff(0:n)
allocate(table(0:n,0:n))
table = 0.0d0
do i=0,n
table(i,0) = fxVals(i)
end do
do k=1,n
do i = k,n
if( k .eq. 1 .and. mod(i,2) .eq. 1) then
table(i,k) = dxVals(i)
else
table(i,k) = (table(i,k-1) - table(i-1,k-1))/(xVals(i) - xVals(i-k))
end if
end do
end do
do i=0,n
newtonDivDiff(i) = table(i,i)
!print*, newtonDivDiff(i)
end do
deallocate(table)
end subroutine getNewtonDivDiff
subroutine evaluatePolynomial(n,newtonDivDiff,x,Px,xVals)
implicit none
integer,intent(in) ::n
real(kind=8),intent(in) ::newtonDivDiff(0:n),xVals(0:n)
real(kind=8),intent(in) ::x
real(kind=8), intent(out) ::Px
integer ::i
Px = newtonDivDiff(n)
do i=n,1,-1
Px = Px * (x- xVals(i-1)) + newtonDivDiff(i-1)
end do
end subroutine evaluatePolynomial
Values
x f(x) f'(x)
1.16, 1.2337, 2.6643
1.32, 1.6879, 2.9989
1.48, 2.1814, 3.1464
1.64, 2.6832, 3.0862
1.8, 3.1553, 2.7697
Output
1.1599999999999999 62.040113431002474
1.3200000000000001 180.40121445431600
1.4800000000000000 212.36319446149312
1.6399999999999999 228.61845650513027
1.8000000000000000 245.11610836104515
You are accessing array newtonDivDiff out of bounds.
You are first allocating it as 0:n (main program's n) then you are passing to subroutine getNewtonDivDiff as 0:n (the subroutine's n) but you pass m (m=n*2+1) to the argument n. That means you tell the subroutine that the array has bounds 0:m which is 0:9, but it has only bounds 0:4.
It is quite difficult to debug the program as it stands, I had to use valgrind. If you move your subroutines to a module and change the dummy arguments to assumed shape arrays (:,:) then the bound checking in gfortran (-fcheck=all) will catch the error.
Other notes:
kind=8 is ugly, 8 can mean different things for different compilers. If you want 64bit variables, you can use kind=real64 (real64 comes from module iso_fortran_env in Fortran 2008) or use selected_real_kind() (Fortran 90 kind parameter)
You do not have to deallocate your local arrays in the subroutines, they are deallocated automatically.
Your deallocate statement in the main program is after the stop statement, it will never be executed. I would just delete the stop, there is no reason to have it.