FFTW fortran 90: allocatable cut in half when N>20 - fortran

Hi everyone and happy new year !
I'm trying to use the fftw library in a simple fortran 90 code (yes, an old fortran...).
This is a very simple code computing the FFT of vector in=1,2,..., N. I'm surprise by the fact that, for N<20, it works. For N >= 20, it does not work anymore. I guess I missed something important but can't figure out what... And was wondering if you could help me...
I compile my code with this command
ifort test.f90 -o test -lfftw3f
And the code is the following
program test
implicit none
include "fftw3.f"
integer, parameter :: fp =4
integer*8 :: N
double complex, allocatable, dimension (:) :: in, out, aux
integer*8 :: plan
integer*8 :: i, errflag
N=10
allocate(in(N), stat=errflag)
allocate(out(N), stat=errflag)
do i=1,N
in(i) = i
end do
call sfftw_plan_dft_1d(plan, N, in, out, -1, 0)
do i=1,N
print *, in(i)
end do
print *, "================================================"
do i=1,N
print *, out(i)
end do
call sfftw_execute_dft(plan, in, out)
call sfftw_destroy_plan(plan)
deallocate(in, out)
end program test
Surpringly (for me), the vector "in" is modified after the line
call sfftw_plan_dft_1d(plan, N, in, out, -1, 0)
Indeed, the vector is "cut in half" as soon as N>20, in the sense that:
in(i) = 0 if i < N/2
in(i) = i otherwise
However, with N =10 for example, the result seems to be good (same as the one obtained with scilab fft function).
I'm kind of lost and not totally familiar with fortran. Did I missed something important ?
Thank you so much in advance !
edit : whoups, bad copy/paste in the code...

Looking around for what your flags meant, I found this here
http://www.fftw.org/fftw3_doc/Planner-Flags.html#Planner-Flags
Important: the planner overwrites the input array during planning unless a saved plan (see Wisdom) is available for that problem, so you should initialize your input data after creating the plan. The only exceptions to this are the FFTW_ESTIMATE and FFTW_WISDOM_ONLY flags, as mentioned below.
Maybe try
call sfftw_plan_dft_1d(plan, N, in, out, -1, 0)
do i=1,N
in(i) = i
end do
Or I may be misreading something completely out of contect, but I guess it's worth a try :)

Related

Fortran manage local array memory

This is more of a best practice on Fortran code writing other than solving an error.
I have this following code sample with some large array that needs to be passed around to some subroutine for some calculation
program name
implicit none
integer, parameter:: n = 10**8
complex(kind=8) :: x(n)
integer :: i, nVal
nVal = 30
do i =1,1000
call test(x,nVal)
!-----other calculations-----!
! after every step nVal chnages, and after few step nVal converges
! e.g. `nVal` starts from 30 and converges at 14, after 10-15 steps, and stays there for rest of the loop
! once `nVal` converges the `workarray` requires much less memory than it requires at the starts
enddo
contains
subroutine test(arr,m)
integer , intent(inout) :: m
complex(kind=8), intent(inout) :: arr(n)
complex(kind=8) :: workarray(n,m) ! <-- large workspace
!----- do calculation-----------!
!--- check convergence of `m`----!
end
end program name
The internal workarray depends on a value that decreases gradually and reaches a convergence, and stays there for rest of the code. If I check the memory usage with top it shows at 27% from starts to finish. But after few steps the memory requirement should decrease too.
So, I modified the code to use allocatable workarray like this,
program name
implicit none
integer, parameter:: n = 10**8
complex(kind=8) :: x(n)
integer :: i, nVal, oldVal
complex(kind=8), allocatable :: workarray(:,:)
nVal = 30
oldVal = nVal
allocate(workarray(n,nVal))
do i =1,1000
! all calculation of the subroutine `test` brought to this main code
!--- check convergence of `nVal`----!
if(nVal /= oldVal) then
deallocate(workarray)
allocate(workarray(n,nVal))
oldVal = nVal
endif
enddo
end program name
Now, If I use top the memory usage starts at about 28% and then decreases and reaches a converged value of 19%.
Now, my question is how should I code situations like this. The allocatable option do decreases memory requirement but it also hampers the code readability a little bit and introduces code duplication in several places. On the other hand, the prior option keeps larger memory for the whole time where much less memory would suffice. So, what is preferred way of coding in this situation?
I can't help you decide which of the two methods is better; it will depend on how you (or the users of your code) value the potential tradeoff between memory use and cpu use. However, I can suggest a better version of your second method.
Rather than passing workarray in and out of test, you can keep it local to test and use the save attribute to make it persistent between procedure calls.
This would look something like
program name
implicit none
integer, parameter :: dp = selected_real_kind(15,300)
integer, parameter:: n = 10**8
complex(dp) :: x(n)
integer :: i, nVal
nVal = 30
do i =1,1000
call test(x,nVal)
enddo
contains
subroutine test(arr,m)
complex(dp), intent(inout) :: arr(:)
integer, intent(inout) :: m
! Initialise workarray to an empty array
! Avoids having to check if it is allocated each time
complex(dp), allocatable, save :: workarray(:,:) = reshape([complex(dp)::], [0, 0])
! Reallocate workarray if necessary.
if (size(workarray, 2)<m) then
deallocate(workarray)
allocate(workarray(size(arr), m))
endif
end subroutine
end program
If m is likely to increase slowly, you may also want to consider replacing allocate(workarray(size(arr), m)) with allocate(workarray(size(arr), 2*m)), such that you get c++ std::vector-style memory management.
The main downside of this approach (besides not reducing the memory use) is that you need to be more careful if you want to run parallel code which uses procedures with saved variables.

Deallocate causes program to stop without error message

I'm learning Fortran with the book Fortran 90 for scientists and engineers by Brian Hahn. In chapter 9 about arrays, pages 131/132, he gives the following code as an example of dynamic arrays
Program Chap_9_Allocatable_Array
Implicit none
! Variables
Real, dimension(:), Allocatable :: X, OldX
Real A
Integer IO, N, i
! Body of Chap_9_Allocatable_Array
Allocate( X(0) ) !Size zero to sart with?
N = 0
Open(1, File = 'Data.txt')
Do
Read(1, *, IOStat = IO) A
If (IO < 0) Exit
N = N + 1
Allocate( OldX( Size(X) ) )
OldX = X !Entire array can be assigned
Deallocate( X )
Allocate( X(N) )
X = OldX
X(N) = A
Deallocate( OldX )
End do
Print *, (X(i), i = 1, N)
End program Chap_9_Allocatable_Array
I have implemented this program in Visual Studio Community 2019 with the Intel Visual Fortran Compiler. The purpose of this program as he explains is
The following program extract shows how to use allocatable arrays, as these beasts are called, to read an unknown amount of data, which unfortunately must be supplied one item per line because of the way READ works.
I found an interesting error. The file data.txt consists of 100 random numbers, 1 per row. When I try to run it, it just seems to stall for a couple of seconds and then the console simply prints the
Press any key to continue.
prompt, without an error message. I have inserted some debug prints and determined that the program runs the do cycle between 3 to 8 times before stopping. I have not been able to determine the reason. If I then change the data.txt file to only be 3 numbers long, the program runs as intended. With the debug prints, I have pinned the error to being the
Deallocate( X )
line. If I debug the program in Visual Studio I just get the following message:
Chap_9_Allocatable_Array.exe has triggered a breakpoint.
There have been a few minor errors in the book. Just in this example, the author seems to have forgotten to declare i, which caused a compile error. However, as I'm only beggining to understand arrays, I don't know what else to try. Any ideas?

A fortran equivalent to unique

I have found many questions that turn around this issue, but none that directly answer the question:
-in fortran, what are (a) the fastest (wall clock) and (b) the most elegant (concise and clear) way to eliminate duplicates from a list of integers
There has to be a better way than my feeble attempt:
Program unique
implicit none
! find "indices", the list of unique numbers in "list"
integer( kind = 4 ) :: kx, list(10)
integer( kind = 4 ),allocatable :: indices(:)
logical :: mask(10)
!!$ list=(/3,2,5,7,3,1,4,7,3,3/)
list=(/1,(kx,kx=1,9)/)
mask(1)=.true.
do kx=10,2,-1
mask(kx)= .not.(any(list(:kx-1)==list(kx)))
end do
indices=pack([(kx,kx=1,10)],mask)
print *,indices
End Program unique
My attempt expects the list to be ordered, but it would be better if that requirement were lifted
I just couldn't help myself, so I wrote up an answer you may enjoy. The following code will return an array of unique values in ascending order for an input array of unsorted integers. Note that the output results are the actual values, not just the indices.
program unique_sort
implicit none
integer :: i = 0, min_val, max_val
integer, dimension(10) :: val, unique
integer, dimension(:), allocatable :: final
val = [ 3,2,5,7,3,1,4,7,3,3 ]
min_val = minval(val)-1
max_val = maxval(val)
do while (min_val<max_val)
i = i+1
min_val = minval(val, mask=val>min_val)
unique(i) = min_val
enddo
allocate(final(i), source=unique(1:i)) !<-- Or, just use unique(1:i)
print "(10i5:)", final
end program unique_sort
! output: 1 2 3 4 5 7
See this gist for timing comparisons between (unique_sort) above, your example (unique_indices), and the example at Rosetta Code (remove_dups) as well as a couple of variations. I'd like to test #High Performance Mark's code but haven't yet.
Run program 1,000,000 times, 100 integers 0<=N<=50
- unique_sort t~2.1 sec input: unsorted, w/duplicates output: sorted unique values
- remove_dup t~1.4 input: unsorted, w/duplicates output: unsorted unique values
- unique_indices t~1.0 input: sorted, w/duplicates output: unsorted indices for unique values
- BONUS!(Python) t~4.1 input: unsorted, w/duplicates output: sorted unique values
Bottom line: on my machine (i7 8GB laptop) unique_indices is slightly faster than remove_dups. However, remove_dups does not require the input array to be pre-sorted, and actually returns the values rather than the indices (see the gist for a modified version of unique_indices that returns the values instead, which doesn't seem to slow it down much at all).
On the other hand, unique_sort takes around twice as long, but is designed to handle unsorted input, and also returns the values in sorted order, in 8 LOC (minus the var declarations). So that seems a fair trade-off. Anywho, I'm sure unique_sort can be optimized for greater speed using some sort of masking statement, but that's for another day.
Update
The timings shown above were obtained from a test program where each subroutine was placed in a module and executed via a procedure call. However, I found a surprisingly large improvement in performance when unique_sort was placed directly in the main program, completing in only ~0.08 sec for 1 million runs. A speedup of ~25x simply by not using a procedure seems strange to me - ordinarily, I assume that the compiler optimizes the cost of procedure calls away. For example, I found no difference in performance for remove_dup or unique_indices whether they were executed via a procedure or placed directly in the main program.
After #VladimirF pointed out that I was overcomparing, I found I could vectorize my original code (remove the do loop do kx....). I have coupled the "unique" function with a mergesort algorithm loosely based on wikipedia. The guts are contained in module SortUnique
Module SortUnique
contains
Recursive Subroutine MergeSort(temp, Begin, Finish, list)
! 1st 3 arguments are input, 4th is output sorted list
implicit none
integer(kind=4),intent(inout) :: Begin,list(:),temp(:)
integer(kind=4),intent(in) :: Finish
integer(kind=4) :: Middle
if (Finish-Begin<2) then !if run size =1
return !it is sorted
else
! split longer runs into halves
Middle = (Finish+Begin)/2
! recursively sort both halves from list into temp
call MergeSort(list, Begin, Middle, temp)
call MergeSort(list, Middle, Finish, temp)
! merge sorted runs from temp into list
call Merge(temp, Begin, Middle, Finish, list)
endif
End Subroutine MergeSort
Subroutine Merge(list, Begin, Middle, Finish, temp)
implicit none
integer(kind=4),intent(inout) :: list(:),temp(:)
integer(kind=4),intent(in) ::Begin,Middle,Finish
integer(kind=4) :: kx,ky,kz
ky=Begin
kz=Middle
!! While there are elements in the left or right runs...
do kx=Begin,Finish-1
!! If left run head exists and is <= existing right run head.
if (ky.lt.Middle.and.(kz.ge.Finish.or.list(ky).le.list(kz))) then
temp(kx)=list(ky)
ky=ky+1
else
temp(kx)=list(kz)
kz = kz + 1
end if
end do
End Subroutine Merge
Function Unique(list)
!! usage sortedlist=Unique(list)
implicit none
integer(kind=4) :: strt,fin,N
integer(kind=4), intent(inout) :: list(:)
integer(kind=4), allocatable :: unique(:),work(:)
logical,allocatable :: mask(:)
! sort
work=list;strt=1;N=size(list);fin=N+1
call MergeSort(work,strt,fin,list)
! cull duplicate indices
allocate(mask(N));
mask=.false.
mask(1:N-1)=list(1:N-1)==list(2:N)
unique=pack(list,.not.mask)
End Function Unique
End Module SortUnique
Program TestUnique
use SortUnique
implicit none
! find "indices", the list of unique numbers in "list"
integer (kind=4),allocatable :: list(:),newlist(:)
integer (kind=4) :: kx,N=100000 !N even
real (kind=4) :: start,finish,myrandom
allocate(list(N))
do kx=1,N
call random_number(myrandom)
list(kx)=ifix(float(N)/2.*myrandom)
end do
call cpu_time(start)
newlist=unique(list)
call cpu_time(finish)
print *,"cull duplicates: ",finish-start
print *,"size(newlist) ",size(newlist)
End Program TestUnique
At #HighPerformanceMark 's suggestion, the function is simply invoked as newlist=unique(list). The above is certainly not concise, but it seems clear, and it is about 200 times faster than either my original or the other solutions proposed.

Passing an allocated array from a SUBTROUTINE to the main program in Fortran

There are several threads with similar titles of mine, but I do not believe they are the same. One was very similar fortran pass allocated array to main procedure, but the answer required Fortran 2008. I am after a Fortran 90/95 solution.
Another very good, and quite similar thread is Dynamic array allocation in fortran90. However in this method while they allocate in the subroutine, they don't ever appear to deallocate, which seems odd. My method looks on the surface at least to be the same, yet when I print the array in the main program, only blank spaces are printed. When I print in the subroutine itself, the array prints to screen the correct values, and the correct number of values.
In the following a MAIN program calls a subroutine. This subroutine reads data into an allocatable array, and passes the array back to the main program. I do this by using small subroutines each designed to look for specific terms in the input file. All of these subroutines are in one module file. So there are three files: Main.f90, input_read.f90 and filename.inp.
It seems then that I do not know how to pass an array that is allocatable in program Main.f90 as well as in the called subroutine where it is actually allocated, sized, and then deallocated before being passed to program Main. This perhaps sounds confusing, so here is the code for all three programs. I apologize for the poor formatting when I pasted it. I tried to separate all the rows.
main.f90:
Program main
use input_read ! the module with the subroutines used for reading filename.inp
implicit none
REAL, Allocatable :: epsilstar(:)
INTEGER :: natoms
call Obtain_LJ_Epsilon(epsilstar, natoms)
print*, 'LJ Epsilon : ', epsilstar
END Program main
Next is the module with a subroutine (I removed all but the necessary one for space), input_read.f90:
module input_read
contains
!===============================================================
!===============================================================
Subroutine Obtain_LJ_Epsilon(epsilstar,natoms)
! Reads epsilon and sigma parameters for Lennard-Jones Force-Field and also
! counts the number of types of atoms in the system
!===============================================================
!===============================================================
INTEGER :: error,line_number,natoms_eps,i
CHARACTER(120) :: string, next_line, next_next_line,dummy_char
CHARACTER(8) :: dummy_na,dummy_eps
INTEGER,intent(out) :: natoms
LOGICAL :: Proceed
real, intent(out), allocatable :: epsilstar(:)
error = 0
line_number = 0
Proceed = .true.
open(10,file='filename.inp',status='old')
!=============================================
! Find key word LJ_Epsilon
!=============================================
DO
line_number = line_number + 1
Read(10,'(A120)',iostat=error) string
IF (error .NE. 0) THEN
print*, "Error, stopping read input due to an error reading line"
exit
END IF
IF (string(1:12) == '$ LJ_epsilon') THEN
line_number = line_number + 1
exit
ELSE IF (string(1:3) == 'END' .or. line_number > 2000) THEN
print*, "Hit end of file before reading '$ LJ_epsilon' "
Proceed = .false.
exit
ENDIF
ENDDO
!========================================================
! Key word found, now determine number of parameters
! needing to be read
!========================================================
natoms_eps = -1
dummy_eps = 'iii'
do while ((dummy_eps(1:1) .ne. '$') .and. (dummy_eps(1:1) .ne. ' '))
natoms_eps = natoms_eps + 1
read(10,*) dummy_eps
enddo !we now know the number of atoms in the system (# of parameters)
close(10)
Allocate(epsilstar(natoms_eps))
epsilstar = 0.0
!============================================================
! Number of parameters found, now read their values
!============================================================
if(Proceed) then
open(11,file='filename.inp',status='old')
do i = 1,line_number-1
read(11,*) ! note it is not recording anything for this do loop
enddo
do i = 1,natoms_eps
read(11,*) dummy_char
read(dummy_char,*) epsilstar(i) ! convert string read in to real, and store in epsilstar
enddo
close(11)
PRINT*, 'LJ_epsilon: ', epsilstar ! printing to make sure it worked
endif
deallocate(epsilstar)
END Subroutine Obtain_LJ_Epsilon
end module input_read
And finally the input file: filename.inp
# Run_Type
NVT
# Run_Name
Test_Name
# Pressure
1.0
# Temperature
298.15
# Number_Species
# LJ_epsilon
117.1
117.1
117.1
# LJ_sigma
3.251
3.251
3.251
END
And again, I can't figure out how to pass the allocated epsilstar array to the main program. I have tried passing an unallocated array to the subroutine from the main.f90, allocating it inside, passing it back, and deallocating it in the main.f90, but that did not work. I have tried it as the code currently is... the code works (i.e. is bug free) but it does not pass epsilstar from the subroutine where it correctly finds it and creates an array.
It turns out that the mistake I made was in deallocating the array in the subroutine before passing it to the main program. By NOT deallocating, the array was sent back fine. Also, I do not deallocate in the main program either.

Whats wrong with my Hermite Interpolation in Fortran?

Hermite Interpolation woes
I am trying to find the Newton Dividing Differences for the function and derivative values of a given set of x's. I'm running into serious problems with my code working for tiny examples, but failing on bigger one's. As is clearly visible, my answers are very much larger than they original function values.
Does anybody have any idea what I'm doing wrong?
program inter
implicit none
integer ::n,m
integer ::i
real(kind=8),allocatable ::xVals(:),fxVals(:),newtonDivDiff(:),dxVals(:),zxVals(:),zdxVals(:),zfxVals(:)
real(kind=8) ::Px
real(kind=8) ::x
Open(Unit=8,File="data/xVals")
Open(Unit=9,File="data/fxVals")
Open(Unit=10,File="data/dxVals")
n = 4 ! literal number of data pts
m = n*2+1
!after we get the data points allocate the space
allocate(xVals(0:n))
allocate(fxVals(0:n))
allocate(dxVals(0:n))
allocate(newtonDivDiff(0:n))
!allocate the zvalue arrays
allocate(zxVals(0:m))
allocate(zdxVals(0:m))
allocate(zfxVals(0:m))
!since the size is the same we can read in one loop
do i=0,n
Read(8,*) xVals(i)
Read(9,*) fxVals(i)
Read(10,*) dxVals(i)
end do
! contstruct the z illusion
do i=0,m,2
zxVals(i) = xVals(i/2)
zxVals(i+1) = xVals(i/2)
zdxVals(i) = dxVals(i/2)
zdxVals(i+1) = dxVals(i/2)
zfxVals(i) = fxVals(i/2)
zfxVals(i+1) = fxVals(i/2)
end do
!slightly modified business as usual
call getNewtonDivDiff(zxVals,zdxVals,zfxVals,newtonDivDiff,m)
do i=0,n
call evaluatePolynomial(m,newtonDivDiff,xVals(i),Px,zxVals)
print*, xVals(i) ,Px
end do
close(8)
close(9)
close(10)
stop
deallocate(xVals,fxVals,dxVals,newtonDivDiff,zxVals,zdxVals,zfxVals)
end program inter
subroutine getNewtonDivDiff(xVals,dxVals,fxVals,newtonDivDiff,n)
implicit none
integer ::i,k
integer, intent(in) ::n
real(kind=8), allocatable,dimension(:,:) ::table
real(kind=8),intent(in) ::xVals(0:n),dxVals(0:n),fxVals(0:n)
real(kind=8), intent(inout) ::newtonDivDiff(0:n)
allocate(table(0:n,0:n))
table = 0.0d0
do i=0,n
table(i,0) = fxVals(i)
end do
do k=1,n
do i = k,n
if( k .eq. 1 .and. mod(i,2) .eq. 1) then
table(i,k) = dxVals(i)
else
table(i,k) = (table(i,k-1) - table(i-1,k-1))/(xVals(i) - xVals(i-k))
end if
end do
end do
do i=0,n
newtonDivDiff(i) = table(i,i)
!print*, newtonDivDiff(i)
end do
deallocate(table)
end subroutine getNewtonDivDiff
subroutine evaluatePolynomial(n,newtonDivDiff,x,Px,xVals)
implicit none
integer,intent(in) ::n
real(kind=8),intent(in) ::newtonDivDiff(0:n),xVals(0:n)
real(kind=8),intent(in) ::x
real(kind=8), intent(out) ::Px
integer ::i
Px = newtonDivDiff(n)
do i=n,1,-1
Px = Px * (x- xVals(i-1)) + newtonDivDiff(i-1)
end do
end subroutine evaluatePolynomial
Values
x f(x) f'(x)
1.16, 1.2337, 2.6643
1.32, 1.6879, 2.9989
1.48, 2.1814, 3.1464
1.64, 2.6832, 3.0862
1.8, 3.1553, 2.7697
Output
1.1599999999999999 62.040113431002474
1.3200000000000001 180.40121445431600
1.4800000000000000 212.36319446149312
1.6399999999999999 228.61845650513027
1.8000000000000000 245.11610836104515
You are accessing array newtonDivDiff out of bounds.
You are first allocating it as 0:n (main program's n) then you are passing to subroutine getNewtonDivDiff as 0:n (the subroutine's n) but you pass m (m=n*2+1) to the argument n. That means you tell the subroutine that the array has bounds 0:m which is 0:9, but it has only bounds 0:4.
It is quite difficult to debug the program as it stands, I had to use valgrind. If you move your subroutines to a module and change the dummy arguments to assumed shape arrays (:,:) then the bound checking in gfortran (-fcheck=all) will catch the error.
Other notes:
kind=8 is ugly, 8 can mean different things for different compilers. If you want 64bit variables, you can use kind=real64 (real64 comes from module iso_fortran_env in Fortran 2008) or use selected_real_kind() (Fortran 90 kind parameter)
You do not have to deallocate your local arrays in the subroutines, they are deallocated automatically.
Your deallocate statement in the main program is after the stop statement, it will never be executed. I would just delete the stop, there is no reason to have it.