Sign issues with Fo Householder Transformation QR Decomposition - fortran

I am having trouble implementing the QR Decomposition using the Householder Reflection algorithm. I am getting the right numbers for the decomposition but the signs are incorrect. I can't seem to troubleshoot what's causing this issue since I can't find good pseudocode for the algorithm elsewhere on the internet. I have a feeling it has something to do with how I am choosing the sign for the tau variable, but I am following the pseudocode given to me from my textbook.
I have chosen to write this algorithm as a Fortran subroutine to prepare for an upcoming test on this material, and for now, it only computes the upper triangular matrix and not Q. Here is the output that this algorithm gives me:
-18.7616634 -9.80723381 -15.7768545 -11.0864372
0.00000000 -9.99090481 -9.33576298 -7.53412533
0.00000000 0.00000000 -5.99453259 -9.80128098
0.00000000 0.00000000 0.00000000 -0.512612820
and here is the correct output:
18.7616634 9.80723286 15.7768517 11.0864372
0.00000000 9.99090672 9.33576488 7.53412533
0.00000000 0.00000000 5.99453354 9.80128098
0.00000000 0.00000000 0.00000000 0.512614250
So it is basically just a sign issue it seems because the values themselves are otherwise correct.
! QR Decomposition with Householder reflections
! Takes a matrix square N x N matrix A
! upper triangular matrix R is stored in A
SUBROUTINE qrdcmp(A, N)
REAL, DIMENSION(10,10) :: A, Q, I_N, uu
REAL gamma, tau, E
INTEGER N, & ! value of n for n x n matrix
I,J,K ! loop indices
E = EPSILON(E) ! smallest value recognized by compiler for real
! Identity matrix initialization
DO I=1,N
DO J=1,N
IF(I == J) THEN
I_N(I,J) = 1.0
ELSE
I_N(I,J) = 0.0
END IF
END DO
END DO
! Main loop
DO K=1,N
IF(ABS(MAXVAL(A(K:N,K))) < E .and. ABS(MINVAL(A(K:N,K))) < E) THEN
gamma = 0
ELSE
tau = 0
DO J=K,N
tau = tau + A(J,K)**2.0
END DO
tau = tau**(0.5) ! tau currently holds norm of the vector
IF(A(K,K) < 0) tau = -tau
A(K,K) = A(K,K) + tau
gamma = A(K,K)/tau
DO J=K+1,N
A(J,K) = A(J,K)/A(K,K)
END DO
A(K,K) = 1
END IF
DO I=K,N
DO J=K,N
uu(I,J) = gamma*A(I,K)*A(J,K)
END DO; END DO
Q(K:N,K:N) = -1*I_N(K:N,K:N) + uu(K:N,K:N)
A(K:N,K+1:N) = MATMUL(TRANSPOSE(Q(K:N,K:N)), A(K:N,K+1:N))
A(K,K) = -tau
END DO
END SUBROUTINE qrdcmp
Any insight would be greatly appreciated. Also, I apologize if my code is not as easily readable as it could be.

In the construction of a reflector, you have the choice of two bisectors between the current column vector and the associated basis vector. Here this is done in the line IF(A(K,K) < 0) tau = -tau, as you already identified. "Unfortunately", the stable choice that avoids cancellation issues, usually also produces a sign flip in R as you observed. For instance, the decomposition of a matrix close to I will produce a result close to Q,R = -I,-I. If you want the R factor with positive diagonal, you have to shift the signs from R to Q in an extra step after completing the decomposition.
LAPACK had a short period where they used the choice producing the positive diagonal by default, this produced some strange errors/numerical stability problems where there previously where none.

Related

matrix diagonalization and basis change with geev

I want to diagonalize a matrix and then be able to do basis changes. The aim in the end is to do matrix exponentiation, with exp(A) = P.exp(D).P^{-1}.
I use sgeev to diagonalize A. If I am not mistaken (and I probably am since it's not working), sgeev gives me P in the vr matrix and P^{-1} is transpose(vl). The diagonal matrix can be reconstitute from the eigenvalues wr.
The problem is that when I try to verify the matrix transformation by computing P * D * P^{-1} it's not giving A back.
Here's my code:
integer :: i,n, info
real::norm
real, allocatable:: A(:,:), B(:,:), C(:,:),D(:,:)
real, allocatable:: wr(:), wi(:), vl(:, :), vr(:, :), work(:)
n=3
allocate(vr(n,n), vl(n,n), wr(n), wi(n), work(4*n))
allocate(A(n,n),B(n,n), C(n,n),D(n,n))
A(1,:)=(/1,0,1/)
A(2,:)=(/0,2,1/)
A(3,:)=(/0,3,1/)
call sgeev('V','V',n,A,n,wr,wi,vl,n,vr,n,work,size(work,1),info)
print*,'eigenvalues'
do i=1,n
print*,i,wr(i),wi(i)
enddo
D=0.0
D(1,1)=wr(1)
D(2,2)=wr(2)
D(3,3)=wr(3)
C = matmul(D,transpose(vl))
B = matmul(vr,C)
print*,'A'
do i=1, n
print*, B(i,:)
enddo
The printed result is:
eigenvalues
1 1.00000000 0.00000000
2 3.30277562 0.00000000
3 -0.302775621 0.00000000
A
0.688247263 0.160159975 0.764021933
0.00000000 1.66581571 0.817408621
0.00000000 2.45222616 0.848407149
A is not the original A, not even considering an eventual factor.
I guess I am somehow mistaken since I checked the eigenvectors by computing matmul(A,vr) = matmul(vr,D) and matmul(transpose(vl),A) = matmul(D, transpose(vl)), and it worked.
Where am I wrong?
The problem is that transpose(vl) is not the inverse of vr. The normalisation given by sgeev is that each eigenvector (each column of vl or vr) is individually normalised. This means that dot_product(vl(:,i), vr(:,j)) is zero if i/=j, but is in general <1 if i==j.
If you want to get P^{-1}, you need to scale each column of vl by a factor of 1/dot_product(vl(:,i),vr(:,i) before transposing it.

Need to find when value is == 0, but I cannot due to numerical errors

I have 2 large lists of vectors (>10,000 vectors each, say vi and wi) and I am trying to find when vi cross-product wi = 0, or, when vi x wi = 0.
The lists of vectors are previously calculated (this is Computational Fluid Dynamics and the calculated vectors represent properties of a fluid. I am doing research in Vortex Identification and this calculation is necessary).
I am trying to find when the cross product == 0 but I only get 3 results out of the thousands where the cross product is satisfied. We are trying to automate a method done by hand so we know for a fact that there are more than 3 vectors.
Our assumption is that since we are using basic numerical methods (of low orders) to calculate the vectors, there is a build up of errors.
TLDR: In essence, this does not work due to numerical errors:
real :: cross1, cross2, cross3
logical :: check1, check2, check3
logical :: is_seed
check1 = cross1 == 0.0
check2 = cross2 == 0.0
check3 = cross3 == 0.0
is_seed = check1 .and. check2 .and. check3
so, we have to do this:
real :: cross1, cross2, cross3
real :: tol
logical :: check1, check2, check3
logical :: is_seed
tol = 1.0e-4 ! NEED TO FIND OUT HOW TO CALCULATE
check1 = cross1 <= (0.0 + tol)
check2 = cross2 <= (0.0 + tol)
check3 = cross3 <= (0.0 + tol)
is_seed = check1 .and. check2 .and. check3
but I want to know how to calculate tol automatically and not hard code it. How can this be done?
Edit 1
As pointed out in the comments, the function below is entirely equivalent to the built-in function spacing(x).
Edit 2
Use the following function ulp(x) to find the value of the least significant bit in the mantissa of an ieee754 number x
32-bit
elemental function ulp32(x) result(d)
real(real32), intent(in) :: x
real(real32) :: d
d = 2.0**(-floor(-log(x)/log(2e0))-24)
end function
64-bit
elemental function ulp64(x) result(d)
real(real64), intent(in) :: x
real(real64) :: d
d = 2d0**(-floor(-log(x)/log(2d0))-53)
end function
interface
interface ulp
procedure :: ulp32, ulp64
end interface
with some results given values between 1 and 1e9
x 32bit 64bit
517.54 0.00006104 0.00000000000011369
1018.45 0.00006104 0.00000000000011369
1972.33 0.00012207 0.00000000000022737
5416.69 0.00048828 0.00000000000090949
11812.67 0.00097656 0.00000000000181899
13190.24 0.00097656 0.00000000000181899
18099.97 0.00195312 0.00000000000363798
28733.47 0.00195312 0.00000000000363798
86965.21 0.00781250 0.00000000001455192
135734.23 0.01562500 0.00000000002910383
203975.41 0.01562500 0.00000000002910383
780835.66 0.06250000 0.00000000011641532
2343924.58 0.25000000 0.00000000046566129
2552437.80 0.25000000 0.00000000046566129
6923904.28 0.50000000 0.00000000093132257
8929837.66 1.00000000 0.00000000186264515
29408286.38 2.00000000 0.00000000372529030
70054595.74 8.00000000 0.00000001490116119
231986024.46 16.00000000 0.00000002980232239
392724963.99 32.00000000 0.00000005960464478
It is recommended to pick a tol value that is a factor of ulp, and this factor should be a power of two. Each power means shifting one bit over to increase the tolerance by a power of two. You can expect each operation that propagates round-off errors to also make the error larger proportionally to 2**n where n is the number of operations.
So depending on the magnitude of the values compared, the tolerance should be approximated by tol = factor * abs(x) * 2**(-24)
For example, comparing two values of x=12418.16752 and y=12418.16774 pick a tolerance with
tol = 8*ulp(15000.0)
check = abs(x-y) <= tol
I get a tol=7.8125000E-03 and the result check=.true.
Edit 0
<Post deleted>
In the first place, you should have knowledge of the error on the vector components, otherwise no test for zero can be conclusive.
Now the absolute error on the cross product is like
(u + δu) x (v + δv) - uv ~ u x δv + δu x v
and in the worst case the vectors can be orthogonal, giving the estimate |u||δu|+|v||δv|=(|u|+|v|)δ. So a value of |u x v| below this bound could correspond to parallel vectors.
I found a solution to my problem.
First, I take the magnitude of the vector. I do this so I only have to work with one value instead of 3. This is possible since ||v|| = 0 if and only if v = 0. I save the magnitude of those vectors in a new array called cross_mag (since the vector is the result of a cross product).
Then I find the lowest value in the array that is not zero. (This is to discount outliers that may be equal to zero)
I found that when the number is written in scientific notation, the exponent of 10 will give me a power x that I can base my tolerance off of. I do this using log10( min_value ).
I then increase the power of the lowest value by 1, which increases the total tolerance directly by a factor of 10.
I use this new value as the exponent of my tol. (This can of course be scaled which I have done by a factor of 1.5).
Or:
real, dimension(:,:,:) :: cross_mag
real :: min_val, ex, tol
integer :: imax, jmax, kmax
! Find new "zero" that is based off of the lowest values.
! This new zero is required due to the buildup of numerical errors.
min_val = rrspacing(1.0)
do k = 1, kmax
do j = 1, jmax
do i = 1, imax
if ((cross_mag(i,j,k) < min_val) .and. (cross_mag(i,j,k) .ne. 0.0)) then
min_val = cross_mag(i,j,k)
end if
end do
end do
end do
ex = log10(abs(min_val))
ex = floor(ex)
tol = 1.5 * 10.0**(ex + 1.0)
write(*,*) 'min_val: ', min_val
write(*,*) 'tol: ', tol
I found this works plenty well for my work and gives me a reasonable amount of vectors to work with. I thank you all for helping my find the rrspacing() function which helped me create an arbitrarily large number.

Foucault Pendulum simulation

Program Foucault
IMPLICIT NONE
REAL,DIMENSION(:),ALLOCATABLE :: t, x,y
REAL,PARAMETER :: pi=3.14159265358979323846, g=9.81
REAL :: L, vitessea, lat, h, omega, beta
INTEGER :: i , zeta
zeta=1000
Allocate(x(zeta),y(zeta),t(zeta))
L=67.
lat=49/180*pi
omega=sqrt(g/L)
h=0.01
Do i= 1,zeta
IF(i==1 .OR. i==2) THEN
t(1)=0.0
t(2)=0.0
x(1)=0.1
x(2)=1
y(1)=0.0
y(2)=0.0
ELSE
t(i+1)=real(i)*h
x(i+1)=(-omega**2*x(i)+2.0*((y(i)-y(i-1))/h)*latang(lat))*h**2+2.0*x(i)-x(i-1)
y(i+1)=(-omega**2*y(i)-2.0*((x(i)-x(i-1))/h)*latang(lat))*h**2+2.0*y(i)-y(i-1)
END IF
WRITE(40,*) t(i), x(i)
WRITE(60,*) t(i), y(i)
WRITE(50,*) x(i), y(i)
END DO
Contains
REAL Function latang(alpha)
REAL, INTENT(IN) :: alpha
REAL :: sol
latang=2*pi*sin(alpha)/86400
END FUNCTION
End Program Foucault
I'm trying to code the original Foucault Pendulum in Paris. My code seems to be working but so far, I could only get the below right graphic, "the flower" evolution. Therefore, I changed my parameters constantly to get the left graphic but I couldn't.
I took parameters of Foucault Pendulum installed in Paris with L=67, angular velocity of earth =2*pi/86400 and latitude of 49/180*pi.
My initial conditions are as written in the code. I tried a way range of parameters varying all of my initial conditions, my latitude and angular velocity but i couldn't get the left desired results.
I used Foucault differential equations as below : i coded them with Finite difference method (more simple than Runge-Kutta) by replacing the 2nd order derivation by its central finite difference. And the first order one by it's backward finite difference. By then, i build my loop by isolating x(i+1) and y(i+1) in both equations.
My code is very sensitive to parameters such as h (=derivation step), earth angular velocity and latitude (which is normal). I tried to change a way big range of parameters from a big h step to a small one, to a minimal and high latitude, initial conditions...etc but i couldn't ever get the left graphic which i rather need.
What could be made to get the left one ?
I was able to get the two charts, by speeding up the earth's rotation 120× fold, and allowing the simulation to run for 32 swings of the pendulum. Also, I noticed that Euler integration added energy to the system making for bad results, so I reverted to a standard RK4 implementation.
and here is the code I used to solve this ODE:
program FoucaultOde
implicit none
integer, parameter :: sp = kind(1.0), dp = kind(1d0)
! Constants
real, parameter :: g=9.80665, pi =3.1415926536
! Variables
real, allocatable :: y(:,:), yp(:), k0(:),k1(:),k2(:),k3(:)
real :: lat, omega, h, L, earth, period
real :: t0,x0,y0,vx0,vy0
integer :: i, zeta, f1, swings
! Code starts here
swings = 32
zeta = 400*swings
L = 67
lat = 49*pi/180
period = 24*60*60 ! period = 86400
earth = (2*pi*sin(lat)/period)*120 !120 multiplier for roation
omega = sqrt(g/L)
allocate(y(5,zeta))
allocate(yp(5), k0(5),k1(5),k2(5),k3(5))
! make pendulum complete 'swings' cycles in 'zeta' steps
h = swings*2*pi/(omega*zeta)
t0 = 0
x0 = 0.5 ! Initial displacement
y0 = 0
vx0 = 0
vy0 = 0
! Initial conditions in the state vector Y
Y(:,1) = [t0,x0,y0,vx0,vy0]
do i=2, zeta
! Euler method (single step)
! Yp = ode(Y(:,i-1))
! Runge-Kutta method (four steps)
k0 = ode(Y(:,i-1))
k1 = ode(Y(:,i-1) + h/2*k0)
k2 = ode(Y(:,i-1) + h/2*k1)
k3 = ode(Y(:,i-1) + h*k2)
Yp = (k0+2*k1+2*k2+k3)/6
! Take a step
Y(:,i) = Y(:,i-1) + h*Yp
end do
open( newunit=f1, file='results.csv', status = 'replace', pad='no')
! write header
write (f1, '(a15,a,a15,a,a15,a,a15,a,a15)') 't',',', 'x',',','y',',', 'vx',',','vy'
! write rows of data, comma-separated
do i=1, zeta
write (f1, '(g,a,g,a,g,a,g,a,g)') y(1,i),',',y(2,i),',',y(3,i),',',y(4,i),',',y(5,i)
end do
close(f1)
contains
function ode(Y) result(Yp)
real, intent(in) :: Y(5)
real :: Yp(5), t,px,py,vx,vy,ax,ay
! Read state vector Y to component values
t = Y(1)
px = Y(2)
py = Y(3)
vx = Y(4)
vy = Y(5)
! Reference paper:
! http://www.legi.grenoble-inp.fr/people/Achim.Wirth/final_version.pdf
ax = -(omega**2)*px + 2*vy*earth ! (equation 53)
ay = -(omega**2)*py - 2*vx*earth ! (equation 54)
! State vector rate. Note, rate of time is aways 1.0
Yp = [1.0, vx, vy, ax, ay]
end function
end program FoucaultOde
The resulting file results.csv looks like this for me (for checking)
t, x, y, vx, vy
.000000 , 5.000000 , .000000 , .000000 , .000000
.4105792E-01, 4.999383 , .1112020E-06, -.3004657E-01, .8124921E-05
.8211584E-01, 4.997533 , .8895339E-06, -.6008571E-01, .3249567E-04
.1231738 , 4.994450 , .3001796E-05, -.9011002E-01, .7310022E-04
.1642317 , 4.990134 , .7114130E-05, -.1201121 , .1299185E-03
.2052896 , 4.984587 , .1389169E-04, -.1500844 , .2029225E-03
.2463475 , 4.977810 , .2399832E-04, -.1800197 , .2920761E-03
.2874054 , 4.969805 , .3809619E-04, -.2099106 , .3973353E-03
...
from which I plotted the 2nd and 3rd columns in one chart, and the 4th and 5th for the second chart.
There is one thing that may be wrong depending on how you manage different step sizes, and an observation on the physics of the real-world example. With the initialization of the arrays, you imply an initial velocity of about 0.9/0.01=90 [m/s] in x direction away from the center. To get compatible results for different step sizes, you would need to adapt the calculation of x(2). However, in the graphs the plot starts from a point with zero velocity. This you can implement to first order by setting x(2)=x(1)=1. As the used integration method is also first order, this is sufficient.
For the second point, note that one can write the system using complex coordinates z=x+iy as
z'' = -w^2*z - 2*i*E*z', E = Omega*sin(theta)
This is a linear ODE with constant coefficients, the solution of it is
z(t) = exp(-i*E*t) * (A*cos(w1*t)+B*sin(w1*t)), w1 = sqrt(w^2+E^2)
This describes a pendulum motion of frequency w1 whose plane rotates with frequency E clockwise. The grand rotation has period T=2*pi/E, during which w1*T/(2*pi)=w1/E pendulum swings occur.
Now insert your numbers, w=sqrt(g/L)=0.383 and E=2*pi*sin(49°)/86400=5.49e-05, so that essentially w1=w. The number of pendulum cycles per full rotation is w/E=6972, so that you can expect a densely filled circle in the plot. Or a very narrow double wedge if only a few cycles are plotted. As each cycle takes 2*pi/w=16.4 [s], and the integration goes 1000 steps of step size 0.01, in the plot as it is you can expect a swing forth and part of the swing back.
To be more realistic, set the initial velocity to zero, that is, the pendulum is taken to its start position and then let go. Also increase the time to 30 [s] to have more than one pendulum cycle in the plot.
It from this we can see that the solutions converge, and with some imagination, that they converge linearly.
To get a plot like in the cited images, one needs a much smaller fraction of w/E, counting the swings, it has to be around 15. Note that you can not get this ratio anywhere on earth with a realistically scaled pendulum. So set w=pi, E=pi/16 and integrate over 15 time units using the first order method.
This detoriorates really fast, even for the smallest step size with 40 points in a pendulum cycle.
For a better result, increase the local truncation order to the next higher by using the central difference in the first derivative approximation.
z(i+1) - 2*z(i) + z(i-1) = -w^2*z(i)*dt^2 - i*E*(z(i+1)-z(i-1))*dt
z(i+1) = ( 2*z(i) - z(i-1) - w^2*z(i)*dt^2 + i*E*z(i-1)*dt ) / (1+i*E*dt)
The division by the complex number can also be easily carried out in the real components of the trajectory,
! x(i+1)-2*x(i)+x(i-1) = h^2*(-omega**2*x(i)) + h*earth*(y(i+1)-y(i-1))
! y(i+1)-2*y(i)+y(i-1) = h^2*(-omega**2*y(i)) - h*earth*(x(i+1)-x(i-1))
t(i) = t(i-1) + h
cx = (2-(h*omega)**2)*x(i) - x(i-1) - h*earth*y(i-1)
cy = (2-(h*omega)**2)*y(i) - y(i-1) + h*earth*x(i-1)
den = 1+(h*earth)**2
x(i+1) = (cx + h*earth*cy)/den
y(i+1) = (cy - h*earth*cx)/den
Now to respect the increased order, also the initial points need to have an order of accuracy more, using again zero initial speed, this gives in the second order Taylor expansion
z(2) = z(1) - 0.5*w^2*z(1)*dt^2
All the step sizes that gave deviating and structurally deteriorating results in the first order method now give a visually identical, structurally stable results in this second order method.

Evaluating the fast Fourier transform of Gaussian function in FORTRAN using FFTW3 library

I am trying to write a FORTRAN code to evaluate the fast Fourier transform of the Gaussian function f(r)=exp(-(r^2)) using FFTW3 library. As everyone knows, the Fourier transform of the Gaussian function is another Gaussian function.
I consider evaluating the Fourier-transform integral of the Gaussian function in the spherical coordinate.
Hence the resulting integral can be simplified to be integral of [r*exp(-(r^2))*sin(kr)]dr.
I wrote the following FORTRAN code to evaluate the discrete SINE transform DST which is the discrete Fourier transform DFT using a PURELY real input array. DST is performed by C_FFTW_RODFT00 existing in FFTW3, taking into account that the discrete values in position space are r=i*delta (i=1,2,...,1024), and the input array for DST is the function r*exp(-(r^2)) NOT the Gaussian. The sine function in the integral of [r*exp(-(r^2))*sin(kr)]dr resulting from the INTEGRATION over the SPHERICAL coordinates, and it is NOT the imaginary part of exp(ik.r) that appears when taking the analytic Fourier transform in general.
However, the result is not a Gaussian function in the momentum space.
Module FFTW3
use, intrinsic :: iso_c_binding
include 'fftw3.f03'
end module
program sine_FFT_transform
use FFTW3
implicit none
integer, parameter :: dp=selected_real_kind(8)
real(kind=dp), parameter :: pi=acos(-1.0_dp)
integer, parameter :: n=1024
real(kind=dp) :: delta, k
real(kind=dp) :: numerical_F_transform
integer :: i
type(C_PTR) :: my_plan
real(C_DOUBLE), dimension(1024) :: y
real(C_DOUBLE), dimension(1024) :: yy, yk
integer(C_FFTW_R2R_KIND) :: C_FFTW_RODFT00
my_plan= fftw_plan_r2r_1d(1024,y,yy,FFTW_FORWARD, FFTW_ESTIMATE)
delta=0.0125_dp
do i=1, n !inserting the input one-dimension position function
y(i)= 2*(delta)*(i-1)*exp(-((i-1)*delta)**2)
! I multiplied by 2 due to the definition of C_FFTW_RODFT00 in FFTW3
end do
call fftw_execute_r2r(my_plan, y,yy)
do i=2, n
k = (i-1)*pi/n/delta
yk(i) = 4*pi*delta*yy(i)/2 !I divide by 2 due to the definition of
!C_FFTW_RODFT00
numerical_F_transform=yk(i)/k
write(11,*) i,k,numerical_F_transform
end do
call fftw_destroy_plan(my_plan)
end program
Executing the previous code gives the following plot which is not for Gaussian function.
Can anyone help me understand what the problem is? I guess the problem is mainly due to FFTW3. Maybe I did not use it properly especially concerning the boundary conditions.
Looking at the related pages in the FFTW site (Real-to-Real Transforms, transform kinds, Real-odd DFT (DST)) and the header file for Fortran, it seems that FFTW expects FFTW_RODFT00 etc rather than FFTW_FORWARD for specifying the kind of
real-to-real transform. For example,
! my_plan= fftw_plan_r2r_1d( n, y, yy, FFTW_FORWARD, FFTW_ESTIMATE )
my_plan= fftw_plan_r2r_1d( n, y, yy, FFTW_RODFT00, FFTW_ESTIMATE )
performs the "type-I" discrete sine transform (DST-I) shown in the above page. This modification seems to fix the problem (i.e., makes the Fourier transform a Gaussian with positive values).
The following is a slightly modified version of OP's code to experiment the above modification:
! ... only the modified part is shown...
real(dp) :: delta, k, r, fftw, num, ana
integer :: i, j, n
type(C_PTR) :: my_plan
real(C_DOUBLE), allocatable :: y(:), yy(:)
delta = 0.0125_dp ; n = 1024 ! rmax = 12.8
! delta = 0.1_dp ; n = 128 ! rmax = 12.8
! delta = 0.2_dp ; n = 64 ! rmax = 12.8
! delta = 0.4_dp ; n = 32 ! rmax = 12.8
allocate( y( n ), yy( n ) )
! my_plan= fftw_plan_r2r_1d( n, y, yy, FFTW_FORWARD, FFTW_ESTIMATE )
my_plan= fftw_plan_r2r_1d( n, y, yy, FFTW_RODFT00, FFTW_ESTIMATE )
! Loop over r-grid
do i = 1, n
r = i * delta ! (2-a)
y( i )= r * exp( -r**2 )
end do
call fftw_execute_r2r( my_plan, y, yy )
! Loop over k-grid
do i = 1, n
! Result of FFTW
k = i * pi / ((n + 1) * delta) ! (2-b)
fftw = 4 * pi * delta * yy( i ) / k / 2 ! the last 2 due to RODFT00
! Numerical result via quadrature
num = 0
do j = 1, n
r = j * delta
num = num + r * exp( -r**2 ) * sin( k * r )
enddo
num = num * 4 * pi * delta / k
! Analytical result
ana = sqrt( pi )**3 * exp( -k**2 / 4 )
! Output
write(10,*) k, fftw
write(20,*) k, num
write(30,*) k, ana
end do
Compile (with gfortran-8.2 + FFTW3.3.8 + OSX10.11):
$ gfortran -fcheck=all -Wall sine.f90 -I/usr/local/Cellar/fftw/3.3.8/include -L/usr/local/Cellar/fftw/3.3.8/lib -lfftw3
If we use FFTW_FORWARD as in the original code, we get
which has a negative lobe (where fort.10, fort.20, and fort.30 correspond to FFTW, quadrature, and analytical results). Modifying the code to use FFTW_RODFT00 changes the result as below, so the modification seems to be working (but please see below for the grid definition).
Additional notes
I have slightly modified the grid definition for r and k in my code (Lines (2-a) and (2-b)), which is found to improve the accuracy. But I'm still not sure whether the above definition matches the definition used by FFTW, so please read the manual for details...
The fftw3.f03 header file gives the interface for fftw_plan_r2r_1d
type(C_PTR) function fftw_plan_r2r_1d(n,in,out,kind,flags) bind(C, name='fftw_plan_r2r_1d')
import
integer(C_INT), value :: n
real(C_DOUBLE), dimension(*), intent(out) :: in
real(C_DOUBLE), dimension(*), intent(out) :: out
integer(C_FFTW_R2R_KIND), value :: kind
integer(C_INT), value :: flags
end function fftw_plan_r2r_1d
(Because of no Tex support, this part is very ugly...) The integral of 4 pi r^2 * exp(-r^2) * sin(kr)/(kr) for r = 0 -> infinite is pi^(3/2) * exp(-k^2 / 4) (obtained from Wolfram Alpha or by noting that this is actually a 3-D Fourier transform of exp(-(x^2 + y^2 + z^2)) by exp(-i*(k1 x + k2 y + k3 z)) with k =(k1,k2,k3)). So, although a bit counter-intuitive, the result becomes a positive Gaussian.
I guess the r-grid can be chosen much coarser (e.g. delta up to 0.4), which gives almost the same accuracy as long as it covers the frequency domain of the transformed function (here exp(-r^2)).
Of course there are negative components of the real part to the FFT of a limited Gaussian spectrum. You are just using the real part of the transform. So your plot is absolutely correct.
You seem to be mistaking the real part with the magnitude, which of course would not be negative. For that you would need to fftw_plan_dft_r2c_1d and then calculate the absolute values of the complex coefficients. Or you might be mistaking the Fourier transform with a limited DFT.
You might want to check here to convince yourself of the correctness of you calculation above:
http://docs.mantidproject.org/nightly/algorithms/FFT-v1.html
Please do keep in mind that the plots on the above page are shifted, so that the 0 frequency is in the middle of the spectrum.
Citing yourself, the nummeric integration of [r*exp(-(r^2))*sin(kr)]dr would have negative components for all k>1 if normalised to 0 for highest frequency.
TLDR: Your plot is absolute state of the art and inline with discrete and limited functional analysis.

Discrete Fourier Transform seems to be printing incorrect answers?

I am attempting to write a program that calculates the discrete fourier transform of a set of given data. I've sampled a sine wave, so my set is (pi/2,2*pi,3*pi/2,2*pi). Here is my program:
program DFT
implicit none
integer :: k, N, x, y, j, r, l
integer, parameter :: dp = selected_real_kind(15,300)
real, allocatable,dimension(:) :: h, rst
integer, dimension(:,:), allocatable :: W
real(kind=dp) :: pi
open(unit=100, file="dft.dat",status='replace')
N = 4
allocate(h(N))
allocate(rst(N))
allocate(W(-N/2:N/2,1:N))
pi = 3.14159265359
do k=1,N
h(k) = k*(pi*0.5)
end do
do j = -N/2,N/2
do k = 1, N
W(j,k) = EXP((2.0_dp*pi*cmplx(0.0_dp,1.0_dp)*j*k)/N)
end do
end do
rst = matmul(W,h)
!print *, h, w
write(100,*) rst
end program
And this prints out the array rst as:
0.00000000 0.00000000 15.7079639 0.00000000 0.00000000
Using an online calculator, the results should be:
15.7+0j -3.14+3.14j -3.14+0j -3.14-3.14j
I'm not sure why rst is 1 entry too long either.
Can anyone spot why it's printing out 0 for 3/4 of the results? I notice that 15.7 appears in both the actual answers and my result.
Thank you
Even though the question has been answered and accepted, the program given has so many problems that I had to say...
The input given is not a sine wave, it's a linear function of time. Kind of like a 1-based ramp input.
For DFTs the indices normally are considered to go from 0:N-1, not 1:N.
For W the Nyquist frequency is represented twice, as -N/2 and N/2. Again it would have been normal to number the rows 0:N-1, BTW, this is why you have an extra output in your rst vector.
pi is double precision but only initialized to 12 significant figures. It's hard to tell if there's a typo in your value of pi which is why many would use 4*atan(1.0_dp) or acos(-1.0_dp).
Notice that h(N) is actually going to end up as the zero time input, which is one reason the whole world indices DFT vectors from zero.
The expression cmplx(0.0_dp,1.0_dp) is sort of futile because the CMPLX intrinsic always returns a single precision result if the third optional KIND= argument is not present. As a complex literal, (0.0_dp,1.0_dp) would be double precision. However, you could as well use (0,1) because it's exactly representable in single precision and would be converted to double precision when it gets multiplied by the growing product on its left. Also 2.0_dp could have been represented successfully as 2 with less clutter.
The expression EXP((2.0_dp*pi*cmplx(0.0_dp,1.0_dp)*j*k)/N) is appropriate for inverse DFT, disregarding normalization. Thus I would have written the whole thing more cleanly and correctly as EXP(-2*pi*(0,1)*j*k/N). Then the output should have been directly comparable to what the online calculator printed out.
Fortran does complex numbers for you but you must declare the appropriate variables as complex. Try
complex, allocatable,dimension(:) :: rst
complex, dimension(:,:), allocatable :: W