Fortran - avoid casting in a loop - casting

This problem always seems to come to me so I would like to know a better solution than what I currently do.
I want to build an array up using loops where the iteration parameter is used in the calculation. The problem is that integers in the array index work fine, but lead to errors when used in calculations. Perhaps an example of the problem would be useful.
DO ii = 1,N
ir = real(ii,kind=dp) ! Cast this to real
DO ji=1,N
jr = real(ji,kind=dp) ! Cast this to real.
IF (abs(sqrt((ir-1)**2 + (jr-1)**2)) < Lim) THEN
A(ii,ji) = 1;
ELSEIF (abs(sqrt((ir-1)**2 + (jr-N)**2)) < Lim) THEN
A(ii,ji) = 1;
ELSEIF (abs(sqrt((ir-N)**2 + (jr-1)**2)) < Lim) THEN
A(ii,ji) = 1;
ELSEIF (abs(sqrt((ir-N)**2 + (jr-N)**2)) < Lim) THEN
A(ii,ji) = 1;
END IF
END DO
END DO
Is there a way to do this without casting to real every time through the loops? I have a feeling this is a bottleneck.

I doubt that casting is a bottleneck, but you could try actually using reals and incrementing in the loops:
ri = 1.0
do ii=1,n
rj = 1.0
do jj=1,n
..math...
rj = rj + 1.0
enddo
ri = ri + 1.0
enddo

Related

if statement to determine steady-state

My code below correctly solves a 1D heat equation for a function u(x,t). I now want to find the steady-state solution, the solution that no longer changes in time so it should satisfy u(t+1)-u(t) = 0. What is the most efficient way to find the steady-state solution? I show three different attempts below, but I'm not sure if either are actually doing what I want. The first and third have correct syntax, the second method has a syntax error due to the if statement. Each method is different due to the change in the if structure.
Method 1 :
program parabolic1
integer, parameter :: n = 10, m = 20
real, parameter :: h = 0.1, k = 0.005 !step sizes
real, dimension (0:n) :: u,v
integer:: i,j
real::pi,pi2
u(0) = 0.0; v(0) = 0.0; u(n) = 0.0; v(n) =0.0
pi = 4.0*atan(1.0)
pi2 = pi*pi
do i=1, n-1
u(i) = sin( pi*real(i)*h)
end do
do j = 1,m
do i = 1, n-1
v(i) = 0.5*(u(i-1)+u(i+1))
end do
t = real(j)*k !increment in time, now check for steady-state
!steady-state check: this checks the solutions at every space point which I don't think is correct.
do i = 1,n-1
if ( v(i) - u(i) .LT. 1.0e-7 ) then
print*, 'steady-state condition reached'
exit
end if
end do
do i = 1, n-1 !updating solution
u(i) = v(i)
end do
end do
end program parabolic1
Method 2 :
program parabolic1
integer, parameter :: n = 10, m = 20
real, parameter :: h = 0.1, k = 0.005 !step sizes
real, dimension (0:n) :: u,v
integer:: i,j
real::pi,pi2
u(0) = 0.0; v(0) = 0.0; u(n) = 0.0; v(n) =0.0
pi = 4.0*atan(1.0)
pi2 = pi*pi
do i=1, n-1
u(i) = sin( pi*real(i)*h)
end do
do j = 1,m
do i = 1, n-1
v(i) = 0.5*(u(i-1)+u(i+1))
end do
t = real(j)*k !increment in time, now check for steady-state
!steady-state check: (This gives an error message since the if statement doesn't have a logical scalar expression, but I want to compare the full arrays v and u as shown.
if ( v - u .LT. 1.0e-7 ) then
print*, 'steady-state condition reached'
exit
end if
do i = 1, n-1 !updating solution
u(i) = v(i)
end do
end do
end program parabolic1
Method 3 :
program parabolic1
integer, parameter :: n = 10, m = 20
real, parameter :: h = 0.1, k = 0.005 !step sizes
real, dimension (0:n) :: u,v
integer:: i,j
real::pi,pi2
u(0) = 0.0; v(0) = 0.0; u(n) = 0.0; v(n) =0.0
pi = 4.0*atan(1.0)
pi2 = pi*pi
do i=1, n-1
u(i) = sin( pi*real(i)*h)
end do
do j = 1,m
do i = 1, n-1
v(i) = 0.5*(u(i-1)+u(i+1))
end do
t = real(j)*k !increment in time, now check for steady-state
!steady-state check: Perhaps this is the correct expression I want to use
if( norm2(v) - norm2(u) .LT. 1.0e-7 ) then
print*, 'steady-state condition reached'
exit
end if
do i = 1, n-1 !updating solution
u(i) = v(i)
end do
end do
end program parabolic1
Without discussing which method to determine "closeness" is best or correct (not really being a programming problem) we can focus on what the Fortran parts of the methods are doing.
Method 1 and Method 2 are similar ideas (but broken in their execution), while Method 3 is different (and broken in another way).
Note also that in general one wants to compare the magnitude of the difference abs(v-u) rather than the (signed) difference v-u. With non-monotonic changes over iterations these are quite different.
Method 3 uses norm2(v) - norm2(u) to test whether the arrays u and v are similar. This isn't correct. Consider
norm2([1.,0.])-norm2([0.,1.])
instead of the more correct
norm2([1.,0.]-[0.,1.])
Method 2's
if ( v - u .LT. 1.0e-7 ) then
has the problem of being an invalid array expression, but the "are all points close?" can be written appropriately as
if ( ALL( v - u .LT. 1.0e-7 )) then
(You'll find other questions around here about such array reductions).
Method 1 tries something similar, but incorrectly:
do i = 1,n-1
if ( v(i) - u(i) .LT. 1.0e-7 ) then
print*, 'steady-state condition reached'
exit
end if
end do
This is incorrect in one big way, and one subtle way.
First, the loop is exited when the condition tests true the first time, with a message saying the steady state is reached. This is incorrect: you need all values close, while this is testing for any value close.
Second, when the condition is met, you exit. But you don't exit the time iteration loop, you exit the closeness testing loop. (exit without a construct name leaves the innermost do construct). You'll be in exactly the same situation, running again immediately after this innermost construct whether the tested condition is ever or never met (if ever met you'll get the message also). You will need to use a construct name on the time loop.
I won't show how to do that (again there are other questions here about that), because you also need to fix the test condition, by which point you'll be better off using if(all(... (corrected Method 2) without that additional do construct.
For Methods 1 and 2 you'll have something like:
if (all(v-u .lt 1e-7)) then
print *, "Converged"
exit
end if
And for Method 3:
if (norm2(v-u) .lt. 1e-7) then
print *, "Converged"
exit
end if

Fortran OpenMP code much slower than its not parallel version

I want to solve the Random Walk problem, so i wrote a fortran sequental code and now i need to parallel this code.
subroutine random_walk(walkers)
implicit none
include "omp_lib.h"
integer :: i, j, col, row, walkers,m,n,iter
real, dimension(:, :), allocatable :: matrix, res
real :: point, z
col = 12
row = 12
allocate (matrix(row, col), res(row, col))
! Read from file
open(2, file='matrix.txt')
do i = 1, row
read(2, *)(matrix(i, j), j=1,col)
end do
res = matrix
! Solve task
!$omp parallel private(i,j,m,n,point,iter)
!$omp do collapse(2)
do i= 2, 11
do j=2, 11
m = i
n = j
iter = 1
point = 0
do while (iter <= walkers)
call random_number(z)
if (z <= 0.25) m = m - 1
if (z > 0.25 .and. z <= 0.5) n = n +1
if (z > 0.5 .and. z <= 0.75) m = m +1
if (z > 0.75) n = n - 1
if (m == 1 .or. m == 12 .or. n == 1 .or. n == 12) then
point = point + matrix(m, n)
m = i
n = j
iter = iter + 1
end if
end do
point = point / walkers
res(i, j) = point
end do
end do
!$omp end do
!$omp end parallel
! Write to file
open(2, file='out_omp.txt')
do i = 1, row
write(2, *)(res(i, j), j=1,col)
end do
contains
end
So, the problem is that parallel program computes MUCH lesser than its sequential version.
Where is the mistake?(except my terrible code)
Update: for now the code is with !$omp do directives, but the result is still the same: it is much lesser than its sequential version.
Most probably, the behavior is related to the random number extraction. RANDOM_NUMBER Fortran procedure is not even guaranteed to be thread-safe but it is thread-safe at least in GNU compiler thanks to a GNU extension. But in any case the performances seem to be very bad as you note.
If you switch to a different thread-safe random number generator, the scalability of your code can be good. I used the classical ran2.f generator:
http://www-star.st-and.ac.uk/~kw25/research/montecarlo/ran2.f
modified to make it thread-safe. If I am not wrong, to do that:
in the calling unit declare and define:
integer :: iv(32), iy, idum2, idum
idum2 = 123456789 ; iv(:) = 0 ; iy = 0
in OpenMP directives add idum as private and idum2, iv, iy as firstprivate (by the way you need to add z as private too)
in the parallel section add (before do)
idum = - omp_get_thread_num()
to have different random numbers for different threads
from ran2 function remove DATA and SAVE lines e pass idum2, iv, iy as arguments:
FUNCTION ran2(idum, iv, iy, idum2)
call ran2 instead of random_number intrinsic
z = ran2(idum, iv, iy, idum2)
With walkers=100000 (GNU compiler) these are my times:
1 thread => 4.7s
2 threads => 2.4s
4 threads => 1.5s
8 threads => 0.78s
16 threads => 0.49s
Not strictly related to the question but I have to say that extracting a real number for each 4 "bit"s info you need (+1 or -1) and the usage of conditionals can be probably changed using a more efficient strategy.

Newton Raphson iteration - unable to iterate

I am not sure this question is on topic here or elsewhere (or not on topic at all anywhere).
I have inherited Fortran 90 code that does Newton Raphson interpolation where logarithm of temperature is interpolated against logarithm of pressure.
The interpolation is of the type
t = a ln(p) + b
and where a, b are defined as
a = ln(tup/tdwn)/(alogpu - alogpd)
and
b = ln T - a * ln P
Here is the test program. It is shown only for a single iteration. But the actual program runs over three FOR loops over k,j and i. In reality pthta is a 3D array(k,j,i) and thta is a 1D array (k)
program test
implicit none
integer,parameter :: dp = SELECTED_REAL_KIND(12,307)
real(kind=dp) kappa,interc,pres,dltdlp,tup,tdwn
real(kind=dp) pthta,alogp,alogpd,alogpu,thta,f,dfdp,p1
real(kind=dp) t1,resid,potdwn,potup,pdwn,pup,epsln,thta1
integer i,j,kout,n,maxit,nmax,resmax
kappa = 2./7.
epsln = 1.
potdwn = 259.39996337890625
potup = 268.41687198359159
pdwn = 100000.00000000000
pup = 92500.000000000000
alogpu = 11.43496392350051
alogpd = 11.512925464970229
thta = 260.00000000000000
alogp = 11.512925464970229
! known temperature at lower level
tdwn = potdwn * (pdwn/100000.)** kappa
! known temperature at upper level
tup = potup *(pup/100000.)** kappa
! linear change of temperature wrt lnP between different levels
dltdlp = dlog(tup/tdwn)/(alogpu-alogpd)
! ln(T) value(intercept) where Pressure is 1 Pa and assume a linear
! relationship between P and T
interc = dlog(tup) - dltdlp*alogpu
! Initial guess value for pressure
pthta = exp((dlog(thta)-interc-kappa*alogp)/(dltdlp-kappa))
n=0
1900 continue
!First guess of temperature at intermediate level
t1 = exp(dltdlp * dlog(pthta)+interc)
!Residual error when calculating Newton Raphson iteration(Pascal)
resid = pthta - 100000.*(t1/thta)**(1./kappa)
print *, dltdlp,interc,t1,resid,pthta
if (abs(resid) .gt. epsln) then
n=n+1
if (n .le. nmax) then
! First guess of potential temperature given T1 and
! pressure level guess
thta1 = t1 * (100000./pthta)**kappa
f= thta - thta1
dfdp = (kappa-dltdlp)*(100000./pthta)**kappa*exp(interc + (dltdlp -1.)*dlog(pthta))
p1 = pthta - f/dfdp
if (p1 .le. pdwn) then
if (p1 .ge. pup) then
pthta = p1
goto 1900
else
n = nmax
end if
end if
else
if (resid .gt. resmax) resmax = resid
maxit = maxit+1
goto 2100
end if
end if
2100 continue
end program test
When you run this program with real data from a data file the value of resid is the following
2.7648638933897018E-010
and it does not differ much for the entire execution. Most of the values are in the range
1E-10 and 1E-12
So given these values the following IF condition
IF (abs(resid) .gt. epsln)
never gets called and the Newton Raphson iteration never gets executed. So I looked at two ways to get this to work. One is to remove the exponential call in these two steps
pthta = exp((dlog(thta)-interc-kappa*alogp)/(dltdlp-kappa))
t1 = exp(dltdlp * dlog(pthta)+interc)
i.e. keep everything in the logarithmic space and take the exponent after the Newton Raphson iteration completes. That part does converge without a problem.
The other way I tried to make this work is to truncate
t1 = exp(dltdlp * dlog(pthta)+interc)
When I truncate it to an integer the value of resid changes dramatically from
1E-10 to 813. I do not understand how truncating that function call leads to such a large value change. Truncating that result does result to a successful completion.
So I am not sure which is the better way to proceed further.
How can I decide which would be the better way to approach this ?
From a research perspective, I'd say your first solution is likely the more appropriate approach. In a physical simulation, one should always work with the logarithm of the properties that are by-definition always positive. In the above code, these would be temperature and pressure. Strictly positive-definite physical variables often result in overflow and underflow in computation, whether you use Fortran or any other programming language, or any possible variable kind. If something can happen, it will happen.
This is true about other physical quantities as well, for example, energy (the typical energy of a Gamma-Ray-Burst is ~10^54 ergs), volume of objects in arbitrary dimensions (the volume of a 100-dimensional sphere of radius 10meters is ~ 10^100), or even probability (the likelihood function in many statistical problems can take values of ~10^{-1000} or less). Working with log-transform of positive-definite variables would enable your code to handle numbers as big as ~10^10^307 (for a double precision variable).
A few notes also regarding the Fortran syntax used in your code:
The variable RESMAX is used in your code without initialization.
When assigning values to variables, it is important to specify the kind of the literal constants appropriately, otherwise, the program results might be affected. For example, here is the output of your original code compiled with Intel Fortran Compiler 2018 in debug mode:
-0.152581477302743 7.31503025786548 259.608693509165
-3.152934473473579E-002 99474.1999921620
And here is the same code's output, but with all literal constants suffixed with the kind parameter _dp (see the revised version of your code below):
-0.152580456940175 7.31501855886952 259.608692604963
-8.731149137020111E-011 99474.2302854451
The output from the revised code in this answer is slightly different from the output of the original code in the above question.
There is no need to use .gt., .ge., .le., .lt., ..., for comparison. These are legacy FORTRAN syntax, as far as I am aware. Use instead the more attractive symbols ( < , > , <= , >= , == ) for comparison.
There is no necessity to use a GOTO statement in a Fortran program. This is again legacy FORTRAN. Frequently, simple elegant do-loops and if-blocks can replace GOTO statements, just as in the revised code below.
There is no need to use kind-specific intrinsic functions in Fortran anymore (such as dexp, dlog, ... for double precision). Almost all (and perhaps all) of Fortran intrinsic functions have generic names (exp, log, ...) in the current Fortran standard.
The following is a revision of the program in this question, that resolves all of the above obsolete syntax, as well as the problem of dealing with extremely large or small positive-definite variables (I probably went too far in log-transforming some variables that would never cause overflow or underflow, but my purpose here was to just show the logic behind log-transformation of positive-definite variables and how to deal with their arithmetics without potentially causing overflow/underflow/error_in_results).
program test
implicit none
integer,parameter :: dp = SELECTED_REAL_KIND(12,307)
real(kind=dp) kappa,interc,pres,dltdlp,tup,tdwn
real(kind=dp) pthta,alogp,alogpd,alogpu,thta,f,dfdp,p1
real(kind=dp) t1,resid,potdwn,potup,pdwn,pup,epsln,thta1
integer i,j,kout,n,maxit,nmax,resmax
real(kind=dp) :: log_resmax, log_pthta, log_t1, log_dummy, log_residAbsolute, sign_of_f
real(kind=dp) :: log_epsln, log_pdwn, log_pup, log_thta, log_thta1, log_p1, log_dfdp, log_f
logical :: residIsPositive, resmaxIsPositive, residIsBigger
log_resmax = log(log_resmax)
resmaxIsPositive = .true.
kappa = 2._dp/7._dp
epsln = 1._dp
potdwn = 259.39996337890625_dp
potup = 268.41687198359159_dp
pdwn = 100000.00000000000_dp
pup = 92500.000000000000_dp
alogpu = 11.43496392350051_dp
alogpd = 11.512925464970229_dp
thta = 260.00000000000000_dp
alogp = 11.512925464970229_dp
log_epsln = log(epsln)
log_pup = log(pup)
log_pdwn = log(pdwn)
log_thta = log(thta)
! known temperature at lower level
tdwn = potdwn * (pdwn/1.e5_dp)**kappa
! known temperature at upper level
tup = potup *(pup/1.e5_dp)** kappa
! linear change of temperature wrt lnP between different levels
dltdlp = log(tup/tdwn)/(alogpu-alogpd)
! ln(T) value(intercept) where Pressure is 1 Pa and assume a linear
! relationship between P and T
interc = log(tup) - dltdlp*alogpu
! Initial guess value for pressure
!pthta = exp( (log(thta)-interc-kappa*alogp) / (dltdlp-kappa) )
log_pthta = ( log_thta - interc - kappa*alogp ) / ( dltdlp - kappa )
n=0
MyDoLoop: do
!First guess of temperature at intermediate level
!t1 = exp(dltdlp * log(pthta)+interc)
log_t1 = dltdlp * log_pthta + interc
!Residual error when calculating Newton Raphson iteration(Pascal)
!resid = pthta - 1.e5_dp*(t1/thta)**(1._dp/kappa)
log_dummy = log(1.e5_dp) + ( log_t1 - log_thta ) / kappa
if (log_pthta>=log_dummy) then
residIsPositive = .true.
log_residAbsolute = log_pthta + log( 1._dp - exp(log_dummy-log_pthta) )
else
residIsPositive = .false.
log_residAbsolute = log_dummy + log( 1._dp - exp(log_pthta-log_dummy) )
end if
print *, "log-transformed values:"
print *, dltdlp,interc,log_t1,log_residAbsolute,log_pthta
print *, "non-log-transformed values:"
if (residIsPositive) print *, dltdlp,interc,exp(log_t1),exp(log_residAbsolute),exp(log_pthta)
if (.not.residIsPositive) print *, dltdlp,interc,exp(log_t1),-exp(log_residAbsolute),exp(log_pthta)
!if (abs(resid) > epsln) then
if ( log_residAbsolute > log_epsln ) then
n=n+1
if (n <= nmax) then
! First guess of potential temperature given T1 and
! pressure level guess
!thta1 = t1 * (1.e5_dp/pthta)**kappa
log_thta1 = log_t1 + ( log(1.e5_dp)-log_pthta ) * kappa
!f = thta - thta1
if ( log_thta>=thta1 ) then
log_f = log_thta + log( 1._dp - exp( log_thta1 - log_thta ) )
sign_of_f = 1._dp
else
log_f = log_thta + log( 1._dp - exp( log_thta - log_thta1 ) )
sign_of_f = 1._dp
end if
!dfdp = (kappa-dltdlp)*(1.e5_dp/pthta)**kappa*exp(interc + (dltdlp -1._dp)*log(pthta))
! assuming kappa-dltdlp>0 is TRUE always:
log_dfdp = log(kappa-dltdlp) + kappa*(log(1.e5_dp)-log_pthta) + interc + (dltdlp -1._dp)*log_pthta
!p1 = pthta - f/dfdp
! p1 should be, by definition, positive. Therefore:
log_dummy = log_f - log_dfdp
if (log_pthta>=log_dummy) then
log_p1 = log_pthta + log( 1._dp - sign_of_f*exp(log_dummy-log_pthta) )
else
log_p1 = log_dummy + log( 1._dp - sign_of_f*exp(log_pthta-log_dummy) )
end if
!if (p1 <= pdwn) then
if (log_p1 <= log_pdwn) then
!if (p1 >= pup) then
if (log_p1 >= log_pup) then
log_pthta = log_p1
cycle MyDoLoop
else
n = nmax
end if
end if
else
!if (resid > resmax) resmax = resid
residIsBigger = ( residIsPositive .and. resmaxIsPositive .and. log_residAbsolute>log_resmax ) .or. &
( .not.residIsPositive .and. .not.resmaxIsPositive .and. log_residAbsolute<log_resmax ) .or. &
( residIsPositive .and. .not. resmaxIsPositive )
if ( residIsBigger ) then
log_resmax = log_residAbsolute
resmaxIsPositive = residIsPositive
end if
maxit = maxit+1
end if
end if
exit MyDoLoop
end do MyDoLoop
end program test
Here is a sample output of this program, which agrees well with the output of the original code:
log-transformed values:
-0.152580456940175 7.31501855886952 5.55917546888014
-22.4565579499410 11.5076538974964
non-log-transformed values:
-0.152580456940175 7.31501855886952 259.608692604963
-1.767017293116268E-010 99474.2302854451
For comparison, here is the output from the original code:
-0.152580456940175 7.31501855886952 259.608692604963
-8.731149137020111E-011 99474.2302854451

How to compute a sum of products of array elements in Fortran?

Given A(I,J,K) with I = 1,2,3 and J = 1,2,3, I want to take the following sum
B(K) = c(1)*c(1)*A(1,1,K) + c(1)*c(2)*A(1,2,K) + c(1)*c(3)*A(1,3,K) + &
c(2)*c(1)*A(2,1,K) + c(2)*c(2)*A(2,2,K) + c(2)*c(3)*A(2,3,K) + &
c(3)*c(1)*A(3,1,K) + c(3)*c(2)*A(3,2,K) + c(3)*c(3)*A(3,3,K)
which gets cumbersome for large values of I and J, with c(I) and c(J) being constants for each I or J.
How do I write this code more efficiently? I think that a DO WHILE loop might be the answer, but I'm a beginner and can't figure out how to do it. Could someone please give me a hint?
My attempt:
DO K = 1,100
J = 1.d0
DO WHILE (J .LE. 3)
I = 1.d0
DO WHILE (I .LE. 3)
A(I,J,K) = c(I)*c(J)*A(I,J,K) ! + ???
I = I + 1.d0
END DO
END DO
END DO
Just use a do loop for J and I like you did for K. Accumulate the sum in B(K), which starts at 0.
DO K = 1,100
B(K) = 0
DO J = 1,3
DO I = 1,3
B(K) = B(K) + c(I)*c(J)*A(I,J,K)
END DO
END DO
END DO

Implementation of the Discrete Fourier Transform - FFT

I am trying to do a project in sound processing and need to put the frequencies into another domain. Now, I have tried to implement an FFT, that didn't go well. I tried to understand the z-transform, that didn't go to well either. I read up and found DFT's a lot more simple to understand, especially the algorithm. So I coded the algorithm using examples but I do not know or think the output is right. (I don't have Matlab on here, and cannot find any resources to test it) and wondered if you guys knew if I was going in the right direction. Here is my code so far:
#include <iostream>
#include <complex>
#include <vector>
using namespace std;
const double PI = 3.141592;
vector< complex<double> > DFT(vector< complex<double> >& theData)
{
// Define the Size of the read in vector
const int S = theData.size();
// Initalise new vector with size of S
vector< complex<double> > out(S, 0);
for(unsigned i=0; (i < S); i++)
{
out[i] = complex<double>(0.0, 0.0);
for(unsigned j=0; (j < S); j++)
{
out[i] += theData[j] * polar<double>(1.0, - 2 * PI * i * j / S);
}
}
return out;
}
int main(int argc, char *argv[]) {
vector< complex<double> > numbers;
numbers.push_back(102023);
numbers.push_back(102023);
numbers.push_back(102023);
numbers.push_back(102023);
vector< complex<double> > testing = DFT(numbers);
for(unsigned i=0; (i < testing.size()); i++)
{
cout << testing[i] << endl;
}
}
The inputs are:
102023 102023
102023 102023
And the result:
(408092, 0)
(-0.0666812, -0.0666812)
(1.30764e-07, -0.133362)
(0.200044, -0.200043)
Any help or advice would be great, I'm not expecting a lot, but, anything would be great. Thank you :)
#Phorce is right here. I don't think there is any reson to reinvent the wheel. However, if you want to do this so that you understand the methodology and to have the joy of coding it yourself I can provide a FORTRAN FFT code that I developed some years ago. Of course this is not C++ and will require a translation; this should not be too difficult and should enable you to learn a lot in doing so...
Below is a Radix 4 based algorithm; this radix-4 FFT recursively partitions a DFT into four quarter-length DFTs of groups of every fourth time sample. The outputs of these shorter FFTs are reused to compute many outputs, thus greatly reducing the total computational cost. The radix-4 decimation-in-frequency FFT groups every fourth output sample into shorter-length DFTs to save computations. The radix-4 FFTs require only 75% as many complex multiplies as the radix-2 FFTs. See here for more information.
!+ FILE: RADIX4.FOR
! ===================================================================
! Discription: Radix 4 is a descreet complex Fourier transform algorithim. It
! is to be supplied with two real arrays, one for real parts of function
! one for imaginary parts: It can also unscramble transformed arrays.
! Usage: calling FASTF(XREAL,XIMAG,ISIZE,ITYPE,IFAULT); we supply the
! following:
!
! XREAL - array containing real parts of transform sequence
! XIMAG - array containing imagianry parts of transformation sequence
! ISIZE - size of transform (ISIZE = 4*2*M)
! ITYPE - +1 forward transform
! -1 reverse transform
! IFAULT - 1 if error
! - 0 otherwise
! ===================================================================
!
! Forward transform computes:
! X(k) = sum_{j=0}^{isize-1} x(j)*exp(-2ijk*pi/isize)
! Backward computes:
! x(j) = (1/isize) sum_{k=0}^{isize-1} X(k)*exp(ijk*pi/isize)
!
! Forward followed by backwards will result in the origonal sequence!
!
! ===================================================================
SUBROUTINE FASTF(XREAL,XIMAG,ISIZE,ITYPE,IFAULT)
REAL*8 XREAL(*),XIMAG(*)
INTEGER MAX2,II,IPOW
PARAMETER (MAX2 = 20)
! Check for valid transform size upto 2**(max2):
IFAULT = 1
IF(ISIZE.LT.4) THEN
print*,'FFT: Error: Data array < 4 - Too small!'
return
ENDIF
II = 4
IPOW = 2
! Prepare mod 2:
1 IF((II-ISIZE).NE.0) THEN
II = II*2
IPOW = IPOW + 1
IF(IPOW.GT.MAX2) THEN
print*,'FFT: Error: FFT1!'
return
ENDIF
GOTO 1
ENDIF
! Check for correct type:
IF(IABS(ITYPE).NE.1) THEN
print*,'FFT: Error: Wrong type of transformation!'
return
ENDIF
! No entry errors - continue:
IFAULT = 0
! call FASTG to preform transformation:
CALL FASTG(XREAL,XIMAG,ISIZE,ITYPE)
! Due to Radix 4 factorisation results are not in the same order
! after transformation as they were when the data was submitted:
! We now call SCRAM, to unscramble the reults:
CALL SCRAM(XREAL,XIMAG,ISIZE,IPOW)
return
END
!-END: RADIX4.FOR
! ===============================================================
! Discription: This is the radix 4 complex descreet fast Fourier
! transform with out unscrabling. Suitable for convolutions or other
! applications that do not require unscrambling. Designed for use
! with FASTF.FOR.
!
SUBROUTINE FASTG(XREAL,XIMAG,N,ITYPE)
INTEGER N,IFACA,IFCAB,LITLA
INTEGER I0,I1,I2,I3
REAL*8 XREAL(*),XIMAG(*),BCOS,BSIN,CW1,CW2,PI
REAL*8 SW1,SW2,SW3,TEMPR,X1,X2,X3,XS0,XS1,XS2,XS3
REAL*8 Y1,Y2,Y3,YS0,YS1,YS2,YS3,Z,ZATAN,ZFLOAT,ZSIN
ZATAN(Z) = ATAN(Z)
ZFLOAT(K) = FLOAT(K) ! Real equivalent of K.
ZSIN(Z) = SIN(Z)
PI = (4.0)*ZATAN(1.0)
IFACA = N/4
! Forward transform:
IF(ITYPE.GT.0) THEN
GOTO 5
ENDIF
! If this is for an inverse transform - conjugate the data:
DO 4, K = 1,N
XIMAG(K) = -XIMAG(K)
4 CONTINUE
5 IFCAB = IFACA*4
! Proform appropriate transformations:
Z = PI/ZFLOAT(IFCAB)
BCOS = -2.0*ZSIN(Z)**2
BSIN = ZSIN(2.0*Z)
CW1 = 1.0
SW1 = 0.0
! This is the main body of radix 4 calculations:
DO 10, LITLA = 1,IFACA
DO 8, I0 = LITLA,N,IFCAB
I1 = I0 + IFACA
I2 = I1 + IFACA
I3 = I2 + IFACA
XS0 = XREAL(I0) + XREAL(I2)
XS1 = XREAL(I0) - XREAL(I2)
YS0 = XIMAG(I0) + XIMAG(I2)
YS1 = XIMAG(I0) - XIMAG(I2)
XS2 = XREAL(I1) + XREAL(I3)
XS3 = XREAL(I1) - XREAL(I3)
YS2 = XIMAG(I1) + XIMAG(I3)
YS3 = XIMAG(I1) - XIMAG(I3)
XREAL(I0) = XS0 + XS2
XIMAG(I0) = YS0 + YS2
X1 = XS1 + YS3
Y1 = YS1 - XS3
X2 = XS0 - XS2
Y2 = YS0 - YS2
X3 = XS1 - YS3
Y3 = YS1 + XS3
IF(LITLA.GT.1) THEN
GOTO 7
ENDIF
XREAL(I2) = X1
XIMAG(I2) = Y1
XREAL(I1) = X2
XIMAG(I1) = Y2
XREAL(I3) = X3
XIMAG(I3) = Y3
GOTO 8
! Now IF required - we multiply by twiddle factors:
7 XREAL(I2) = X1*CW1 + Y1*SW1
XIMAG(I2) = Y1*CW1 - X1*SW1
XREAL(I1) = X2*CW2 + Y2*SW2
XIMAG(I1) = Y2*CW2 - X2*SW2
XREAL(I3) = X3*CW3 + Y3*SW3
XIMAG(I3) = Y3*CW3 - X3*SW3
8 CONTINUE
IF(LITLA.EQ.IFACA) THEN
GOTO 10
ENDIF
! Calculate a new set of twiddle factors:
Z = CW1*BCOS - SW1*BSIN + CW1
SW1 = BCOS*SW1 + BSIN*CW1 + SW1
TEMPR = 1.5 - 0.5*(Z*Z + SW1*SW1)
CW1 = Z*TEMPR
SW1 = SW1*TEMPR
CW2 = CW1*CW1 - SW1*SW1
SW2 = 2.0*CW1*SW1
CW3 = CW1*CW2 - SW1*SW2
SW3 = CW1*SW2 + CW2*SW1
10 CONTINUE
IF(IFACA.LE.1) THEN
GOTO 14
ENDIF
! Set up tranform split for next stage:
IFACA = IFACA/4
IF(IFACA.GT.0) THEN
GOTO 5
ENDIF
! This is the calculation of a radix two-stage:
DO 13, K = 1,N,2
TEMPR = XREAL(K) + XREAL(K + 1)
XREAL(K + 1) = XREAL(K) - XREAL(K + 1)
XREAL(K) = TEMPR
TEMPR = XIMAG(K) + XIMAG(K + 1)
XIMAG(K + 1) = XIMAG(K) - XIMAG(K + 1)
XIMAG(K) = TEMPR
13 CONTINUE
14 IF(ITYPE.GT.0) THEN
GOTO 17
ENDIF
! For the inverse case, cojugate and scale the transform:
Z = 1.0/ZFLOAT(N)
DO 16, K = 1,N
XIMAG(K) = -XIMAG(K)*Z
XREAL(K) = XREAL(K)*Z
16 CONTINUE
17 return
END
! ----------------------------------------------------------
!-END of subroutine FASTG.FOR.
! ----------------------------------------------------------
!+ FILE: SCRAM.FOR
! ==========================================================
! Discription: Subroutine for unscrambiling FFT data:
! ==========================================================
SUBROUTINE SCRAM(XREAL,XIMAG,N,IPOW)
INTEGER L(19),II,J1,J2,J3,J4,J5,J6,J7,J8,J9,J10,J11,J12
INTEGER J13,J14,J15,J16,J17,J18,J19,J20,ITOP,I
REAL*8 XREAL(*),XIMAG(*),TEMPR
EQUIVALENCE (L1,L(1)),(L2,L(2)),(L3,L(3)),(L4,L(4))
EQUIVALENCE (L5,L(5)),(L6,L(6)),(L7,L(7)),(L8,L(8))
EQUIVALENCE (L9,L(9)),(L10,L(10)),(L11,L(11)),(L12,L(12))
EQUIVALENCE (L13,L(13)),(L14,L(14)),(L15,L(15)),(L16,L(16))
EQUIVALENCE (L17,L(17)),(L18,L(18)),(L19,L(19))
II = 1
ITOP = 2**(IPOW - 1)
I = 20 - IPOW
DO 5, K = 1,I
L(K) = II
5 CONTINUE
L0 = II
I = I + 1
DO 6, K = I,19
II = II*2
L(K) = II
6 CONTINUE
II = 0
DO 9, J1 = 1,L1,L0
DO 9, J2 = J1,L2,L1
DO 9, J3 = J2,L3,L2
DO 9, J4 = J3,L4,L3
DO 9, J5 = J4,L5,L4
DO 9, J6 = J5,L6,L5
DO 9, J7 = J6,L7,L6
DO 9, J8 = J7,L8,L7
DO 9, J9 = J8,L9,L8
DO 9, J10 = J9,L10,L9
DO 9, J11 = J10,L11,L10
DO 9, J12 = J11,L12,L11
DO 9, J13 = J12,L13,L12
DO 9, J14 = J13,L14,L13
DO 9, J15 = J14,L15,L14
DO 9, J16 = J15,L16,L15
DO 9, J17 = J16,L17,L16
DO 9, J18 = J17,L18,L17
DO 9, J19 = J18,L19,L18
J20 = J19
DO 9, I = 1,2
II = II +1
IF(II.GE.J20) THEN
GOTO 8
ENDIF
! J20 is the bit reverse of II!
! Pairwise exchange:
TEMPR = XREAL(II)
XREAL(II) = XREAL(J20)
XREAL(J20) = TEMPR
TEMPR = XIMAG(II)
XIMAG(II) = XIMAG(J20)
XIMAG(J20) = TEMPR
8 J20 = J20 + ITOP
9 CONTINUE
return
END
! -------------------------------------------------------------------
!-END:
! -------------------------------------------------------------------
Going through this and understanding it will take time! I wrote this using a CalTech paper I found years ago, I cannot recall the reference I am afraid. Good luck.
I hope this helps.
Your code works.
I would give more digits for PI ( 3.1415926535898 ).
Also, you have to devide the output of the DFT summation by S, the DFT size.
Since the input series in your test is constant, the DFT output should have only one non-zero coefficient.
And indeed all the output coefficients are very small relative to the first one.
But for a large input length, this is not an efficient way of implementing the DFT.
If timing is a concern, look into the Fast Fourrier Transform for faster methods to calculate the DFT.
Your code looks right to me. I'm not sure what you were expecting for output but, given that your input is a constant value, the DFT of a constant is a DC term in bin 0 and zeroes in the remaining bins (or a close equivalent, which you have).
You might try testing you code with a longer sequence containing some type of waveform like a sine wave or a square wave. In general, however, you should consider using something like fftw in production code. Its been wrung out and highly optimized by many people for a long time. FFTs are optimized DFTs for special cases (e.g., lengths that are powers of 2).
Your code looks okey. out[0] should represent the "DC" component of your input waveform. In your case, it is 4 times bigger than the input waveform, because your normalization coefficient is 1.
The other coefficients should represent the amplitude and phase of your input waveform. The coefficients are mirrored, i.e., out[i] == out[N-i]. You can test this with the following code:
double frequency = 1; /* use other values like 2, 3, 4 etc. */
for (int i = 0; i < 16; i++)
numbers.push_back(sin((double)i / 16 * frequency * 2 * PI));
For frequency = 1, this gives:
(6.53592e-07,0)
(6.53592e-07,-8)
(6.53592e-07,1.75661e-07)
(6.53591e-07,2.70728e-07)
(6.5359e-07,3.75466e-07)
(6.5359e-07,4.95006e-07)
(6.53588e-07,6.36767e-07)
(6.53587e-07,8.12183e-07)
(6.53584e-07,1.04006e-06)
(6.53581e-07,1.35364e-06)
(6.53576e-07,1.81691e-06)
(6.53568e-07,2.56792e-06)
(6.53553e-07,3.95615e-06)
(6.53519e-07,7.1238e-06)
(6.53402e-07,1.82855e-05)
(-8.30058e-05,7.99999)
which seems correct to me: negligible DC, amplitude 8 for 1st harmonics, negligible amplitudes for other harmonics.
MoonKnight has already provided a radix-4 Decimation In Frequency Cooley-Tukey scheme in Fortran. I'm below providing a radix-2 Decimation In Frequency Cooley-Tukey scheme in Matlab.
The code is an iterative one and considers the scheme in the following figure:
A recursive approach is also possible.
As you will see, the implementation calculates also the number of performed multiplications and additions and compares it with the theoretical calculations reported in How many FLOPS for FFT?.
The code is obviously much slower than the highly optimized FFTW exploited by Matlab.
Note also that the twiddle factors omegaa^((2^(p - 1) * n)) can be calculated off-line and then restored from a lookup table, but this point is skipped in the code below.
For a Matlab implementation of an iterative radix-2 Decimation In Time Cooley-Tukey scheme, please see Implementing a Fast Fourier Transform for Option Pricing.
% --- Radix-2 Decimation In Frequency - Iterative approach
clear all
close all
clc
N = 32;
x = randn(1, N);
xoriginal = x;
xhat = zeros(1, N);
numStages = log2(N);
omegaa = exp(-1i * 2 * pi / N);
mulCount = 0;
sumCount = 0;
tic
M = N / 2;
for p = 1 : numStages;
for index = 0 : (N / (2^(p - 1))) : (N - 1);
for n = 0 : M - 1;
a = x(n + index + 1) + x(n + index + M + 1);
b = (x(n + index + 1) - x(n + index + M + 1)) .* omegaa^((2^(p - 1) * n));
x(n + 1 + index) = a;
x(n + M + 1 + index) = b;
mulCount = mulCount + 4;
sumCount = sumCount + 6;
end;
end;
M = M / 2;
end
xhat = bitrevorder(x);
timeCooleyTukey = toc;
tic
xhatcheck = fft(xoriginal);
timeFFTW = toc;
rms = 100 * sqrt(sum(sum(abs(xhat - xhatcheck).^2)) / sum(sum(abs(xhat).^2)));
fprintf('Time Cooley-Tukey = %f; \t Time FFTW = %f\n\n', timeCooleyTukey, timeFFTW);
fprintf('Theoretical multiplications count \t = %i; \t Actual multiplications count \t = %i\n', ...
2 * N * log2(N), mulCount);
fprintf('Theoretical additions count \t\t = %i; \t Actual additions count \t\t = %i\n\n', ...
3 * N * log2(N), sumCount);
fprintf('Root mean square with FFTW implementation = %.10e\n', rms);
Your code is correct to obtain the DFT.
The function you are testing is (sin ((double) i / points * frequency * 2) which corresponds to a synoid of amplitude 1, frequency 1 and sampling frequency Fs = number of points taken.
Operating with the obtained data we have:
As you can see, the DFT coefficients are symmetric with respect to the position coefficient N / 2, so only the first N / 2 provide information. The amplitude obtained by means of the module of the real and imaginary part must be divided by N and multiplied by 2 to reconstruct it. The frequencies of the coefficients will be multiples of Fs / N by the coefficient number.
If we introduce two sinusoids, one of frequency 2 and amplitude 1.3 and another of frequency 3 and amplitude 1.7.
for (int i = 0; i < 16; i++)
{
numbers.push_back(1.3 *sin((double)i / 16 * frequency1 * 2 * PI)+ 1.7 *
sin((double)i / 16 * frequency2 * 2 * PI));
}
The obtained data are:
Good luck.