How to compute the magnitude of each complex number in an array? - fortran

I'm attempting to test a program that calculates the discrete Fourier transform of a signal, namely a sine wave. To test it, I need to plot my results. However, the result is an array of size N (currently at 400) and is filled with complex numbers of the form z = x + iy. So I know that to test it I need to plot these results, and that to do this I need to plot |z|. Here's my program:
program DFT
implicit none
integer :: k, N, x, y, j, r, l, istat, p
integer, parameter :: dp = selected_real_kind(15,300)
real, allocatable,dimension(:) :: h
complex, allocatable, dimension(:) :: rst
complex, dimension(:,:), allocatable :: W
real(kind=dp) :: pi
p = 2*pi
!open file to write results to
open(unit=100, file="dft.dat", status='replace')
N = 400
!allocate arrays as length N, apart from W (NxN)
allocate(h(N))
allocate(rst(N))
allocate(W(-N/2:N/2,1:N))
pi = 3.14159265359
!loop to create the sample containing array
do k=1,N
h(k) = sin((2*pi*k)/N)
end do
!loop to fill the product matrix with values
do j = -N/2,N/2
do k = 1, N
W(j,k) = EXP((2.0_dp*pi*cmplx(0.0_dp,1.0_dp)*j*k)/N)
end do
end do
!use of matmul command to multiply matrices
rst = matmul(W,h)
print *, h, w
write(100,*) rst
end program
So my question is how do I take the magnitude of all the individual complex numbers in the array?

The ABS intrinsic function returns the magnitude of a complex number in Fortran. It is an elemental function as well, so for an array of type complex simply ABS( array ) will return a real array with the same kind as the original containing the results you want.

Related

Computing the Jacobian matrix in Fortran

In Newton's method, to solve a nonlinear system of equations we need to find the Jacobian matrix and the determinant of the inverse of the Jacobian matrix.
Here are my component functions,
real function f1(x,y)
parameter (pi = 3.141592653589793)
f1 = log(abs(x-y**2)) - sin(x*y) - sin(pi)
end function f1
real function f2(x,y)
f2 = exp(x*y) + cos(x-y) - 2
end function f2
For the 2x2 case I am computing the Jacobian matrix and determinant of the inverse of Jacobian matrix like this,
x = [2,2]
h = 0.00001
.
.
! calculate approximate partial derivative
! you can make it more accurate by reducing the
! value of h
j11 = (f1(x(1)+h,x(2))-f1(x(1),x(2)))/h
j12 = (f1(x(1),x(2)+h)-f1(x(1),x(2)))/h
j21 = (f2(x(1)+h,x(2))-f2(x(1),x(2)))/h
j22 = (f2(x(1),x(2)+h)-f2(x(1),x(2)))/h
! calculate the Jacobian
J(1,:) = [j11,j12]
J(2,:) = [j21,j22]
! calculate inverse Jacobian
inv_J(1,:) = [J(2,2),-J(1,2)]
inv_J(2,:) = [-J(2,1),J(1,1)]
DET=J(1,1)*J(2,2) - J(1,2)*J(2,1)
inv_J = inv_J/DET
.
.
How do I in Fortran extend this to evaluate a Jacobian for m functions evaluated at n points?
Here is a more flexible jacobian calculator.
The results with the 2×2 test case are what you expect
arguments (x)
2.00000000000000
2.00000000000000
values (y)
1.44994967586787
53.5981500331442
Jacobian
0.807287239448229 3.30728724371454
109.196300248300 109.196300248300
I check the results against a symbolic calculation for the given inputs of
Console.f90
program Console1
use ISO_FORTRAN_ENV
implicit none
! Variables
integer, parameter :: wp = real64
real(wp), parameter :: pi = 3.141592653589793d0
! Interfaces
interface
function fun(x,n,m) result(y)
import
integer, intent(in) :: n,m
real(wp), intent(in) :: x(m)
real(wp) :: y(n)
end function
end interface
real(wp) :: h
real(wp), allocatable :: x(:), y(:), J(:,:)
! Body of Console1
x = [2d0, 2d0]
h = 0.0001d0
print *, "arguments"
print *, x(1)
print *, x(2)
y = test(x,2,2)
print *, "values"
print *, y(1)
print *, y(2)
J = jacobian(test,x,2,h)
print *, "Jacobian"
print *, J(1,:)
print *, J(2,:)
contains
function test(x,n,m) result(y)
! Test case per original question
integer, intent(in) :: n,m
real(wp), intent(in) :: x(m)
real(wp) :: y(n)
y(1) = log(abs(x(1)-x(2)**2)) - sin(x(1)*x(2)) - sin(pi)
y(2) = exp(x(1)*x(2)) + cos(x(1)-x(2)) - 2
end function
function jacobian(f,x,n,h) result(u)
procedure(fun), pointer, intent(in) :: f
real(wp), allocatable, intent(in) :: x(:)
integer, intent(in) :: n
real(wp), intent(in) :: h
real(wp), allocatable :: u(:,:)
integer :: j, m
real(wp), allocatable :: y1(:), y2(:), e(:)
m = size(x)
allocate(u(n,m))
do j=1, m
e = element(j, m) ! Get kronecker delta for j-th value
y1 = f(x-e*h/2,n,m)
y2 = f(x+e*h/2,n,m)
u(:,j) = (y2-y1)/h ! Finite difference for each column
end do
end function
function element(i,n) result(e)
! Kronecker delta vector. All zeros, except the i-th value.
integer, intent(in) :: i, n
real(wp) :: e(n)
e(:) = 0d0
e(i) = 1d0
end function
end program Console1
I will answer about evaluation in different points. This is quite simple. You just need an array of points, and if the points are in some regular grid, you may not even need that.
You may have an array of xs and array of ys or you can have an array of derived datatype with x and y components.
For the former:
real, allocatable :: x(:), y(:)
x = [... !probably read from some data file
y = [...
do i = 1, size(x)
J(i) = Jacobian(f, x(i), y(i))
end do
If you want to have many functions at the same time, the problem is always in representing functions. Even if you have an array of function pointers, you need to code them manually. A different approach is to have a full algebra module, where you enter some string representing a function and you can evaluate such function and even compute derivatives symbolically. That requires a parser, an evaluator, it is a large task. There are libraries for this. Evaluation of such a derivative will be slow unless further optimizing steps (compiling to machine code) are undertaken.
Numerical evaluation of the derivative is certainly possible. It will slow the convergence somewhat, depending on the order of the approximation of the derivative. You do a difference of two points for the numerical derivative. You can make an interpolating polynomial from values in multiple points to get a higher-order approximation (finite difference approximations), but that costs machine cycles.
Normally we can use auto difference tools as #John Alexiou mentioned. However in practise I prefer using MATLAB to analytically solve out the Jacobian and then use its build-in function fortran() to convert the result to a f90 file. Take your function as an example. Just type these into MATLAB
syms x y
Fval=sym(zeros(2,1));
Fval(1)=log(abs(x-y^2)) - sin(x*y) - sin(pi);
Fval(2)=exp(x*y) + cos(x-y) - 2;
X=[x;y];
Fjac=jacobian(Fval,X);
fortran(Fjac)
which will yield
Fjac(1,1) = -y*cos(x*y)-((-(x-y**2)/abs(-x+y**2)))/abs(-x+y**2)
Fjac(1,2) = -x*cos(x*y)+(y*((-(x-y**2)/abs(-x+y**2)))*2.0D0)/abs(-
&x+y**2)
Fjac(2,1) = -sin(x-y)+y*exp(x*y)
Fjac(2,2) = sin(x-y)+x*exp(x*y)
to you. You just get an analytical Jacobian fortran function.
Meanwhile, it is impossible to solve the inverse of a mxn matrix because of rank mismatching. You should simplify the system of equations to get a nxn Jacobin.
Additionally, when we use Newton-Raphson's method we do not solve the inverse of the Jacobin which is time-consuming and inaccurate for a large system. An easy way is to use dgesv in LAPACK for dense Jacobin. As we only need to solve the vector x from system of linear equations
Jx=-F
dgesv use LU decomposition and Gaussian elimination to solve above system of equations which is extremely faster than solving inverse matrix.
If the system of equations is large, you can use UMFPACK and its fortran interface module mUMFPACK to solve the system of equations in which J is a sparse matrix. Or use subroutine ILUD and LUSOL in a wide-spread sparse matrix library SPARSEKIT2.
In addition to these, there are tons of other methods which try to solve the Jx=-F faster and more accurate such as Generalized Minimal Residual (GMRES) and Stabilized Bi-Conjugate Gradient (BICGSTAB) which is a strand of literature.

How to calculate several different random numbers from normal dist? using fortran

I have to find 'n' random numbers from a normal distribution given the mean and standard deviation. I have figure out how to get a random number, but when try to loop it to get several different random numbers, it gives me the same number x amount of times?
program prac10
implicit none
real :: mu, sigma
integer :: i
!print*, "mean and stdev?"
!read*, mu, sigma
mu=1.1
sigma=0.1
do i=1, 2 # here is the part I think I am stuck on??
call normal(mu,sigma)
enddo
end program
subroutine normal(mu,sigma)
implicit none
integer :: i
integer :: n
real :: u, v, z, randnum
real :: mu, sigma
real :: pi=3.141593
call RANDOM_SEED()
call RANDOM_NUMBER(u)
call RANDOM_NUMBER(v)
z=sqrt(-2*log(u))*cos(2*pi*v)
randnum=mu+sigma*z
print*, randnum
end subroutine
particularly the part where I should be looping/repeating. I used from 1:2, replacing n with 2 for now so that I wouldn't have to input it every time I try to run it
The most important fact is that you must not call RANOM_SEED repeatedly. It is supposed to be called only once.
It is also good to modify the subroutine to generate the number and pass it further. Notice also the change of formatting to make it more readable and the change in the value of pi.
program prac10
implicit none
real :: mu, sigma, x
integer :: i
call RANDOM_SEED()
mu = 1.1
sigma = 0.1
do i = 1, 2
call normal(x,mu,sigma)
print *, x
end do
end program
subroutine normal(randnum,mu,sigma)
implicit none
integer :: i
integer :: n
real :: u, v, z, randnum
real :: mu, sigma
real :: pi = acos(-1.0)
call RANDOM_NUMBER(u)
call RANDOM_NUMBER(v)
z = sqrt(-2*log(u)) * cos(2*pi*v)
randnum = mu + sigma*z
end subroutine

Evaluating the fast Fourier transform of Gaussian function in FORTRAN using FFTW3 library

I am trying to write a FORTRAN code to evaluate the fast Fourier transform of the Gaussian function f(r)=exp(-(r^2)) using FFTW3 library. As everyone knows, the Fourier transform of the Gaussian function is another Gaussian function.
I consider evaluating the Fourier-transform integral of the Gaussian function in the spherical coordinate.
Hence the resulting integral can be simplified to be integral of [r*exp(-(r^2))*sin(kr)]dr.
I wrote the following FORTRAN code to evaluate the discrete SINE transform DST which is the discrete Fourier transform DFT using a PURELY real input array. DST is performed by C_FFTW_RODFT00 existing in FFTW3, taking into account that the discrete values in position space are r=i*delta (i=1,2,...,1024), and the input array for DST is the function r*exp(-(r^2)) NOT the Gaussian. The sine function in the integral of [r*exp(-(r^2))*sin(kr)]dr resulting from the INTEGRATION over the SPHERICAL coordinates, and it is NOT the imaginary part of exp(ik.r) that appears when taking the analytic Fourier transform in general.
However, the result is not a Gaussian function in the momentum space.
Module FFTW3
use, intrinsic :: iso_c_binding
include 'fftw3.f03'
end module
program sine_FFT_transform
use FFTW3
implicit none
integer, parameter :: dp=selected_real_kind(8)
real(kind=dp), parameter :: pi=acos(-1.0_dp)
integer, parameter :: n=1024
real(kind=dp) :: delta, k
real(kind=dp) :: numerical_F_transform
integer :: i
type(C_PTR) :: my_plan
real(C_DOUBLE), dimension(1024) :: y
real(C_DOUBLE), dimension(1024) :: yy, yk
integer(C_FFTW_R2R_KIND) :: C_FFTW_RODFT00
my_plan= fftw_plan_r2r_1d(1024,y,yy,FFTW_FORWARD, FFTW_ESTIMATE)
delta=0.0125_dp
do i=1, n !inserting the input one-dimension position function
y(i)= 2*(delta)*(i-1)*exp(-((i-1)*delta)**2)
! I multiplied by 2 due to the definition of C_FFTW_RODFT00 in FFTW3
end do
call fftw_execute_r2r(my_plan, y,yy)
do i=2, n
k = (i-1)*pi/n/delta
yk(i) = 4*pi*delta*yy(i)/2 !I divide by 2 due to the definition of
!C_FFTW_RODFT00
numerical_F_transform=yk(i)/k
write(11,*) i,k,numerical_F_transform
end do
call fftw_destroy_plan(my_plan)
end program
Executing the previous code gives the following plot which is not for Gaussian function.
Can anyone help me understand what the problem is? I guess the problem is mainly due to FFTW3. Maybe I did not use it properly especially concerning the boundary conditions.
Looking at the related pages in the FFTW site (Real-to-Real Transforms, transform kinds, Real-odd DFT (DST)) and the header file for Fortran, it seems that FFTW expects FFTW_RODFT00 etc rather than FFTW_FORWARD for specifying the kind of
real-to-real transform. For example,
! my_plan= fftw_plan_r2r_1d( n, y, yy, FFTW_FORWARD, FFTW_ESTIMATE )
my_plan= fftw_plan_r2r_1d( n, y, yy, FFTW_RODFT00, FFTW_ESTIMATE )
performs the "type-I" discrete sine transform (DST-I) shown in the above page. This modification seems to fix the problem (i.e., makes the Fourier transform a Gaussian with positive values).
The following is a slightly modified version of OP's code to experiment the above modification:
! ... only the modified part is shown...
real(dp) :: delta, k, r, fftw, num, ana
integer :: i, j, n
type(C_PTR) :: my_plan
real(C_DOUBLE), allocatable :: y(:), yy(:)
delta = 0.0125_dp ; n = 1024 ! rmax = 12.8
! delta = 0.1_dp ; n = 128 ! rmax = 12.8
! delta = 0.2_dp ; n = 64 ! rmax = 12.8
! delta = 0.4_dp ; n = 32 ! rmax = 12.8
allocate( y( n ), yy( n ) )
! my_plan= fftw_plan_r2r_1d( n, y, yy, FFTW_FORWARD, FFTW_ESTIMATE )
my_plan= fftw_plan_r2r_1d( n, y, yy, FFTW_RODFT00, FFTW_ESTIMATE )
! Loop over r-grid
do i = 1, n
r = i * delta ! (2-a)
y( i )= r * exp( -r**2 )
end do
call fftw_execute_r2r( my_plan, y, yy )
! Loop over k-grid
do i = 1, n
! Result of FFTW
k = i * pi / ((n + 1) * delta) ! (2-b)
fftw = 4 * pi * delta * yy( i ) / k / 2 ! the last 2 due to RODFT00
! Numerical result via quadrature
num = 0
do j = 1, n
r = j * delta
num = num + r * exp( -r**2 ) * sin( k * r )
enddo
num = num * 4 * pi * delta / k
! Analytical result
ana = sqrt( pi )**3 * exp( -k**2 / 4 )
! Output
write(10,*) k, fftw
write(20,*) k, num
write(30,*) k, ana
end do
Compile (with gfortran-8.2 + FFTW3.3.8 + OSX10.11):
$ gfortran -fcheck=all -Wall sine.f90 -I/usr/local/Cellar/fftw/3.3.8/include -L/usr/local/Cellar/fftw/3.3.8/lib -lfftw3
If we use FFTW_FORWARD as in the original code, we get
which has a negative lobe (where fort.10, fort.20, and fort.30 correspond to FFTW, quadrature, and analytical results). Modifying the code to use FFTW_RODFT00 changes the result as below, so the modification seems to be working (but please see below for the grid definition).
Additional notes
I have slightly modified the grid definition for r and k in my code (Lines (2-a) and (2-b)), which is found to improve the accuracy. But I'm still not sure whether the above definition matches the definition used by FFTW, so please read the manual for details...
The fftw3.f03 header file gives the interface for fftw_plan_r2r_1d
type(C_PTR) function fftw_plan_r2r_1d(n,in,out,kind,flags) bind(C, name='fftw_plan_r2r_1d')
import
integer(C_INT), value :: n
real(C_DOUBLE), dimension(*), intent(out) :: in
real(C_DOUBLE), dimension(*), intent(out) :: out
integer(C_FFTW_R2R_KIND), value :: kind
integer(C_INT), value :: flags
end function fftw_plan_r2r_1d
(Because of no Tex support, this part is very ugly...) The integral of 4 pi r^2 * exp(-r^2) * sin(kr)/(kr) for r = 0 -> infinite is pi^(3/2) * exp(-k^2 / 4) (obtained from Wolfram Alpha or by noting that this is actually a 3-D Fourier transform of exp(-(x^2 + y^2 + z^2)) by exp(-i*(k1 x + k2 y + k3 z)) with k =(k1,k2,k3)). So, although a bit counter-intuitive, the result becomes a positive Gaussian.
I guess the r-grid can be chosen much coarser (e.g. delta up to 0.4), which gives almost the same accuracy as long as it covers the frequency domain of the transformed function (here exp(-r^2)).
Of course there are negative components of the real part to the FFT of a limited Gaussian spectrum. You are just using the real part of the transform. So your plot is absolutely correct.
You seem to be mistaking the real part with the magnitude, which of course would not be negative. For that you would need to fftw_plan_dft_r2c_1d and then calculate the absolute values of the complex coefficients. Or you might be mistaking the Fourier transform with a limited DFT.
You might want to check here to convince yourself of the correctness of you calculation above:
http://docs.mantidproject.org/nightly/algorithms/FFT-v1.html
Please do keep in mind that the plots on the above page are shifted, so that the 0 frequency is in the middle of the spectrum.
Citing yourself, the nummeric integration of [r*exp(-(r^2))*sin(kr)]dr would have negative components for all k>1 if normalised to 0 for highest frequency.
TLDR: Your plot is absolute state of the art and inline with discrete and limited functional analysis.

Prevent changing variables with intent(in)

so reading the following question (Correct use of FORTRAN INTENT() for large arrays) I learned that defining a variable with intent(in) isn't enough, since when the variable is passed to another subroutine/function, it can be changed again. So how can I avoid this? In the original thread they talked about putting the subroutine into a module, but that doesn't help for me. For example I want to calculate the determinant of a matrix with a LU-factorization. Therefore I use the Lapack function zgetrf, but however this function alters my input matrix and the compiler don't displays any warnings. So what can I do?
module matHelper
implicit none
contains
subroutine initMat(AA)
real*8 :: u
double complex, dimension(:,:), intent(inout) :: AA
integer :: row, col, counter
counter = 1
do row=1,size(AA,1)
do col=1,size(AA,2)
AA(row,col)=cmplx(counter ,0)
counter=counter+1
end do
end do
end subroutine initMat
!subroutine to write a Matrix to file
!Input: AA - double complex matrix
! fid - integer file id
! fname - file name
! stat - integer status =replace[0] or old[1]
subroutine writeMat(AA,fid, fname, stat)
integer :: fid, stat
character(len=*) :: fname
double complex, dimension(:,:), intent(in) :: AA
integer :: row, col
character (len=64) :: fmtString
!opening file with given options
if(fid /= 0) then
if(stat == 0) then
open(unit=fid, file=fname, status='replace', &
action='write')
else if(stat ==1) then
open(unit=fid, file=fname, status='old', &
action='write')
else
print*, 'Error while trying to open file with Id', fid
return
end if
end if
!initializing matrix print format
write(fmtString,'(I0)') size(aa,2)
fmtString = '('// trim(fmtString) //'("{",ES10.3, ",", 1X, ES10.3,"}",:,1X))'
!write(*,*) fmtString
!writing matrix to file by iterating through each row
do row=1,size(aa,1)
write(fid,fmt = fmtString) AA(row,:)
enddo
write(fid,*) ''
end subroutine writeMat
!function to calculate the determinant of the input
!Input: AA - double complex matrix
!Output determinantMat - double complex,
! 0 if AA not a square matrix
function determinantMat(AA)
double complex, dimension(:,:), intent(in) :: AA
double complex :: determinantMat
integer, dimension(min(size(AA,1),size(AA,2)))&
:: ipiv
integer :: ii, info
!check if not square matrix, then set determinant to 0
if(size(AA,1)/= size(AA,2)) then
determinantMat = 0
return
end if
!compute LU facotirzation with LAPACK function
call zgetrf(size(AA,1),size(AA,2), AA,size(AA,1), ipiv,info)
if(info /= 0) then
determinantMat = cmplx(0.D0, 0.D0)
return
end if
determinantMat = cmplx(1.D0, 0.D0)
!determinant of triangular matrix is product of diagonal elements
do ii=1,size(AA,1)
if(ipiv(ii) /= ii) then
!a permutation was done, so a factor of -1
determinantMat = -determinantMat *AA(ii,ii)
else
!no permutation, so no -1
determinantMat = determinantMat*AA(ii,ii)
end if
end do
end function determinantMat
end module matHelper
!***********************************************************************
!module which stores matrix elements, dimension, trace, determinant
program test
use matHelper
implicit none
double complex, dimension(:,:), allocatable :: AA, BB
integer :: n, fid
fid = 0;
allocate(AA(3,3))
call initMat(AA)
call writeMat(AA,0,' ', 0)
print*, 'Determinante: ',determinantMat(AA) !changes AA
call writeMat(AA,0, ' ', 0)
end program test
PS: I am using the ifort compiler v15.0.3 20150407
I do not have ifort at home, but you may want to try compiling with '-check interfaces' and maybe with '-ipo'. You may need the path to 'zgetrf' for the '-check interfaces' to work, and if that is not source then it may not help.
If you declare 'function determinantMat' as 'PURE FUNCTION determinantMat' then I am pretty sure it would complain because 'zgetrf' is not known to be PURE nor ELEMENTAL. Try ^this stuff^ first.
If LAPACK has a module, then zgetrf could be known to be, or not be, PURE/ELEMENTAL. https://software.intel.com/en-us/articles/blas-and-lapack-fortran95-mod-files
I would suggest you add to your compile line:
-check interfaces -ipo
During initial build I like (Take it out for speed once it works):
-check all -warn all
Making a temporary array is one way around it. (I have not compiled this, so it is only a conceptual exemplar.)
PURE FUNCTION determinantMat(AA)
USE LAPACK95 !--New Line--!
IMPLICIT NONE !--New Line--!
double complex, dimension(:,:) , intent(IN ) :: AA
double complex :: determinantMat !<- output
!--internals--
integer, dimension(min(size(AA,1),size(AA,2))) :: ipiv
!!--Next line is new--
double complex, dimension(size(AA,1),size(AA,2)) :: AA_Temp !!<- I have no idea if this will work, you may need an allocatable??
integer :: ii, info
!check if not square matrix, then set determinant to 0
if(size(AA,1)/= size(AA,2)) then
determinantMat = 0
return
end if
!compute LU factorization with LAPACK function
!!--Next line is new--
AA_Temp = AA !--Initialise AA_Temp to be the same as AA--!
call zgetrf(size(AA_temp,1),size(AA_Temp,2), AA_Temp,size(AA_Temp,1), ipiv,info)
if(info /= 0) then
determinantMat = cmplx(0.D0, 0.D0)
return
end if
determinantMat = cmplx(1.D0, 0.D0)
!determinant of triangular matrix is product of diagonal elements
do ii=1,size(AA_Temp,1)
if(ipiv(ii) /= ii) then
!a permutation was done, so a factor of -1
determinantMat = -determinantMat *AA_Temp(ii,ii)
else
!no permutation, so no -1
determinantMat = determinantMat*AA_Temp(ii,ii)
end if
end do
end function determinantMat
With the 'USE LAPACK95' you probably do not need PURE, but if you wanted it to be PURE then you want to explicitly say so.

LAPACK: Are operations on packed storage matrices faster?

I want to tridiagonalize a real symmetric matrix using Fortran and LAPACK. LAPACK basically provides two routines, one operating on the full matrix, the other on the matrix in packed storage. While the latter surely uses less memory, I was wondering if anything can be said about the speed difference?
It's an empirical question, of course: but in general, nothing comes for free, and less memory/more runtime is a pretty common tradeoff.
In this case, the indexing for the data is more complex for the packed case, so as you traverse the matrix, the cost of getting your data is a little higher. (Complicating this picture is that for symmetric matrices, the lapack routines also assume a certain kind of packing - that you only have the upper or lower component of the matrix available).
I was messing around with an eigenproblem earlier today, so I'll use that as a measurement benchmark; trying with a simple symmetric test case (The Herdon matrix, from http://people.sc.fsu.edu/~jburkardt/m_src/test_mat/test_mat.html ), and comparing ssyevd with sspevd
$ ./eigen2 500
Generating a Herdon matrix:
Unpacked array:
Eigenvalues L_infty err = 1.7881393E-06
Packed array:
Eigenvalues L_infty err = 3.0994415E-06
Packed time: 2.800000086426735E-002
Unpacked time: 2.500000037252903E-002
$ ./eigen2 1000
Generating a Herdon matrix:
Unpacked array:
Eigenvalues L_infty err = 4.5299530E-06
Packed array:
Eigenvalues L_infty err = 5.8412552E-06
Packed time: 0.193900004029274
Unpacked time: 0.165000006556511
$ ./eigen2 2500
Generating a Herdon matrix:
Unpacked array:
Eigenvalues L_infty err = 6.1988831E-06
Packed array:
Eigenvalues L_infty err = 8.4638596E-06
Packed time: 3.21040010452271
Unpacked time: 2.70149993896484
There's about an 18% difference, which I must admit is larger than I expected (also with a slightly larger error for the packed case?). This is with intel's MKL. The performance difference will depend on your matrix in general, of course, as eriktous points out, and on the problem you're doing; the more random access to the matrix you have to do, the worse the overhead would be. The code I used is as follows:
program eigens
implicit none
integer :: nargs,n ! problem size
real, dimension(:,:), allocatable :: A, B, Z
real, dimension(:), allocatable :: PA
real, dimension(:), allocatable :: work
integer, dimension(:), allocatable :: iwork
real, dimension(:), allocatable :: eigenvals, expected
real :: c, p
integer :: worksize, iworksize
character(len=100) :: nstr
integer :: unpackedclock, packedclock
double precision :: unpackedtime, packedtime
integer :: i,j,info
! get filename
nargs = command_argument_count()
if (nargs /= 1) then
print *,'Usage: eigen2 n'
print *,' Where n = size of array'
stop
endif
call get_command_argument(1, nstr)
read(nstr,'(I)') n
if (n < 4 .or. n > 25000) then
print *, 'Invalid n ', nstr
stop
endif
! Initialize local arrays
allocate(A(n,n),B(n,n))
allocate(eigenvals(n))
! calculate the matrix - unpacked
print *, 'Generating a Herdon matrix: '
A = 0.
c = (1.*n * (1.*n + 1.) * (2.*n - 5.))/6.
forall (i=1:n-1,j=1:n-1)
A(i,j) = -1.*i*j/c
endforall
forall (i=1:n-1)
A(i,i) = (c - 1.*i*i)/c
A(i,n) = 1.*i/c
endforall
forall (j=1:n-1)
A(n,j) = 1.*j/c
endforall
A(n,n) = -1./c
B = A
! expected eigenvalues
allocate(expected(n))
p = 3. + sqrt((4. * n - 3.) * (n - 1.)*3./(n+1.))
expected(1) = p/(n*(5.-2.*n))
expected(2) = 6./(p*(n+1.))
expected(3:n) = 1.
print *, 'Unpacked array:'
allocate(work(1),iwork(1))
call ssyevd('N','U',n,A,n,eigenvals,work,-1,iwork,-1,info)
worksize = int(work(1))
iworksize = int(work(1))
deallocate(work,iwork)
allocate(work(worksize),iwork(iworksize))
call tick(unpackedclock)
call ssyevd('N','U',n,A,n,eigenvals,work,worksize,iwork,iworksize,info)
unpackedtime = tock(unpackedclock)
deallocate(work,iwork)
if (info /= 0) then
print *, 'Error -- info = ', info
endif
print *,'Eigenvalues L_infty err = ', maxval(eigenvals-expected)
! pack array
print *, 'Packed array:'
allocate(PA(n*(n+1)/2))
allocate(Z(n,n))
do i=1,n
do j=i,n
PA(i+(j-1)*j/2) = B(i,j)
enddo
enddo
allocate(work(1),iwork(1))
call sspevd('N','U',n,PA,eigenvals,Z,n,work,-1,iwork,-1,info)
worksize = int(work(1))
iworksize = iwork(1)
deallocate(work,iwork)
allocate(work(worksize),iwork(iworksize))
call tick(packedclock)
call sspevd('N','U',n,PA,eigenvals,Z,n,work,worksize,iwork,iworksize,info)
packedtime = tock(packedclock)
deallocate(work,iwork)
deallocate(Z,A,B,PA)
if (info /= 0) then
print *, 'Error -- info = ', info
endif
print *,'Eigenvalues L_infty err = ', &
maxval(eigenvals-expected)
deallocate(eigenvals, expected)
print *,'Packed time: ', packedtime
print *,'Unpacked time: ', unpackedtime
contains
subroutine tick(t)
integer, intent(OUT) :: t
call system_clock(t)
end subroutine tick
! returns time in seconds from now to time described by t
real function tock(t)
integer, intent(in) :: t
integer :: now, clock_rate
call system_clock(now,clock_rate)
tock = real(now - t)/real(clock_rate)
end function tock
end program eigens