lapack zheevd gives wrong results - fortran

I am trying to use lapack's zheevd in order to diagonalize a complex Hermitian matrix. I' ve written a small example which doesn't produce any compile or run time error but gives wrong results for the eigenvalues... Here's the code:
program test
implicit none
INTEGER, PARAMETER :: N=4
INTEGER, PARAMETER :: LDA = N
INTEGER, PARAMETER :: LWMAX = 1000
INTEGER :: INFO, LWORK, LIWORK, LRWORK,i,j
INTEGER :: IWORK( LWMAX )
REAL(8) :: W(N), RWORK( LWMAX )
COMPLEX(16) :: A(LDA, N), WORK(LWMAX), zero
character(len=1) :: job,uplo
! the matrix I want to diagonalize is:
! ( 3.40, 0.00) ( -2.36, -1.93) ( -4.68, 9.55) ( 5.37, -1.23)
! A= ( -2.36, 1.93) ( 6.94, 0.00) ( 8.13, -1.47) ( 2.07, -5.78)
! ( -4.68, -9.55) ( 8.13, 1.47) ( -2.14, 0.00) ( 4.68, 7.44)
! ( 5.37, 1.23) ( 2.07, 5.78) ( 4.68, -7.44) ( -7.42, 0.00)
zero=dcmplx(0.0d0,0.0d0)
A=zero
A(1,1)= dcmplx( 3.40d0, 0.0d0); A(1,2)=dcmplx(-2.36d0, -1.93d0); A(1,3)= dcmplx(-4.68d0,9.55d0)
A(1,4)= dcmplx( 5.37d0, -1.23d0)
A(2,2)= dcmplx( 6.94d0, 0.0d0); A(2,3)=dcmplx( 8.13d0, -1.47d0); A(2,4)= dcmplx( 2.07d0, -5.78d0)
A(3,3)= dcmplx(-2.14d0, 0.0d0); A(3,4)=dcmplx( 4.68d0, 7.44d0); A(4,4)= dcmplx(-7.42d0, 0.0d0)
job='V'; uplo='U'
LWORK= N**2 + 2*N; LRWORK= 2*N**2 + 5*N + 1; LIWORK= 5*N+3
CALL ZHEEVD( job, uplo, N, A, LDA, W, WORK, LWORK, RWORK,LRWORK,IWORK,LIWORK, INFO )
IF( INFO > 0 ) THEN
WRITE(*,*)'The algorithm failed to compute eigenvalues.'
STOP
END IF
print*, 'eigenvalues found'
do i=1,N
print*, W(i)
end do
open(1, file='eigenvectors.dat')
write(1,10) ((A(i,j),j=1,N),i=1,N)
10 format(4(F10.5,2X,F10.5))
end program test
when I run the code the results I get for the eigenvalues are:
-2.8413, 0, 0, 2.8413
while the actual eigenvalues are: -21.968, 16.3387, 6.45946, -0.0501069
I keep seeing the routine's reference guide and it seems I have everything correct so it should work properly expect it doesn't... Has anyone an idea about what is wrong with my code?
Thanks

There are three main problems here that I can see:
The most serious issue is that you have translated the COMPLEX*16 types in the MKL example you have based your code on as COMPLEX(16). That is incorrect. You should use COMPLEX(8). I don't know whether your toolchain actually has an extended precision complex type, but there could be a size mismatch between your code and the LAPACK call
There is a typo in the code that means that the values of the matrix you pass to LAPACK are not the same as in your comments (and presumably also not the same as the matrix you computed the eigenvalues for)
Lastly, and just as importantly, you have not defined an interface for ZHEEVD (or declared it as external). This will lead to an implicit interface being guessed by the compiler, and it is quite probable that there are inconsistencies between the argument passing within your code and what LAPACK expects. Especially given the type mismatch in the complex arguments.
I would expect that some combination of all three should fix the results.

Related

Evaluating the fast Fourier transform of Gaussian function in FORTRAN using FFTW3 library

I am trying to write a FORTRAN code to evaluate the fast Fourier transform of the Gaussian function f(r)=exp(-(r^2)) using FFTW3 library. As everyone knows, the Fourier transform of the Gaussian function is another Gaussian function.
I consider evaluating the Fourier-transform integral of the Gaussian function in the spherical coordinate.
Hence the resulting integral can be simplified to be integral of [r*exp(-(r^2))*sin(kr)]dr.
I wrote the following FORTRAN code to evaluate the discrete SINE transform DST which is the discrete Fourier transform DFT using a PURELY real input array. DST is performed by C_FFTW_RODFT00 existing in FFTW3, taking into account that the discrete values in position space are r=i*delta (i=1,2,...,1024), and the input array for DST is the function r*exp(-(r^2)) NOT the Gaussian. The sine function in the integral of [r*exp(-(r^2))*sin(kr)]dr resulting from the INTEGRATION over the SPHERICAL coordinates, and it is NOT the imaginary part of exp(ik.r) that appears when taking the analytic Fourier transform in general.
However, the result is not a Gaussian function in the momentum space.
Module FFTW3
use, intrinsic :: iso_c_binding
include 'fftw3.f03'
end module
program sine_FFT_transform
use FFTW3
implicit none
integer, parameter :: dp=selected_real_kind(8)
real(kind=dp), parameter :: pi=acos(-1.0_dp)
integer, parameter :: n=1024
real(kind=dp) :: delta, k
real(kind=dp) :: numerical_F_transform
integer :: i
type(C_PTR) :: my_plan
real(C_DOUBLE), dimension(1024) :: y
real(C_DOUBLE), dimension(1024) :: yy, yk
integer(C_FFTW_R2R_KIND) :: C_FFTW_RODFT00
my_plan= fftw_plan_r2r_1d(1024,y,yy,FFTW_FORWARD, FFTW_ESTIMATE)
delta=0.0125_dp
do i=1, n !inserting the input one-dimension position function
y(i)= 2*(delta)*(i-1)*exp(-((i-1)*delta)**2)
! I multiplied by 2 due to the definition of C_FFTW_RODFT00 in FFTW3
end do
call fftw_execute_r2r(my_plan, y,yy)
do i=2, n
k = (i-1)*pi/n/delta
yk(i) = 4*pi*delta*yy(i)/2 !I divide by 2 due to the definition of
!C_FFTW_RODFT00
numerical_F_transform=yk(i)/k
write(11,*) i,k,numerical_F_transform
end do
call fftw_destroy_plan(my_plan)
end program
Executing the previous code gives the following plot which is not for Gaussian function.
Can anyone help me understand what the problem is? I guess the problem is mainly due to FFTW3. Maybe I did not use it properly especially concerning the boundary conditions.
Looking at the related pages in the FFTW site (Real-to-Real Transforms, transform kinds, Real-odd DFT (DST)) and the header file for Fortran, it seems that FFTW expects FFTW_RODFT00 etc rather than FFTW_FORWARD for specifying the kind of
real-to-real transform. For example,
! my_plan= fftw_plan_r2r_1d( n, y, yy, FFTW_FORWARD, FFTW_ESTIMATE )
my_plan= fftw_plan_r2r_1d( n, y, yy, FFTW_RODFT00, FFTW_ESTIMATE )
performs the "type-I" discrete sine transform (DST-I) shown in the above page. This modification seems to fix the problem (i.e., makes the Fourier transform a Gaussian with positive values).
The following is a slightly modified version of OP's code to experiment the above modification:
! ... only the modified part is shown...
real(dp) :: delta, k, r, fftw, num, ana
integer :: i, j, n
type(C_PTR) :: my_plan
real(C_DOUBLE), allocatable :: y(:), yy(:)
delta = 0.0125_dp ; n = 1024 ! rmax = 12.8
! delta = 0.1_dp ; n = 128 ! rmax = 12.8
! delta = 0.2_dp ; n = 64 ! rmax = 12.8
! delta = 0.4_dp ; n = 32 ! rmax = 12.8
allocate( y( n ), yy( n ) )
! my_plan= fftw_plan_r2r_1d( n, y, yy, FFTW_FORWARD, FFTW_ESTIMATE )
my_plan= fftw_plan_r2r_1d( n, y, yy, FFTW_RODFT00, FFTW_ESTIMATE )
! Loop over r-grid
do i = 1, n
r = i * delta ! (2-a)
y( i )= r * exp( -r**2 )
end do
call fftw_execute_r2r( my_plan, y, yy )
! Loop over k-grid
do i = 1, n
! Result of FFTW
k = i * pi / ((n + 1) * delta) ! (2-b)
fftw = 4 * pi * delta * yy( i ) / k / 2 ! the last 2 due to RODFT00
! Numerical result via quadrature
num = 0
do j = 1, n
r = j * delta
num = num + r * exp( -r**2 ) * sin( k * r )
enddo
num = num * 4 * pi * delta / k
! Analytical result
ana = sqrt( pi )**3 * exp( -k**2 / 4 )
! Output
write(10,*) k, fftw
write(20,*) k, num
write(30,*) k, ana
end do
Compile (with gfortran-8.2 + FFTW3.3.8 + OSX10.11):
$ gfortran -fcheck=all -Wall sine.f90 -I/usr/local/Cellar/fftw/3.3.8/include -L/usr/local/Cellar/fftw/3.3.8/lib -lfftw3
If we use FFTW_FORWARD as in the original code, we get
which has a negative lobe (where fort.10, fort.20, and fort.30 correspond to FFTW, quadrature, and analytical results). Modifying the code to use FFTW_RODFT00 changes the result as below, so the modification seems to be working (but please see below for the grid definition).
Additional notes
I have slightly modified the grid definition for r and k in my code (Lines (2-a) and (2-b)), which is found to improve the accuracy. But I'm still not sure whether the above definition matches the definition used by FFTW, so please read the manual for details...
The fftw3.f03 header file gives the interface for fftw_plan_r2r_1d
type(C_PTR) function fftw_plan_r2r_1d(n,in,out,kind,flags) bind(C, name='fftw_plan_r2r_1d')
import
integer(C_INT), value :: n
real(C_DOUBLE), dimension(*), intent(out) :: in
real(C_DOUBLE), dimension(*), intent(out) :: out
integer(C_FFTW_R2R_KIND), value :: kind
integer(C_INT), value :: flags
end function fftw_plan_r2r_1d
(Because of no Tex support, this part is very ugly...) The integral of 4 pi r^2 * exp(-r^2) * sin(kr)/(kr) for r = 0 -> infinite is pi^(3/2) * exp(-k^2 / 4) (obtained from Wolfram Alpha or by noting that this is actually a 3-D Fourier transform of exp(-(x^2 + y^2 + z^2)) by exp(-i*(k1 x + k2 y + k3 z)) with k =(k1,k2,k3)). So, although a bit counter-intuitive, the result becomes a positive Gaussian.
I guess the r-grid can be chosen much coarser (e.g. delta up to 0.4), which gives almost the same accuracy as long as it covers the frequency domain of the transformed function (here exp(-r^2)).
Of course there are negative components of the real part to the FFT of a limited Gaussian spectrum. You are just using the real part of the transform. So your plot is absolutely correct.
You seem to be mistaking the real part with the magnitude, which of course would not be negative. For that you would need to fftw_plan_dft_r2c_1d and then calculate the absolute values of the complex coefficients. Or you might be mistaking the Fourier transform with a limited DFT.
You might want to check here to convince yourself of the correctness of you calculation above:
http://docs.mantidproject.org/nightly/algorithms/FFT-v1.html
Please do keep in mind that the plots on the above page are shifted, so that the 0 frequency is in the middle of the spectrum.
Citing yourself, the nummeric integration of [r*exp(-(r^2))*sin(kr)]dr would have negative components for all k>1 if normalised to 0 for highest frequency.
TLDR: Your plot is absolute state of the art and inline with discrete and limited functional analysis.

How to find optimal block size and LWORK in LAPACK

I am trying to find inverse and eigenfunctions of nxn Hermitian matrices using Fortran with lapack.
How do I choose the optimal values for parameters like lda, lwork, liwork and lrwork. I browse through some example and find these choices
integer,parameter::lda=nh
integer,parameter::lwork=2*nh+nh*nh
integer,parameter::liwork=3+5*nh
integer,parameter::lrwork=1 + 5*nh + 2*nh*nh
where nh is the dimension of the matrix. I also find another example with lwork=16*nh. How can I determine the best choice? At this point, I am dealing with 500x500 Hermitian matrices (maximum).
I found this documentation which suggests
WORK
(workspace) REAL array, dimension (LWORK)
On exit, if INFO = 0, then WORK(1) returns the optimal LWORK.
LWORK
(input) INTEGER
The dimension of the array WORK. LWORK  max(1,N).
For optimal performance LWORK  N*NB, where NB is the optimal block size returned by ILAENV.
Is it possible to find out the optimal block size using WORK or ILAENV for a given matrix dimension?
I am using both gfortran and ifort with mkl.
EDIT
Based on the comment by #percusse and #kvantour's answer here is a sample code
character,parameter::jobz="v",uplo="u"
integer, parameter::nh=15
complex*16::m(nh,nh),m1(nh,nh)
integer,parameter::lda=nh
integer::ipiv(nh),info
complex*16::work(1)
real*8::rwork(1), w(nh)
integer::iwork(1)
real*8::x1(nh,nh),x2(nh,nh)
call random_seed()
call random_number(x1)
call random_number(x2)
m=cmplx(x1,x2)
m1=conjg(m)
m1=transpose(m1)
m=(m+m1)/2.0
call zheevd(jobz,uplo,nh,m,lda,w,work,-1,rwork,-1,iwork, -1,info)
print*,"info : ", info
print*,"lwork: ", int(work(1)) , 2*nh+nh*nh
print*,"lrwork:", int(rwork(1)) , 1 + 5*nh + 2*nh*nh
print*,"liwork:", int(iwork(1)) , 3+5*nh
end
info : 0
lwork: 255 255
lrwork: 526 526
liwork: 78 78
I'm not sure what you are implying with "Is it possible to find out the optimal block size using WORK or ILAENV for a particular machine architecture?". You can however find the optimal values for a particular problem.
Eg. If you want to find the eigenvalues of a complex Hermitian matrix, using cheev, you can ask the routine to return you the value :
subroutine CHEEV( JOBZ, UPLO, N, A, LDA, W, WORK, LWORK, RWORK, INFO )
character , intent(in) :: JOBZ
character , intent(in) :: UPLO
integer , intent(in) :: N
complex, dimension(lda,*), intent(inout) :: A
integer , intent(in) :: LDA
real , dimension(*) , intent(out) :: W
complex, dimension(*) , intent(out) :: WORK
integer , intent(in) :: LWORK
real , dimension(*) , intent(out) :: RWORK
integer , intent(out) :: INFO
Then the documentation clearly states (be advised, in the past this was easier to read):
WORK is COMPLEX array, dimension (MAX(1,LWORK))
On exit, if INFO = 0, WORK(1) returns the optimal LWORK.
LWORK is INTEGER
The length of the array WORK. LWORK >= max(1,2*N-1).
For optimal efficiency, LWORK >= (NB+1)*N,
where NB is the blocksize for CHETRD returned by ILAENV. If LWORK = -1, then a workspace query is assumed; the routine
only calculates the optimal size of the WORK array, returns
this value as the first entry of the WORK array, and no error
message related to LWORK is issued by XERBLA.
So all you need to do is
call cheev(jobz, uplo, n, a, lda, w, work, -1, rwork, info)
lwork=int(work(1))
dallocate(work)
allocate(work(lwork))
call cheev(jobz, uplo, n, a, lda, w, work, lwork, rwork, info)

Solving eigenvalue linear system with Fortran using Lapack zggev routine

I am trying to use the ZGGEV routine from Lapack to solve the general eigen value problem, namely, A x=v B x. Where v is the eigen-value. I'm doing a small test on some random matrixes. Let's say
A=[ -21.10-22.50i 53.50-50.50i -34.50+127.50i 7.50+0.50i;
-0.46-7.78i -3.50-37.50i -15.50+58.50i -10.50-1.50i;
4.30-5.50i 39.70-17.10i -68.50+12.50i -7.50-3.50i;
5.50+4.40i 14.40+43.30i -32.50-46.00i -19.00-32.50i]
B=[ 1.00-5.00i 1.60+1.20i -3.00+0.00i 0.00-1.00i ;
0.80-0.60i 3.00-5.00i -4.00+3.00i -2.40-3.20i ;
1.00+0.00i 2.40+1.80i -4.00-5.00i 0.00-3.00i ;
0.00+1.00i -1.80+2.40i 0.00-4.00i 4.00-5.00i]
my Fortran codes are as followed:
program testz
implicit none
integer, parameter :: N=4, nb=64, Nmax=10
integer :: lda,ldb,ldvr,lwork
parameter (lda=Nmax, ldb=Nmax, ldvr=Nmax,lwork=Nmax+Nmax*nb)
integer :: i,j,info
complex(kind=16) :: A(lda,Nmax), alpha(Nmax), B(ldb,Nmax),
& beta(Nmax), dummy(1,1), vr(ldvr,Nmax), work(lwork), eig(Nmax)
double precision :: rwork(8*Nmax)
A(1,1)=(-21.10,-22.50);A(1,2)=(53.50,-50.50)
A(1,3)=(-34.50,127.50);A(1,4)=(7.50,0.50)
A(2,1)=(-0.46,7.78);A(2,2)=(-3.5,-37.5)
A(2,3)=(-15.5,58.5);A(2,4)=(-10.5,-1.5)
A(3,1)=(4.3,-5.5);A(3,2)=(39.7,-17.1)
A(3,3)=(-68.5,12.5);A(3,4)=(-7.5,-3.5)
A(4,1)=(5.5,4.4);A(4,2)=(14.4,43.3)
A(4,3)=(-32.5,-46);A(4,4)=(-19,32.5)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
B(1,1)=(1,-5);B(1,2)=(1.6,1.2)
B(1,3)=(-3,0);B(1,4)=(0,-1)
B(2,1)=(0.8,-0.6);B(2,2)=(3,-5)
B(2,3)=(-4,3);B(2,4)=(-2.4,-3.2)
B(3,1)=(1,0);B(3,2)=(2.4,1.8)
B(3,3)=(-4,-5);B(3,4)=(0,-3)
B(4,1)=(0,1);B(4,2)=(-1.8,2.4)
B(4,3)=(0,-4);B(4,4)=(4,-5)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
call zggev('n','v',N,A,lda,B,ldb,alpha,beta,dummy,1,vr,ldvr,
& work,lwork,rwork,info)
eig=alpha/beta
!here skip the heading to the file
do j=1,N
write(7,12) eig(j)
12 format('(',2F8.4,')')!,'(',2F8.4,')','(',2F8.4,')','(',2F8.4,')')
end do
end program
my results are:
( NaN NaN)
(****************)
( 0.0000 0.0000)
( 0.0000 0.0000)
while the correct answer should be
Eigenvalue( 1) = ( 3.0000E+00,-9.0000E+00)
Eigenvalue( 2) = ( 2.0000E+00,-5.0000E+00)
Eigenvalue( 3) = ( 3.0000E+00,-1.0000E+00)
Eigenvalue( 4) = ( 4.0000E+00,-5.0000E+00)

Fortran strange segmentation fault

I have some a problem with my main code, so I tried to isolate the problem.
Therefore, I have this small code :
MODULE Param
IMPLICIT NONE
integer, parameter :: dr = SELECTED_REAL_KIND(15, 307)
integer :: D =3
integer :: Q=10
integer :: mmo=16
integer :: n=2
integer :: x=80
integer :: y=70
integer :: z=20
integer :: tMax=8
END MODULE Param
module m
contains
subroutine compute(f, r)
USE Param, ONLY: dr, mmo, x, y, z, n
IMPLICIT NONE
real (kind=dr), intent(in) :: f(x,y,z, 0:mmo, n)
real (kind=dr), intent(out) :: r(x, y, z, n)
real (kind=dr) :: fGlob(x,y,z, 0:mmo)
!-------------------------------------------------------------------------
print*, 'We are in compute subroutine'
r= 00.0
fGlob=sum(f,dim=5)
r=sum(f, dim=4)
print*, 'fGlob=', fGlob(1,1,1, 1)
print*, 'f=', f(1,1,1, 0,1)
print*, 'r=', r(1,1,1, 1)
end subroutine compute
end module m
PROGRAM test_prog
USE Param
USE m
Implicit None
integer :: tStep
real (kind=dr), dimension(:,:,:, :,:), allocatable :: f
real (kind=dr), dimension(:,:,:,:), allocatable :: r
!----------------------------------------------------------------------------
! Initialise the parameters.
print*, 'beginning of the test'
! Allocate
allocate(f(x,y,z, 0:mmo,n))
allocate(r(x,y,z, n))
f=1.0_dr
! ---------------------------------------------------------
! Iteration over time
! ---------------------------------------------------------
do tStep = 1, tMax
print *, tStep
call compute(f,r)
f=f+1
print *, 'tStep', tStep
enddo
print*, 'f=', f(1,1,1, 0,1)
print*, 'r=', r(1,1,1, 1)
! Deallacation
deallocate(f)
deallocate(r)
print*, 'End of the test program'
END PROGRAM test_prog
For now, I am not able to understand why when I compile with ifort, I have a segmentation fault, and it works when I compile with gfortran. And worst, when I compile with both ifort and gfortran with their fast options, I get again a segmentation fault (core dumped) error. And more confusing, when I also tried with both compilers to compile with traceback options, everything works fine.
I know that segmentation fault (core dumped) error usually means that I try to read or write in a wrong location (matrix indices etc...); but here with this small code, I see no mistake like this.
Does anyone can help me to understand why theses errors occur?
The problem comes from the size of the stack used by some compilers by default (ifort) or by some others when they optimise the compilation (gfortran -Ofast). Here, our writings exceed the size of the stack.
To solve this, I use the options -heap-arrays for ifort compiler and -fno-stack-arrays for gfortran compiler.

Segmentation fault using SCALAPACK in Fortran? No backtrace?

I'm trying to find the eigenvalues and eigenvectors of a Hermitian matrix using SCALAPACK and MPI in Fortran. For bug-squashing, I made this program as simple as possible, but am still getting a segmentation fault. Per the answers given to people with similar questions, I've tried changing all of my integers to integer*8, and all of my reals to real*8 or real*16, but I still get this issue. Most interestingly, I don't even get a backtrace for the segmentation fault: the program hangs up when trying to give me a backtrace and has to be aborted manually.
Also, please forgive my lack of knowledge -- I'm not familiar with most program-y things but I've done my best. Here is my code:
PROGRAM easydiag
IMPLICIT NONE
INCLUDE 'mpif.h'
EXTERNAL BLACS_EXIT, BLACS_GET, BLACS_GRIDEXIT, BLACS_GRIDINFO
EXTERNAL BLACS_GRIDINIT, BLACS_PINFO,BLACS_SETUP, DESCINIT
INTEGER,EXTERNAL::NUMROC,ICEIL
REAL*8,EXTERNAL::PDLAMCH
INTEGER,PARAMETER::XNDIM=4 ! MATRIX WILL BE XNDIM BY XNDIM
INTEGER,PARAMETER::EXPND=XNDIM
INTEGER,PARAMETER::NPROCS=1
INTEGER COMM,MYID,ROOT,NUMPROCS,IERR,STATUS(MPI_STATUS_SIZE)
INTEGER NUM_DIM
INTEGER NPROW,NPCOL
INTEGER CONTEXT, MYROW, MYCOL
COMPLEX*16,ALLOCATABLE::HH(:,:),ZZ(:,:),MATTODIAG(:,:)
REAL*8:: EIG(2*XNDIM) ! EIGENVALUES
CALL MPI_INIT(ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD,myid,ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD,numprocs,ierr)
ROOT=0
NPROW=INT(SQRT(REAL(NPROCS)))
NPCOL=NPROCS/NPROW
NUM_DIM=2*EXPND/NPROW
CALL SL_init(CONTEXT,NPROW,NPCOL)
CALL BLACS_GRIDINFO( CONTEXT, NPROW, NPCOL, MYROW, MYCOL )
ALLOCATE(MATTODIAG(XNDIM,XNDIM),HH(NUM_DIM,NUM_DIM),ZZ(NUM_DIM,NUM_DIM))
MATTODIAG=0.D0
CALL MAKEHERMMAT(XNDIM,MATTODIAG)
CALL MPIDIAGH(EXPND,MATTODIAG,ZZ,MYROW,MYCOL,NPROW,NPCOL,NUM_DIM,CONTEXT,EIG)
DEALLOCATE(MATTODIAG,HH,ZZ)
CALL MPI_FINALIZE(IERR)
END
!****************************************************
SUBROUTINE MAKEHERMMAT(XNDIM,MATTODIAG)
IMPLICIT NONE
INTEGER:: XNDIM, I, J, COUNTER
COMPLEX*16:: MATTODIAG(XNDIM,XNDIM)
REAL*8:: RAND
COUNTER = 1
DO J=1,XNDIM
DO I=J,XNDIM
MATTODIAG(I,J)=COUNTER
COUNTER=COUNTER+1
END DO
END DO
END
!****************************************************
SUBROUTINE MPIDIAGH(EXPND,A,Z,MYROW,MYCOL,NPROW,NPCOL,NUM_DIM,CONTEXT,W)
IMPLICIT NONE
EXTERNAL DESCINIT
REAL*8,EXTERNAL::PDLAMCH
INTEGER EXPND,NUM_DIM
INTEGER CONTEXT
INTEGER MYCOL,MYROW,NPROW,NPCOL
COMPLEX*16 A(NUM_DIM,NUM_DIM), Z(NUM_DIM,NUM_DIM)
REAL*8 W(2*EXPND)
INTEGER N
CHARACTER JOBZ, RANGE, UPLO
INTEGER IL,IU,IA,JA,IZ,JZ
INTEGER LIWORK,LRWORK,LWORK
INTEGER M, NZ, INFO
REAL*8 ABSTOL, ORFAC, VL, VU
INTEGER DESCA(50), DESCZ(50)
INTEGER IFAIL(2*EXPND), ICLUSTR(2*NPROW*NPCOL)
REAL*8 GAP(NPROW*NPCOL)
INTEGER,ALLOCATABLE:: IWORK(:)
REAL*8,ALLOCATABLE :: RWORK(:)
COMPLEX*16,ALLOCATABLE::WORK(:)
N=2*EXPND
JOBZ='V'
RANGE='I'
UPLO='U' ! This should be U rather than L
VL=0.d0
VU=0.d0
IL=1 ! EXPND/2+1
IU=2*EXPND ! EXPND+(EXPND/2) ! HERE IS FOR THE CUTTING OFF OF THE STATE
M=IU-IL+1
ORFAC=-1.D0
IA=1
JA=1
IZ=1
JZ=1
ABSTOL=PDLAMCH( CONTEXT, 'U')
CALL DESCINIT( DESCA, N, N, NUM_DIM, NUM_DIM, 0, 0, CONTEXT, NUM_DIM, INFO )
CALL DESCINIT( DESCZ, N, N, NUM_DIM, NUM_DIM, 0, 0, CONTEXT, NUM_DIM, INFO )
LWORK = -1
LRWORK = -1
LIWORK = -1
ALLOCATE(WORK(LWORK))
ALLOCATE(RWORK(LRWORK))
ALLOCATE(IWORK(LIWORK))
CALL PZHEEVX( JOBZ, RANGE, UPLO, N, A, IA, JA, DESCA, VL, &
VU, IL, IU, ABSTOL, M, NZ, W, ORFAC, Z, IZ, &
JZ, DESCZ, WORK, LWORK, RWORK, LRWORK, IWORK, &
LIWORK, IFAIL, ICLUSTR, GAP, INFO )
LWORK = INT(ABS(WORK(1)))
LRWORK = INT(ABS(RWORK(1)))
LIWORK =INT (ABS(IWORK(1)))
DEALLOCATE(WORK)
DEALLOCATE(RWORK)
DEALLOCATE(IWORK)
ALLOCATE(WORK(LWORK))
ALLOCATE(RWORK(LRWORK))
ALLOCATE(IWORK(LIWORK))
PRINT*, LWORK, LRWORK, LIWORK
CALL PZHEEVX( JOBZ, RANGE, UPLO, N, A, IA, JA, DESCA, VL, &
VU, IL, IU, ABSTOL, M, NZ, W, ORFAC, Z, IZ, &
JZ, DESCZ, WORK, LWORK, RWORK, LRWORK, IWORK, &
LIWORK, IFAIL, ICLUSTR, GAP, INFO )
RETURN
END
The problem is with the second PZHEEVX function. I'm fairly certain that I'm using it correctly since this code is a simpler version of another more complicated code that works fine. For this purpose, I'm only using one processor.
Help!
According to this page
setting LWORK = -1 seems to request the PZHEEVX routine to return the necessary size of all the work arrays, for example,
If LWORK = -1, then LWORK is global input and a workspace query
is assumed; the routine only calculates the optimal size for
all work arrays. Each of these values is returned in the first
entry of the corresponding work array, and no error message is
issued by PXERBLA.
Similar explanations can be found for LRWORK = -1. As for IWORK,
IWORK (local workspace) INTEGER array
On return, IWORK(1) contains the amount of integer workspace
required.
but in your program the work arrays are allocated as
LWORK = -1
LRWORK = -1
LIWORK = -1
ALLOCATE(WORK(LWORK))
ALLOCATE(RWORK(LRWORK))
ALLOCATE(IWORK(LIWORK))
and after the first call of PZHEEVX, the sizes of the work arrays are obtained as
LWORK = INT(ABS(WORK(1)))
LRWORK = INT(ABS(RWORK(1)))
LIWORK =INT (ABS(IWORK(1)))
which looks inconsistent (-1 vs 1). So it will be better to modify the allocation as (*)
allocate( WORK(1), RWORK(1), IWORK(1) )
An example in this page also seems to allocate the work arrays this way. Another point of concern is that INT() is used in several places (for example, NPROW=INT(SQRT(REAL(NPROCS))), but I guess it might be better to use NINT() to avoid the effect of round-off errors.
(*) More precisely, allocation of an array with -1 is not valid because the size of an allocated array becomes 0 (thanks to #francescalus). You can verify this by printing size(a) or a(:). To prevent this kind of error, it is very useful to attach compiler options like -fcheck=all (for gfortran) or -check (for ifort).
There's a fishy piece of dimensioning in your code which can easily be responsible for the segfault. In your main program you set
EXPND=XNDIM=4
NUM_DIM=2*EXPND !NPROW==1 for a single-process test
ALLOCATE(MATTODIAG(XNDIM,XNDIM)) ! MATTODIAG(4,4)
Then you pass your MATTODIAG, the Hermitian matrix, to
CALL MPIDIAGH(EXPND,MATTODIAG,ZZ,MYROW,...)
which is in turn defined as
SUBROUTINE MPIDIAGH(EXPND,A,Z,MYROW,...)
COMPLEX*16 A(NUM_DIM,NUM_DIM) ! A(8,8)
This is already an inconsistency, which can mess up the computations in that subroutine (even without having a segfault). Furthermore, the subroutine along with scalapack thinks that A is of size (8,8), instead of (4,4) which you allocated in the main program, allowing the subroutine to overrun available memory.