The FFT result is keep diverging..(Fortran/ MKL) - fortran

I tried to conduct 2-Dimensional FFT with visual Fortran and Intel MKL.
But the result keeps blowing up. How can I fix it?
I would appreciate it if you could answer my question.
Here is my code:
\\\\\
First, I created a sinusoidal wave,
real(8), allocatable:: Wave_ini_2D(:,:) ! [nx, nt] dimensional
with duration x:[-40,40], t:[0:10] with resolution nx and nt, respectively.
And the wave is passed to a subroutine as follows,
subroutine FFTF2_R2C(nx, nt, L1, Wave_ini_2D, Wave_fft_2D)
integer, intent(in):: L1(2), nx, nt
real(8), intent(in):: Wave_ini_2D(nx, nt)
complex(8), intent(out):: Wave_fft_2D(L1(1), L1(2))
type(DFTI_DESCRIPTOR), POINTER :: My_Desc2_Handle ! FFT handler
real(8):: Win(nx*nt)
integer:: status, i
Win = reshape(Wave_ini_2D, shape(Win))
Status = DftiCreateDescriptor( My_Desc2_Handle, DFTI_SINGLE, DFTI_REAL, 2, L1)
Status = DftiCommitDescriptor( My_Desc2_Handle)
Status = DftiComputeForward( My_Desc2_Handle, Win)
Status = DftiFreeDescriptor(My_Desc2_Handle)
Wave_fft_2D = reshape(Win, shape(Wave_fft_2D))
end subroutine

Related

Gathering data through MPI

I use MPI_Gather command to collect data from each processor but got the following error information (The 523rd line in MAINp.f90 has error).
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
sot 0000000000427FD3 Unknown Unknown Unknown
libpthread-2.26.s 00002AAAB0D1C2F0 Unknown Unknown Unknown
sot 000000000041D2AE MAIN__ 523 MAINp.f90
sot 0000000000409B92 Unknown Unknown Unknown
libc-2.26.so 00002AAAB115034A __libc_start_main Unknown Unknown
sot 0000000000409AAA Unknown Unknown Unknown
srun: error: nid01236: task 19: Exited with exit code 174
srun: Terminating job step 14213926.0
slurmstepd: error: *** STEP 14213926.0 ON nid01236 CANCELLED AT 2020-04-23T06:53:35 ***
I do not know why it is wrong. I just want to collect data from each processor. I only put part of my MAINp.F90 below and the error line follows the label (!THIS IS THE ERROR LINE). Would anyone please give me some suggestions? Thank you.
PROGRAM MAIN
USE MPI
USE CAL
IMPLICIT NONE
!Variables for setting up the parameters in INPUT.dat file
CHARACTER (LEN=50) :: na(6) !Array to store the names of Hamiltonian files from wannier90
DOUBLE PRECISION :: an !Angel interval
INTEGER :: km(2) !k point mesh
INTEGER :: vd !Velocity direction of the Hamiltonian matrix
DOUBLE PRECISION :: fermi !Fermi energy value
DOUBLE PRECISION :: wf !Energy window
DOUBLE PRECISION :: bv !Broadening value
DOUBLE PRECISION :: pi !pi
DOUBLE PRECISION :: hb !h_bar
DOUBLE PRECISION :: es !Electron volt
!
!Variables for parameters in '.wout' file
INTEGER :: sta !Status of files
DOUBLE PRECISION :: rea_c(3,3) !Lattice constant of unit cell in real space
DOUBLE PRECISION :: rec_c(3,3) !Vectors of unit cell in the reciprocal space
!
!Variables for parameters in Hamiltonian ('_hr.dat') file from wannier90
INTEGER :: nu_wa !Number of wannier function
INTEGER :: nu_nr(5) !Number of Wigner-Seitz grid point
INTEGER, ALLOCATABLE :: nd1(:) !Degeneracy of each Wigner-Seitz grid point with magnetizaiton along z axis
INTEGER, ALLOCATABLE :: nd2(:) !Degeneracy of each Wigner-Seitz grid point with magnetizaiton along different axes
INTEGER, ALLOCATABLE :: nd3(:) !Degeneracy of each Wigner-Seitz grid point with magnetizaiton along different axes
INTEGER, ALLOCATABLE :: nd4(:) !Degeneracy of each Wigner-Seitz grid point with magnetizaiton along different axes
INTEGER, ALLOCATABLE :: nd5(:) !Degeneracy of each Wigner-Seitz grid point with magnetizaiton along different axes
DOUBLE PRECISION, ALLOCATABLE :: hr1(:,:) !Array to store the Hamitlonian matrix information in '_hr.dat' file, magnetization along z axis
DOUBLE PRECISION, ALLOCATABLE :: hr2(:,:) !Array to store the Hamitlonian matrix information in '_hr.dat' file, magnetization along other axes
DOUBLE PRECISION, ALLOCATABLE :: hr3(:,:) !Array to store the Hamitlonian matrix information in '_hr.dat' file, magnetization along other axes
DOUBLE PRECISION, ALLOCATABLE :: hr4(:,:) !Array to store the Hamitlonian matrix information in '_hr.dat' file, magnetization along other axes
DOUBLE PRECISION, ALLOCATABLE :: hr5(:,:) !Array to store the Hamitlonian matrix information in '_hr.dat' file, magnetization along other axes
!
!Internal variables
INTEGER :: i, j, k, l, n !Integer for loop
CHARACTER (LEN=100) :: str !String for transitting data
DOUBLE PRECISION :: tr(3) !Array for transitting data
DOUBLE PRECISION, ALLOCATABLE :: kp(:,:) !Array to store the Cartesian coordinate of k-point mesh
DOUBLE PRECISION, ALLOCATABLE :: ka(:,:,:) !Array to store the Cartesian coordiantes of all k points
DOUBLE COMPLEX, ALLOCATABLE :: tb(:,:) !Array to store the extracted tight binding Hamiltonian matrix
DOUBLE COMPLEX, ALLOCATABLE :: ec(:,:) !Array to store the Eigen vector matrix
DOUBLE PRECISION, ALLOCATABLE :: ev(:,:) !Array to store the Eigen value on single k point
DOUBLE PRECISION :: dk(2) !Array to store the Delta kx and ky
INTEGER :: nb !Number of valence band
DOUBLE PRECISION :: me !Minimum eigen value
DOUBLE COMPLEX, ALLOCATABLE :: u_s1(:,:) !Array to store the contribution of each eigen state to the total spin orbit torque
DOUBLE COMPLEX, ALLOCATABLE :: u_s2(:,:) !Array to store the contribution of each eigen state to the total spin orbit torque
DOUBLE COMPLEX, ALLOCATABLE :: u_t1(:,:) !Array to collect the contribution of each eigen state to the total spin orbit torque from all processors
DOUBLE COMPLEX, ALLOCATABLE :: u_t2(:,:) !Array to collect the contribution of each eigen state to the total spin orbit torque from all processors
DOUBLE COMPLEX :: sr1 !Sum of Femri surface part for spin orbit torque on all km(1) k points
DOUBLE COMPLEX :: sr2 !Sum of Femri surface part for spin orbit torque on all km(1) k points
DOUBLE COMPLEX, ALLOCATABLE :: crr1_all(:) !Array of ct
DOUBLE COMPLEX, ALLOCATABLE :: crr2_all(:) !Array of ct
DOUBLE COMPLEX :: crr1 !Sum of conductivity on all k points
DOUBLE COMPLEX :: crr2 !Sum of conductivity on all k points
DOUBLE COMPLEX :: crr1_total !Sum of conductivity
DOUBLE COMPLEX :: crr2_total !Sum of conductivity
DOUBLE PRECISION, ALLOCATABLE, TARGET :: nme(:) !Array to store the minimum eigen value
INTEGER, ALLOCATABLE, TARGET :: nnb(:) !Array to store the number of valence band
DOUBLE PRECISION, POINTER :: p1 !Pointer used to find the minimum eigen value
INTEGER, POINTER :: p2 !Pointer used to find the number of valence band
!
!Parameters for timer
INTEGER :: cr, t00, t0, t !Timer variables
DOUBLE PRECISION :: ra !Timer rate
!Parameters for MPI
INTEGER :: world_size !MPI
INTEGER :: world_rank, ierr !MPI
INTEGER :: irank, j0 !MPI
!
!Initializing MPI
CALL MPI_Init(ierr)
CALL MPI_Comm_size(MPI_COMM_WORLD, world_size, ierr)
CALL MPI_Comm_rank(MPI_COMM_WORLD, world_rank, ierr)
!
!Allocating the array used to store the contribution of each eigen state to the total spin orbit torque
ALLOCATE (u_s1(2,nu_wa*km(1)))
ALLOCATE (u_s2(2,nu_wa*km(1)))
!
!Initialising array used to store the total conductivity
cr = CMPLX(0.0d0, 0.0d0)
!
!Allocating array to collect the contribution of each eigen state to the total spin orbit torque from all processors
IF (world_rank .EQ. 0) THEN
ALLOCATE (u_t1(2,nu_wa*km(1)*km(2)))
ALLOCATE (u_t2(2,nu_wa*km(1)*km(2)))
END IF
u_t1 = CMPLX(0.0d0, 0.0d0)
u_t2 = CMPLX(0.0d0, 0.0d0)
!
!Allocating array to collect the number of valence band and the minimum eigen value
IF (world_rank .EQ. 0) THEN
ALLOCATE (nme(km(2)))
ALLOCATE (nnb(km(2)))
END IF
nme = 0.0d0
nnb = 0
!
!Reading the Cartesian coordinates of k-point mesh
DO j = 1, km(2), 1
IF (mod(j-1, world_size) .NE. world_rank) CYCLE
DO k = 1, km(1), 1
kp(k,:) = ka(j,k,:)
END DO
!Building up Hamiltonian matrix on k points and diagonalising the matrix to obtain Eigen vectors and values
CALL HAMSUR(vd,kp,nu_wa,nu_nr,km(1),nd1,nd2,nd3,nd4,nd5,hr1,hr2,hr3,hr4,hr5,tb,ec,ev,fermi,an,wf,bv,dk,u_s1,u_s2,sr1,sr2,nb,me)
!
!THIS IS THE ERROR LINE
CALL MPI_Gather(u_s1, 2*nu_wa*km(1), MPI_DOUBLE_COMPLEX, u_t1(1:2,1+nu_wa*km(1)*(j-1):nu_wa*km(1)*j),&
2*nu_wa*km(1), MPI_DOUBLE_COMPLEX, 0, MPI_COMM_WORLD, ierr)
CALL MPI_Gat**her(u_s2, 2*nu_wa*km(1), MPI_DOUBLE_COMPLEX, u_t2(1:2,1+nu_wa*km(1)*(j-1):nu_wa*km(1)*j),&
2*nu_wa*km(1), MPI_DOUBLE_COMPLEX, 0, MPI_COMM_WORLD, ierr)
crr1 = crr1 + sr1
crr2 = crr2 + sr2
CALL MPI_Gather(me, 1, MPI_DOUBLE, nme(j), 1, MPI_INT, 0, MPI_COMM_WORLD, ierr)
CALL MPI_Gather(nb, 1, MPI_INT, nnb(j), 1, MPI_INT, 0, MPI_COMM_WORLD, ierr)
END DO
!
CALL MPI_Barrier(MPI_COMM_WORLD, ierr)
IF (world_rank .EQ. 0) THEN
ALLOCATE (crr1_all(world_size))
ALLOCATE (crr2_all(world_size))
END IF
crr1_all = CMPLX(0.0d0, 0.0d0)
crr2_all = CMPLX(0.0d0, 0.0d0)
CALL MPI_Gather(crr1, 1, MPI_double_complex, crr1_all, 1, MPI_double_complex, 0, MPI_COMM_WORLD, ierr)
CALL MPI_Gather(crr2, 1, MPI_double_complex, crr2_all, 1, MPI_double_complex, 0, MPI_COMM_WORLD, ierr)
!Writing total conductivity value into the file
IF (world_rank .EQ. 0) THEN
crr1_total = CMPLX(0.0d0, 0.0d0)
crr2_total = CMPLX(0.0d0, 0.0d0)
DO i = 1, world_size, 1
crr1_total = crr1_total + crr1_all(i)
crr2_total = crr2_total + crr2_all(i)
END DO
!Finding the minimum eigen value
NULLIFY (p1, p2)
p1 => nme(1)
p2 => nnb(1)
DO i = 2, km(2), 1
IF (p1 .GE. nme(i)) THEN
p1 => nme(i)
END IF
IF (p2 .LE. nnb(i)) THEN
p2 => nnb(i)
END IF
END DO
WRITE (UNIT=14, FMT='(A27,$)') 'The minimum eigen value is:'
WRITE (UNIT=14, FMT=*) p1
WRITE (UNIT=14, FMT='(A30,$)') 'The number of valence band is:'
WRITE (UNIT=14, FMT=*) p2
!
!Constant for the coefficient
pi = DACOS(-1.0d0)
hb = 1.054571817d-34 !(unit - J)
es = 1.602176634d-19 !(unit - J*s)
!
END IF
!
IF (world_rank .EQ. 0) THEN
DEALLOCATE (crr1_all)
DEALLOCATE (crr2_all)
END IF
!Finalising MPI
CALL MPI_Finalize(ierr)
!
!Deallocating array that sotres and collect the fermi-surface-part contribution of each eigen state to the total spin orbit torque
DEALLOCATE (u_s1)
DEALLOCATE (u_s2)
DEALLOCATE (u_t1)
DEALLOCATE (u_t2)
!
STOP
END PROGRAM MAIN

System of linear equations in fortran using DGESV [duplicate]

I'm struggling with LAPACK's dgetrf and dgetri routines. Below is a subroutine I've created (the variable fit_coeffs is defined externally and is allocatable, it's not the problem). When I run I get memory allocation errors, that appear when I assign fit_coeffs, due to the matmul(ATA,AT) line. I know this from inserting a bunch of print statements. Also, both error checking statements after calls to LAPACK subroutines are printed, suggesting an error.
Does anyone understand where this comes from? I'm compiling using the command:
gfortran -Wall -cpp -std=f2003 -ffree-form -L/home/binningtont/lapack-3.4.0/ read_grib.f -llapack -lrefblas.
Thanks in advance!
subroutine polynomial_fit(x_array, y_array, D)
integer, intent(in) :: D
real, intent(in), dimension(:) :: x_array, y_array
real, allocatable, dimension(:,:) :: A, AT, ATA
real, allocatable, dimension(:) :: work
integer, dimension(:), allocatable :: pivot
integer :: l, m, n, lda, lwork, ok
l = D + 1
lda = l
lwork = l
allocate(fit_coeffs(l))
allocate(pivot(l))
allocate(work(l))
allocate(A(size(x_array),l))
allocate(AT(l,size(x_array)))
allocate(ATA(l,l))
do m = 1,size(x_array),1
do n = 1,l,1
A(m,n) = x_array(m)**(n-1)
end do
end do
AT = transpose(A)
ATA = matmul(AT,A)
call dgetrf(l, l, ATA, lda, pivot, ok)
! ATA is now represented as PLU (permutation, lower, upper)
if (ok /= 0) then
write(6,*) "HERE"
end if
call dgetri(l, ATA, lda, pivot, work, lwork, ok)
! ATA now contains the inverse of the matrix ATA
if (ok /= 0) then
write(6,*) "HERE"
end if
fit_coeffs = matmul(matmul(ATA,AT),y_array)
deallocate(pivot)
deallocate(fit_coeffs)
deallocate(work)
deallocate(A)
deallocate(AT)
deallocate(ATA)
end subroutine polynomial_fit
1) Where is fit_coeffs declared? I can't see how the above can even compile
1b) Implicit None is your friend!
2) You do have an interface in scope at the calling point, don't you?
3) dgertf and dgetri want "double precision" while you have single. So you need sgetrf and sgetri
"Fixing" all these and completeing the program I get
Program testit
Implicit None
Real, Dimension( 1:100 ) :: x, y
Integer :: D
Interface
subroutine polynomial_fit(x_array, y_array, D)
Implicit None ! Always use this!!
integer, intent(in) :: D
real, intent(in), dimension(:) :: x_array, y_array
End subroutine polynomial_fit
End Interface
Call Random_number( x )
Call Random_number( y )
D = 6
Call polynomial_fit( x, y, D )
End Program testit
subroutine polynomial_fit(x_array, y_array, D)
Implicit None ! Always use this!!
integer, intent(in) :: D
real, intent(in), dimension(:) :: x_array, y_array
real, allocatable, dimension(:,:) :: A, AT, ATA
real, allocatable, dimension(:) :: work, fit_coeffs
integer, dimension(:), allocatable :: pivot
integer :: l, m, n, lda, lwork, ok
l = D + 1
lda = l
lwork = l
allocate(fit_coeffs(l))
allocate(pivot(l))
allocate(work(l))
allocate(A(size(x_array),l))
allocate(AT(l,size(x_array)))
allocate(ATA(l,l))
do m = 1,size(x_array),1
do n = 1,l,1
A(m,n) = x_array(m)**(n-1)
end do
end do
AT = transpose(A)
ATA = matmul(AT,A)
call sgetrf(l, l, ATA, lda, pivot, ok)
! ATA is now represented as PLU (permutation, lower, upper)
if (ok /= 0) then
write(6,*) "HERE"
end if
call sgetri(l, ATA, lda, pivot, work, lwork, ok)
! ATA now contains the inverse of the matrix ATA
if (ok /= 0) then
write(6,*) "HERE"
end if
fit_coeffs = matmul(matmul(ATA,AT),y_array)
deallocate(pivot)
deallocate(fit_coeffs)
deallocate(work)
deallocate(A)
deallocate(AT)
deallocate(ATA)
end subroutine polynomial_fit
This runs to completion. If I omit the interface I get "HERE" printed twice. If I use the d versions I get seg faults.
Does this answer your question?

Using mpi_scatterv with 4D fortran array

I'm trying to break up a 4D array over the third dimension, and send to each node using MPI. Basically, I'm computing derivatives of a matrix, Cpq, with respect to atom positions in each of the three cartesian directions. Cpq is of size nat_sl x nat_sl, so dCpqdR is of size nat_sl x nat_sl x nat x 3. At the end of the day, for ever s,i pair, I have to compute the matrix product of dCpqdR between the transpose of the eigenvectors of Cpq and the eigenvectors of Cpq like so:
temp = MATMUL(TRANSPOSE(Cpq), MATMUL(dCpqdR(:, :, s, i), Cpq))
This is fine, but as it turns out, the loop over s and i is now by far the slow part of my code. Because each can be done independently, I was hoping that I could break up dCpqdR, and give each task it's own s, i to compute the derivative of. That is, I'd like task 1 to get dCpqdR(:,:,1,1), task 2 to get dCpqdR(:,:,1,2), etc.
I've got this working in some sense by using a buffered send/recv pair of calls. The root node allocates a temporary array, fills it, sends to the relevant nodes, and the relevant nodes do their computations as they wish. This is fine, but can be slow and memory inefficient. I'd ideally like to break it up in a more memory efficient way.
The logical thing to do, then, is to use mpi_scatterv, but here is where I start running into trouble, as I'm having trouble figuring out the memory layout for this. I've written this, so far:
call mpi_type_create_subarray(4, (/ nat_sl, nat_sl, nat, 3 /), (/nat_sl, nat_sl, n_pairs(me_image+1), 3/),&
(/0, 0, 0, 0/), mpi_order_fortran, mpi_double_precision, subarr_typ, ierr)
call mpi_type_commit(subarr_typ, ierr)
call mpi_scatterv(dCpqdR, n_pairs(me_image+1), f_displs, subarr_typ,&
my_dCpqdR, 3*nat_sl*3*nat_sl*3*n_pairs(me_image+1), subarr_typ,&
root_image, intra_image_comm, ierr)
I've computed n_pairs using this subroutine:
subroutine mbdvdw_para_init_int_forces()
implicit none
integer :: p, s, i, counter, k, cpu_ind
integer :: num_unique_rpq, n_pairs_per_proc, cpu
real(dp) :: Rpq(3), Rpq_norm, current_val
num_pairs = nat
if(.not.allocated(f_cpu_id)) allocate(f_cpu_id(nat, 3))
n_pairs_per_proc = floor(dble(num_pairs)/nproc_image)
cpu = 0
n_pairs = 0
counter = 1
p = 1
do counter = 0, num_pairs-1, 1
n_pairs(modulo(counter, nproc_image)+1) = n_pairs(modulo(counter, nproc_image)+1) + 1
end do
do s = 1, nat, 1
f_cpu_id(s) = cpu
if((counter.lt.num_pairs)) then
if(p.eq.n_pairs(cpu+1)) then
cpu = cpu + 1
p = 0
end if
end if
p = p + 1
end do
call mp_set_displs( n_pairs, f_displs, num_pairs, nproc_image)
f_displs = f_displs*nat_sl*nat_sl*3
end subroutine mbdvdw_para_init_int_forces
and the full method for the matrix multiplication is
subroutine mbdvdw_interacting_energy(energy, forcedR, forcedh, forcedV)
implicit none
real(dp), intent(out) :: energy
real(dp), dimension(nat, 3), intent(out) :: forcedR
real(dp), dimension(3,3), intent(out) :: forcedh
real(dp), dimension(nat), intent(out) :: forcedV
real(dp), dimension(3*nat_sl, 3*nat_sl) :: temp
real(dp), dimension(:,:,:,:), allocatable :: my_dCpqdR
integer :: num_negative, i_atom, s, i, j, counter
integer, parameter :: eigs_check = 200
integer :: subarr_typ, ierr
! lapack work variables
integer :: LWORK, errorflag
real(dp) :: WORK((3*nat_sl)*(3+(3*nat_sl)/2)), eigenvalues(3*nat_sl)
call start_clock('mbd_int_energy')
call mp_sum(Cpq, intra_image_comm)
eigenvalues = 0.0_DP
forcedR = 0.0_DP
energy = 0.0_DP
num_negative = 0
forcedV = 0.0_DP
errorflag=0
LWORK=3*nat_sl*(3+(3*nat_sl)/2)
call DSYEV('V', 'U', 3*nat_sl, Cpq, 3*nat_sl, eigenvalues, WORK, LWORK, errorflag)
if(errorflag.eq.0) then
do i_atom=1, 3*nat_sl, 1
!open (unit=eigs_check, file="eigs.tmp",action="write",status="unknown",position="append")
! write(eigs_check, *) eigenvalues(i_atom)
!close(eigs_check)
if(eigenvalues(i_atom).ge.0.0_DP) then
energy = energy + dsqrt(eigenvalues(i_atom))
else
num_negative = num_negative + 1
end if
end do
if(num_negative.ge.1) then
write(stdout, '(3X," WARNING: Found ", I3, " Negative Eigenvalues.")'), num_negative
end if
else
end if
energy = energy*nat/nat_sl
!!!!!!!!!!!!!!!!!!!!
! Forces below here. There's going to be some long parallelization business.
!!!!!!!!!!!!!!!!!!!!
call start_clock('mbd_int_forces')
if(.not.allocated(my_dCpqdR)) allocate(my_dCpqdR(nat_sl, nat_sl, n_pairs(me_image+1), 3)), my_dCpqdR = 0.0_DP
if(mbd_vdw_forces) then
do s=1,nat,1
if(me_image.eq.(f_cpu_id(s)+1)) then
do i=1,3,1
temp = MATMUL(TRANSPOSE(Cpq), MATMUL(my_dCpqdR(:, :, counter, i), Cpq))
do j=1,3*nat_sl,1
if(eigenvalues(j).ge.0.0_DP) then
forcedR(s, i) = forcedR(s, i) + 1.0_DP/(2.0_DP*dsqrt(eigenvalues(j)))*temp(j,j)
end if
end do
end do
counter = counter + 1
end if
end do
forcedR = forcedR*nat/nat_sl
do s=1,3,1
do i=1,3,1
temp = MATMUL(TRANSPOSE(Cpq), MATMUL(dCpqdh(:, :, s, i), Cpq))
do j=1,3*nat_sl,1
if(eigenvalues(j).ge.0.0_DP) then
forcedh(s, i) = forcedh(s, i) + 1.0_DP/(2.0_DP*dsqrt(eigenvalues(j)))*temp(j,j)
end if
end do
end do
end do
forcedh = forcedh*nat/nat_sl
call mp_sum(forcedR, intra_image_comm)
call mp_sum(forcedh, intra_image_comm)
end if
call stop_clock('mbd_int_forces')
call stop_clock('mbd_int_energy')
return
end subroutine mbdvdw_interacting_energy
But when run, it's complaining that
[MathBook Pro:58100] *** An error occurred in MPI_Type_create_subarray
[MathBook Pro:58100] *** reported by process [2560884737,2314885530279477248]
[MathBook Pro:58100] *** on communicator MPI_COMM_WORLD
[MathBook Pro:58100] *** MPI_ERR_ARG: invalid argument of some other kind
[MathBook Pro:58100] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[MathBook Pro:58100] *** and potentially your MPI job)
so something is going wrong, but I have no idea what. I know my description is somewhat sparse to start with, so please let me know what information would be necessary to help.

FFTW, simple derivative in fortran

I am struggling with this simple derivative of a sine with FFTW. At a first look it seems ok but then there is a quite big error (5e-6) when compared with the exact solution...
I do see that after taking the c2r the complex input is all messed up, but it seems to me that same complex input is the cause of my problem... What am I doing wrong? I didn't use any pointer and tried to keep everything as simple as possible; still I can't figure out what's wrong.
Any help is appreciated! Thanks!!!
program main
! C binding
use, intrinsic :: iso_c_binding
implicit none
double precision, parameter :: pi = 4*ATAN(1.0)
complex, parameter :: ii =(0.0,1.0)
integer(C_INT), parameter :: Nx = 32
integer(C_INT), parameter :: Ny = Nx
double precision, parameter :: Lx = 2*pi, Ly = 2*pi
! Derived paramenter
double precision, parameter :: dx = Lx/Nx, dy = Ly/Ny
real(C_DOUBLE), dimension(Nx,Ny) :: x,y, u0,in,dudx,dudxE, errdU
real(C_DOUBLE), dimension(Nx/2+1,Ny) :: kx, ky
! Fourier space variables
complex(C_DOUBLE_COMPLEX), dimension(Nx/2+1,Ny) :: u_hat_x, out
! indices
integer :: i, j
!---FFTW plans
type(C_PTR) :: pf, pb
! FFTW include
include 'fftw3.f03'
write(*,'(A)') 'Starting...'
! Grid
forall(i=1:Nx,j=1:Ny)
x(i,j) = (i-1)*dx
y(i,j) = (j-1)*dy
end forall
! Compute the wavenumbers
forall(i=1:Nx/2,j=1:Ny) kx(i,j)=2*pi*(i-1)/Lx
kx(Nx/2+1,:) = 0.0
forall(i=1:Nx/2+1,j=1:Ny/2) ky(i,j)=2*pi*(j-1)/Ly
forall(i=1:Nx/2+1,j=Ny/2+1:Ny) ky(i,j)=2*pi*(j-Ny-1)/Ly
! Initial Condition
u0 = sin(2*x)
dudxE = 2*cos(2*x)
! Go to Fourier Space
in = u0
pf = fftw_plan_dft_r2c_2d(Ny, Nx, in,out ,FFTW_ESTIMATE)
call fftw_execute_dft_r2c(pf,in,out)
u_hat_x = out
! Derivative
out = ii*kx*out
! Back to physical space
pb = fftw_plan_dft_c2r_2d(Ny, Nx, out,in,FFTW_ESTIMATE)
call fftw_execute_dft_c2r(pb,out,in)
! rescale
dudx = in/Nx/Ny
! check the derivative
errdU = dudx - dudxE
! Write file
write(*,*) 'Writing to files...'
OPEN(UNIT=1, FILE="out_for.dat", ACTION="write", STATUS="replace", &
FORM="unformatted")
WRITE(1) kx,u0,dudx,errdU,abs(u_hat_x)
CLOSE(UNIT=1)
call fftw_destroy_plan(pf)
call fftw_destroy_plan(pb)
write(*,'(A)') 'Done!'
end program main
steabert was right! the problem was simply with the pi definition which was only single precision! Thank you so much!

FORTRAN ZGEEV, all 0 eigenvalues

I am trying to get the ZGEEV routine in Lapack to work for a test problem and having some difficulties. I just started coding in FORTRAN a week ago, so I think it is probably something very trivial that I am missing.
I have to diagonalize rather large complex symmetric matrices. To get started, using Matlab I created a 200 by 200 matrix, which I have verified is diagonalizable. When I run the code, it brings up no errors and the INFO = 0, suggesting a success. However, all the eigenvalues are (0,0) which I know is wrong.
Attached is my code.
PROGRAM speed_zgeev
IMPLICIT NONE
INTEGER(8) :: N
COMPLEX*16, DIMENSION(:,:), ALLOCATABLE :: MAT
INTEGER(8) :: INFO, I, J
COMPLEX*16, DIMENSION(:), ALLOCATABLE :: RWORK
COMPLEX*16, DIMENSION(:), ALLOCATABLE :: D
COMPLEX*16, DIMENSION(1,1) :: VR, VL
INTEGER(8) :: LWORK = -1
COMPLEX*16, DIMENSION(:), ALLOCATABLE :: WORK
DOUBLE PRECISION :: RPART, IPART
EXTERNAL ZGEEV
N = 200
ALLOCATE(D(N))
ALLOCATE(RWORK(2*N))
ALLOCATE(WORK(N))
ALLOCATE(MAT(N,N))
OPEN(UNIT = 31, FILE = "newmat.txt")
OPEN(UNIT = 32, FILE = "newmati.txt")
DO J = 1,N
DO I = 1,N
READ(31,*) RPART
READ(32,*) IPART
MAT(I,J) = CMPLX(RPART, IPART)
END DO
END DO
CLOSE(31)
CLOSE(32)
CALL ZGEEV('N','N', N, MAT, N, D, VL, 1, VR, 1, WORK, LWORK, RWORK, INFO)
INFO = WORK(1)
DEALLOCATE(WORK)
ALLOCATE(WORK(INFO))
CALL ZGEEV('N','N', N, MAT, N, D, VL, 1, VR, 1, WORK, LWORK, RWORK, INFO)
IF (INFO .EQ. 0) THEN
PRINT*, D(1:10)
ELSE
PRINT*, INFO
END IF
DEALLOCATE(MAT)
DEALLOCATE(D)
DEALLOCATE(RWORK)
DEALLOCATE(WORK)
END PROGRAM speed_zgeev
I have tried the same code on smaller matrices, of size 30 by 30 and they work fine. Any help would be appreciated! Thanks.
I forgot to mention that I am loading the matrices from a test file which I have verified to be working right.
Maybe LWORK = WORK (1) instead of INFO = WORK(1)? Also change ALLOCATE(WORK(INFO)).