PGI, OpenMP and namelist in Fortran - fortran

I encounter a problem while trying to read a namelist in a Fortran program, using OpenMP and the Portland Group compiler.
What I am trying to do is simple: I call a read_namelist subroutine in a SINGLE region, where I initialize the parameters I want to read from the namelist, and then I open, read, close the namelist. The parameters I'm reading in the namelist are threadprivate, and I spread them to the other threads after the reading.
While it works perfectly well with GNU and Intel compilers, it fails with PGI and I cannot get why. I get no error, but the read parameters are equal to the default parameters, not the ones I read from the namelist.
Here is an example of what I am trying to do:
program read_input
!$ use OMP_LIB
use params
implicit none
integer :: rank=0, nthreads=1
!$OMP PARALLEL DEFAULT(PRIVATE)
!$ rank = OMP_GET_THREAD_NUM()
!$ nthreads = OMP_GET_NUM_THREADS()
!$OMP SINGLE
print*, 'There is ', nthreads, ' threads running'
call read_nml
!$OMP END SINGLE COPYPRIVATE(nx, ny, nz)
print*, 'Rank: ', rank
print*, 'nx, ny, nz: ', nx, ny, nz
!$OMP END PARALLEL
contains
subroutine read_nml
use params
implicit none
namelist /input_params/ nx, ny, nz
call default_parameters
print*, 'nx, ny, nz (default): ', nx, ny, nz
open(unit=1, file='input', status='old')
read(1, input_params)
close(1)
print*, 'nx, ny, nz (read): ', nx, ny, nz
return
end subroutine read_nml
subroutine default_parameters
use params
implicit none
nx = 2; ny = 2; nz = 2
return
end subroutine default_parameters
end program read_input
the module params is very simple and contains only:
module params
integer :: nx, ny, nz
!$OMP THREADPRIVATE(nx, ny, nz)
end module params
Compiling with pgfortran, here is the output I get (with 2 threads):
Start program: read_input
There is 2 threads running
nx, ny, nz (default): 2 2 2
Rank: 0
nx, ny, nz: 2 2 2
Rank: 1
nx, ny, nz: 2 2 2
And if I compile the same piece of code with Intel or GNU compilers (still with 2 threads):
Start program: read_input
There is 2 threads running
nx, ny, nz (default): 2 2 2
nx, ny, nz (read): 10 10 10
Rank: 0
nx, ny, nz: 10 10 10
Rank: 1
nx, ny, nz: 10 10 10
Any thought or hint will be appreciated!

I don't get the exact reason, but at least I found a workaround, playing with the code.
If the parameters read in the namelist are private in the subroutine, they can be read with no problem; therefore replacing
call read_nml
by
call read_nml(nx, ny, nz)
and the subroutine read_nml by
subroutine read_nml(nx, ny, nz)
implicit none
integer :: nx, ny, nz
namelist /input_params/ nx, ny, nz
call default_parameters
print*, 'nx, ny, nz (default): ', nx, ny, nz
open(unit=1, file='input', status='old')
read(1, input_params)
close(1)
print*, 'nx, ny, nz (read): ', nx, ny, nz
return
end subroutine read_nml
works well. I guess it is once again a problem of status (private) of the attributes, but I don't get why Intel and GNU compilers handled it with no problem whereas PGI compiler is not able to handle it. Actually, that is why the parameters were threadprivate in my module, to prevent this kind of behaviour. If somebody can give me a better answer than the one I have, I'm still interested!

Related

OpenACC: Why updating an array depends on the location of the update directive

I'm new to openacc. I'm trying to use it to accelerate a particle code. However, I don't understand why when updating an array (eta in the program below) on the host, it gives different results depending on the location of '!$acc update self'. Here is a code that re-produce this problem:
program approximateFun
use funs
use paras_mod
integer :: Nx
real(dp) :: dx
real(dp), dimension(:), allocatable :: x, eta
real(dp), dimension(5) :: xp, fAtxp
!$acc declare create(Nx)
!$acc declare create(x)
!$acc declare create(eta)
!$acc declare create(dx)
!$acc declare create(fAtxp)
!$acc declare create(xp)
Nx = 16
!$acc update device(Nx)
xp = (/3.9, 4.1, 4.5, 5.0, 5.6/)
!$acc update device(xp)
allocate(x(1 : Nx))
allocate(eta(1 : Nx))
eta = 0.0d0
dx = 2 * pi / (Nx - 1)
!$acc update device(dx)
do i = 1, Nx
x(i) = (i - 1.0d0) * dx
end do
!$acc update device(x)
call calc_etaVec(x, Nx, eta)
!$acc update self(eta) ! gives the correct results
!$acc parallel loop present(dx, xp, eta, fAtxp)
do i = 1, 5
call calcFunAtx(xp(i), dx, eta, fAtxp(i))
end do
!$acc update self (fAtxp)
!!$acc update self(eta) !---> gives wrong result
write(6, *) 'eta', eta
do i = 1, 5
write(6, *) 'xp, fAtxp', xp(i), fAtxp(i)
end do
deallocate(x)
deallocate(eta)
end program approximateFun
The previous program uses the following modules
MODULE funs
use paras_mod
implicit none
CONTAINS
subroutine calc_etaVec(x, nx, eta)
integer, intent(in) :: nx
real(dp), dimension(:), intent(in) :: x
real(dp), dimension(:), intent(out) :: eta
integer :: i
!$acc parallel loop present(x, eta)
do i = 1, nx
eta(i) = sin(x(i))
end do
end subroutine
subroutine calcFunAtx(xp, dx, eta, fAtx)
real(dp), intent(in) :: xp, dx
real(dp), dimension(:), intent(in) :: eta
real(dp), intent(out) :: fAtx
integer :: idx
!$acc routine seq
idx = 1 + floor(xp / dx)
fAtx = eta(idx)
end subroutine calcFunAtx
END MODULE
and
module paras_mod
implicit none
save
INTEGER, PARAMETER :: dp = selected_real_kind(14,300)
REAL(dp), PARAMETER :: PI=4.0d0*atan(1.0d0)
end module paras_mod
When using !$acc update self(eta) directly after call calc_etaVec(x, Nx, eta), eta is updated correctly. But when used after the loop, only the first five elements are correct, while the remaining are zeros. What are the reasons behind that?
thanks
The output when !$acc update self(eta) is used directly after call calc_etaVec(x, Nx, eta) is
0.000000000000000 0.4067366430758002
0.7431448254773941 0.9510565162951535 0.9945218953682734
0.8660254037844388 0.5877852522924732 0.2079116908177597
-0.2079116908177591 -0.5877852522924730 -0.8660254037844384
-0.9945218953682732 -0.9510565162951536 -0.7431448254773946
-0.4067366430758009 -2.4492935982947064E-016
which is correct. However, when used after the loop, the output is
0.000000000000000 0.4067366430758002
0.7431448254773941 0.9510565162951535 0.9945218953682734
0.000000000000000 0.000000000000000 0.000000000000000
0.000000000000000 0.000000000000000 0.000000000000000
0.000000000000000 0.000000000000000 0.000000000000000
0.000000000000000 0.000000000000000
This one was perplexing till I determined that its a compiler error. I've filed a problem report, TPR #32673, and sent it to engineering for review.
When setting the environment variable NV_ACC_NOTIFY=2, which shows the data movement, I see that the compiler is only copying 40 bytes, versus the correct 128. However, if I remove "eta" from the preceding present clause, then it's correct.
#ifdef WORKS
!$acc parallel loop present(dx, fAtxp)
#else
!$acc parallel loop present(dx, fAtxp, eta)
#endif
do i = 1, 5
call calcFunAtx(xp(i), dx, eta, fAtxp(i))
end do
Also, this only occurs when using an allocatable in a declare create directive. If you switched to using data regions, the issue doesn't occur. Probably why we didn't see it before (looks like the bugs been there since mid-2020). Using the declare directive in anything but a module is rare.

Reading integer data from a file using fortran [duplicate]

I'm stuck in a process where I need to compute the values of a function f[x,y,z] on a grid. Here I put how I wrote the program, only evaluating on a one-dimensional grid.
I wrote the program:
program CHISQUARE_MINIMIZATION_VELOCITY_PROFILES
use distribution
IMPLICIT none
integer, parameter :: kp=1001 ! Parameter which states the number of points on the grid.
integer, parameter :: ndata=13 ! Parameter which states the number of elements of the data file.
integer, parameter :: nconst=3 ! Fixed integer parameter.
integer i, j, n
real*8 rc0, rcf, V00, V0f, d00, d0f, rc, V0, d, z
real*8 rcr(kp), V0r(kp), d0r(kp), chisq(kp)
!Scaling radius range
rc0=0.0d-5 ! kpc
rcf=1.0d2 ! kpc
call linspace(rc0,rcf,kp,rcr)
!**************If I call like this, it works normal*****************
!CHISQUARED(1.3d0, 130.2d0, 0.12d0, 1.0d0, 1.0d0, 2.0d0, 0.0d0, 0.0d0, 1, !ndata, nconst)
! **1.27000000000000 0.745818846396887**
! Press any key to continue
!**************If I call like this, it works normal*****************
!******* Here is where my problem is****************
do j=1, kp
rc=rcr(j)
write(*,*) rc, CHISQUARED(rc, 130.2d0, 0.12d0, 1.0d0, 1.0d0, 2.0d0, 0.0d0, 0.0d0, 1, ndata, nconst)
enddo
!******* Here is where my problem is****************
end program CHISQUARE_MINIMIZATION_VELOCITY_PROFILES
I use the module where I compute the chi^2 distribution, coming from a theoretical model...
MODULE distribution
IMPLICIT NONE
CONTAINS
! I define here the chi^2 function****
real*8 function CHISQUARED(rc, V0, d, alpha, gamma, chi, a, b, n, ndata, nconst)
integer i, n, ndata, nconst
real*8 rc, V0, d
real*8 alpha, gamma, chi, a, b, s
real*8, DIMENSION(ndata,3) :: X
open(unit=1, file="data.txt")
s=0.0d0
do i=1, ndata
Read(1,*) X(i,:)
s=s+((X(i,2)-VELOCITYPROFILE(X(i,1), rc, V0, d, alpha, gamma, chi, a, b, n))/(X(i,3)))**2.0d0
end do
CHISQUARED=s/(ndata-nconst)
end function CHISQUARED
!****Here I define the model function
real*8 function VELOCITYPROFILE(r, rc, V0, d, alpha, gamma, chi, a, b, n)
integer i, n
real*8 r, rc, V0, d, alpha, gamma, chi, a, b, z
if (rc < 0.0d0 .OR. d < 0.0d0 .OR. a <0.0d0 .OR. b <0.0d0 .OR. alpha < 0.0d0 .OR. gamma <0.0d0 .OR. chi < 0.0d0 .OR. n<1 ) then
VELOCITYPROFILE=0.0d0
return
else
z=0.0d0
do i=0,n
z=z+((V0*((r/rc)**(1.5d0))*(1+a+r/rc)**(-gamma*(2*n+0.5d0)))/((a+(r/rc)**alpha)**(chi/2.0d0)))*(((b+r/rc)**gamma)/d)**i
end do
VELOCITYPROFILE=z
end if
end function VELOCITYPROFILE
END MODULE distribution
!*****************END OF THE MODULE******************************
the data.txt file is of the form
0.24 37.31 6.15
0.28 37.92 5.5
0.46 47.12 3.9
0.64 53.48 2.8
0.73 55.14 3.3
0.82 58.47 2.5
1.08 66.15 3.3
1.22 69.39 2.75
1.45 74.55 5.
1.71 77.94 2.93
1.87 81.66 2.5
2.2 86.81 3.02
2.28 90.08 2.1
2.69 94.38 3.92
2.7 95.36 1.8
In order to get several values of the function CHISQUARED, I use the subroutine linspace to generate the partition of the 1-dimensional grid
subroutine linspace(xi,xf,jmax,y)
integer jmax,j
real*8 xi,xf,y(jmax)
y=(/(xi+dble(j-1)*(xf-xi)/(dble(jmax)-1.0d0), j=1, jmax)/)
end subroutine linspace
What happens is that if in the main program, I call the function CHISQUARED like this:
CHISQUARED(1.3d0, 130.2d0, 0.12d0, 1.0d0, 1.0d0, 2.0d0, 0.0d0, 0.0d0, 1, ndata, nconst)
**1.27000000000000 0.745818846396887**
Press any key to continue
I get some finite value, like, I don't know, 0.7 or something like this. (I restricted the data file so the result won't be the one written, I just put 0.7 as an example). However, when I put it inside a loop as it is in the program written above, to get the values on the one dimensional grid, it gives me the error
**0.000000000000000E+000 NaN**
forrtl: severe (24): end-of-file during read, unit 1, file C:\Users\Ernesto Lopez Fune\Desktop\Minimize\newone\chisquarerotationcurve\data.txt
Image PC Routine Line Source
chisquarerotation 0040B889 Unknown Unknown Unknown
Press any key to continue
Can anyone recommend me what to do in this case? How to overcome this barrier?
According to your error, you reach the end of your file.
When you call your subroutine once, it's OK but in a loop, your file is read multiple times. After the first iteration, your file is read until the EOF control but for the next iteration, the program can't read anymore because it has already reached the end of the file.
You need to use the REWIND(1) statement before end function CHISQUARED. With this, the cursor will be re-positioned at the beginning of the file. Besides, I think it would be better to OPEN your file in the main program and not in a function or subroutine to avoid multiple OPEN/CLOSE.
Don't forget to CLOSE your file when you are done dealing with it.

The FFT result is keep diverging..(Fortran/ MKL)

I tried to conduct 2-Dimensional FFT with visual Fortran and Intel MKL.
But the result keeps blowing up. How can I fix it?
I would appreciate it if you could answer my question.
Here is my code:
\\\\\
First, I created a sinusoidal wave,
real(8), allocatable:: Wave_ini_2D(:,:) ! [nx, nt] dimensional
with duration x:[-40,40], t:[0:10] with resolution nx and nt, respectively.
And the wave is passed to a subroutine as follows,
subroutine FFTF2_R2C(nx, nt, L1, Wave_ini_2D, Wave_fft_2D)
integer, intent(in):: L1(2), nx, nt
real(8), intent(in):: Wave_ini_2D(nx, nt)
complex(8), intent(out):: Wave_fft_2D(L1(1), L1(2))
type(DFTI_DESCRIPTOR), POINTER :: My_Desc2_Handle ! FFT handler
real(8):: Win(nx*nt)
integer:: status, i
Win = reshape(Wave_ini_2D, shape(Win))
Status = DftiCreateDescriptor( My_Desc2_Handle, DFTI_SINGLE, DFTI_REAL, 2, L1)
Status = DftiCommitDescriptor( My_Desc2_Handle)
Status = DftiComputeForward( My_Desc2_Handle, Win)
Status = DftiFreeDescriptor(My_Desc2_Handle)
Wave_fft_2D = reshape(Win, shape(Wave_fft_2D))
end subroutine

Fortran subroutine delivers wrong result when called in C++ program

I have to write a Fortran routine returning the inverse matrix. If I run the code below in a Fortran program the inverse matrix is correct, but when I run the subroutine from C++ code my first value is a wrong value. It seems like a problem with the data types or the memory.
What am I doing wrong?
Here is the subroutine:
subroutine get_inverse_matrix( matrix, rows_matrix, cols_matrix, tmpMatrix, rows_tmpMatrix, cols_tmpMatrix) bind(c)
use iso_c_binding
integer :: m, n, lda, lwork, info, size_m
integer(c_int) :: rows_matrix, cols_matrix, rows_tmpMatrix, cols_tmpMatrix
real(c_double) :: matrix(rows_matrix, cols_matrix), tmpMatrix(rows_tmpMatrix, cols_tmpMatrix)
integer, dimension( rows_matrix ) :: ipiv
real, dimension( rows_matrix ) :: work
size_m = rows_matrix
m = size_m
n = size_m
lda = size_m
lwork = size_m
write(*,*) "Matrix: ", matrix
!tmpMatrix = matrix
write(*,*) "Temp matrix: ", tmpMatrix
! LU-Faktorisierung (Dreieckszerlegung) der Matrix
call sgetrf( m, n, tmpMatrix, lda, ipiv, info )
write(*,*) info
! Inverse der LU-faktorisierten Matrix
call sgetri( n, tmpMatrix, lda, ipiv, work, lwork, info )
write(*,*) info
select case(info)
case(0)
write(*,*) "SUCCESS"
case(:-1)
write(*,*) "ILLEGAL VALUE"
case(1:)
write(*,*) "SINGULAR MATRIX"
end select
end subroutine get_inverse_matrix
Here is the declaration in the C++ code:
extern "C"
{
void get_inverse_matrix( double *matrix, int *rows_matrix, int *cols_matrix, double *tmpMatrix, int *rows_tmpMatrix, int *cols_tmpMatrix);}
Here is the call from my C++ program:
get_inverse_matrix(&lhs[0], &sz, &sz, &res[0], &sz, &sz);
My program only uses a 3x3 matrix. If I pass the identity matrix the result looks like:
5.29981e-315 0 0
0 1 0
0 0 1
You are declaring your arrays as type real with kind c_double but you are using lapack routines that are expecting single precision inputs (e.g. c_float). To fix this you should replace the calls to sgetrf and sgetri with dgetrf and dgetri.
As noted by Vladimir F in the comments these issues can be more easily caught if you provide interfaces.

Error: unclassifiable statement in fortran

When I ran the following simple program
program test
! integer m,n,r,i
double precision x(2),y(3),z(4)
x=(/2.0,1.0/)
y=(/1.0,2.0,1.0/)
call polymul(x,2,y,3,z,4)
print *,z
end
subroutine polymul(x,m,y,n,z,r)
! polynominal multipy
integer i,j,k
do i=1,r
z(i)=0.0
end do
do i=1,m
do j=1,n
k=i+j-1
z(k)=z(k)+x(i)*y(j)
end do
end do
end
it showed
Error: Unclassifiable statement
You have not declared what x, y, and z are in the subroutine. Fortran does not know if these variables are functions (that have not been defined) or an array. The fix is simple: declare the arrays explicitly in the subroutine:
subroutine polymul(x, m, y, n, z, r)
implicit none
integer m, n, r
double precision x(m), y(n), z(r)
integer i, j, k
do i=1,r
z(i)=0.0
enddo
do i=1,m
do j=1,n
k=i+j-1
z(k)=z(k)+x(i)*y(j)
enddo
enddo
end subroutine
Just as ifort prompts that (variable z)This name has not been declared as an array or a function.u need to declare variable x,y,z to be arrays in subroutine polymul.