I am currently accelerating a Fortran code where I have a main accelerated loop in subroutine sub. In the loop, I want to call subroutine subsub on the device with acc routine. The subroutine has an intent(out) argument val, which is private in the loop. As subsub has a loop itself, I want to use the vector clause:
module calc
implicit none
public :: sub
private
contains
subroutine sub()
integer :: i
integer :: array(10)
integer :: val
!$acc kernels loop independent private(val)
do i = 1, 10
call subsub(val)
array(i) = val
enddo
print "(10(i0, x))", array
endsubroutine
subroutine subsub(val)
!$acc routine vector
integer, intent(out) :: val
integer :: i
val = 0
!$acc loop independent reduction(+:val)
do i = 1, 10
val = val + 1
enddo
endsubroutine
endmodule
program test
use calc, only: sub
implicit none
call sub()
endprogram
When compiling with the PGI compiler version 20.9-0 and running the program, I get gibberish values in variable array. When I simply use acc routine for subsub, I get the correct behavior (10 in all values of array). What is wrong in my approach to parallelize this subroutine?
It does look like a compiler code generation issue on how val is getting handled in the main loop. Luckily the workaround is easy, just add the installation of val in the main loop.
% cat test.f90
module calc
implicit none
public :: sub
private
contains
subroutine sub()
integer :: i
integer :: array(10)
integer :: val
!$acc kernels loop independent private(val)
do i = 1, 10
val = 0
call subsub(val)
array(i) = val
enddo
print "(10(i0, x))", array
endsubroutine
subroutine subsub(val)
!$acc routine vector
integer, intent(out) :: val
integer :: i
val = 0
!$acc loop independent reduction(+:val)
do i = 1, 10
val = val + 1
enddo
endsubroutine
endmodule
program test
use calc, only: sub
implicit none
call sub()
endprogram
% nvfortran -acc -Minfo=accel test.f90 -V20.9 ; a.out
sub:
10, Generating implicit copyout(array(:)) [if not already present]
11, Loop is parallelizable
Generating Tesla code
11, !$acc loop gang ! blockidx%x
subsub:
18, Generating Tesla code
24, !$acc loop vector ! threadidx%x
Generating reduction(+:val)
Vector barrier inserted for vector loop reduction
24, Loop is parallelizable
10 10 10 10 10 10 10 10 10 10
Related
How can I access a function on device via a function pointer?
In below code I am trying to access init0 or init1 using function pointer init. The code does work as intended if OpenACC is not enabled during compilation. However, it fails when compiled with OpenACC. Below code is saved as stackOverflow2.f95:
module modTest2
use openacc
implicit none
type :: Container
sequence
integer :: n
integer, allocatable :: arr(:)
end type Container
interface Container
procedure :: new_Container
end interface
abstract interface
integer function function_template (i)
integer, intent (in) :: i
end function function_template
end interface
contains
type(Container) function new_Container(n)
integer, intent(in) :: n
allocate(new_Container%arr(n))
end function new_Container
end module modTest2
program test2
use modTest2
implicit none
integer :: n, x, i
type(Container) :: c
procedure(function_template), pointer :: init
print *, "Enter array size: "
read *, n
print *, "Allocating..."
c = Container(n)
print *, "Allocation complete!"
print *, "Enter initialization type (x): "
read *, x
print *, "Initializing..."
select case (x)
case (0)
init => init0
case default
init => init1
end select
!$acc data copyin(c) copyout(c%arr)
!$acc parallel loop present(c)
do i = 1, n
c%arr(i) = init(i)
end do
!$acc end data
print *, "Initialization complete..."
do i = 1, n
print *, i, c%arr(i)
end do
contains
integer function init0(i)
!$acc routine
integer, intent(in) :: i
init0 = 10*i
end function init0
integer function init1(i)
!$acc routine
integer, intent(in) :: i
init1 = 20*i
end function init1
end program test2
Correct output is seen without OpenACC:
$ gfortran -c stackOverflow2.f95
$ gfortran stackOverflow2.o -o a.out
$ ./a.out
Enter array size:
3
Allocating...
Allocation complete!
Enter initialization type (x):
0
Initializing...
Initialization complete...
1 10
2 20
3 30
Incorrect output is seen below with OpenACC (Note that NVIDIA compiler is used here):
$ /opt/nvidia/hpc_sdk/Linux_x86_64/22.1/compilers/bin/nvfortran stackOverflow2.f95 -acc; ./a.out
Enter array size:
3
Allocating...
Allocation complete!
Enter initialization type (x):
0
Initializing...
Initialization complete...
1 0
2 0
3 0
Sorry but function pointers (along with C++ virtual functions) are not yet supported on the device. Adding the compiler feedback flag (-Minfo=accel), you'll see the following message:
% nvfortran -acc -Minfo=accel test.f90
test2:
62, Generating copyout(c%arr(:)) [if not already present]
Generating copyin(c) [if not already present]
65, Accelerator restriction: Indirect function/procedure calls are not supported
The problem being that indirect functions require a device jump table and runtime dynamic linking which is currently unavailable. While I don't have a timeline, we are exploring options on how to offer this support in the future.
Using gfortran-11 with the below did the trick:
module modTest2
use openacc
implicit none
type :: Container
sequence
integer :: n
integer, allocatable :: arr(:)
end type Container
interface Container
procedure :: new_Container
end interface
abstract interface
integer function function_template (i)
integer, intent (in) :: i
end function function_template
end interface
contains
type(Container) function new_Container(n)
integer, intent(in) :: n
allocate(new_Container%arr(n))
end function new_Container
end module modTest2
program test2
use modTest2
implicit none
integer :: n, x, i
type(Container) :: c
procedure(function_template), pointer :: init
print *, "Enter array size: "
read *, n
print *, "Allocating..."
c = Container(n)
print *, "Allocation complete!"
print *, "Enter initialization type (x): "
read *, x
print *, "Initializing..."
select case (x)
case (0)
init => init0
case default
init => init1
end select
!$acc enter data copyin(c)
!$acc enter data create(c%arr)
!$acc parallel loop present(c)
do i = 1, n
c%arr(i) = init(i)
end do
!$acc exit data copyout(c%arr)
!$acc exit data delete(c)
print *, "Initialization complete..."
do i = 1, n
print *, i, c%arr(i)
end do
contains
integer function init0(i)
!$acc routine
integer, intent(in) :: i
init0 = 10*i
end function init0
integer function init1(i)
!$acc routine
integer, intent(in) :: i
init1 = 20*i
end function init1
end program test2
Here's the output:
$ gfortran-11 -fopenacc stackOverflow2.f95
$ gfortran-11 -fopenacc stackOverflow2.o -o stackOverflow2
$ ./stackOverflow2
Enter array size:
4
Allocating...
Allocation complete!
Enter initialization type (x):
0
Initializing...
Initialization complete...
1 10
2 20
3 30
4 40
$ ./stackOverflow2
Enter array size:
4
Allocating...
Allocation complete!
Enter initialization type (x):
9
Initializing...
Initialization complete...
1 20
2 40
3 60
4 80
I'm quite new to fortran, i'm trying to execute a function/subroutine but i'm getting an error Explicit interface required
This is my code:
function printmat(m)
integer, dimension(:,:) :: m
integer :: row,col
row = size(m,1)
col = size(m,2)
do k=1,row
print *, m(k,1:col)
enddo
end function printmat
program test
integer, dimension(5, 5) :: mat
integer :: i,j
do i=1,5
do j=1,5
mat(j,i) = real(i)/real(j)
enddo
enddo
call printmat(mat)
end program test
But when i execute it i get:
Error: Explicit interface required for 'printmat' at (1): assumed-shape argument
Any idea of what could it be? I tried wrapping it into a module, but when i use "use modulename" in the program it gives me an error (tries to read it from a file with the same name)
Wrap it into a module and make it a subroutine if you want to use it with CALL.
module printmat_module
contains
subroutine printmat(m)
integer, dimension(:,:) :: m
integer :: row,col
row = size(m,1)
col = size(m,2)
do k=1,row
print *, m(k,1:col)
enddo
end subroutine printmat
end module printmat_module
program test
use printmat_module
integer, dimension(5, 5) :: mat
integer :: i,j
do i=1,5
do j=1,5
mat(j,i) = real(i)/real(j)
enddo
enddo
call printmat(mat)
end program test
Alternatively you can just do what the compiler tells you and add an explicit interface to the program.
subroutine printmat(m)
integer, dimension(:,:) :: m
integer :: row,col
row = size(m,1)
col = size(m,2)
do k=1,row
print *, m(k,1:col)
enddo
end subroutine printmat
program test
interface
subroutine printmat(m)
integer, dimension(:,:) :: m
end subroutine printmat
end interface
integer, dimension(5, 5) :: mat
integer :: i,j
do i=1,5
do j=1,5
mat(j,i) = real(i)/real(j)
enddo
enddo
call printmat(mat)
end program test
Is there a Fortran library which has an implementation of an ordering function, i.e., a function ordering(list) (like Ordering[] in Mathematica) which gives the positions in list at which each successive element of the sorted list appears?
I can implement it but I don't want to reinvent the wheel (and my wheel could be far from perfect...). Since it is so basic I was searching for a lib containing such list operations but failed to find one.
Do you have any suggestions?
Since I had this already implemented a long time ago (which relies on and borrows heavily from the Numerical Recipes book of Bill Press et al), here is a self-contained implementation of it in Fortran:
module index_mod
use, intrinsic :: iso_fortran_env, only: IK=>int32, RK=>real64
implicit none
contains
subroutine indexArrayReal(n,Array,Index)
implicit none
integer(IK), intent(in) :: n
real(RK) , intent(in) :: Array(n)
integer(IK), intent(out) :: Index(n)
integer(IK), parameter :: nn=15, nstack=50
integer(IK) :: k,i,j,indext,jstack,l,r
integer(IK) :: istack(nstack)
real(RK) :: a
do j = 1,n
Index(j) = j
end do
jstack=0
l=1
r=n
do
if (r-l < nn) then
do j=l+1,r
indext=Index(j)
a=Array(indext)
do i=j-1,l,-1
if (Array(Index(i)) <= a) exit
Index(i+1)=Index(i)
end do
Index(i+1)=indext
end do
if (jstack == 0) return
r=istack(jstack)
l=istack(jstack-1)
jstack=jstack-2
else
k=(l+r)/2
call swap(Index(k),Index(l+1))
call exchangeIndex(Index(l),Index(r))
call exchangeIndex(Index(l+1),Index(r))
call exchangeIndex(Index(l),Index(l+1))
i=l+1
j=r
indext=Index(l+1)
a=Array(indext)
do
do
i=i+1
if (Array(Index(i)) >= a) exit
end do
do
j=j-1
if (Array(Index(j)) <= a) exit
end do
if (j < i) exit
call swap(Index(i),Index(j))
end do
Index(l+1)=Index(j)
Index(j)=indext
jstack=jstack+2
if (jstack > nstack) then
write(*,*) 'NSTACK too small in indexArrayReal()' ! xxx
error stop
end if
if (r-i+1 >= j-l) then
istack(jstack)=r
istack(jstack-1)=i
r=j-1
else
istack(jstack)=j-1
istack(jstack-1)=l
l=i
end if
end if
end do
contains
subroutine exchangeIndex(i,j)
integer(IK), intent(inout) :: i,j
integer(IK) :: swp
if (Array(j) < Array(i)) then
swp=i
i=j
j=swp
end if
end subroutine exchangeIndex
pure elemental subroutine swap(a,b)
implicit none
integer(IK), intent(inout) :: a,b
integer(IK) :: dum
dum=a
a=b
b=dum
end subroutine swap
end subroutine indexArrayReal
end module Index_mod
program Index_prog
use Index_mod, only: IK, RK, indexArrayReal
implicit none
integer(IK), parameter :: n = 5
integer(IK) :: Index(n)
real(RK) :: Array(n) = [ 1.,3.,4.,2.,-1. ]
call indexArrayReal(n,Array,Index)
write(*,*) "Index: ", Index
write(*,*) "Array(Index): ", Array(Index)
end program Index_prog
Compiled with GFortran 2008, here is the output:
$gfortran -std=f2008 *.f95 -o main
$main
Index: 5 1 4 2 3
Array(Index): -1.0000000000000000 1.0000000000000000 2.0000000000000000 3.0000000000000000 4.0000000000000000
The above routine was for sorting real-valued arrays. To sort integer arrays, simply change real(RK) :: Array(n) in the interface of subroutine indexArrayReal() to integer(IK) :: Array(n).
I'm trying to learn Fortran (unfortunately a necessity for my research group) - one of the tasks I set myself was to package one of the necessary functions (Associated Legendre polynomials) from the Numerical Recipes book into a fortran 03 compliant module. The original program (f77) has some error handling in the form of the following:
if(m.lt.0.or.m.gt.1.or.abs(x).gt.1)pause 'bad arguments in plgndr'
Pause seems to have been deprecated since f77 as using this line gives me a compiling error, so I tried the following:
module sha_helper
implicit none
public :: plgndr, factorial!, ylm
contains
! numerical recipes Associated Legendre Polynomials rewritten for f03
function plgndr(l,m,x) result(res_plgndr)
integer, intent(in) :: l, m
real, intent(in) :: x
real :: res_plgndr, fact, pll, pmm, pmmp1, somx2
integer :: i,ll
if (m.lt.0.or.m.gt.l.or.abs(x).gt.1) then
write (*, *) "bad arguments to plgndr, aborting", m, x
res_plgndr=-10e6 !return a ridiculous value
else
pmm = 1.
if (m.gt.0) then
somx2 = sqrt((1.-x)*(1.+x))
fact = 1.
do i = 1, m
pmm = -pmm*fact*somx2
fact = fact+2
end do
end if
if (l.eq.m) then
res_plgndr = pmm
else
pmmp1 = x*(2*m+1)*pmm
if(l.eq.m+1) then
res_plgndr = pmmp1
else
do ll = m+2, l
pll = (x*(2*ll-1)*pmmp1-(ll+m-1)*pmm)/(ll-m)
pmm = pmmp1
pmmp1 = pll
end do
res_plgndr = pll
end if
end if
end if
end function plgndr
recursive function factorial(n) result(factorial_result)
integer, intent(in) :: n
integer, parameter :: RegInt_K = selected_int_kind(20) !should be enough for the factorials I am using
integer (kind = RegInt_K) :: factorial_result
if (n <= 0) then
factorial_result = 1
else
factorial_result = n * factorial(n-1)
end if
end function factorial
! function ylm(l,m,theta,phi) result(res_ylm)
! integer, intent(in) :: l, m
! real, intent(in) :: theta, phi
! real :: res_ylm, front_block
! real, parameter :: pi = 3.1415926536
! front_block = sqrt((2*l+1)*factorial(l-abs(m))/(4*pi*))
! end function ylm
end module sha_helper
The main code after the else works, but if I execute my main program and call the function with bad values, the program freezes before executing the print statement. I know that the print statement is the problem, as commenting it out allows the function to execute normally, returning -10e6 as the value. Ideally, I would like the program to crash after giving a user readable error message, as giving bad values to the plgndr function is a fatal error for the program. The function plgndr is being used by the program sha_lmc. Currently all this does is read some arrays and then print a value of plgndr for testing (early days). The function ylm in the module sha_helper is also not finished, hence it is commented out. The code compiles using gfortran sha_helper.f03 sha_lmc.f03 -o sha_lmc, and
gfortran --version
GNU Fortran (GCC) 4.8.2
!Spherical Harmonic Bayesian Analysis testbed for Lagrangian Dynamical Monte Carlo
program sha_analysis
use sha_helper
implicit none
!Analysis Parameters
integer, parameter :: harm_order = 6
integer, parameter :: harm_array_length = (harm_order+1)**2
real, parameter :: coeff_lo = -0.1, coeff_hi = 0.1, data_err = 0.01 !for now, data_err fixed rather than heirarchical
!Monte Carlo Parameters
integer, parameter :: run = 100000, burn = 50000, thin = 100
real, parameter :: L = 1.0, e = 1.0
!Variables needed by the program
integer :: points, r, h, p, counter = 1
real, dimension(:), allocatable :: x, y, z
real, dimension(harm_array_length) :: l_index_list, m_index_list
real, dimension(:,:), allocatable :: g_matrix
!Open the file, allocate the x,y,z arrays and read the file
open(1, file = 'Average_H_M_C_PcP_boschi_1200.xyz', status = 'old')
read(1,*) points
allocate(x(points))
allocate(y(points))
allocate(z(points))
print *, "Number of Points: ", points
readloop: do r = 1, points
read(1,*) x(r), y(r), z(r)
end do readloop
!Set up the forwards model
allocate(g_matrix(harm_array_length,points))
!Generate the l and m values of spherical harmonics
hloop: do h = 0, harm_order
ploop: do p = -h,h
l_index_list(counter) = h
m_index_list(counter) = p
counter = counter + 1
end do ploop
end do hloop
print *, plgndr(1,2,0.1)
!print *, ylm(1,1,0.1,0.1)
end program sha_analysis
Your program does what is known as recursive IO - the initial call to plgndr is in the output item list of an IO statement (a print statement) [directing output to the console] - inside that function you then also attempt to execute another IO statement [that outputs to the console]. This is not permitted - see 9.11p2 and p3 of F2003 or 9.12p2 of F2008.
A solution is to separate the function invocation from the io statement in the main program, i.e.
REAL :: a_temporary
...
a_temporary = plgndr(1,2,0.1)
PRINT *, a_temporary
Other alternatives in F2008 (but not F2003 - hence the [ ] parts in the first paragraph) include directing the output from the function to a different logical unit (note that WRITE (*, ... and PRINT ... reference the same unit).
In F2008 you could also replace the WRITE statement with a STOP statement with a message (the message must be a constant - which wouldn't let you report the problematic values).
The potential for inadvertently invoking recursive IO is part of the reason that some programming styles discourage conducting IO in functions.
Try:
if (m.lt.0.or.m.gt.l.or.abs(x).gt.1) then
write (*, *) "bad arguments to plgndr, aborting", m, x
stop
else
...
end if
I have a piece of experimental code that works perfectly with serial compilation and execution. When I compile it with openmp option on ifort (on ubuntu), the compilation goes on fine but the execution stops abruptly. The code is as follows:
!!!!!!!! module
module array
implicit none
real(kind=8),allocatable :: y(:)
end module array
module nonarray
implicit none
real(kind=8):: aa
end module nonarray
use nonarray; use array
implicit none
integer(kind=8):: iter,i
integer(kind=8),parameter:: id=1
real(kind=8),allocatable:: yt(:)
allocate(y(id)); allocate(yt(id)); y=0.d0; yt=0.d0
aa=4.d0 !!A SYSTEM PARAMETER
!$OMP PARALLEL PRIVATE(y,yt,iter,i)
!$OMP DO
loop1: do iter=1,20 !! THE INITIAL CONDITION LOOP
call random_number(y)!! RANDOM INITIALIZATION OF THE VARIABLE
loop2: do i=1,10000 !! ITERATION OF THE SYSTEM
call evolve(yt)
y=yt
enddo loop2 !! END OF SYSTEM ITERATION
write(1,*)aa,yt
enddo loop1 !!INITIAL CONDITION ITERATION DONE
!$OMP ENDDO
!$OMP END PARALLEL
stop
end
recursive subroutine evolve(yevl)
use nonarray; use array
implicit none
integer(kind=8),parameter:: id=1
real(kind=8):: xf
real(kind=8),intent(out):: yevl(id)
xf=aa*y(1)*(1.d0-y(1))
yevl(1)=xf
end subroutine evolve
For compilation I use the following command:
ifort -openmp -fpp test.f90.
test.f90 being the name of the program.
Any suggestions or help is highly appreciated.
I am not an OMP expert, but I think if the subroutine evolve should see a different (private) y in each thread, you should pass it directly from within the parallelized code block to the subroutine instead of importing it from an external module:
module common
use iso_fortran_env
implicit none
integer, parameter :: dp = real64
real(dp) :: aa
contains
subroutine evolve(y, yevl)
implicit none
real(dp), intent(in) :: y(:)
real(dp), intent(out):: yevl(:)
yevl(1) = aa * y(1) * (1.0_dp - y(1))
end subroutine evolve
end module common
program test
use common
implicit none
integer :: iter, i
real(dp), allocatable :: yt(:), y(:)
allocate(yt(1), y(1))
y(:) = 0.0_dp
yt(:) = 0.0_dp
aa = 4.0_dp
!$OMP PARALLEL DO PRIVATE(y,yt,iter,i)
loop1: do iter = 1, 20
call random_number(y)
loop2: do i = 1, 10000
call evolve(y, yt)
y = yt
end do loop2
write(*,*) aa, yt
end do loop1
!$OMP END PARALLEL DO
end program test
Just an additional warning: the code above worked with various compilers (nagfor 5.3.1, gfortran 4.6.3, ifort 13.0.1), but not with ifort 12.1.6. So, although I can't see any obvious problems with it, I may have messed up something.