Global Variables in Fortran OpenMP - fortran

Why the following code in Fortran only works if I put the loop variables 'i' and 'j' as input arguments of the subroutine 'mat_init'? The loop variables 'i' and 'j' are declared as private, so shouldn't they remain private inside the subroutine when I call it?
program main
use omp_lib
implicit none
real(8), dimension(:,:), allocatable:: A
integer:: i, j, n
n = 20
allocate(A(n,n)); A(:,:) = 0.0d+00
!$omp parallel do private(i, j)
do i=1,n
do j=1,n
call mat_init
end do
end do
do i=1,n
write(*,'(20f7.4)') (A(i,j), j=1,n)
end do
contains
subroutine mat_init
A(i,j) = 1.0d+00
end subroutine
end program main
I know this have something to do with the 'lexical' and 'dynamic' extend, but I don't understand why OpenMP is implemented in this way to don't recognize private variables in the 'dymanic' extend inside de parallel regions. For me it seems not to be logical or am I doing anything wrong?

First, I think that the subroutine mat_init should takes the value of i and j as input arguments explicitly. Then, the value of i and j must be private, because each thread works on a specific value of i and j. I think also that openmp recognizes that i is private because the parallelized loop is on i. Idem for j. However, this work for the global variables i and j and not for those ones who are internal to the subroutine. Thus, you have to specify that i and j are private in order to force the subroutine internal variables to inhiritate of this aspect.
I believe that the problem is due to the reentrance of the subroutine mat_init. Indeed, what happen when multiple threads enter the subroutine at the same time with different value of i and j? If you don't do any special thing, the called subroutine might not recognize the private aspect of i and j.
In general, it is not welcomed to call many times a subroutine inside a loop, because each call requires a given time. I suggest to write a subroutine that is parallelized rather than call a subroutine within a parallelized section.

Related

Reference Argument Passing with Nested OpenACC Routines

I'm attempting to parallelize some Fortran 90 code using OpenACC, where a parallelized loop calls a sequential routine. When I attempt to run the code using the PGI Fortran compiler (2020.4), I obtain an error message saying that reference argument passing prevents parallelization.
My understanding is that this is likely because one routine exists on the Host while the other is on the Device, but I'm unclear on where I might be missing a pragma that would lead to this outcome.
The basic structure of the calling routine is:
subroutine OuterRoutine(F,G,X,Y)
real(wp), dimension(:,:), intent(IN) :: X
real(wp), dimension(:,:), intent(IN) :: Y
real(wp), dimension(1,PT), intent(OUT) :: F
real(wp), dimension(N_p,PT), intent(OUT) :: G
! Local Variables
integer :: t, i, j
!$acc data copyin(X,Y), copyout(F,G)
!$acc parallel loop
do t = 1,PT,1
!$acc loop collapse(2) reduction(+:intr)
do i = 1,N_int-1,1
do j = 1,N_int-1,1
G(i,j) = intgrdJ2(X(i,j),X(j,i),Y(i,j),Y(j,i),t)
end do
end do
!$acc end loop
!$acc end parallel loop
!$acc end data
end subroutine OuterRoutine
And the function being called is:
function intgrdJ2(z,mu,p,q,t)
!$acc routine seq
real(wp), intent(IN) :: z, mu, p, q
integer, intent(IN) :: t
real(wp) :: intgrdJ2
! Local Variables
real(wp) :: mu2
real(wp), dimension(N_p) :: nu_m2, psi_m2
integer :: i
mu2 = (mu*fh_pdf(z,mu,p))/f_pdf(z,mu,p)
do i = 1,N_p,1
nu_m2(i) = interpValue(mu2,mugrid,nu_knots(:,i,t))
psi_m2(i) = interpValue(mu2,mugrid,psi_knots(:,i,t))
end do
intgrdJ2 = nu_m2(i)*psi_m2(i)
end function intgrdJ2
The routines interpValue, fh_pdf, and f_pdf are all contained in a used module, and denoted as !$acc routine seq. The variables mugrid, nu_knots, and psi_knots are all module-level variables, which are copied-in to the Device prior to calling OuterRoutine.
When I run the code, I get this sort of output from the compiler:
intgrdj2:
576, Generating acc routine seq
Generating Tesla code
593, Reference argument passing prevents parallelization: mu2
Where 593 refers to the "nu_m2(i) = ..." line.
My understanding is that since the variable mu2 is a scalar declared inside of the sequential routine, each thread should have it's own copy of the variable, and I don't need to explicitly declare it to be private when I declare the data region. From reading this post it seems that the problem may be related to where the routines are located (Host vs Device). However, it seems as though all of the relevant pieces should be on the device because I'm specifying that routines are sequential.
As a first-time OpenACC user, any explanations about what I might be overlooking would be greatly appreciated!
My understanding is that since the variable mu2 is a scalar declared
inside of the sequential routine, each thread should have it's own
copy of the variable, and I don't need to explicitly declare it to be
private when I declare the data region
This is true in most cases. But what's likely happening here is that since Fortran by default passes variables by reference, the compiler must assume that it's reference can be taken by a module variable. Unlikely, but possible.
The typical way to fix this is to pass the scalar by value, i.e. add the "value" attribute to the argument declaration in "interpValue". Alternately, you can explicitly privatize "mu2" by adding "!$acc loop seq private(mu2)" on the "i" loop.
Now the message may just be indicating that the compiler can't auto-parallelize this loop. But since it's in a sequential routine, that wouldn't matter and you can safely ignore the message. Though, I don't have the full context so can't be 100% certain of this.

Writing assumed-size array causes "upper bound shall not be omitted..."

I am writing code to add on a closed-source Finite-Element Framework that forces me (due to relying on some old F77 style approaches) in one place to rely on assumed-size arrays.
Is it possible to write an assumed-size array to the standard output, whatever its size may be?
This is not working:
module fun
implicit none
contains
subroutine writer(a)
integer, dimension(*), intent(in) :: a
write(*,*) a
end subroutine writer
end module fun
program test
use fun
implicit none
integer, dimension(2) :: a
a(1) = 1
a(2) = 2
call writer(a)
end program test
With the Intel Fortran compiler throwing
error #6364: The upper bound shall not be omitted in the last dimension of a reference to an assumed size array.
The compiler does not know how large an assumed-size array is. It has only the address of the first element. You are responsible to tell how large it is.
write(*,*) a(1:n)
Equivalently you can use an explicit-size array
integer, intent(in) :: a(n)
and then you can do
write(*,*) a
An assumed-size array may not occur as a whole array reference when that reference requires the shape of the array. As an output item in a write statement that is one such disallowed case.
So, in that sense the answer is: no, it is not possible to have the write statement as you have it.
From an assumed-size array, array sections and array elements may appear:
write (*,*) a(1:2)
write (*,*) a(1), a(2)
write (*,*) (a(i), i=1,2)
leading simply to how to get the value 2 into the subroutine; at other times it may be 7 required. Let's call it n.
Naturally, changing the subroutine is tempting:
subroutine writer (a,n)
integer n
integer a(n) ! or still a(*)
end subroutine
or even
subroutine writer (a)
integer a(:)
end subroutine
One often hasn't a choice, alas, in particular when associating a procedure with a dummy procedure with a specific interface . However, n can get into the subroutine through any of several other ways: as a module or host entity, or through a common block (avoid this one if possible). These methods do not require modifying the interface of the procedure. For example:
subroutine writer(a)
use aux_params, only : n
integer, dimension(*), intent(in) :: a
write(*,*) a(1:n)
end subroutine writer
or we could have n as an entity in the module fun and have it accesible in writer through host association. In either case, setting this n's value in the main program before writer is executed will be necessary.

Program stops due to array allocation in a function [duplicate]

The following code is returning a Segmentation Fault because the allocatable array I am trying to pass is not being properly recognized (size returns 1, when it should be 3). In this page (http://www.eng-tips.com/viewthread.cfm?qid=170599) a similar example seems to indicate that it should work fine in F95; my code file has a .F90 extension, but I tried changing it to F95, and I am using gfortran to compile.
My guess is that the problem should be in the way I am passing the allocatable array to the subroutine; What am I doing wrong?
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%!
PROGRAM test
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%!
IMPLICIT NONE
DOUBLE PRECISION,ALLOCATABLE :: Array(:,:)
INTEGER :: iii,jjj
ALLOCATE(Array(3,3))
DO iii=1,3
DO jjj=1,3
Array(iii,jjj)=iii+jjj
PRINT*,Array(iii,jjj)
ENDDO
ENDDO
CALL Subtest(Array)
END PROGRAM
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%!
SUBROUTINE Subtest(Array)
DOUBLE PRECISION,ALLOCATABLE,INTENT(IN) :: Array(:,:)
INTEGER :: iii,jjj
PRINT*,SIZE(Array,1),SIZE(Array,2)
DO iii=1,SIZE(Array,1)
DO jjj=1,SIZE(Array,2)
PRINT*,Array(iii,jjj)
ENDDO
ENDDO
END SUBROUTINE
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%!
If a procedure has a dummy argument that is an allocatable, then an explicit interface is required in any calling scope.
(There are numerous things that require an explicit interface, an allocatable dummy is but one.)
You can provide that explicit interface yourself by putting an interface block for your subroutine inside the main program. An alternative and far, far, far better option is to put the subroutine inside a module and then USE that module in the main program - the explicit interface is then automatically created. There is an example of this on the eng-tips site that you provided a link to - see the post by xwb.
Note that it only makes sense for a dummy argument to have the allocatable attribute if you are going to do something related to its allocation status - query its status, reallocate it, deallocate it, etc.
Please also note that your allocatable dummy argument array is declared with intent(in), which means its allocation status will be that of the associated actual argument (and it may not be changed during the procedure). The actual argument passed to your subroutine may be unallocated and therefore illegal to reference, even with an explicit interface. The compiler will not know this and the behaviour of inquiries like size is undefined in such cases.
Hence, you first have to check the allocation status of array with allocated(array) before referencing its contents. I would further suggest to implement loops over the full array with lbound and ubound, since in general you can't be sure about array's bounds:
subroutine subtest(array)
double precision, allocatable, intent(in) :: array(:,:)
integer :: iii, jjj
if(allocated(array)) then
print*, size(array, 1), size(array, 2)
do iii = lbound(array, 1), ubound(array, 1)
do jjj = lbound(array, 2), ubound(array, 2)
print*, array(iii,jjj)
enddo
enddo
endif
end subroutine
This is a simple example that uses allocatable dummy arguments with a module.
module arrayMod
real,dimension(:,:),allocatable :: theArray
end module arrayMod
program test
use arrayMod
implicit none
interface
subroutine arraySub
end subroutine arraySub
end interface
write(*,*) allocated(theArray)
call arraySub
write(*,*) allocated(theArray)
end program test
subroutine arraySub
use arrayMod
write(*,*) 'Inside arraySub()'
allocate(theArray(3,2))
end subroutine arraySub

Fortran arrays in hybrid MPI/OpenMP

I am facing the following issue when running a hybrid MPI/OpenMP
code with GNU and Intel compilers and OpenMPI. The code is big (commercial)
written in Fortran. It compiles and runs fine with GNU compilers
but crashes with Intel compilers.
I have monitored the part of the code when the program stops working,
it has the following structure:
subroutine test(n,dy,dy)
integer :: i
integer, parameter :: n=6
real*8 :: dx(num),dy(num), ener
ener=0.0
!$omp parallel num_threads(2)
!$omp do
do i=1,100
ener = ener + funct(n,dx,dy) + i
enddo
!$omp end do
!$omp end parallel
end subroutine test
and the function funct has this structure:
real*8 function funct(n,dx,dy)
integer :: n
real*8 :: dx(*),dy(*)
funct = 0.0
do i=1,n
funct = funct + dx(i)+dy(i)
enddo
end function funct
Specifically the code stops inside funct (with Intel). The
program is able to get the end of funct but only one thread
of the two requested is able to return the value, I checked
that by printing the thread numbers.
This issue is only for Intel compilers, for GNU I don't get
the issue.
One way to avoid the issue, I found, is by using plain arrays
inside funct as follows:
real*8 function funct(n,dx,dy)
integer :: n
real*8 :: dx(n),dy(n)
but my point is that I don't understand what is happening.
My guess is that in the Intel case, the compiler cannot
figure out the length of dx and dy inside funct but I am
not sure. I tried to reproduce this issue with a small
Fortran program but I was not able to see that issue.
Any comment is welcome.
One update: I eliminated the issue with the race condition (this is
not the real problem, what I wrote here was the structure of the code).
I realized that subroutinetest is being called from another subroutine
upper which defines dx,dy as pointers:
subroutine upper
real*8,save,pointer :: dx(:)=>Null(), dy(:)=>Null()
....
call test(n,dx,dy)
...
end subroutine upper
what I did now, was to replace pointers by allocatables:
subroutine upper
real*8,save,dimension(:),allocatable :: dx,dy
....
allocate(dx(n),dy(n))
call test(n,dx,dy)
...
end subroutine upper
and I don't get the issue with Intel. I don't know what could be the
difference between pointers and allocatables.
Thanks.

OpenMP and shared variable in Fortran which are not shared

I encounter a problem with OpenMP and shared variables I cannot understand. Everything I do is in Fortran 90/95.
Here is my problem: I have a parallel region defined in my main program, with the clause DEFAULT(SHARED), in which I call a subroutine that does some computation. I have a local variable (an array) I allocate and on which I do the computations. I was expecting this array to be shared (because of the DEFAULT(SHARED) clause), but it seems that it is not the case.
Here is an example of what I am trying to do and that reproduce the error I get:
program main
!$ use OMP_LIB
implicit none
integer, parameter :: nx=10, ny=10
real(8), dimension(:,:), allocatable :: array
!$OMP PARALLEL DEFAULT(SHARED)
!$OMP SINGLE
allocate(array(nx,ny))
!$OMP END SINGLE
!$OMP WORKSHARE
array = 1.
!$OMP END WORKSHARE
call compute(array,nx,ny)
!$OMP SINGLE
deallocate(array)
!$OMP END SINGLE
!$OMP END PARALLEL
contains
!=============================================================================
! SUBROUTINES
!=============================================================================
subroutine compute(array, nx, ny)
!$ use OMP_LIB
implicit none
real(8), dimension(nx,ny) :: array
integer :: nx, ny
real(8), dimension(:,:), allocatable :: q
integer :: i, j
!$OMP SINGLE
allocate(q(nx,ny))
!$OMP END SINGLE
!$OMP WORKSHARE
q = 0.
!$OMP END WORKSHARE
print*, 'q before: ', q(1,1)
!$OMP DO SCHEDULE(RUNTIME)
do j = 1, ny
do i = 1, nx
if(mod(i,j).eq.0) then
q(i,j) = array(i,j)*2.
else
q(i,j) = array(i,j)*0.5
endif
end do
end do
!$OMP END DO
print*, 'q after: ', q(1,1)
!$OMP SINGLE
deallocate(q)
!$OMP END SINGLE
end subroutine compute
!=============================================================================
end program main
When I execute it like that, I get a segmentation fault, because the local array q is allocated on one thread but not on the others, and when the others try to access it in memory, it crashes.
If I get rid of the SINGLE region the local array q is allocated (though sometimes it crashes, which make sense, if different threads try to allocate it whereas it is already the case (and actually it puzzles me why it does not crash everytime)) but then it is clearly as if the array q is private (therefore one thread returns me the expected value, whereas the others return me something else).
It really puzzled me why the q array is not shared although I declared my parallel region with the clause DEFAULT(SHARED). And since I am in an orphaned subroutine, I cannot declare explicitely q as shared, since it is known only in the subroutine compute... I am stuck with this problem so far, I could not find a workaround.
Is it normal? Should I expect this behaviour? Is there a workaround? Do I miss something obvious?
Any help would be highly appreciated!
q is an entity that is "inside a region but not inside a construct" in terms of OpenMP speak. The subroutine that q is local to is in a procedure that is called during a parallel construct, but q itself does not lexically appear in between the PARALLEL and END PARALLEL directives.
The data sharing rules for such entities in OpenMP then dictate that q is private.
The data sharing clauses such as DEFAULT(SHARED), etc only apply to things that appear in the construct itself (things that lexically appear in between the PARALLEL and END PARALLEL). (They can't apply to things in the region generally - procedures called in the region may have been separately compiled and might be called outside of any parallel constructs.)
The array q is defined INSIDE the called subroutine. Every thread calls this subroutine independently and therefore every thread will have it's own copy. The shared directive in the outer subroutine cannot change this. Try to declare it with the save attribute.