Should the use of the INTENT keyword speed the code up? - fortran

This question is based on an answer to the post Fortran intent(inout) versus omitting intent, namely the one by user Vladimyr, #Vladimyr.
He says that "<...> Fortran copies that data into a contiguous section of memory, and passes the new address to the routine. Upon returning, the data is copied back into its original location. By specifying INTENT, the compiler can know to skip one of the copying operations."
I did not know this at all, I thought Fortran passes by reference exactly as C.
The first question is, why would Fortran do so, what is the rationale behind this choice?
As a second point, I put this behaviour to the test. If I understood correctly, use of INTENT(IN) would save the time spent in copying back the data to th original location, as the compiler is sure the data has not been changed.
I tried this little piece of code
function funco(inp) result(j)
!! integer, dimension(:), intent (in) :: inp
integer, dimension(:):: inp
integer, dimension(SIZE(inp)) :: j ! output
j = 0.0 !! clear whole vector
N = size(inp)
DO i = 1, N
j(i) = inp(i)
END DO
end function
program main
implicit none
interface
function funco(inp) result(j)
!! integer, dimension(:), intent (in) :: inp
integer, dimension(:) :: inp
integer, dimension(SIZE(inp)) :: j ! output
end function
end interface
integer, dimension(3000) :: inp , j
!! integer, dimension(3000) :: funco
integer :: cr, cm , c1, c2, m
real :: rate, t1, t2
! Initialize the system_clock
CALL system_clock(count_rate=cr)
CALL system_clock(count_max=cm)
CALL CPU_TIME(t1)
rate = REAL(cr)
WRITE(*,*) "system_clock rate ",rate
inp = 2
DO m = 1,1000000
j = funco(inp) + 1
END DO
CALL SYSTEM_CLOCK(c2)
CALL CPU_TIME(t2)
WRITE(*,*) "system_clock : ",(c2 - c1)/rate
WRITE(*,*) "cpu_time : ",(t2-t1)
end program
The function copies an array, and in the main body this is repeated many times.
According to the claim above, the time spent in copying back the array should somehow show up.
system_clock rate 1000.00000
system_clock : 2068.07910
cpu_time : 9.70935345
but the results are pretty much the same independently from whether INTENT is use or not.
Could anybody share some light on these two points, why does Fortran performs an additional copy (which seems ineffective at first, efficiency-wise) instead of passing by reference, and does really INTENT save the time of a copying operation?

The answer you are referring to speaks about passing some specific type of subsection, not of the whole array. In that case a temporary copy might be necessary, depending on the function. Your function uses and assumed shape array and a temporary array will not be necessary even if you try quite hard.
An example of what the author (it wasn't me) might have had in mind is
module functions
implicit none
contains
function fun(a, n) result(res)
real :: res
! note the explicit shape !!!
integer, intent(in) :: n
real, intent(in) :: a(n, n)
integer :: i, j
do j = 1, n
do i = 1, n
res = res + a(i,j) *i + j
end do
end do
end function
end module
program main
use functions
implicit none
real, allocatable :: array(:,:)
real :: x, t1, t2
integer :: fulln
fulln = 400
allocate(array(1:fulln,1:fulln))
call random_number(array)
call cpu_time(t1)
x = fun(array(::2,::2),(fulln/2))
call cpu_time(t2)
print *,x
print *, t2-t1
end program
This program is somewhat faster with intent(in) when compared to intent(inout) in Gfortran (not so much in Intel). However, it is even much faster with an assumed shape array a(:,:). Then no copy is performed.
I am also getting some strange uninitialized accesses in gfortran when running without runtime checks. I do not understand why.
Of course this is a contrived example and there are real cases in production programs where array copies are made and then intent(in) can make a difference.

Related

array operation in fortran

I am writing a code with a lot of 2D arrays and manipulation of them. I would like the code to be as concise as possible, for that I would like to use as many 'implicit' operation on array as possible but I don't really know how to write them for 2D arrays.
For axample:
DO J=1,N
DO I=1,M
A(I,J)=B(J)*A(I,J)
ENDDO
ENDDO
become easily:
DO J=1,N
A(:,J)=B(J)*A(:,J)
ENDDO
Is there a way to reduce also the loop J?
Thanks
For brevity and clarity, you could wrap these operations in a derived type. I wrote a minimal example which is not so concise because I need to initialise the objects, but once this initialisation is done, manipulating your arrays becomes very concise and elegant.
I stored in arrays_module.f90 a derived type arrays2d_T which can hold the array coefficients, plus useful information (number of rows and columns). This type contains procedures for initialisation, and the operation you are trying to perform.
module arrays_module
implicit none
integer, parameter :: dp = kind(0.d0) !double precision definition
type :: arrays2d_T
real(kind=dp), allocatable :: dat(:,:)
integer :: nRow, nCol
contains
procedure :: kindOfMultiply => array_kindOfMuliply_vec
procedure :: init => initialize_with_an_allocatable
end type
contains
subroutine initialize_with_an_allocatable(self, source_dat, nRow, nCol)
class(arrays2d_t), intent(inOut) :: self
real(kind=dp), allocatable, intent(in) :: source_dat(:,:)
integer, intent(in) :: nRow, nCol
allocate (self%dat(nRow, nCol), source=source_dat)
self%nRow = nRow
self%nCol = nCol
end subroutine
subroutine array_kindOfMuliply_vec(self, vec)
class(arrays2d_t), intent(inOut) :: self
real(kind=dp), allocatable, intent(in) :: vec(:)
integer :: iRow, jCol
do jCol = 1, self%nCol
do iRow = 1, self%nRow
self%dat(iRow, jCol) = vec(jCol)*self%dat(iRow, jCol)
end do
end do
end subroutine
end module arrays_module
Then, in main.f90, I check the behaviour of this multiplication on a simple example:
program main
use arrays_module
implicit none
type(arrays2d_T) :: A
real(kind=dp), allocatable :: B(:)
! auxilliary variables that are only useful for initialization
real(kind=dp), allocatable :: Aux_array(:,:)
integer :: M = 3
integer :: N = 2
! initialise the 2d array
allocate(Aux_array(M,N))
Aux_array(:,1) = [2._dp, -1.4_dp, 0.3_dp]
Aux_array(:,2) = [4._dp, -3.4_dp, 2.3_dp]
call A%init(aux_array, M, N)
! initialise vector
allocate (B(N))
B = [0.3_dp, -2._dp]
! compute the product
call A%kindOfMultiply(B)
print *, A%dat(:,1)
print *, A%dat(:,2)
end program main
Compilation can be as simple as gfortran -c arrays_module.f90 && gfortran -c main.f90 && gfortran -o main.out main.o arrays_module.o
Once this object-oriented machinery exists, call A%kindOfMultiply(B) is much clearer than a FORALL approach (and much less error prone).
No one has mentioned do concurrent construct here, which has the potential to automatically parallelize and speed up your code,
do concurrent(j=1:n); A(:,j)=B(j)*A(:,j); end do
A one-line solution can be achieved by using FORALL:
FORALL(J=1:N) A(:,J) = B(J)*A(:,J)
Note that FORALL is deprecated in the most recent versions of the standard, but as far as I know, that is the only way you can perform that operation as a single line of code.

Why does a subroutine with an array from a "use module" statement give faster performance than the same subroutine a locally sized array?

Related to this question, but I believe the issue is more clearly identified with this example.
I have some legacy code that looks like this:
subroutine ID_OG(N, DETERM)
use variables, only: ID
implicit real (A-H,O-Z)
implicit integer(I-N)
DETERM = 1.0
DO 1 I=1,N
1 ID(I)=0
DETERM = sum(ID)
end subroutine ID_OG
Replacing use variables, only: ID with real, dimension(N) :: ID or real, dimension(:), allocatable :: ID causes a noticeable performance loss. Why is this? Is this expected behavior? I am wondering if it has something to do with the program needing to repeatedly allocate memory for the local array ID, while the use statement allows the program to skip the memory allocation step.
In the legacy code ID is in module variables but it is only used within the subroutine ID_OG. It is not used anywhere else in the code - it is not an input or an output. To me, it seems like good programming practice for ID to be removed from module variables and defined locally in the subroutine. But perhaps that isn't the case.
Minimum working example (MWE):
compiling as gfortran -O3 test.f95 using gfortran 8.2.0
MODULE variables
implicit none
real, dimension(:), allocatable :: ID
END MODULE variables
program test
use variables
implicit none
integer :: N
integer :: loop_max = 1e6
integer :: ii ! loop index
real :: DETERM
real :: t1, t2
real :: t_ID_OG, t_ID_header, t_ID_no_ID, t_OG_no_ID, t_allocate
character(*), parameter :: format_header = '((A5, 1X), 20(A12,1X))'
character(*), parameter :: format_data = '((I5, 1X), 20(ES12.5, 1X))'
open(1, file = 'TimingSubroutines_ID.txt', status = 'unknown')
write(1,format_header) 'N', 't_Legacy', 't_header', 't_head_No_ID', 't_Leg_no_ID', &
& 't_allocate'
do N = 1, 100
allocate(ID(N))
call CPU_time(t1)
do ii = 1, loop_max
CALL ID_OG(N, DETERM)
end do
call CPU_time(t2)
t_ID_OG = t2 - t1
print*, N, DETERM
call CPU_time(t1)
do ii = 1, loop_max
CALL ID_header(N, DETERM)
end do
call CPU_time(t2)
t_ID_header = t2 - t1
print*, N, DETERM
call CPU_time(t1)
do ii = 1, loop_max
CALL ID_header_no_ID(N, DETERM)
end do
call CPU_time(t2)
t_ID_no_ID = t2 - t1
print*, N, DETERM
call CPU_time(t1)
do ii = 1, loop_max
CALL ID_OG_no_ID(N, DETERM)
end do
call CPU_time(t2)
t_OG_no_ID = t2 - t1
print*, N, DETERM
call CPU_time(t1)
do ii = 1, loop_max
CALL ID_OG_allocate(N, DETERM)
end do
call CPU_time(t2)
t_allocate = t2 - t1
print*, N, DETERM
deallocate(ID)
write(1,format_data) N, t_ID_OG, t_ID_header, t_ID_no_ID, t_OG_no_ID, t_allocate
end do
end program test
subroutine ID_OG(N, DETERM)
use variables, only: ID
implicit real (A-H,O-Z)
implicit integer(I-N)
DETERM = 1.0
DO 1 I=1,N
1 ID(I)=0
DETERM = sum(ID)
end subroutine ID_OG
subroutine ID_header(N, DETERM)
use variables, only: ID
implicit none
integer, intent(in) :: N
real, intent(out) :: DETERM
integer :: I
DETERM = 1.0
DO 1 I=1,N
1 ID(I)=0
DETERM = sum(ID)
end subroutine ID_header
subroutine ID_header_no_ID(N, DETERM)
implicit none
integer, intent(in) :: N
real, intent(out) :: DETERM
integer :: I
real, dimension(N) :: ID
DETERM = 1.0
DO 1 I=1,N
1 ID(I)=0
DETERM = sum(ID)
end subroutine ID_header_no_ID
subroutine ID_OG_no_ID(N, DETERM)
implicit real (A-H,O-Z)
implicit integer(I-N)
real, dimension(N) :: ID
DETERM = 1.0
DO 1 I=1,N
1 ID(I)=0
DETERM = sum(ID)
end subroutine ID_OG_no_ID
subroutine ID_OG_allocate(N, DETERM)
implicit real (A-H,O-Z)
implicit integer(I-N)
real, dimension(:), allocatable :: ID
allocate(ID(N))
DETERM = 1.0
DO 1 I=1,N
1 ID(I)=0
DETERM = sum(ID)
end subroutine ID_OG_allocate
Allocating the arrays takes time. The compiler is free to allocate the local arrays where-ever it wants, but it can typically be adjusted by compiler-specific flags. Use -fstack-arrays for gfortran to force local arrays to stack.
Allocating on the stack is just changing the stack pointer, it is virtually for free. Allocating on the heap, however, is more involved and requires some bookkeeping.
There are situations where local variables are in order and there are situations where global (module) variables are in order. One can also use local saved variables or variables that are components of some objects. One cannot say which one is better without seeing the complete design of the code in question.
FWIW, with -fstack-arrays I do not see much difference except when allocating explicitly using allocate():
Explicit allocate will always use the heap.
Without -fstack-arrays I do see some:
The graphs are quite noisy because my notebook is running many processes at the same time.
This is not to say that one should always use -fstack-arrays, I used to demonstrate the difference. The option is useful, but care must be taken to avoid a stack overflow error. -fmax-stack-var-size may help with that.
As your tests are pointing out, the additional overhead of all methods which do not use the module variable is due to the language's phylosophy to not bother the user with memory handling too much.
The compiler will decide where memory should be allocated, unless you start tinkering with compiler flags. You see allocation/freeing time as a drawback, but your analysis also shows:
stack vs. heap memory handling overhead quickly gets smaller and smaller: for N>=100, it is already <50%. a dimension(100) array is a ridiculously small memory chunk on a modern computer.
declaring a variable in a module just for speeding up storage is a Fortran 90 way of making it a global, and as such, it is a deprecated coding style.
I think the best strategy to make the code well-coded and fast is:
Is N going to be constant through the whole runtime? Then, it would be a good idea to encapsulate it into a class:
module myCalculation
implicit none
type, public: fancyMethod
integer :: N = 0
real, allocatable :: ID(:)
contains
procedure :: init
procedure :: compute
procedure :: is_init
end type fancyMethod
contains
elemental subroutine init(self,n)
class(fancyMethod), intent(inout) :: self
integer, intent(in) :: n
real, allocatable :: tmp(:)
self%N = n
allocate(tmp(N)); tmp(:) = 0
call move_alloc(from=tmp,to=self%ID)
end subroutine init
elemental logical function is_init(self)
class(fancyMethod), intent(in) :: self
is_init = allocated(self%ID) .and. size(self%ID)>0
end function is_init
real function compute(self,n,...) result(DETERM)
class(fancyMethod), intent(inout) :: self
integer, intent(in) :: n
....
if (.not.is_init(self)) call init(self,N)
DETERM = sum(self%ID(1:N))
end function compute
end module myCalculation
Is N going to be constant and small? Why not just use a PARAMETER to define its max size? if it is a parameter, the compiler will perhaps always put the automatic array on the stack:
real function computeWithMaxSize(N) result(DETERM)
integer, intent(in) :: N
integer, parameter :: MAX_SIZE = 1024
real :: ID(MAX_SIZE)
[...]
if (N>MAX_SIZE) stop ' N is too large! '
DETERM = sum(ID(1:N))
end function computeWithMaxSize
Is N going to be variable-sized and large? Then, the in-routine memory handling is fine, and its overhead is likely negligible, because the CPU time will be dominated by the calculation; use an allocatable version if you're not sure that the size can be so large to cause any stack issues:
real function computeWithAllocatable(N) result(DETERM)
integer, intent(in) :: N
real, allocatable :: ID(:)
allocate(ID(N))
[...]
DETERM = sum(ID(1:N))
end function computeWithAllocatable

Generalizing your operation for a specific declared type in Fortran

I have a Structure of arrays using the declared type in Fortran
e.g.
type t_data
integer :: N
real, allocatable :: x(:)
real, allocatable :: z(:)
real, allocatable :: y(:)
contains
procedure :: copy
procedure :: SWAP
procedure :: copy_element
end type
! constructor
interface t_data
module procedure constructor
end interface
contains
subroutine copy(this, old)
class(t_data), intent(inout) :: this
type(t_data), intent(in) :: old
do i = 1, old% N
this% x(i) = old% x(i)
etc ..
end do
end subroutine
subroutine copy(this, old)
class(t_data), intent(inout) :: this
type(t_data), intent(in) :: old
do i = 1, old% N
this% x(i) = old% x(i)
etc ..
end do
end subroutine
function constructor(size_)
integer, intent(in) :: size_
type(t_data), :: constructor
allocate(constructor% x(size_))
allocate(constructor% y(size_) )
! etc
end function
subroutine swap(this, from, i1,i2)
class(t_particle_data), intent(inout) :: this
type(t_particle_data), intent(in) :: from
integer, intent(in) :: i1, i2
this% x(i1) = from% x(i2)
! etc
end subroutine
These are a set of examples of procedures that need to do same operations on all arrays of the declared type t_data. My question is how to make it more maintainable to tackle the situation when we for example later want to add a new component to the declared type.
Currently, when I add a new array to my t_data, I need to go through all those procedures, constructors, deconstructors, and add the component.
I am asking if there is a way to make this more easier.
MY APPLICATION
Please note that these data type is used for particle simulation. Initially I allocate t_data with a large number. However, later during my simulation I might need more particles. Hence, I allocate a new t_data with more memory and copy over the old t_data up to its old size.
subroutine NEW_ALLOC(new, old)
type(t_data), intent(out) :: new
type(t_data), intent(inout) :: old
nsize = old% N * 2 ! allocate twice the old size
new = T_DATA(nsize)
call new% copy(old)
!DEALLCOte OLD
end subroutine
Does anybody has/is it possible to make this in a more clever way. I do not mind mixing this with C/C++?
My question is how to make it more maintainable to tackle the situation when we for example later want to add a new component to the declared type.
Here's how I would tackle the situation, and how many Fortran programmers have tackled the situation. I don't see the compelling need to have a derived type containing 3 arrays of coordinates, and approaching the problem that way does, as OP fears, require that adding another dimension to the problem requires code revision, such as adding a member array real, allocatable :: w(:) to t_data and recoding all the type-bound procedures operating on the type.
So drop that approach in favour of
TYPE t_data
REAL, DIMENSION(:,:), ALLOCATABLE :: elements
END TYPE t_data
let's have a couple of instances for exposition
TYPE(t_data) :: t1 ,t2, t3
and we can allocate the elements member of any of these this way
ALLOCATE(t1%elements(3,10))
which could just as easily be
ALLOCATE(t1%elements(6,100))
or whatever you wish. This has the advantage over the original derived type design that the dimensions of elements can be determined at run-time. It also makes it difficult to have different lengths for each of the coordinate arrays.
Now, copying t1 is as simple as
t2 = t1
Modern Fortran even takes care of automatically allocating the elements of t2. So I don't see any need for defining procedures for copying whole instances of t_data. As for swapping data around, slicing and dicing, this is as simple as
t2%elements(1,:) = t1%elements(2,:)
even
t2%elements(1,:) = t1%elements(1,6:1:-1)
or
t2%elements(1,:) = t1%elements(1,[1,3,5,2,4,6])
It should be obvious how to wrap these into a swap routine. But if not, ask another question.
Next, to the point about needing to allocate more space for elements during execution. First a temporary array
REAL, DIMENSION(:,:), ALLOCATABLE :: temp
then a little code like this, to double the size of elements.
ALLOCATE(temp(3,2*n))
temp(:,1:n) = t2%elements(:,1:n)
CALL MOVE_ALLOC(to=t2%elements,from=temp)
Again, you might care to wrap this into a procedure and if you need help doing that, ask for it.
Finally, the lesson of all this is not to share how I would program the problem, but to share the idea to program in Fortran.

Is it possible to do OpenMP reduction over an element of a derived Fortran type?

I am trying to adapt a Fortran code (Gfortran) to make use of OpenMP. It is a particle based code where the index of arrays can correspond to particles or pairs. The code uses a derived type to store a number of matrices for each particle. It is very common to come across loops which require the use of a matrix stored in this derived type. This matrix may be accessed by multiple threads. The loop also requires a reduction over an element in this derived type. I currently have to write a temporary array in order to do this reduction and then I set the element of the derived type equal to this temporary reduction array. If not using OpenMP no temporary array is needed.
Question: Is it possible to do a reduction over an element of a derived type? I don't think I can do a reduction over the entire derived type as I need to access some of the elements in the derived type to do work, which means it needs to be SHARED. (From reading the specification I understand that when using REDUCTION a private copy of each list item is created.)
Complete minimal working example below. It could be more minimal but I feared that removing more components might over simplify the problem.
PROGRAM TEST_OPEN_MP
USE, INTRINSIC :: iso_fortran_env
USE omp_lib
IMPLICIT NONE
INTEGER, PARAMETER :: dp = REAL64
INTEGER, PARAMETER :: ndim=3
INTEGER, PARAMETER :: no_partic=100000
INTEGER, PARAMETER :: len_array=1000000
INTEGER :: k, i, ii, j, jj
INTEGER, DIMENSION(1:len_array) :: pair_i, pair_j
REAL(KIND=dp), DIMENSION(1:len_array) :: pair_i_r, pair_j_r
REAL(KIND=dp), DIMENSION(1:no_partic) :: V_0
REAL(KIND=dp), DIMENSION(1:ndim,1:no_partic) :: disp, foovec
REAL(KIND=dp), DIMENSION(1:ndim,1:len_array) :: dvx
REAL(KIND=dp), DIMENSION(1:2*ndim,1:len_array):: vec
REAL(KIND=dp), DIMENSION(1:ndim) :: disp_ij,temp_vec1,temp_vec2
REAL(KIND=dp), DIMENSION(1:ndim,1:ndim) :: temp_ten1,temp_ten2
REAL(KIND=dp), DIMENSION(1:no_partic,1:ndim,1:ndim):: reduc_ten1
REAL(KIND=dp) :: sum_check1,sum_check2,cstart,cend
TYPE :: matrix_holder !<-- The derived type
REAL(KIND=dp), DIMENSION(1:ndim,1:ndim) :: mat1 !<-- The first element
REAL(KIND=dp), DIMENSION(1:ndim,1:ndim) :: mat2 !<-- The second element, etc.
END TYPE matrix_holder
TYPE(matrix_holder), DIMENSION(1:no_partic) :: matrix
! Setting "random" values to the arrays
DO k = 1, no_partic
CALL random_number(matrix(k)%mat1(1:ndim,1:ndim))
CALL random_number(matrix(k)%mat2(1:ndim,1:ndim))
END DO
CALL random_number(pair_i_r)
CALL random_number(pair_j_r)
CALL random_number(disp)
CALL random_number(vec)
CALL random_number(dvx)
CALL random_number(V_0)
disp = disp*10.d0
vec = vec*100.d0
dvx = dvx*200.d0
V_0 = V_0*10d0
pair_i = FLOOR(no_partic*pair_i_r)+1
pair_j = FLOOR(no_partic*pair_j_r)+1
! Doing the work
cstart = omp_get_wtime()
!$OMP PARALLEL DO DEFAULT(SHARED) &
!$OMP& PRIVATE(i,j,k,disp_ij,temp_ten1,temp_ten2,temp_vec1,temp_vec2,ii,jj), &
!$OMP& REDUCTION(+:foovec,reduc_ten1), SCHEDULE(static)
DO k= 1, len_array
i = pair_i(k)
j = pair_j(k)
disp_ij(1:ndim) = disp(1:ndim,i)-disp(1:ndim,j)
temp_vec1 = MATMUL(matrix(i)%mat2(1:ndim,1:ndim),&
vec(1:ndim,k))
temp_vec2 = MATMUL(matrix(j)%mat2(1:ndim,1:ndim),&
vec(1:ndim,k))
DO jj=1,ndim
DO ii = 1,ndim
temp_ten1(ii,jj) = -disp_ij(ii) * vec(jj,k)
temp_ten2(ii,jj) = disp_ij(ii) * vec(ndim+jj,k)
END DO
END DO
reduc_ten1(i,1:ndim,1:ndim)=reduc_ten1(i,1:ndim,1:ndim)+temp_ten1*V_0(j) !<--The temporary reduction array
reduc_ten1(j,1:ndim,1:ndim)=reduc_ten1(j,1:ndim,1:ndim)+temp_ten2*V_0(i)
foovec(1:ndim,i) = foovec(1:ndim,i) - temp_vec1(1:ndim)*V_0(j) !<--A generic reduction vector
foovec(1:ndim,j) = foovec(1:ndim,j) + temp_vec1(1:ndim)*V_0(i)
END DO
!$OMP END PARALLEL DO
cend = omp_get_wtime()
! Checking the results
sum_check1 = 0.d0
sum_check2 = 0.d0
DO i = 1,no_partic
matrix(i)%mat2(1:ndim,1:ndim)=reduc_ten1(i,1:ndim,1:ndim) !<--Writing the reduction back to the derived type element
sum_check1 = sum_check1+SUM(foovec(1:ndim,i))
sum_check2 = sum_check2+SUM(matrix(i)%mat2(1:ndim,1:ndim))
END DO
WRITE(*,*) sum_check1, sum_check2, cend-cstart
END PROGRAM TEST_OPEN_MP
The only other alternative I can think of would be to remove all the derived types and replace these with large arrays similar to reduc_ten1 in the example.
Unfortunately, what you want is not possible. At least if I understood your (very complicated for me!) code correctly.
The problem is that you have an array of derived types each have an array. You cannot reference that.
Consider this toy example:
type t
real :: mat(3)
end type
integer, parameter :: n = 100, nk = 1000
type(t) :: parts(n)
integer :: i
real :: array(3,n,nk)
do k = 1, nk
array(:,:,nk) = k
end do
do i = 1, n
parts(i)%mat = 0
end do
!$omp parallel do reduction(+:parts%mat)
do k = 1, nk
do i = 1, n
parts(i)%mat = parts(i)%mat + array(:,i,nk)
end do
end do
!$omp end parallel do
end
Intel Fortran gives a more concrete error:
reduction6.f90(23): error #6159: A component cannot be an array if the encompassing structure is an array. [MAT]
!$omp parallel do reduction(+:parts%mat)
--------------------------------------^
reduction6.f90(23): error #7656: Subobjects are not allowed in this OpenMP* clause; a named variable must be specified. [PARTS]
!$omp parallel do reduction(+:parts%mat)
--------------------------------^
Remember that it is not even allowed to do this, completely without OpenMP:
parts%mat = 0
Intel:
reduction6.f90(21): error #6159: A component cannot be an array if the encompassing structure is an array. [MAT]
gfortran:
Error: Two or more part references with nonzero rank must not be specified at (1)
You must do this:
do i = 1, n
parts(i)%mat = 0
end do
The reason for the error reported by Intel above is very similar.
Actually no derived type components are allowed in the reduction clause, only variable names can be used. That is the reason for the syntax error reported by gfortran. It does not expect any % there. Intel again gives a clearer error message.
But one could make a workaround around that, like passing it to a subroutine and do the reduction there.

changing array dimensions in fortran

There are basically two ways to pass arrays to a subroutine in Fortran 90/95:
PROGRAM ARRAY
INTEGER, ALLOCATABLE :: A(:,:)
INTEGER :: N
ALLOCATE(A(N,N))
CALL ARRAY_EXPLICIT(A,N)
! or
CALL ARRAY_ASSUMED(A)
END PROGRAM ARRAY
SUBROUTINE ARRAY_EXPLICIT(A,N)
INTEGER :: N
INTEGER :: A(N,N)
! bla bla
END SUBROUTINE ARRAY_EXPLICIT
SUBROUTINE ARRAY_ASSUMED(A)
INTEGER, ALLOCATABLE :: A(:,:)
N=SIZE(A,1)
! bla bla
END SUBROUTINE ARRAY_ASSUMED
where you need an explicit interface for the second, usually through the use of a module.
From FORTRAN77, I'm used to the first alternative, and I read this is also the most efficient if you pass the whole array.
The nice thing with the explicit shape is that I can also call a subroutine and treat the array as a vector instead of a matrix:
SUBROUTINE ARRAY_EXPLICIT(A,N)
INTEGER :: N
INTEGER :: A(N**2)
! bla bla
END SUBROUTINE ARRAY_EXPLICIT
I wondered if there is a nice way to do that kind of thing using the second, assumed shape interface, without copying it.
See the RESHAPE intrinsic, e.g.
http://gcc.gnu.org/onlinedocs/gfortran/RESHAPE.html
Alternatively, if you want to avoid the copy (in some cases an optimizing compiler might be able to do a reshape without copying, e.g. if the RHS array is not used afterwards, but I wouldn't count on it), as of Fortran 2003 you can assign pointers to targets of different rank, using bounds remapping. E.g. something like
program ptrtest
real, pointer :: a(:)
real, pointer :: b(:,:)
integer :: n = 10
allocate(a(n**2))
a = 42
b (1:n, 1:n) => a
end program ptrtest
I was looking to do the same thing and came across this discussion. None of the solutions suited my purposes, but I found that there is a way to reshape an array without copying the data using iso_c_binding if you are using the fortran 2003 standard which current fortran 90/95 compilers tend to support. I know the discussion is old, but I figured I would add what I came up with for the benefit of others with this question.
The key is to use the function C_LOC to convert an array to an array pointer, and then use C_F_POINTER to convert this back into a fortran array pointer with the desired shape. One challenge with using C_LOC is that C_LOC only works for array that have a directly specified shape. This is because arrays in fortran with an incomplete size specification (i.e., that use a : for some dimension) include an array descriptor along with the array data. C_LOC does not give you the memory location of the array data, but the location of the descriptor. So an allocatable array or a pointer array don't work with C_LOC (unless you want the location of the compiler specific array descriptor data structure). The solution is to create a subroutine or function that receives the array as an array of fixed size (the size really doesn't matter). This causes the array variable in the function (or subroutine) to point to the location of the array data rather than the location of the array descriptor. You then use C_LOC to get a pointer to the array data location and C_F_POINTER to convert this pointer back into an array with the desired shape. The desired shape must be passed into this function to be used with C_F_POINTER. Below is an example:
program arrayresize
implicit none
integer, allocatable :: array1(:)
integer, pointer :: array2(:,:)
! allocate and initialize array1
allocate(array1(6))
array1 = (/1,2,3,4,5,6/)
! This starts out initialized to 2
print *, 'array1(2) = ', array1(2)
! Point array2 to same data as array1. The shape of array2
! is passed in as an array of intergers because C_F_POINTER
! uses and array of intergers as a SIZE parameter.
array2 => getArray(array1, (/2,3/))
! Change the value at array2(2,1) (same as array1(2))
array2(2,1) = 5
! Show that data in array1(2) was modified by changing
! array2(2,1)
print *, 'array(2,1) = array1(2) = ', array1(2)
contains
function getArray(array, shape_) result(aptr)
use iso_c_binding, only: C_LOC, C_F_POINTER
! Pass in the array as an array of fixed size so that there
! is no array descriptor associated with it. This means we
! can get a pointer to the location of the data using C_LOC
integer, target :: array(1)
integer :: shape_(:)
integer, pointer :: aptr(:,:)
! Use C_LOC to get the start location of the array data, and
! use C_F_POINTER to turn this into a fortran pointer (aptr).
! Note that we need to specify the shape of the pointer using an
! integer array.
call C_F_POINTER(C_LOC(array), aptr, shape_)
end function
end program
#janneb has already answered re RESHAPE. RESHAPE is a function -- usually used in an assignment statement so there will be a copy operation. Perhaps it can be done without copying using pointers. Unless the array is huge, it is probably better to use RESHAPE.
I'm skeptical that the explicit shape array is more efficient than the assumed shape, in terms of runtime. My inclination is to use the features of the Fortran >=90 language and use assumed shape declarations ... that way you don't have to bother passing the dimensions.
EDIT:
I tested the sample program of #janneb with ifort 11, gfortran 4.5 and gfortran 4.6. Of these three, it only works in gfortran 4.6. Interestingly, to go the other direction and connect a 1-D array to an existing 2-D array requires another new feature of Fortran 2008, the "contiguous" attribute -- at least according to gfortran 4.6.0 20110318. Without this attribute in the declaration, there is a compile time error.
program test_ptrs
implicit none
integer :: i, j
real, dimension (:,:), pointer, contiguous :: array_twod
real, dimension (:), pointer :: array_oned
allocate ( array_twod (2,2) )
do i=1,2
do j=1,2
array_twod (i,j) = i*j
end do
end do
array_oned (1:4) => array_twod
write (*, *) array_oned
stop
end program test_ptrs
You can use assumed-size arrays, but it can mean multiple layers of wrapper
routines:
program test
implicit none
integer :: test_array(10,2)
test_array(:,1) = (/1, 2, 3, 4, 5, 6, 7, 8, 9, 10/)
test_array(:,2) = (/11, 12, 13, 14, 15, 16, 17, 18, 19, 20/)
write(*,*) "Original array:"
call print_a(test_array)
write(*,*) "Reshaped array:"
call print_reshaped(test_array, size(test_array))
contains
subroutine print_reshaped(a, n)
integer, intent(in) :: a(*)
integer, intent(in) :: n
call print_two_dim(a, 2, n/2)
end subroutine
subroutine print_two_dim(a, n1, n2)
integer, intent(in) :: a(1:n1,1:*)
integer, intent(in) :: n1, n2
call print_a(a(1:n1,1:n2))
end subroutine
subroutine print_a(a)
integer, intent(in) :: a(:,:)
integer :: i
write(*,*) "shape:", shape(a)
do i = 1, size(a(1,:))
write(*,*) a(:,i)
end do
end subroutine
end program test
I am using ifort 14.0.3 and 2D to 1D conversion, I could use an allocatable array for 2D array and a pointer array for 1D:
integer,allocatable,target :: A(:,:)
integer,pointer :: AP(:)
allocate(A(3,N))
AP(1:3*N) => A
As #M.S.B mentioned, in case both A and AP have the pointer attribute, I had to use contiguous attribute for A to guarantee the consistency of the conversion.
Gfortran is a bit paranoid with interfaces. It not only wants to know the type, kind, rank and number of arguments, but also the shape, the target attribute and the intent (although I agree with the intent part). I encountered a similar problem.
With gfortran, there are three different dimension definition:
1. Fixed
2. Variable
3. Assumed-size
With ifort, categories 1 and 2 are considered the same, so you can do just define any dimension size as 0 in the interface and it works.
program test
implicit none
integer, dimension(:), allocatable :: ownlist
interface
subroutine blueprint(sz,arr)
integer, intent(in) :: sz
integer, dimension(0), intent(in) :: arr
! This zero means that the size does not matter,
! as long as it is a one-dimensional integer array.
end subroutine blueprint
end interface
procedure(blueprint), pointer :: ptr
allocate(ownlist(3))
ownlist = (/3,4,5/)
ptr => rout1
call ptr(3,ownlist)
deallocate(ownlist)
allocate(ownlist(0:10))
ownlist = (/3,4,5,6,7,8,9,0,1,2,3/)
ptr => rout2
call ptr(3,ownlist)
deallocate(ownlist)
contains
! This one has a dimension size as input.
subroutine rout1(sz,arr)
implicit none
integer, intent(in) :: sz
integer, dimension(sz), intent(in) :: arr
write(*,*) arr
write(*,*) arr(1)
end subroutine rout1
! This one has a fixed dimension size.
subroutine rout2(sz,arr)
implicit none
integer, intent(in) :: sz
integer, dimension(0:10), intent(in) :: arr
write(*,*) "Ignored integer: ",sz
write(*,*) arr
write(*,*) arr(1)
end subroutine rout2
end program test
Gfortran complains about the interface. Changing the 0 into 'sz' solves the problem four 'rout1', but not for 'rout2'.
However, you can fool gfortran around and say dimension(0:10+0*sz) instead of dimension(0:10) and gfortran compiles and gives the same
result as ifort.
This is a stupid trick and it relies on the existence of the integer 'sz' that may not be there. Another program:
program difficult_test
implicit none
integer, dimension(:), allocatable :: ownlist
interface
subroutine blueprint(arr)
integer, dimension(0), intent(in) :: arr
end subroutine blueprint
end interface
procedure(blueprint), pointer :: ptr
allocate(ownlist(3))
ownlist = (/3,4,5/)
ptr => rout1
call ptr(ownlist)
deallocate(ownlist)
allocate(ownlist(0:10))
ownlist = (/3,4,5,6,7,8,9,0,1,2,3/)
ptr => rout2
call ptr(ownlist)
deallocate(ownlist)
contains
subroutine rout1(arr)
implicit none
integer, dimension(3), intent(in) :: arr
write(*,*) arr
write(*,*) arr(1)
end subroutine rout1
subroutine rout2(arr)
implicit none
integer, dimension(0:10), intent(in) :: arr
write(*,*) arr
write(*,*) arr(1)
end subroutine rout2
end program difficult_test
This works under ifort for the same reasons as the previous example, but gfortran complains about the interface. I do not know how I can fix it.
The only thing I want to tell gfortran is 'I do not know the dimension size yet, but we will fix it.'. But this needs a spare integer arguemnt (or something else that we can turn into an integer) to fool gfortran around.