Moving average with mask in Fortran - fortran

I have to calculate the moving average of a masked dataset with dimensions (7320,8520) in Fortran. I wrote a subroutine that receives the data (TS) and outputs the averaged data (TS_NEW). The problem is that the code is taking too long to run (it actually never finishes, despite not running into memory issues). I wonder if there's a way to make the code more efficient. Below is the code I wrote:
SUBROUTINE avgwin(ts,winsize,size1,size2,sizelat,sizelon,ts_new)
implicit none
double precision, dimension(size1,size2),INTENT(IN) :: ts
double precision, dimension(winsize,winsize) :: store
double precision, dimension(sizelon,sizelat),INTENT(OUT) :: ts_new
integer :: j,k
integer :: A, B
integer,INTENT(IN) :: winsize,size1,size2,sizelat,sizelon
logical, dimension(size1,size2) :: mask,mask2
double precision :: SUMVAR, COUNTVAR
A=1
B=1
mask = ts > 0 !Mask to highlight all the OK values
mask2 = ts < 0 !Mask to highlight all the values to be discarded
do j=1,sizelat !Looping through latitude
do k=1,sizelon !Looping through longitude
if (ALL(mask2(k:k+winsize-1,j:j+winsize-1)) .eqv. .true.) then
ts_new(B,A) = -100 !Adds a fill value if all the elements are to be discarded
B=B+1
else
SUMVAR = sum(ts(k:k+winsize-1,j:j+winsize-1), MASK=mask(k:k+winsize-1,j:j+winsize-1))
COUNTVAR = count(mask(k:k+winsize-1,j:j+winsize-1))
ts_new(B,A) = SUMVAR/COUNTVAR
B=B+1
end if
end do
B=1
A=A+1
end do
END SUBROUTINE
program test
implicit none
double precision, dimension(7320,8520) :: DATA
double precision, dimension(:,:),allocatable :: DATA_NEW
integer :: sizelat, sizelon, i, j, len1, len2, winsize
integer, dimension(3) :: sizes
len1 = 7320
len2 = 8520
do i=1,8520
do j=1,7320
DATA(j,i)= i !Just for testing purposes
end do
end do
sizes(1:3) = (/300,301,302/)
do w=1,3
winsize = sizes(w)
sizelon = len1-winsize+1
sizelat = len2-winsize+1
allocate(DATA_NEW(sizelon,sizelat))
CALL avgwin(DATA,winsize,len1,len2,sizelat,sizelon,DATANEW)
end do
end program test

Though not sure if this meets the OP's purpose, how about first collecting data along one dimension and then collecting the processed data again along another dimension (i.e., partial summation)? For example, if we consider a simpler problem of summing data( 1:L, 1:L ) over moving window of size w, there may be three different ways to achieve this:
program main
implicit none
real, allocatable, dimension(:,:) :: data, direct, part1, part2
integer :: i1, i2, L, S, w
real :: t1, t2
L = 2000
w = 50
S = L - w + 1
allocate( data( L, L ), direct( S, S ), &
part1( L, S ), part2( S, S ) )
!> test data
do i2 = 1, L
do i1 = 1, L
data( i1, i2 ) = mod( i1 + i2, 2 )
enddo
enddo
!> method 1: direct sum (cost = O( S^2 * w^2 ))
call cpu_time( t1 )
do i2 = 1, S
do i1 = 1, S
direct( i1, i2 ) = sum( data( i1:(i1 + w - 1), i2:(i2 + w - 1) ) )
enddo
enddo
call cpu_time( t2 )
print *, "time (s) = ", t2 - t1
!> method 2: partial sum (cost = O( S^2 * w * 2 ))
call cpu_time( t1 )
do i2 = 1, S
do i1 = 1, L
part1( i1, i2 ) = sum( data( i1, i2:(i2 + w - 1) ) )
enddo
enddo
do i2 = 1, S
do i1 = 1, S
part2( i1, i2 ) = sum( part1( i1:(i1 + w - 1), i2 ) )
enddo
enddo
call cpu_time( t2 )
print *, "time (s) = ", t2 - t1
print *, "error = ", maxval( abs( part2 - direct ) )
!> method 3: an improved version of method 2 (cost = O( S^2 ))
call cpu_time( t1 )
do i1 = 1, L
part1( i1, 1 ) = sum( data( i1, 1:w ) )
do i2 = 2, S
part1( i1, i2 ) = part1( i1, i2-1 ) &
- data( i1, i2-1 ) + data( i1, i2+w-1 )
enddo
enddo
do i2 = 1, S
part2( 1, i2 ) = sum( part1( 1:w, i2 ) )
do i1 = 2, S
part2( i1, i2 ) = part2( i1-1, i2 ) &
- part1( i1-1, i2 ) + part1( i1+w-1, i2 )
enddo
enddo
call cpu_time( t2 )
print *, "time (s) = ", t2 - t1
print *, "error = ", maxval( abs( part2 - direct ) )
end program
Then, gfortran-7.2 -O3 test.f90 seems to give some nice speedup:
time (s) = 9.64789867
time (s) = 0.345023155
error = 0.00000000
time (s) = 8.60958099E-02
error = 0.00000000
To calculate moving average with mask, a similar approach may work somehow. If we search the net, there may be other (better) approaches/libraries for such moving average, because it is very common calculation...

Related

Is there any simple way to realize sum over series of permutations of array in Fortran?

I tried to write the python code in Optimizing array additions and multiplications with transposes
in Fortran to see if I can achieve any speed up (-O3 helps a lot; the approach in Ian Bush's answer in Transposition of a matrix by multithread in Fortran, seems too complicated to me). E.g.,
0.1 * A(l1,l2,l3,l4) + 0.2*A(l1,l2,l4,l3) + 0.3 * A(l1,l3,l2,l4)+...
If I tried to extend from
Program transpose
integer, parameter :: dp = selected_real_kind(15, 307)
real(dp), dimension(:, :, :, :), allocatable :: a, b
Integer :: n1, n2, n3, n4, n, m_iter
Integer :: l1, l2, l3, l4
Integer(8) :: start, finish, rate
real(dp) :: sum_time
Write(*, *) 'n1, n2, n3, n4?'
Read(*, *) n1, n2, n3, n4
Allocate( a ( 1:n1, 1:n2, 1:n3, 1:n4 ) )
Allocate( b ( 1:n1, 1:n2, 1:n3, 1:n4 ) )
call random_init(.true., .false.)
Call Random_number( a )
m_iter = 100
b = 0.0_dp
Call system_clock( start, rate )
do n = 1, m_iter
do l4 = 1, n4
do l3 = 1, n3
do l2 = 1, n2
do l1 = 1, n1
b(l1,l2,l3,l4) = 0.1_dp*a(l1,l2,l3,l4) + 0.2_dp*a(l1,l2,l4,l3)
end do
end do
end do
end do
end do
Call system_clock( finish, rate )
sum_time = real( finish - start, dp ) / rate
write (*,*) 'all loop', sum_time/m_iter
print *, b(1,1,1,1)
End
(I tried reshape, slower than nested loops)
Is there any simple way to include A(l1,l3,l2,l4), A(l1,l3,l4,l2) etc? I can use Python to generate a strings to include all of them with \ for changing lines.
A potential complexity is, if there is a term 0.0 * A(l4,l3,l2,l1), and I would like to skip it, generate a string from python is complicated. Any more Fortran-like solution?
Another issue is, if the array A has different dimension in each index, say, n1 != n2 != n3 != n4, some permutation may out of bound. In this situation, the prefactor will be zero. For example, if n1 = n2 = 10, n3 = n4 = 20, it will be something like 0.1 * A(l1,l2,l3,l4) + 0.0 * A(l1,l3,l2,l4). In aother word, b = 0.1*a + 0.0*reshape(a, (/n1, n2, n3, n4/), order = (/1,3,2,4/) ) , or say 0.1*a + 0.0 * P(2,3) a, where P is a permutation operator. By checking the absolute value of permutation prefactor below some threshold, the summation would be able to skip that permutation.
In this case, the prefactor will be zero. The summation is supposed to skip that type of permutation.
Edited: a python reference implementation is below. I include a random and non-random version by the variable, gen_random. The latter may be eaiser to check.
import numpy as np
import time
import itertools as it
ref_list = [0, 1, 2, 3]
p = it.permutations(ref_list)
transpose_list = tuple(p)
n_loop = 2
na = nb = nc = nd = 30
A = np.zeros((na,nb,nc,nd))
gen_random = False
if gen_random == False:
n = 1
for la in range(na):
for lb in range(nb):
for lc in range(nc):
for ld in range(nd):
A[la,lb,lc,ld] = n
n = n + 1
else:
A = np.random.random((na,nb,nc,nd))
factor_list = [(i+1)*0.1 for i in range(24)]
time_total = 0.0
for n in range(n_loop):
sum_A = np.zeros((na,nb,nc,nd))
start_0 = time.time()
for m, t in enumerate(transpose_list):
sum_A = np.add(sum_A, factor_list[m] * np.transpose(A, transpose_list[m] ), out = sum_A)
#sum_A += factor_list[m] * np.transpose(A, transpose_list[m])
finish_0 = time.time()
time_total += finish_0 - start_0
print('level 4', time_total/n_loop)
print('Ref value', A[0,0,0,0], sum_A[0,0,0,0])
As a sanity check, if A[0,0,0,0] is non-zero, sum_A[0,0,0,0]/A[0,0,0,0] = 30, by 0.1 + 0.2 +... + 2.4 = (0.1+2.4)*2.4/2=30. Though the permutation factors can be different, the above is just an example.
Here's one way that I think does it in Fortran, which also skips terms for which the prefactor is zero. I make no claims as to it being the best, there are many ways to do it. I also am hesitant to claim its correctness, what you have provided it makes it difficult to assess this fully. But it does pass the sanity test for the case when all the sizes are the same ... You give no way to check the more general case.
The main problem is Fortran provides no way to create the permutations you require, so I've written a little module which I believe implements the same ordering as python. This is ordering is taken from the python documentation and the algorithm to implement it from wikipedia. Unit testing strongly suggests it does the job.
Once you have that it is easy to loop over each permutation in turn skipping those that have zero weight, either because the prefactor is zero, or the shapes are incompatible. So with all the caveats above here's my effort along with the compiler version and a few tests, some with array bounds checking on, some without.
Note even if this is correct I certainly make not claims as to how optimal it is - the memory access pattern is very non-trivial, and optimising that will require much more thought than I am willing to give this now, though I suspect cache blocking will be required, as in the matrix transposition question that you reference.
Module permutations_module
! Little module to handle permutations of an arbitrary sized list of integer 1, 2, 3, .... n
Implicit None
Type, Public :: permutation
Private
Integer, Dimension( : ), Allocatable, Private :: state
Contains
Procedure, Public :: init
Procedure, Public :: get
Procedure, Public :: next
End type permutation
Private
Contains
Subroutine init( p, n )
! Initalise a permutation
Class( permutation ), Intent( Out ) :: p
Integer , Intent( In ) :: n
Integer :: i
Allocate( p%state( 1:n ) )
p%state = [ ( i, i = 1, Size( p%state ) ) ]
End Subroutine init
Pure Function get( p ) Result( a )
! Get the current permutation
Class( permutation ), Intent( In ) :: p
Integer, Dimension( : ), Allocatable :: a
a = p%state
End Function get
Function next( p ) Result( finished )
! Move onto the next permutation, returning .True. if there are no more permutations in the list
Logical :: finished
Class( permutation ), Intent( InOut ) :: p
Integer :: k, l
Integer :: tmp
finished = .False.
Do k = Size( p%state ) - 1, 1, -1
If( p%state( k ) < p%state( k + 1 ) ) Exit
End Do
finished = k == 0
If( .Not. finished ) Then
Do l = Size( p%state ), k + 1, -1
If( p%state( k ) < p%state( l ) ) Exit
End Do
tmp = p%state( k )
p%state( k ) = p%state( l )
p%state( l ) = tmp
p%state( k + 1: ) = p%state( Size( p%state ):k + 1: - 1 )
End If
End Function next
End Module permutations_module
Program testit
Use, Intrinsic :: iso_fortran_env, Only : wp => real64
Use permutations_module, Only : permutation
Implicit None
Integer, Parameter :: n_iter = 100
Type( permutation ) :: p
Integer :: i
Real( wp ), Dimension( :, :, :, : ), Allocatable :: a
Real( wp ), Dimension( :, :, :, : ), Allocatable :: b
Real( wp ), Dimension( 1:Product( [ ( i, i = 1, Size( Shape( a ) ) ) ] ) ) :: c
Integer, Dimension( 1:Size( Shape( a ) ) ) :: this_permutation
Integer, Dimension( 1:Size( Shape( a ) ) ) :: sizes
Integer, Dimension( 1:Size( Shape( a ) ) ) :: permuted_sizes
Integer, Dimension( 1:Size( Shape( a ) ) ) :: indices
Integer, Dimension( 1:Size( Shape( a ) ) ) :: permuted_indices
Integer :: n1, n2, n3, n4
Integer :: l1, l2, l3, l4
Integer :: iter
Integer :: start, finish, rate
Logical :: finished
c = [ ( i * 0.1_wp, i = 1, Size( c ) ) ]
Write( *, * ) 'n1, n2, n3, n4?'
Read ( *, * ) n1, n2, n3, n4
Allocate( a ( 1:n1, 1:n2, 1:n3, 1:n4 ) )
Allocate( b, Mold = a )
Call Random_init( .true., .false. )
Call Random_number( a )
! Make sure a( 1, 1, 1, 1 ) is not zero for the sanity check
a( 1, 1, 1, 1 ) = a( 1, 1, 1, 1 ) + 0.1_wp
Call system_clock( start, rate )
sizes = Shape( a )
b = 0.0_wp
iter_loop: Do iter = 1, n_iter
Call p%init( Size( Shape( a ) ) )
i = 0
finished = .False.
permutation_loop: Do While( .Not. finished )
i = i + 1
! Get the next permutation
finished = p%next()
! Only do it if it has any weight
If( Abs( c( i ) ) > Epsilon( c( i ) ) ) Then
! Get the current permutation
this_permutation = p%get()
! Check the shapes are compatible
permuted_sizes = sizes( this_permutation )
If( All( permuted_sizes == sizes ) ) Then
! Add in the current permutation
Do l4 = 1, n4
Do l3 = 1, n3
Do l2 = 1, n2
Do l1 = 1, n1
indices = [ l1, l2, l3, l4 ]
permuted_indices = indices( this_permutation )
b( indices( 1 ), indices( 2 ), indices( 3 ), indices( 4 ) ) = &
b(indices( 1 ), indices( 2 ), indices( 3 ), indices( 4 ) ) + &
c( i ) * a( permuted_indices( 1 ), permuted_indices( 2 ), &
permuted_indices( 3 ), permuted_indices( 4 ) )
End Do
End Do
End Do
End Do
End If
End If
End Do permutation_loop
End Do iter_loop
Call system_clock( finish, rate )
Write( *, * ) 'time per iteration = ', Real( finish - start ) / Real( rate ) / Real( n_iter )
Write( *, * ) 'Sanity ', b( 1, 1, 1, 1 ) / a( 1, 1, 1, 1 ) / n_iter
End Program testit
ijb#ijb-Latitude-5410:~/work/stack$ gfortran --version
GNU Fortran (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
ijb#ijb-Latitude-5410:~/work/stack$ gfortran -fcheck=all -Wall -Wextra -O3 -g -std=f2018 perm_mod_single_thread.f90
ijb#ijb-Latitude-5410:~/work/stack$ ./a.out
n1, n2, n3, n4?
10 10 10 10
time per iteration = 1.47000002E-03
Sanity 30.000000000000259
ijb#ijb-Latitude-5410:~/work/stack$ ./a.out
n1, n2, n3, n4?
11 10 9 12
time per iteration = 0.00000000
Sanity 0.0000000000000000
ijb#ijb-Latitude-5410:~/work/stack$ gfortran -Wall -Wextra -O3 -g -std=f2018 perm_mod_single_thread.f90
ijb#ijb-Latitude-5410:~/work/stack$ ./a.out
n1, n2, n3, n4?
30 30 30 30
time per iteration = 6.56599998E-02
Sanity 29.999999999999844
ijb#ijb-Latitude-5410:~/work/stack$ ./a.out
n1, n2, n3, n4?
60 60 60 60
time per iteration = 2.46800995
Sanity 30.000000000000036
ijb#ijb-Latitude-5410:~/work/stack$ ./a.out
n1, n2, n3, n4?
10 20 30 40
time per iteration = 2.00000013E-05
Sanity 0.0000000000000000
ijb#ijb-Latitude-5410:~/work/stack$ ./a.out
n1, n2, n3, n4?
30 30 15 15
time per iteration = 1.50999997E-03
Sanity 1.4000000000000050
ijb#ijb-Latitude-5410:~/work/stack$

Expected list of ‘lower-bound :’ or list of ‘lower-bound : upper-bound’ specifications at (1) in Fortran pointer

The background of my question is related to Optimizing array additions and multiplications with transposes
I am thinking about optimizng 0.1*A + 0.1*transpose(A,(1,0)) (possibly with more general transpose) by Fortran pointer, where A is an array. (transpose in python sense, seems related to reshape in Fortran)
I am not sure if tranposing/multiplying value via pointer will be faster than using array. I thought using pointer may restricted operation within given memory locations. If I use
b = 0.1*a + 0.1*reshape(a, (/n1, n2, n3, n4/), order = (/2,1,3,4/) )
, reshape may be associated to different memory location.
Here is my code
Program transpose_test
use, Intrinsic :: iso_fortran_env, Only : wp => real64, li => int64
integer, parameter :: dp = selected_real_kind(15, 307)
! Implicit None
real(dp), Dimension( :, :, :, : ), Allocatable :: a, b
integer :: n1, n2, n3, n4, i, m, n, m_iter
integer :: l1, l2, l3, l4
integer(li) :: start, finish, rate
real(dp) :: sum_time
real(dp), target, allocatable :: at(:,:,:,:)
real(dp), pointer :: ap(:,:,:,:), bp(:,:,:,:)
Write( *, * ) 'n1, n2, n3, n4?'
Read( *, * ) n1, n2, n3, n4
Allocate( a ( 1:n1, 1:n2, 1:n3, 1:n4 ) )
i = 0
do l1 = 1, n1
do l2 = 1, n2
do l3 = 1, n3
do l4 = 1, n4
a(l1, l2, l3, l4) = i
i = i + 1
end do
end do
end do
end do
at = a
ap => at
bp => ap
!print *, at
print *, 'ap', ap
print *, 'bp', bp
sum_time = 0.0
do n = 1, m_iter
Call System_clock( start, rate )
do l2 = 1, n2
do l1 = 1, n1
bp(l1,l2,:,:) => 0.1*ap(l1,l2,:,:) + 0.1*ap(l2,l1,:,:)
end do
end do
Call System_clock( finish, rate )
sum_time = sum_time + Real( finish - start, dp ) / rate
end do
write (*,*) 'reshape pointer time', sum_time
print *, 'bp', bp
End
gfotran 9.3.0 gives
56 | bp(l1,l2,:,:) => 0.1*ap(l1,l2,:,:) + 0.1*ap(l2,l1,:,:)
| 1
Error: Expected list of ‘lower-bound :’ or list of ‘lower-bound : upper-bound’ specifications at (1)
What would be the solution for the above error message? Generally, will the above approach out perform tranposing array, e.g.,
b = 0.1*a + 0.1*reshape(a, (/n1, n2, n3, n4/), order = (/2,1,3,4/) )
and comparing with numpy realization related to the question in the first paragraph.
?

How can I do a swap between two elements belonging to the same polymorphic variable?

What is the best method when you need interchange the values in two polymorphic elements? (Using standard fortran 2008).
I'm sending an example (please try don't modify the type variables).
The problems that I have using intel compiler v.19 and gfortran 8.1 in windows are different.
Here a complete example. Look at the subroutine where I have defined the swap procedure. Currently is activate the version that works in GFortran but I have error with intel compiler. If you comment this part and uncomment the lines for ifort, then works for intel and not for gfortran....
Program Check
implicit none
!> Type definitions
Type :: Refl_Type
integer,dimension(:), allocatable :: H
integer :: Mult =0
End Type Refl_Type
Type :: RefList_Type
integer :: Nref
class(refl_Type), dimension(:), allocatable :: Reflections
end Type RefList_Type
Type(RefList_Type) :: List
Type(Refl_Type), dimension(3) :: Refl_Ini
!> Variables
integer :: i
!> Init
Refl_Ini(1)%H=[1, 0, 0]; Refl_Ini(1)%Mult=1
Refl_Ini(2)%H=[0, 2, 0]; Refl_Ini(2)%Mult=2
Refl_Ini(3)%H=[0, 0, 3]; Refl_Ini(3)%Mult=3
List%Nref=3
List%Reflections=Refl_Ini
!> Print Step:1
do i=1, List%Nref
print '(i3,2x,3i4,2x,i3)', i,List%Reflections(i)%H, List%Reflections(i)%Mult
end do
print*,' '
print*,' '
!> Swap
call Swap_Elements_List(List, 1, 3)
!> Print Step:2
do i=1, List%Nref
print '(i3,2x,3i4,2x,i3)', i,List%Reflections(i)%H, List%Reflections(i)%Mult
end do
Contains
Subroutine Swap_Elements_List(List, i, j)
!---- Argument ----!
type (RefList_Type), intent(in out) :: List
integer, intent(in) :: i,j
!---- Local Variables ----!
class(Refl_Type), allocatable :: tmp
!> IFort
!tmp=List%reflections(i)
!List%reflections(i)=List%reflections(j)
!List%reflections(j)=tmp
!> Gfortran
associate(t1 => list%reflections(i), t2 => list%reflections(j), tt => tmp)
tt=t1
t1=t2
t2=tt
end associate
End Subroutine Swap_Elements_List
End Program Check
Any suggestion?
Compiling the original code with gfortran-8.2 gives
test.f90:34:6:
List%reflections(i)=List%reflections(j) !!<---
1
Error: Nonallocatable variable must not be polymorphic in
intrinsic assignment at (1) - check that there is a
matching specific subroutine for '=' operator
I think this is because List % reflections(i) is not separately allocatable (even though List % reflections itself is allocatable as an array of uniform type). This point seems to be discussed in detail, e.g., in this Q/A page, which suggests two alternative approaches: (A) convince the compiler that all elements will be of the same type; or (B) use an (array) container.
If we use the "container" approach, I think we can use move_alloc() to swap two polymorphic objects (without knowing the dynamic type). For example, a bit modified version of the original code may be
program main
implicit none
type :: Refl_t
integer, allocatable :: H(:)
endtype
type, extends(Refl_t) :: ExtRefl_t
real :: foo
endtype
type :: RefList_t
class(Refl_t), allocatable :: refl
endtype
type(RefList_t) :: list( 3 )
call init()
print *, "Before:"
call output()
call swap( 1, 2 )
print *, "After:"
call output()
contains
subroutine swap( i, j )
integer, intent(in) :: i, j
class(Refl_t), allocatable :: tmp
call move_alloc( from= list( i )% refl, to= tmp )
call move_alloc( from= list( j )% refl, to= list( i )% refl )
call move_alloc( from= tmp, to= list( j )% refl )
end
subroutine init()
integer i
do i = 1, 3
allocate( ExtRefl_t :: list( i ) % refl )
select type( x => list( i ) % refl )
type is ( ExtRefl_t )
x % H = [ i, i * 10 ]
x % foo = i * 100
endselect
enddo
end
subroutine output()
integer i
do i = 1, 3
select type( x => list( i ) % refl )
type is ( ExtRefl_t )
print *, "i = ", i, " : H = ", x % H, " foo = ", x % foo
endselect
enddo
end
end program
Result (gfortran-8.2):
Before:
i = 1 : H = 1 10 foo = 100.000000
i = 2 : H = 2 20 foo = 200.000000
i = 3 : H = 3 30 foo = 300.000000
After:
i = 1 : H = 2 20 foo = 200.000000
i = 2 : H = 1 10 foo = 100.000000
i = 3 : H = 3 30 foo = 300.000000
I think we could also use polymorphic assignment for the above swap() routine, for example:
subroutine swap( i, j )
integer, intent(in) :: i, j
class(Refl_t), allocatable :: tmp
tmp = list( i ) % refl
list( i ) % refl = list( j ) % refl
list( j ) % refl = tmp
end
This compiles with gfortran-8.2, but gives a strange result... (a possible compiler bug?). I guess newer compilers like GCC-9 or Intel Fortran may give an expected result.
On the other hand, if we use a polymorphic array, we may need to use select type explicitly for swapping the two elements. (But I hope there is a different approach...) The code may then look like:
program main
implicit none
type :: Refl_t
integer, allocatable :: H(:)
endtype
type, extends(Refl_t) :: ExtRefl_t
real :: foo
endtype
class(Refl_t), allocatable :: refls( : )
allocate( ExtRefl_t :: refls( 3 ) )
call init()
print *, "Before:"
call output()
call swap( 1, 2 )
print *, "After:"
call output()
contains
subroutine swap( i, j )
integer, intent(in) :: i, j
selecttype ( refls )
type is ( ExtRefl_t )
block
type(ExtRefl_t) :: tmp
tmp = refls( i ) !<-- assignment of concrete type
refls( i ) = refls( j )
refls( j ) = tmp
endblock
class default
stop
endselect
end
subroutine init()
integer i
select type( refls )
type is ( ExtRefl_t )
do i = 1, 3
refls( i ) % H = [ i, i * 10 ]
refls( i ) % foo = i * 100
enddo
endselect
end
subroutine output()
integer i
select type( refls )
type is ( ExtRefl_t )
do i = 1, 3
print *, "i = ", i, " : H = ", refls( i ) % H, &
" foo = ", refls( i ) % foo
enddo
endselect
end
end program
(The result is the same as above.)
The answer by roygvib summarizes the problem well. If this assignment is to be performed in user's code where the types are known or are known to be from a small set of possible types, one can just protect the assignment by the select type typeguard.
The real problem happens in a generic code that is written without the knowledge of the user's derived types. Therefore it may have no access to possible user-defined assignments. I suggest a possible solution using a callback procedure. Basically, the user defines an assignment or swap procedure which is then called by the library code.
subroutine sub_that_needs_assignments(array, assign)
class(*) :: array
interface
subroutne assign(out, in)
end subroutine
end interface
call assign(array(i), array(i+1))
!or you can even assign a new elemnt from somewhere else
! possibly protect by same_type_as()
end subroutine
in the user's code
subroutine assign_my_type(out, in)
class(*), ... :: out
class(*), ... :: in
select type (out)
type is (my_type)
select type (in) ! not always necessary
type is (in)
out = in
end select
end select
!add appropriate error checking
end subroutine

Any way to avoid nested do loops in Fortran 90?

I am reading some velocities for molecules along a time trajectory. I am trying to calculate v_i(t)*v_j(t+n∆t) where i and j are not necessarily for the same atom.
I use nested do loops to do the calculation, which is by definition for different time steps, different molecules, and different atoms. I have multiple nested do loops, which slows the code and leads to memory issues. I want to avoid these problems, if possible. How can I improve my code using Fortran 90?
PROGRAM BUILD
IMPLICIT NONE
INTEGER :: I,K,L,L1,L2,M1,M2,T,T1,T2,NCON,NMOL,NSIT,SPLIT,LOOP
REAL(8) :: X,Y,Z,V1,V2,V3,R,TRASH
REAL(8),ALLOCATABLE :: VX(:,:,:),VY(:,:,:),VZ(:,:,:)
REAL(8),ALLOCATABLE :: NORM(:,:,:,:,:),V(:,:,:,:,:)
! Input
NCON = 100001 ! Number of configurations
NMOL = 524 ! Number of molecules
NSIT = 6 ! Number of sites on each molecule
SPLIT = 50 ! Number of subgroups of configurations
LOOP = (NCON-1)/SPLIT ! Number of configurations in each subgroup
! * * * * * * * * *
! Allocate memory
ALLOCATE ( VX(0:LOOP,NMOL,NSIT) )
ALLOCATE ( VY(0:LOOP,NMOL,NSIT) )
ALLOCATE ( VZ(0:LOOP,NMOL,NSIT) )
ALLOCATE ( V(0:LOOP,NMOL,NMOL,NSIT,NSIT) )
ALLOCATE ( NORM(0:LOOP,NMOL,NMOL,NSIT,NSIT) )
ALLOCATE ( VIVJ(0:LOOP,NSIT,NSIT) )
ALLOCATE ( N(0:LOOP,NSIT,NSIT) )
! Initialize
VX = 0.0D0
VY = 0.0D0
VZ = 0.0D0
V = 0.0D0
NORM = 0.0D0
VIVJ = 0.0D0
N = 0.0D0
! Read trajectories
OPEN(UNIT=15,FILE='HISTORY',STATUS='UNKNOWN',ACTION='READ')
DO I = 1,SPLIT
WRITE(*,*) I,SPLIT
DO T = 0,LOOP-1
DO L = 1,NMOL
DO K = 1,NSIT
READ(15,*) V1,V2,V3
VX(T,L,K) = V1
VY(T,L,K) = V2
VZ(T,L,K) = V3
END DO
END DO
END DO
! Calculate functions
DO T1 = 1,LOOP
DO T2 = T1,LOOP
DO L1 = 1,NMOL
DO M1 = 1,NSIT
DO L2 = 1,NMOL
DO M2 = 1,NSIT
! Includes all atoms, both intermolecular and intramolecular
! Keep all of the molecules
V(T2-T1,L1,L2,M1,M2) = V(T2-T1,L1,L2,M1,M2) + &
VX(T1,L1,M1)*VX(T2,L2,M2) + &
VY(T1,L1,M1)*VY(T2,L2,M2) + &
VZ(T1,L1,M1)*VZ(T2,L2,M2)
! Accounting
NORM(T2-T1,L1,L2,M1,M2) = NORM(T2-T1,L1,L2,M1,M2) + 1.0D0
END DO
END DO
END DO
END DO
END DO
END DO
CLOSE(15)
DEALLOCATE(VX)
DEALLOCATE(VY)
DEALLOCATE(VZ)
DEALLOCATE(V)
DEALLOCATE(NORM)
END PROGRAM
Fortran stores its arrays in a different order than C and most other languages.
do T = ...
do L = ...
do K = ...
array(T, L, K) = ...
end do
end do
end do
will always be considerably slower than
do K = ...
do L = ...
do T = ...
array(T, L, K) = ...
end do
end do
end do
all other things being equal.

When using r2c and c2r FFTW in Fortran, are the forward and backward dimensions same?

Blow is a main file
PROGRAM SPHEROID
USE nrtype
USE SUB_INFO
INCLUDE "/usr/local/include/fftw3.f"
INTEGER(I8B) :: plan_forward, plan_backward
INTEGER(I4B) :: i, t, int_N
REAL(DP) :: cth_i, sth_i, real_i, perturbation
REAL(DP) :: PolarEffect, dummy, x1, x2, x3
REAL(DP), DIMENSION(4096) :: dummy1, dummy2, gam, th, ph
REAL(DP), DIMENSION(4096) :: k1, k2, k3, k4, l1, l2, l3, l4, f_in
COMPLEX(DPC), DIMENSION(2049) :: output1, output2, f_out
CHARACTER(1024) :: baseOutputFilename
CHARACTER(1024) :: outputFile, format_string
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
int_N = 4096
! File Open Section
format_string = '(I5.5)'
! Write the coodinates at t = 0
do i = 1, N
real_i = real(i)
gam(i) = 2d0*pi/real_N
perturbation = 0.01d0*dsin(2d0*pi*real_i/real_N)
ph(i) = 2d0*pi*real_i/real_N + perturbation
th(i) = pi/3d0 + perturbation
end do
! Initialization Section for FFTW PLANS
call dfftw_plan_dft_r2c_1d(plan_forward, int_N, f_in, f_out, FFTW_ESTIMATE)
call dfftw_plan_dft_c2r_1d(plan_backward, int_N, f_out, f_in, FFTW_ESTIMATE)
! Runge-Kutta 4th Order Method Section
do t = 1, Iter_N
call integration(th, ph, gam, k1, l1)
do i = 1, N
dummy1(i) = th(i) + 0.5d0*dt*k1(i)
end do
do i = 1, N
dummy2(i) = ph(i) + 0.5d0*dt*l1(i)
end do
call integration(dummy1, dummy2, gam, k2, l2)
do i = 1, N
dummy1(i) = th(i) + 0.5d0*dt*k2(i)
end do
do i = 1, N
dummy2(i) = ph(i) + 0.5d0*dt*l2(i)
end do
call integration(dummy1, dummy2, gam, k3, l3)
do i = 1, N
dummy1(i) = th(i) + dt*k3(i)
end do
do i = 1, N
dummy2(i) = ph(i) + dt*l3(i)
end do
call integration(dummy1, dummy2, gam, k4, l4)
do i = 1, N
cth_i = dcos(th(i))
sth_i = dsin(th(i))
PolarEffect = (nv-sv)*dsqrt(1d0+a*sth_i**2) + (nv+sv)*cth_i
PolarEffect = PolarEffect/(sth_i**2)
th(i) = th(i) + dt*(k1(i) + 2d0*k2(i) + 2d0*k3(i) + k4(i))/6d0
ph(i) = ph(i) + dt*(l1(i) + 2d0*l2(i) + 2d0*l3(i) + l4(i))/6d0
ph(i) = ph(i) + dt*0.25d0*PolarEffect/pi
end do
!! Fourier Filtering Section
call dfftw_execute_dft_r2c(plan_forward, th, output1)
do i = 1, N/2+1
dummy = abs(output1(i))
if (dummy.lt.threshhold) then
output1(i) = dcmplx(0.0d0)
end if
end do
call dfftw_execute_dft_c2r(plan_backward, output1, th)
do i = 1, N
th(i) = th(i)/real_N
end do
call dfftw_execute_dft_r2c(plan_forward, ph, output2)
do i = 1, N/2+1
dummy = abs(output2(i))
if (dummy.lt.threshhold) then
output2(i) = dcmplx(0.0d0)
end if
end do
call dfftw_execute_dft_c2r(plan_backward, output2, ph)
do i = 1, N
ph(i) = ph(i)/real_N
end do
!! Data Writing Section
write(baseOutputFilename, format_string) t
outputFile = "xyz" // baseOutputFilename
open(unit=7, file=outputFile)
outputFile = "Fsptrm" // baseOutputFilename
open(unit=8, file=outputFile)
do i = 1, N
x1 = dsin(th(i))*dcos(ph(i))
x2 = dsin(th(i))*dsin(ph(i))
x3 = dsqrt(1d0+a)*dcos(th(i))
write(7,*) x1, x2, x3
end do
do i = 1, N/2+1
write(8,*) abs(output1(i)), abs(output2(i))
end do
close(7)
close(8)
do i = 1, N/2+1
output1(i) = dcmplx(0.0d0)
end do
do i = 1, N/2+1
output2(i) = dcmplx(0.0d0)
end do
end do
! Destroying Process for FFTW PLANS
call dfftw_destroy_plan(plan_forward)
call dfftw_destroy_plan(plan_backward)
END PROGRAM
Below is a subroutine file for integration
! We implemented Shelly's spectrally accurate convergence method
SUBROUTINE integration(in1,in2,in3,out1,out2)
USE nrtype
USE SUB_INFO
INTEGER(I4B) :: i, j
REAL(DP) :: th_i, th_j, gi, ph_i, ph_j, gam_j, v1, v2
REAL(DP), DIMENSION(N), INTENT(INOUT) :: in1, in2, in3, out1, out2
REAL(DP) :: ui, uj, part1, part2, gj, cph, sph
REAL(DP) :: denom, numer, temp
do i = 1, N
out1(i) = 0d0
end do
do i = 1, N
out2(i) = 0d0
end do
do i = 1, N
th_i = in1(i)
ph_i = in2(i)
ui = dcos(th_i)
part1 = dsqrt(1d0+a)/(dsqrt(-a)*ui+dsqrt(1d0+a-a*ui*ui))
part1 = part1**(dsqrt(-a))
part2 = (dsqrt(1d0+a-a*ui*ui)+ui)/(dsqrt(1d0+a-a*ui*ui)-ui)
part2 = dsqrt(part2)
gi = dsqrt(1d0-ui*ui)*part1*part2
do j = 1, N
if (mod(i+j,2).eq.1) then
th_j = in1(j)
ph_j = in2(j)
gam_j = in3(j)
uj = dcos(th_j)
part1 = dsqrt(1d0+a)/(dsqrt(-a)*uj+dsqrt(1d0+a-a*uj*uj))
part1 = part1**(dsqrt(-a))
part2 = (dsqrt(1d0+a-a*uj*uj)+uj)/(dsqrt(1d0+a-a*uj*uj)-uj)
part2 = dsqrt(part2)
gj = dsqrt(1d0-ui*ui)*part1*part2
cph = dcos(ph_i-ph_j)
sph = dsin(ph_i-ph_j)
numer = dsqrt(1d0-uj*uj)*sph
denom = (gj/gi*(1d0-ui*ui) + gi/gj*(1d0-uj*uj))*0.5d0
denom = denom - dsqrt((1d0-ui*ui)*(1d0-uj*uj))*cph
denom = denom + krasny_delta
v1 = -0.25d0*gam_j*numer/denom/pi
temp = dsqrt(1d0+(1d0-ui*ui)*a)
numer = -(gj/gi)*(temp+ui)
numer = numer + (gi/gj)*((1d0-uj*uj)/(1d0-ui*ui))*(temp-ui)
numer = numer + 2d0*ui*dsqrt((1d0-uj*uj)/(1d0-ui*ui))*cph
numer = 0.5d0*numer
v2 = -0.25d0*gam_j*numer/denom/pi
out1(i) = out1(i) + 2d0*v1
out2(i) = out2(i) + 2d0*v2
end if
end do
end do
END
Below is a module file
module nrtype
Implicit none
!integer
INTEGER, PARAMETER :: I8B = SELECTED_INT_KIND(20)
INTEGER, PARAMETER :: I4B = SELECTED_INT_KIND(9)
INTEGER, PARAMETER :: I2B = SELECTED_INT_KIND(4)
INTEGER, PARAMETER :: I1B = SELECTED_INT_KIND(2)
!real
INTEGER, PARAMETER :: SP = KIND(1.0)
INTEGER, PARAMETER :: DP = KIND(1.0D0)
!complex
INTEGER, PARAMETER :: SPC = KIND((1.0,1.0))
INTEGER, PARAMETER :: DPC = KIND((1.0D0,1.0D0))
!defualt logical
INTEGER, PARAMETER :: LGT = KIND(.true.)
!mathematical constants
REAL(DP), PARAMETER :: pi = 3.141592653589793238462643383279502884197_dp
!derived data type s for sparse matrices,single and double precision
!User-Defined Constants
INTEGER(I4B), PARAMETER :: N = 4096, Iter_N = 20000
REAL(DP), PARAMETER :: real_N = 4096d0
REAL(DP), PARAMETER :: a = -0.1d0, dt = 0.001d0, krasny_delta = 0.01d0
REAL(DP), PARAMETER :: nv = 0d0, sv = 0d0, threshhold = 0.00000000001d0
!N : The Number of Point Vortices, Iter_N * dt = Total time, dt : Time Step
!krasny_delta : Smoothing Parameter introduced by R.Krasny
!nv : Northern Vortex Strength, sv : Southern Vortex Strength
!a : The Eccentricity in the direction of z , threshhold : Filtering Threshhold
end module nrtype
Below is a subroutine info file
MODULE SUB_INFO
INTERFACE
SUBROUTINE integration(in1,in2,in3,out1,out2)
USE nrtype
INTEGER(I4B) :: i, j
REAL(DP) :: th_i, th_j, gi, ph_i, ph_j, gam_j, v1, v2
REAL(DP), DIMENSION(N), INTENT(INOUT) :: in1, in2, in3, out1, out2
REAL(DP) :: ui, uj, part1, part2, gj, cph, sph
REAL(DP) :: denom, numer, temp
END SUBROUTINE
END INTERFACE
END MODULE
I compiled them using the below command
gfortran -o p0 -fbounds-check nrtype.f90 spheroid_sub_info.f90 spheroid_sub_integration.f90 spheroid_main.f90 -lfftw3 -lm -Wall -pedantic -pg
nohup ./p0 &
Note that 2049 = 4096 / 2 + 1
When making plan_backward, isn't it correct that we use 2049 instead of 4096 since the dimension of output is 2049?
But when I do that, it blows up. (Blowing up means NAN error)
If I use 4096 in making plan_backward, Everything is fine except that some Fourier coefficients are abnormally big which should not happen.
Please help me use FFTW in Fortran correctly. This issue has discouraged me for a long time.
First, although you claim your example is minimal, it is still pretty large, I have no time to study it.
But I updated my gist code https://gist.github.com/LadaF/73eb430682ef527eea9972ceb96116c5 to show also the backward transform and to answer the title question about the transform dimensions.
The logical size of the transform is the size of the real array (Real-data DFT Array Format) but the complex part is smaller due to inherent symmetries.
But when you make first r2c transform from real array of size n to complex array of size n/2+1. and then an opposite transform back, the real array should be again of size n.
This is my minimal example from the gist:
module FFTW3
use, intrinsic :: iso_c_binding
include "fftw3.f03"
end module
use FFTW3
implicit none
integer, parameter :: n = 100
real(c_double), allocatable :: data_in(:)
complex(c_double_complex), allocatable :: data_out(:)
type(c_ptr) :: planf, planb
allocate(data_in(n))
allocate(data_out(n/2+1))
call random_number(data_in)
planf = fftw_plan_dft_r2c_1d(size(data_in), data_in, data_out, FFTW_ESTIMATE+FFTW_UNALIGNED)
planb = fftw_plan_dft_c2r_1d(size(data_in), data_out, data_in, FFTW_ESTIMATE+FFTW_UNALIGNED)
print *, "real input:", real(data_in)
call fftw_execute_dft_r2c(planf, data_in, data_out)
print *, "result real part:", real(data_out)
print *, "result imaginary part:", aimag(data_out)
call fftw_execute_dft_c2r(planb, data_out, data_in)
print *, "real output:", real(data_in)/n
call fftw_destroy_plan(planf)
call fftw_destroy_plan(planb)
end
Note that I am using the modern Fortran interface. I don't like using the old one.
One issue may be that dfftw_execute_dft_c2r can destroy the content of the input array, as described in this page. The key excerpt is
FFTW_PRESERVE_INPUT specifies that an out-of-place transform must not change its input array. This is ordinarily the default, except for c2r and hc2r (i.e. complex-to-real) transforms for which FFTW_DESTROY_INPUTis the default...
We can verify this, for example, by modifying the sample code by #VladimirF such that it saves data_out to data_save right after the first FFT(r2c) call, and then calculating their difference after the second FFT (c2r) call. So, in the case of OP's code, it seems safer to save output1 and output2 to different arrays before entering the second FFT (c2r).