Fortran low performance with allocatable arrays - fortran

I use Intel Visual Fortran, both IVF2013 and IVF2019. When using allocatable arrays, the program is much slower than the one using static memory allocation. That is to say, if I change from
Method 1: by using fixed array
do i = 1, 1000
call A
end do
subroutine A
real(8) :: x(30)
do things
end subroutine A
to something like
Method 2: by using allocatable arrays
module module_size_is_defined
n = 30
end module
do i = 1, 1000
call A
end do
subroutine A
use module_size_is_defined
real(8), allocatable :: x(:)
allocate(x(n))
do things
end subroutine A
The code is much slower. For my code, the static allocation takes 1 minutes 30 seconds while the dynamic allocation takes 2 minutes and 30 seconds. Then, I thought is might because that the allocation action was run takes too much time as it is in the loop, then I tried following two methods:
Method 3: by using the module to allocate the array only once
module module_x_is_allocated
n = 30
allocat(x(n))
end module
do i = 1, 1000
call A
end do
subroutine A
use module_x_is_allocated
do things
end subroutine A
Method 4: by using automatic array
module module_size_is_defined
n = 30
end module
do i = 1, 1000
call A
end do
subroutine A
use module_size_is_defined
real(x) :: x(n)
do things
end subroutine A
Both Method 3 and Method 4 take almost the same time of the one using dynamic allocated array Method 2. Both around 2 mins 30s. All cases are compiling with same optimization. I tried IVF 2013 and IVF 2019, and same results. I don't know why. Especially for Method 3, although the allocate is only run once, it still takes the same time. It seems that dynamic allocated array is stored at the place that is slower than the static allocated array, and allocation does not take extra time (since method 2 and 3 take the same time).
Any ideas and suggestions that to allocate the arrays in a more efficient manner to reduce the performance penalty? Thanks.
!=========================================================================
Edit 1:
My program is too long to post here. Thus, I tried a few small codes. The results are a little bit strange. I tried three cases,
Method 1: takes 28.98s
module module_size_is_defined
implicit none
integer(4) :: n
end module
program main
use module_size_is_defined
implicit none
integer(4) :: i
real(8) :: y(50,50),z(50,50),t
n = 50
do i =1,50000
t=dble(i) * 2.0D0
call A(y,t)
z = z + y
end do
write(*,*) z(1,1)
end
subroutine A(y,t)
use module_size_is_defined
implicit none
real(8),intent(out):: y(n,n)
real(8),intent(in) :: t
integer(4) :: j
real(8) :: x(1,50)
y=0.0D0
do j = 1, 200
call getX(x,t,j)
y = y + matmul( transpose(x) + dble(j)**2, x )
end do
endsubroutine A
subroutine getX(x,t,j)
use module_size_is_defined
implicit none
real(8),intent(out) :: x(1,n)
real(8),intent(in) :: t
integer(4),intent(in) :: j
integer(4) :: i
do i =1, n
x(1,i) = dble(i+j) * t ** (1.5D00)
end do
endsubroutine getX
Method 2: takes 30.56s
module module_size_is_defined
implicit none
integer(4) :: n
end module
program main
use module_size_is_defined
implicit none
integer(4) :: i
real(8) :: y(50,50),z(50,50),t
n = 50
do i =1,50000
t=dble(i) * 2.0D0
call A(y,t)
z = z + y
end do
write(*,*) z(1,1)
end
subroutine A(y,t)
use module_size_is_defined
implicit none
real(8),intent(out):: y(n,n)
real(8),intent(in) :: t
integer(4) :: j
real(8),allocatable :: x(:,:)
allocate(x(1,n))
y=0.0D0
do j = 1, 200
call getX(x,t,j)
y = y + matmul( transpose(x) + dble(j)**2, x )
end do
endsubroutine A
subroutine getX(x,t,j)
use module_size_is_defined
implicit none
real(8),intent(out) :: x(1,n)
real(8),intent(in) :: t
integer(4),intent(in) :: j
integer(4) :: i
do i =1, n
x(1,i) = dble(i+j) * t ** (1.5D00)
end do
endsubroutine getX
Method 3: takes 78.72s
module module_size_is_defined
implicit none
integer(4) :: n
endmodule
module module_array_is_allocated
use module_size_is_defined
implicit none
real(8), allocatable,save :: x(:,:)
contains
subroutine init
implicit none
allocate(x(1,n))
endsubroutine
endmodule module_array_is_allocated
program main
use module_size_is_defined
use module_array_is_allocated
implicit none
integer(4) :: i
real(8) :: y(50,50),z(50,50),t
n = 50
call init
do i =1,50000
t=dble(i) * 2.0D0
call A(y,t)
z = z + y
end do
write(*,*) z(1,1)
end
subroutine A(y,t)
use module_size_is_defined
use module_array_is_allocated
implicit none
real(8),intent(out):: y(n,n)
real(8),intent(in) :: t
integer(4) :: j
y=0.0D0
do j = 1, 200
call getX(x,t,j)
y = y + matmul( transpose(x) + dble(j)**2, x )
end do
endsubroutine A
subroutine getX(x,t,j)
use module_size_is_defined
implicit none
real(8),intent(out) :: x(1,n)
real(8),intent(in) :: t
integer(4),intent(in) :: j
integer(4) :: i
do i =1, n
x(1,i) = dble(i+j) * t ** (1.5D00)
end do
endsubroutine getX
Now, with samller size problem, Method 1 and Method 2 is almost same time. But Method 3 should be better than Method 2, since it only allocate x(1,n) once. But it is much slower. But in my previous program, Method 2 gives almost the same time as Method 3. It is strange.
I complied in both Windows and Linux, with release setup, -O2 Optimization, with different version of IVF.

Related

Dummy argument not agreeing with actual argument when passing function

I'm trying to implement Newton's method but I'm getting a confusing error message. In my code you'll see I called external with f1 and f2 which I assumes tells the computer to look for the function but it's treating them as variables based on the error message. I've read the stack overflow posts similar to my issue but none of the solutions seem to work. I've tried with and without the external but the issue still persists. Hoping someone could see what I'm missing.
implicit none
contains
subroutine solve(f1,f2,x0,n, EPSILON)
implicit none
real(kind = 2), external:: f1, f2
real (kind = 2), intent(in):: x0, EPSILON
real (kind = 2):: x
integer, intent(in):: n
integer:: iteration
x = x0
do while (abs(f1(x))>EPSILON)
iteration = iteration + 1
print*, iteration, x, f1(x)
x = x - (f1(x)/f2(x))
if (iteration >= n) then
print*, "No Convergence"
stop
end if
end do
print*, iteration, x
end subroutine solve
end module newton
Program Lab10
use newton
implicit none
integer, parameter :: n = 1000 ! maximum iteration
real(kind = 2), parameter :: EPSILON = 1.d-3
real(kind = 2):: x0, x
x0 = 3.0d0
call solve(f(x),fp(x),x0,n, EPSILON)
contains
real (kind = 2) function f(x) ! this is f(x)
implicit none
real (kind = 2), intent(in)::x
f = x**2.0d0-1.0d0
end function f
real (kind = 2) function fp(x) ! This is f'(x)
implicit none
real (kind = 2), intent(in)::x
fp = 2.0d0*x
end function fp
end program Lab10```
You seem to be passing function results to your subroutine and not the functions themselves. Remove (x) when calling solve() and the problem will be resolved. But more importantly, this code is a prime example of how to not program in Fortran. The attribute external is deprecated and you better provide an explicit interface. In addition, what is the meaning of kind = 2. Gfortran does not even comprehend it. Even if it comprehends the kind, it is not portable. Here is a correct portable modern implementation of the code,
module newton
use iso_fortran_env, only: RK => real64
implicit none
abstract interface
pure function f_proc(x) result(result)
import RK
real(RK), intent(in) :: x
real(RK) :: result
end function f_proc
end interface
contains
subroutine solve(f1,f2,x0,n, EPSILON)
procedure(f_proc) :: f1, f2
real(RK), intent(in) :: x0, EPSILON
integer, intent(in) :: n
real(RK) :: x
integer :: iteration
x = x0
do while (abs(f1(x))>EPSILON)
iteration = iteration + 1
print*, iteration, x, f1(x)
x = x - (f1(x)/f2(x))
if (iteration >= n) then
print*, "No Convergence"
stop
end if
end do
print*, iteration, x
end subroutine solve
end module newton
Program Lab10
use newton
integer, parameter :: n = 1000 ! maximum iteration
real(RK), parameter :: EPSILON = 1.e-3_RK
real(RK) :: x0, x
x0 = 3._RK
call solve(f,fp,x0,n, EPSILON)
contains
pure function f(x) result(result) ! this is f(x)
real (RK), intent(in) :: x
real (RK) :: result
result = x**2 - 1._RK
end function f
pure function fp(x) result(result) ! This is f'(x)
real (RK), intent(in) :: x
real (RK) :: result
result = 2 * x
end function fp
end program Lab10
If you expect to pass nonpure functions to the subroutine solve(), then remove the pure attribute. Note the use of real64 to declare 64-bit (double precision) real kind. Also notice how I have used _RK suffix to assign 64-bit precision to real constants. Also, notice I changed the exponents from real to integer as it is multiplication is more efficient than exponentiation computationally. I hope this answer serves more than merely the solution to Lab10.

MPI_WTIME is not giving me speedup as required

Program Main
implicit none
include 'mpif.h'
!Define parameters
integer::my_rank,p2,n2,ierr,source
integer, parameter :: n=3,m=3,o=m*n
real(kind=8) aaa(n),ddd(n),bbb(n),ccc(n),xxx(n),b(m,n),start, finish
integer i, j
real h
real(kind=8),dimension(:),allocatable::sol1
h=0.25
b=0
do i=1,m
b(i,i)=1/(1.2**i)
b(i,i-1)=-b(i,i)
enddo
call MPI_INIT(ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,p2,ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,my_rank,ierr)
allocate(sol1(o))
start=MPI_WTIME()
do i=1,n
aaa(i)=-1/h**2
bbb(i)=2/h**2+b(my_rank+1,my_rank+1)
ccc(i)=-1/h**2
ddd(i)=1/h**2
enddo
call thomas(aaa,bbb,ccc,ddd,xxx,n)
finish=MPI_WTIME()
print*, finish-start
write(*,*) xxx, my_rank
call MPI_GATHER(xxx,n, MPI_REAL, sol1,n,MPI_REAL8,0, MPI_COMM_WORLD,ierr)
print*,sol1
call MPI_FINALIZE(ierr)
end program main
subroutine thomas(ld,md,ud,rh,solution,n)
implicit none
integer,parameter :: r8 = kind(1.d0)
integer,intent(in) :: n
real(r8),dimension(n),intent(in) :: ld,md,ud,rh
real(r8),dimension(n),intent(out) :: solution
real(r8),dimension(n) :: P,Q
real(r8) :: m
integer i
P(1) = ud(1)/md(1)
Q(1) = rh(1)/md(1)
do i = 2,n
m = md(i)-p(i-1)*ld(i)
P(i) = ud(i)/m
Q(i) = (rh(i)-Q(i-1)*ld(i))/m
end do
solution(n) = Q(n)
do i = n-1, 1, -1
solution(i) = Q(i)-P(i)*solution(i+1)
end do
end subroutine thomas
Here I used MPI_WTIME() to find the execution time. It seems like when I increase the number of processor than I am not getting the speedup. In this code I have m=3 (I make m equal equal to no of processor). I run with mpirun -np 3 sp.exe). Now I change say m=10 and run with mpirun -np 10 sp.exe. I should get the less time, isn't it? or I am missing something here. The community helped me before with some issues and now I am getting another issue. I would really appreciate the help if somebody would point out something.Isn't the chunk of code starting with do loop done by invidual processors( which I want)?

Assumed Shape arrays require explicit interface in Fortran [duplicate]

This question already has an answer here:
Procedure with assumed-shape dummy argument must have an explicit interface [duplicate]
(1 answer)
Closed 4 years ago.
Im trying to write the code for a simple program that first asks for a number n, then creates an nxn matrix with 3s on its main diagonal, 1s over it and 0s under it and a vector (n) with 3 in the uneven positions and 2 in the even positions. It must then include a subroutine that multiplies both of them not using matmul()
program P13
implicit none
integer(4) :: n, i, j
integer, dimension(:), allocatable:: v
integer, dimension(:,:), allocatable :: m
integer, dimension(:), allocatable :: r
write(*,*) "Insert n"
read(*,*) n
allocate (v(1:n))
allocate (m(1:n,1:n))
v(1:n:2) = 3
v(2:n:2) = 2
m = 0
DO i=1,n,1
m (i,i:n)=1
END DO
Do i=1,n,1
m (i,i)=3
End do
call matrmul(n, m, v)
end program
subroutine matrmul(n, b, o, t)
implicit none
integer(4), intent(in) :: n
integer(4) :: i, j
integer, dimension(:), intent(in) :: b
integer, dimension(:,:),intent(in) :: o
integer, dimension(:), intent(out) :: t
DO i=1,n,1
t(i) = sum(b*o(:,i))
END DO
write(*,'(I2)') t
end subroutine
I get the error message Explicit interface required for ‘matrmul’ at (1): assumed-shape argument
How do I fix this?? Thanks
There are plenty examples here at stack overflow that will show you how to create explicit interfaces. However, since you allocate memory for all your arrays in the main program and you pass the size into the subroutine, just declare all your arrays in the subroutine with n.
subroutine matrmul(n, b, o, t)
implicit none
integer(4), intent(in) :: n
integer(4) :: i, j
integer, dimension(n), intent(in) :: b
integer, dimension(n,n),intent(in) :: o
integer, dimension(n), intent(out) :: t

Fortran character format string as subroutine argument

I am struggling with reading a text string in. Am using gfortran 4.9.2.
Below I have written a little subroutine in which I would like to submit the write format as argument.
Ideally I'd like to be able to call it with
call printarray(mat1, "F8.3")
to print out a matrix mat1 in that format for example. The numbers of columns should be determined automatically inside the subroutine.
subroutine printarray(x, udf_temp)
implicit none
real, dimension(:,:), intent(in) :: x ! array to be printed
integer, dimension(2) :: dims ! array for shape of x
integer :: i, j
character(len=10) :: udf_temp ! user defined format, eg "F8.3, ...
character(len = :), allocatable :: udf ! trimmed udf_temp
character(len = 10) :: udf2
character(len = 10) :: txt1, txt2
integer :: ncols ! no. of columns of array
integer :: udf_temp_length
udf_temp_length = len_trim(udf_temp)
allocate(character(len=udf_temp_length) :: udf)
dims = shape(x)
ncols = dims(2)
write (txt1, '(I5)') ncols
udf2 = trim(txt1)//adjustl(udf)
txt2 = "("//trim(udf2)//")"
do i = 1, dims(1)
write (*, txt2) (x(i, j), j = 1, dims(2)) ! this is line 38
end do
end suroutine printarray
when I set len = 10:
character(len=10) :: udf_temp
I get compile error:
call printarray(mat1, "F8.3")
1
Warning: Character length of actual argument shorter than of dummy argument 'udf_temp' (4/10) at (1)
When I set len = *
character(len=*) :: udf_temp
it compiles but at runtime:
At line 38 of file where2.f95 (unit = 6, file = 'stdout')
Fortran runtime error: Unexpected element '( 8
What am I doing wrong?
Is there a neater way to do this?
Here's a summary of your question that I will try to address: You want to have a subroutine that will print a specified two-dimensional array with a specified format, such that each row is printed on a single line. For example, assume we have the real array:
real, dimension(2,8) :: x
x = reshape([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16], shape=[2,8], order=[2,1])
! Then the array is:
! 1.000 2.000 3.000 4.000 5.000 6.000 7.000 8.000
! 9.000 10.000 11.000 12.000 13.000 14.000 15.000 16.000
We want to use the format "F8.3", which prints floating point values (reals) with a field width of 8 and 3 decimal places.
Now, you are making a couple of mistakes when creating the format within your subroutine. First, you try to use udf to create the udf2 string. This is a problem because although you have allocated the size of udf, nothing has been assigned to it (pointed out in a comment by #francescalus). Thus, you see the error message you reported: Fortran runtime error: Unexpected element '( 8.
In the following, I make a couple of simplifying changes and demonstrate a few (slightly) different techniques. As shown, I suggest the use of * to indicate that the format can be applied an unlimited number of times, until all elements of the output list have been visited. Of course, explicitly stating the number of times to apply the format (ie, "(8F8.3)" instead of "(*(F8.3))") is fine, but the latter is slightly less work.
program main
implicit none
real, dimension(2,8) :: x
character(len=:), allocatable :: udf_in
x = reshape([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16], shape=[2,8], order=[2,1])
udf_in = "F8.3"
call printarray(x, udf_in)
contains
subroutine printarray(x, udf_in)
implicit none
real, dimension(:,:), intent(in) :: x
character(len=*), intent(in) :: udf_in
integer :: ncols ! size(x,dim=2)
character(len=10) :: ncols_str ! ncols, stringified
integer, dimension(2) :: dims ! shape of x
character(len=:), allocatable :: udf0, udf1 ! format codes
integer :: i, j ! index counters
dims = shape(x) ! or just use: ncols = size(x, dim=2)
ncols = dims(2)
write (ncols_str, '(i0)') ncols ! use 'i0' for min. size
udf0 = "(" // ncols_str // udf_in // ")" ! create string: "(8F8.3)"
udf1 = "(*(" // udf_in // "))" ! create string: "(*(F8.3))"
print *, "Version 1:"
do i = 1, dims(1)
write (*, udf0) (x(i, j), j = 1,ncols) ! implied do-loop over j.
end do
print *, "Version 2:"
do i = 1, dims(1)
! udf1: "(*(F8.3))"
write (*, udf1) (x(i, j), j = 1,ncols) ! implied do-loop over j
end do
print *, "Version 3:"
do i = 1, size(x,dim=1) ! no need to create nrows/ncols vars.
write(*, udf1) x(i,:) ! let the compiler handle the extents.
enddo
end subroutine printarray
end program main
Observe: the final do-loop ("Version 3") is very simple. It does not need an explicit count of ncols because the * takes care of it automatically. Due to its simplicity, there is really no need for a subroutine at all.
besides the actual error (not using the input argument), this whole thing can be done much more simply:
subroutine printarray(m,f)
implicit none
character(len=*)f
real m(:,:)
character*10 n
write(n,'(i0)')size(m(1,:))
write(*,'('//n//f//')')transpose(m)
end subroutine
end
note no need for the loop constructs as fortran will automatically write the whole array , line wrapping as you reach the length of data specified by your format.
alternately you can use a loop construct, then you can use a '*' repeat count in the format and obviate the need for the internal write to construct the format string.
subroutine printarray(m,f)
implicit none
character(len=*)f
real m(:,:)
integer :: i
do i=1,size(m(:,1))
write(*,'(*('//f//'))')m(i,:)
enddo
end subroutine
end

Using MKL to solve a non-linear system of equations with an objective function stored in another module

I'm trying to use the MKL trust region algorithm to solve a nonlinear system of equations in a Fortran program. I started from the example provided online (ex_nlsqp_f90_x.f90 https://software.intel.com/en-us/node/501498) and everything works correctly. Now, because I have to use this in a much bigger program, I need the user defined objective function to be loaded from a separate module. Hence, I split the example into 2 separate files, but I'm not able to make it compile correctly.
So here is the code for module which contains user defined data structure and the objective function
module modFun
implicit none
private
public my_data, extended_powell
type :: my_data
integer a
integer sum
end type my_data
contains
subroutine extended_powell (m, n, x, f, user_data)
implicit none
integer, intent(in) :: m, n
real*8 , intent(in) :: x(n)
real*8, intent(out) :: f(m)
type(my_data) :: user_data
integer i
user_data%sum = user_data%sum + user_data%a
do i = 1, n/4
f(4*(i-1)+1) = x(4*(i-1)+1) + 10.0 * x(4*(i-1)+2)
f(4*(i-1)+2) = 2.2360679774998 * (x(4*(i-1)+3) - x(4*(i-1)+4))
f(4*(i-1)+3) = ( x(4*(i-1)+2) - 2.0 * x(4*(i-1)+3) )**2
f(4*(i-1)+4) = 3.1622776601684 * (x(4*(i-1)+1) - x(4*(i-1)+4))**2
end do
end subroutine extended_powell
end module modFun
and here the portion of the main program calling it
include 'mkl_rci.f90'
program EXAMPLE_EX_NLSQP_F90_X
use MKL_RCI
use MKL_RCI_type
use modFun
! user's objective function
! n - number of function variables
! m - dimension of function value
integer n, m
parameter (n = 4)
parameter (m = 4)
! precisions for stop-criteria (see manual for more details)
real*8 eps(6)
real*8 x(n)
real*8 fjac(m*n)
! number of iterations
integer fun
! Additional users data
type(my_data) :: m_data
m_data%a = 1
m_data%sum = 0
rs = 0.0
fun = djacobix(extended_powell,n,m,fjac,x,eps(1),%val(loc(m_data)))
end program EXAMPLE_EX_NLSQP_F90_X
Also djacobix code
INTERFACE
INTEGER FUNCTION DJACOBIX(fcn, n, m, fjac, x, eps, user_data)
USE, INTRINSIC :: ISO_C_BINDING
INTEGER, INTENT(IN) :: n
INTEGER, INTENT(IN) :: m
DOUBLE PRECISION, INTENT(IN) :: eps
DOUBLE PRECISION, INTENT(IN), DIMENSION(*) :: x
DOUBLE PRECISION, INTENT(OUT), DIMENSION(m, *) :: fjac
INTEGER(C_INTPTR_T) :: user_data
INTERFACE
SUBROUTINE fcn(m, n, x, f, user_data)
USE, INTRINSIC :: ISO_C_BINDING
INTEGER, INTENT(IN) :: n
INTEGER, INTENT(IN) :: m
DOUBLE PRECISION, INTENT(IN), DIMENSION(*) :: x
DOUBLE PRECISION, INTENT(OUT), DIMENSION(*) :: f
INTEGER(C_INTPTR_T), INTENT(IN) :: user_data
END SUBROUTINE
END INTERFACE
END FUNCTION
END INTERFACE
When i compile the following errors are generated:
mpiifort -g -t -mkl -I/apps/rhel6/intel/composer_xe_2015.3.187/mkl/include/intel64/lp64 -c modFun.f90
mpiifort -g -t -mkl -I/apps/rhel6/intel/composer_xe_2015.3.187/mkl/include/intel64/lp64 -c main.f90
main.f90(30): error #7065: The characteristics of dummy argument 5 of the associated actual procedure differ from the characteristics of dummy argument 5 of the dummy procedure. [EXTENDED_POWELL]
fun = djacobix(extended_powell,n,m,fjac,x,eps(1),%val(loc(m_data)))
-------------------^
I have the feeling I have to create an interface to override the check on the m_data, but I can't figure out where and how. Can anyone help me with this problem providing a working example?
I guess the reason is that the function djacobix passes the pointer instead of the true value of variable user_data.
You can check the manual at https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/nonlinear-optimization-problem-solvers/jacobian-matrix-calculation-routines/jacobix.html where a sentence shows that "You need to declare fcn as extern in the calling program."