Related
Related question Fortran: Which method is faster to change the rank of arrays? (Reshape vs. Pointer)
If I have a tensor contraction
A[a,b] * B[b,c,d] = C[a,c,d]
If I use BLAS, I think I need DGEMM (assume real values), then I can
first reshape tensor B[b,c,d] as D[b,e] where e = c*d,
DGEMM, A[a,b] * D[b,e] = E[a,e]
reshape E[a,e] into C[a,c,d]
The problem is, reshape is not that fast :( I saw discussions in Fortran: Which method is faster to change the rank of arrays? (Reshape vs. Pointer)
, in the above link, the author met some error messages, except reshape itself.
Thus, I am asking if there is a convenient solution.
[I have prefaced the size of dimensions with the letter n to avoid confusion in the below between the tensor and the size of the tensor]
As discussed in the comments there is no need to reshape. Dgemm has no concept of tensors, it only knows about arrays. All it cares about is that those arrays are laid out in the correct order in memory. As Fortran is column major if you use a 3 dimensional array to represent the 3 dimensional tensor B in the question it will be laid out exactly the same in memory as a 2 dimensional array used to represent the 2 dimensional tensor D. As far as the matrix mult is concerned all you need to do now is get the dot products which form the result to be the right length. This leads you to the conclusion that if you tell dgemm that B has a leading dim of nb, and a second dim of nc*nd you will get the right result. This leads us to
ian#eris:~/work/stack$ gfortran --version
GNU Fortran (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
ian#eris:~/work/stack$ cat reshape.f90
Program reshape_for_blas
Use, Intrinsic :: iso_fortran_env, Only : wp => real64, li => int64
Implicit None
Real( wp ), Dimension( :, : ), Allocatable :: a
Real( wp ), Dimension( :, :, : ), Allocatable :: b
Real( wp ), Dimension( :, :, : ), Allocatable :: c1, c2
Real( wp ), Dimension( :, : ), Allocatable :: d
Real( wp ), Dimension( :, : ), Allocatable :: e
Integer :: na, nb, nc, nd, ne
Integer( li ) :: start, finish, rate
Write( *, * ) 'na, nb, nc, nd ?'
Read( *, * ) na, nb, nc, nd
ne = nc * nd
Allocate( a ( 1:na, 1:nb ) )
Allocate( b ( 1:nb, 1:nc, 1:nd ) )
Allocate( c1( 1:na, 1:nc, 1:nd ) )
Allocate( c2( 1:na, 1:nc, 1:nd ) )
Allocate( d ( 1:nb, 1:ne ) )
Allocate( e ( 1:na, 1:ne ) )
! Set up some data
Call Random_number( a )
Call Random_number( b )
! With reshapes
Call System_clock( start, rate )
d = Reshape( b, Shape( d ) )
Call dgemm( 'N', 'N', na, ne, nb, 1.0_wp, a, Size( a, Dim = 1 ), &
d, Size( d, Dim = 1 ), &
0.0_wp, e, Size( e, Dim = 1 ) )
c1 = Reshape( e, Shape( c1 ) )
Call System_clock( finish, rate )
Write( *, * ) 'Time for reshaping method ', Real( finish - start, wp ) / rate
! Direct
Call System_clock( start, rate )
Call dgemm( 'N', 'N', na, ne, nb, 1.0_wp, a , Size( a , Dim = 1 ), &
b , Size( b , Dim = 1 ), &
0.0_wp, c2, Size( c2, Dim = 1 ) )
Call System_clock( finish, rate )
Write( *, * ) 'Time for straight method ', Real( finish - start, wp ) / rate
Write( *, * ) 'Difference between result matrices ', Maxval( Abs( c1 - c2 ) )
End Program reshape_for_blas
ian#eris:~/work/stack$ cat in
40 50 60 70
ian#eris:~/work/stack$ gfortran -std=f2008 -Wall -Wextra -fcheck=all reshape.f90 -lblas
ian#eris:~/work/stack$ ./a.out < in
na, nb, nc, nd ?
Time for reshaping method 1.0515256000000001E-002
Time for straight method 5.8608790000000003E-003
Difference between result matrices 0.0000000000000000
ian#eris:~/work/stack$ gfortran -std=f2008 -Wall -Wextra reshape.f90 -lblas
ian#eris:~/work/stack$ ./a.out < in
na, nb, nc, nd ?
Time for reshaping method 1.3585931000000001E-002
Time for straight method 1.6730429999999999E-003
Difference between result matrices 0.0000000000000000
That said I think it worth noting though that the overhead for reshaping is O(N^2) while the time for the matrix multiply is O(N^3). Thus for large matrices the percentage overhead due to the reshape will tend to zero. Now code performance is not the only consideration, code readability and maintainability is also very important. So, if you find the reshape method much more readable and the matrices you use are sufficiently large that the overhead is not of import, you may well use the reshapes as in this case code readability might be more important than the performance. Your call.
I want to calculate D[a,d] = A[a,b,c] * B[b,c,d].
Method I: reshape A[a,b,c] => C1[a,e], B[b,c,d] => C2[e,d], e = b*c
Method II: directly call dgemm. This is a run-time error.
" na, nb, nc, nd ?
2 3 5 7
Time for reshaping method 2.447600000000000E-002
Intel MKL ERROR: Parameter 10 was incorrect on entry to DGEMM .
Time for straight method 1.838800000000000E-002
Difference between result matrices 5.46978468774136 "
Question: Can we contract two indexes together by BLAS?
The following only works for one index.
How to speed up reshape in higher rank tensor contraction by BLAS in Fortran?
Program reshape_for_blas
Use, Intrinsic :: iso_fortran_env, Only : wp => real64, li => int64
Implicit None
Real( wp ), Dimension( :, :, : ), Allocatable :: a
Real( wp ), Dimension( :, :, : ), Allocatable :: b
Real( wp ), Dimension( :, : ), Allocatable :: c1, c2
Real( wp ), Dimension( :, : ), Allocatable :: d
Real( wp ), Dimension( :, : ), Allocatable :: e
Integer :: na, nb, nc, nd, ne
Integer( li ) :: start, finish, rate
Write( *, * ) 'na, nb, nc, nd ?'
Read( *, * ) na, nb, nc, nd
ne = nb * nc
Allocate( a ( 1:na, 1:nb, 1:nc ) )
Allocate( b ( 1:nb, 1:nc, 1:nd ) )
Allocate( c1( 1:na, 1:ne ) )
Allocate( c2( 1:ne, 1:nd ) )
Allocate( d ( 1:na, 1:nd ) )
Allocate( e ( 1:na, 1:nd ) )
! Set up some data
Call Random_number( a )
Call Random_number( b )
! With reshapes
Call System_clock( start, rate )
c1 = Reshape( a, Shape( c1 ) )
c2 = Reshape( b, Shape( c2 ) )
Call dgemm( 'N', 'N', na, nd, ne, 1.0_wp, c1, Size( c1, Dim = 1 ), &
c2, Size( c2, Dim = 1 ), &
0.0_wp, e, Size( e, Dim = 1 ) )
Call System_clock( finish, rate )
Write( *, * ) 'Time for reshaping method ', Real( finish - start, wp ) / rate
! Direct
Call System_clock( start, rate )
Call dgemm( 'N', 'N', na, nd, ne, 1.0_wp, a , Size( a , Dim = 1 ), &
b , Size( b , Dim = 1 ), &
0.0_wp, d, Size( d, Dim = 1 ) )
Call System_clock( finish, rate )
Write( *, * ) 'Time for straight method ', Real( finish - start, wp ) / rate
Write( *, * ) 'Difference between result matrices ', Maxval( Abs( d - e ) )
End Program reshape_for_blas
I have the following Fortran code (modified on top of many answers from stack overflow..)
Program blas
integer, parameter :: dp = selected_real_kind(15, 307)
Real( dp ), Dimension( :, : ), Allocatable :: a
Real( dp ), Dimension( :, :, : ), Allocatable :: b
Real( dp ), Dimension( :, :, : ), Allocatable :: c1, c2
Integer :: na, nb, nc, nd, ne
Integer :: la, lb, lc, ld
Write( *, * ) 'na, nb, nc, nd ?'
Read( *, * ) na, nb, nc, nd
ne = nc * nd
Allocate( a ( 1:na, 1:nb ) )
Allocate( b ( 1:nb, 1:nc, 1:nd ) )
Allocate( c1( 1:na, 1:nc, 1:nd ) )
Allocate( c2( 1:na, 1:nc, 1:nd ) )
Call Random_number( a )
Call Random_number( b )
c1 = 0.0_dp
c2 = 0.0_dp
do ld = 1, nd
do lc = 1, nc
do lb = 1, nb
do la = 1, na
c1(la,lc,ld) = c1(la,lc,ld) + a(la,lb) * b(lb, lc, ld)
end do
end do
end do
end do
Call dgemm( 'N', 'N', na, ne, nb, 1.0_dp, a , Size( a , Dim = 1 ), &
b , Size( b , Dim = 1 ), &
0.0_dp, c2, Size( c2, Dim = 1 ) )
do la = 1, na
do lc = 1, nc
do ld = 1, nd
if ( dabs(c2(la,lc,ld) - c1(la,lc,ld)) > 1.e-6 ) then
write (*,*) '!!! c2', la,lc,ld, c2(la,lc,ld) - c1(la,lc,ld)
endif
enddo
enddo
enddo
End
(call it test.f90).
It works by gfortran -O3 test.f90 -L/opt/OpenBLAS/lib -lopenblas. Then, I tried to link gfortran to mkl, suggested by https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html
gfortran -O3 test.f90 -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_gf_ilp64 -lmkl_sequential -lmkl_core -lpthread -lm -ld. And I got
Intel MKL ERROR: Parameter 10 was incorrect on entry to DGEMM .
My question is, what's wrong with the parameter 10? and how to fix it? It seems if I use ifort with -mkl, the above problem does not appear.
You selected the ilp64 version of MKL. That means that integers, longs and pointers are 64-bit. But you are not using gfortran with 64-bit integers, the default in all compilers I know is 32-bit integers. Either you want a different version of MKL, like lp64, or you want to set up your gfortran to use 64-bit default integers. For the former, select the 32bit-integer interface layer in the Link Advisor.
See also https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models
I need to implement an algorithm that operates on a list of matrices. The number of matrices and their sizes are not known in advance - a user is free to apply the algorithm to any finite number of matrices of any size. How can I implement such behavior in Fortran code? Is there a proper data structure available to do that? I am looking for a well-established Fortran programming pattern.
It is relatively easy to implement such an algorithm in Python using a combination of the list data structure and numpy matrices, but it works way to slow.
Note I have assumed that all your matrices have elements of the same data type.
Here is a simplified (and through that very slightly old fashioned) example of what I would do
ian#eris:~/work/stack$ cat list_of_matrices.f90
Module numbers_module
Implicit None
Integer, Parameter, Public :: wp = Selected_real_kind( 12, 70 )
Private
End Module numbers_module
Module matrix_module
Use numbers_module, Only : wp
Implicit None
Type, Public :: matrix
Real( wp ), Dimension( :, : ), Allocatable, Public :: data
End type matrix
Public :: matrix_allocate
Public :: matrix_free
Public :: matrix_set_with_random
Public :: matrix_print
Private
Contains
Subroutine matrix_allocate( A, m, n )
Type( matrix ), Intent( Out ) :: A
Integer , Intent( In ) :: m
Integer , Intent( In ) :: n
Allocate( A%data( 1:m, 1:n ) )
End Subroutine matrix_allocate
Subroutine matrix_free( A )
Type( matrix ), Intent( InOut ) :: A
Deallocate( A%data )
End Subroutine matrix_free
Subroutine matrix_set_with_random( A )
Type( matrix ), Intent( InOut ) :: A
Call Random_number( A%data )
End Subroutine matrix_set_with_random
Subroutine matrix_print( A, format )
Type( matrix ) , Intent( In ) :: A
Character( Len = * ), Intent( In ) :: format
Integer :: i
Write( *, * ) 'The matrix has the shape: ', Shape( A%data )
Do i = 1, Size( A%data, Dim = 1 )
Write( *, format ) A%data( i, : )
End Do
End Subroutine matrix_print
End Module matrix_module
Program test_matrix
Use matrix_module, Only : matrix, matrix_allocate, matrix_free, &
matrix_set_with_random, matrix_print
Implicit None
Type( matrix ), Dimension( : ), Allocatable :: list_of_matrices
Integer :: n_mats
Integer :: n, m
Integer :: i_mat
Write( *, * ) 'How many matrices'
Read ( *, * ) n_mats
Allocate( list_of_matrices( 1:n_mats ) )
Do i_mat = 1, n_mats
Write( *, * ) 'Dimensions for matrix ', i_mat
Read ( *, * ) m, n
Call matrix_allocate( list_of_matrices( i_mat ), m, n )
Call matrix_set_with_random( list_of_matrices( i_mat ) )
End Do
Do i_mat = 1, n_mats
Write( *, * ) 'Data for matrix ', i_mat
Call matrix_print( list_of_matrices( i_mat ), '( 20( f5.2, 1x ) )' )
End Do
Do i_mat = n_mats, 1, -1
Call matrix_free( list_of_matrices( i_mat ) )
End Do
Deallocate( list_of_matrices )
End Program test_matrix
ian#eris:~/work/stack$ gfortran -std=f2008 -Wall -Wextra -Wuse-without-only -Wsurprising -Wimplicit-interface -Werror -fcheck=all list_of_matrices.f90 -o list_of_matrices
ian#eris:~/work/stack$ ./list_of_matrices
How many matrices
3
Dimensions for matrix 1
2 1
Dimensions for matrix 2
4 3
Dimensions for matrix 3
5 6
Data for matrix 1
The matrix has the shape: 2 1
0.06
0.31
Data for matrix 2
The matrix has the shape: 4 3
0.02 0.63 0.08
0.26 0.84 0.75
0.85 0.67 0.34
0.85 0.91 0.33
Data for matrix 3
The matrix has the shape: 5 6
0.35 0.58 0.01 0.93 0.74 0.46
0.43 0.38 0.89 0.83 0.51 0.26
0.33 0.03 0.73 0.26 0.40 0.58
0.48 0.87 0.15 0.62 0.13 0.79
0.59 0.97 0.15 0.09 0.05 0.37
ian#eris:~/work/stack$
In practice I would have the contents of the derived type kept private and only accessible by the module procedures, and nowadays I would use type bound procedures within the matrix type, but for this I think that distracts from the point hence going the slightly older route. In production code I would also probably have a separate list_of_matrices type to hold the array of matrices, but it depends on exactly what you are doing.
In fact I am currently working on something which is essentially a much more complicated version of this - routines to perform linear algebra on list of matrices, where those matrices may be either real or complex, and the data within those matrices can be distributed across multiple processes. Having just berated somebody for asking us to download unknown files I feel somewhat guilty about this, but if interested you can find it on github:
git clone https://github.com/drijbush/dmat2.git
I am new in Fortran programming so I need a help about allocatable arrays.
This is my simple code:
PROGRAM MY_SIMPLE_CODE
IMPLICIT NONE
INTEGER :: N_TMP, ALLOC_ERR, DEALLOC_ERR
REAL, ALLOCATABLE, DIMENSION(:) :: P_POT
WRITE( *,* ) "ENTER THE VALUE FOR N_TMP:"
READ( *,* ) N_TMP
IF ( .NOT. ALLOCATED( P_POT) ) ALLOCATE( P_POT( N_TMP), STATUS = ALLOC_ERR )
IF ( ALLOC_ERR .NE. 0 ) STOP( "ERROR - ALLOCATION P_POT !!!")
IF ( ALLOCATED( P_POT) ) DEALLOCATE( P_POT, STATUS = DEALLOC_ERR )
IF ( DEALLOC_ERR .NE. 0 ) STOP( "ERROR - DEALLOCATION P_POT !!!")
END PROGRAM MY_SIMPLE_CODE
When I cobuild this code I got this error message:
Allocate-object is neither a data pointer nor an allocatable variable
What is wrong with this code?
What kind of tricky stuff can be masked in this simple code?
IDE: Code::Blocks TDM_GCC_5 1 0
OS: Win 10 X64
Just like #Steve said in the comment, the keyword for the status of allocation/deallocation is STAT, not STATUS. The error comes because the compiler doesn't recognize the name and thinks it is a variable.
Moreover, there is a syntax error because there must be at least a space between the STOP statement and the opening brace (or no braces at all).
IF ( .NOT. ALLOCATED( P_POT) ) ALLOCATE( P_POT( N_TMP), STAT = ALLOC_ERR )
IF ( ALLOC_ERR .NE. 0 ) STOP "ERROR - ALLOCATION P_POT !!!"
!(...)
IF ( ALLOCATED( P_POT) ) DEALLOCATE( P_POT, STAT = DEALLOC_ERR )
IF ( DEALLOC_ERR .NE. 0 ) STOP "ERROR - DEALLOCATION P_POT !!!"