I want to calculate D[a,d] = A[a,b,c] * B[b,c,d].
Method I: reshape A[a,b,c] => C1[a,e], B[b,c,d] => C2[e,d], e = b*c
Method II: directly call dgemm. This is a run-time error.
" na, nb, nc, nd ?
2 3 5 7
Time for reshaping method 2.447600000000000E-002
Intel MKL ERROR: Parameter 10 was incorrect on entry to DGEMM .
Time for straight method 1.838800000000000E-002
Difference between result matrices 5.46978468774136 "
Question: Can we contract two indexes together by BLAS?
The following only works for one index.
How to speed up reshape in higher rank tensor contraction by BLAS in Fortran?
Program reshape_for_blas
Use, Intrinsic :: iso_fortran_env, Only : wp => real64, li => int64
Implicit None
Real( wp ), Dimension( :, :, : ), Allocatable :: a
Real( wp ), Dimension( :, :, : ), Allocatable :: b
Real( wp ), Dimension( :, : ), Allocatable :: c1, c2
Real( wp ), Dimension( :, : ), Allocatable :: d
Real( wp ), Dimension( :, : ), Allocatable :: e
Integer :: na, nb, nc, nd, ne
Integer( li ) :: start, finish, rate
Write( *, * ) 'na, nb, nc, nd ?'
Read( *, * ) na, nb, nc, nd
ne = nb * nc
Allocate( a ( 1:na, 1:nb, 1:nc ) )
Allocate( b ( 1:nb, 1:nc, 1:nd ) )
Allocate( c1( 1:na, 1:ne ) )
Allocate( c2( 1:ne, 1:nd ) )
Allocate( d ( 1:na, 1:nd ) )
Allocate( e ( 1:na, 1:nd ) )
! Set up some data
Call Random_number( a )
Call Random_number( b )
! With reshapes
Call System_clock( start, rate )
c1 = Reshape( a, Shape( c1 ) )
c2 = Reshape( b, Shape( c2 ) )
Call dgemm( 'N', 'N', na, nd, ne, 1.0_wp, c1, Size( c1, Dim = 1 ), &
c2, Size( c2, Dim = 1 ), &
0.0_wp, e, Size( e, Dim = 1 ) )
Call System_clock( finish, rate )
Write( *, * ) 'Time for reshaping method ', Real( finish - start, wp ) / rate
! Direct
Call System_clock( start, rate )
Call dgemm( 'N', 'N', na, nd, ne, 1.0_wp, a , Size( a , Dim = 1 ), &
b , Size( b , Dim = 1 ), &
0.0_wp, d, Size( d, Dim = 1 ) )
Call System_clock( finish, rate )
Write( *, * ) 'Time for straight method ', Real( finish - start, wp ) / rate
Write( *, * ) 'Difference between result matrices ', Maxval( Abs( d - e ) )
End Program reshape_for_blas
Related
Related question Fortran: Which method is faster to change the rank of arrays? (Reshape vs. Pointer)
If I have a tensor contraction
A[a,b] * B[b,c,d] = C[a,c,d]
If I use BLAS, I think I need DGEMM (assume real values), then I can
first reshape tensor B[b,c,d] as D[b,e] where e = c*d,
DGEMM, A[a,b] * D[b,e] = E[a,e]
reshape E[a,e] into C[a,c,d]
The problem is, reshape is not that fast :( I saw discussions in Fortran: Which method is faster to change the rank of arrays? (Reshape vs. Pointer)
, in the above link, the author met some error messages, except reshape itself.
Thus, I am asking if there is a convenient solution.
[I have prefaced the size of dimensions with the letter n to avoid confusion in the below between the tensor and the size of the tensor]
As discussed in the comments there is no need to reshape. Dgemm has no concept of tensors, it only knows about arrays. All it cares about is that those arrays are laid out in the correct order in memory. As Fortran is column major if you use a 3 dimensional array to represent the 3 dimensional tensor B in the question it will be laid out exactly the same in memory as a 2 dimensional array used to represent the 2 dimensional tensor D. As far as the matrix mult is concerned all you need to do now is get the dot products which form the result to be the right length. This leads you to the conclusion that if you tell dgemm that B has a leading dim of nb, and a second dim of nc*nd you will get the right result. This leads us to
ian#eris:~/work/stack$ gfortran --version
GNU Fortran (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
ian#eris:~/work/stack$ cat reshape.f90
Program reshape_for_blas
Use, Intrinsic :: iso_fortran_env, Only : wp => real64, li => int64
Implicit None
Real( wp ), Dimension( :, : ), Allocatable :: a
Real( wp ), Dimension( :, :, : ), Allocatable :: b
Real( wp ), Dimension( :, :, : ), Allocatable :: c1, c2
Real( wp ), Dimension( :, : ), Allocatable :: d
Real( wp ), Dimension( :, : ), Allocatable :: e
Integer :: na, nb, nc, nd, ne
Integer( li ) :: start, finish, rate
Write( *, * ) 'na, nb, nc, nd ?'
Read( *, * ) na, nb, nc, nd
ne = nc * nd
Allocate( a ( 1:na, 1:nb ) )
Allocate( b ( 1:nb, 1:nc, 1:nd ) )
Allocate( c1( 1:na, 1:nc, 1:nd ) )
Allocate( c2( 1:na, 1:nc, 1:nd ) )
Allocate( d ( 1:nb, 1:ne ) )
Allocate( e ( 1:na, 1:ne ) )
! Set up some data
Call Random_number( a )
Call Random_number( b )
! With reshapes
Call System_clock( start, rate )
d = Reshape( b, Shape( d ) )
Call dgemm( 'N', 'N', na, ne, nb, 1.0_wp, a, Size( a, Dim = 1 ), &
d, Size( d, Dim = 1 ), &
0.0_wp, e, Size( e, Dim = 1 ) )
c1 = Reshape( e, Shape( c1 ) )
Call System_clock( finish, rate )
Write( *, * ) 'Time for reshaping method ', Real( finish - start, wp ) / rate
! Direct
Call System_clock( start, rate )
Call dgemm( 'N', 'N', na, ne, nb, 1.0_wp, a , Size( a , Dim = 1 ), &
b , Size( b , Dim = 1 ), &
0.0_wp, c2, Size( c2, Dim = 1 ) )
Call System_clock( finish, rate )
Write( *, * ) 'Time for straight method ', Real( finish - start, wp ) / rate
Write( *, * ) 'Difference between result matrices ', Maxval( Abs( c1 - c2 ) )
End Program reshape_for_blas
ian#eris:~/work/stack$ cat in
40 50 60 70
ian#eris:~/work/stack$ gfortran -std=f2008 -Wall -Wextra -fcheck=all reshape.f90 -lblas
ian#eris:~/work/stack$ ./a.out < in
na, nb, nc, nd ?
Time for reshaping method 1.0515256000000001E-002
Time for straight method 5.8608790000000003E-003
Difference between result matrices 0.0000000000000000
ian#eris:~/work/stack$ gfortran -std=f2008 -Wall -Wextra reshape.f90 -lblas
ian#eris:~/work/stack$ ./a.out < in
na, nb, nc, nd ?
Time for reshaping method 1.3585931000000001E-002
Time for straight method 1.6730429999999999E-003
Difference between result matrices 0.0000000000000000
That said I think it worth noting though that the overhead for reshaping is O(N^2) while the time for the matrix multiply is O(N^3). Thus for large matrices the percentage overhead due to the reshape will tend to zero. Now code performance is not the only consideration, code readability and maintainability is also very important. So, if you find the reshape method much more readable and the matrices you use are sufficiently large that the overhead is not of import, you may well use the reshapes as in this case code readability might be more important than the performance. Your call.
I have the following Fortran code (modified on top of many answers from stack overflow..)
Program blas
integer, parameter :: dp = selected_real_kind(15, 307)
Real( dp ), Dimension( :, : ), Allocatable :: a
Real( dp ), Dimension( :, :, : ), Allocatable :: b
Real( dp ), Dimension( :, :, : ), Allocatable :: c1, c2
Integer :: na, nb, nc, nd, ne
Integer :: la, lb, lc, ld
Write( *, * ) 'na, nb, nc, nd ?'
Read( *, * ) na, nb, nc, nd
ne = nc * nd
Allocate( a ( 1:na, 1:nb ) )
Allocate( b ( 1:nb, 1:nc, 1:nd ) )
Allocate( c1( 1:na, 1:nc, 1:nd ) )
Allocate( c2( 1:na, 1:nc, 1:nd ) )
Call Random_number( a )
Call Random_number( b )
c1 = 0.0_dp
c2 = 0.0_dp
do ld = 1, nd
do lc = 1, nc
do lb = 1, nb
do la = 1, na
c1(la,lc,ld) = c1(la,lc,ld) + a(la,lb) * b(lb, lc, ld)
end do
end do
end do
end do
Call dgemm( 'N', 'N', na, ne, nb, 1.0_dp, a , Size( a , Dim = 1 ), &
b , Size( b , Dim = 1 ), &
0.0_dp, c2, Size( c2, Dim = 1 ) )
do la = 1, na
do lc = 1, nc
do ld = 1, nd
if ( dabs(c2(la,lc,ld) - c1(la,lc,ld)) > 1.e-6 ) then
write (*,*) '!!! c2', la,lc,ld, c2(la,lc,ld) - c1(la,lc,ld)
endif
enddo
enddo
enddo
End
(call it test.f90).
It works by gfortran -O3 test.f90 -L/opt/OpenBLAS/lib -lopenblas. Then, I tried to link gfortran to mkl, suggested by https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html
gfortran -O3 test.f90 -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_gf_ilp64 -lmkl_sequential -lmkl_core -lpthread -lm -ld. And I got
Intel MKL ERROR: Parameter 10 was incorrect on entry to DGEMM .
My question is, what's wrong with the parameter 10? and how to fix it? It seems if I use ifort with -mkl, the above problem does not appear.
You selected the ilp64 version of MKL. That means that integers, longs and pointers are 64-bit. But you are not using gfortran with 64-bit integers, the default in all compilers I know is 32-bit integers. Either you want a different version of MKL, like lp64, or you want to set up your gfortran to use 64-bit default integers. For the former, select the 32bit-integer interface layer in the Link Advisor.
See also https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models
I am working on a code that needs to get the two variables of a binomial expression. I want to know if there's a way to separate a string with a sign in the middle into two substrings.
e.g. (x+y)^3 to var1=x and var2=y or (qwerty-asdf)^12 to var1=qwerty and var2=asdf
I have tried doing this:
character(100) :: str, var
var=' '
do i=1,len(str)
if (((str(i:i) == "+") .or. (str(i:i) == "-")) .or. &
((str(i:i) >= "a") .and. (str(i:i) <= "z"))) &
var=trim(var)//trim(str(i:i))
end do
But the only characters that get removed are the parenthesis and the power.
Another way that I'm looking at my problem is that if I know what the string length where the signs are then I can do this:
character(100) :: str, var
var=' '
do i=1,len(the_unknown_string_length)
if ((str(i:i) >= "a") .and. (str(i:i) <= "z")) &
var=trim(var)//trim(str(i:i))
end do
Although, I also don't know how I could get the specific string length where the signs appear.
I wouldn't mess about with loops - I'd use the available intrinsic functions. Something like
ijb#ijb-Latitude-5410:~/work/stack$ cat binom.f90
Program binom
Implicit None
Character( Len = 100 ) :: expression
Character( Len = : ), Allocatable :: var1, var2
Write( *, '( a )' ) 'Expression?'
Read ( *, '( a )' ) expression
Call split_it( expression, var1, var2 )
If( Len( var1 ) /= 0 .And. Len( var2 ) /= 0 ) Then
Write( *, '( a, a, t20, i0 )' ) 'Var1 = ', var1, Len( var1 )
Write( *, '( a, a, t20, i0 )' ) 'Var2 = ', var2, Len( var2 )
Else
Write( *, * ) 'No + or - in the string'
End If
Contains
Subroutine split_it( expression, var1, var2 )
Implicit None
Character( Len = * ), Intent( In ) :: expression
Character( Len = : ), Allocatable, Intent( Out ) :: var1
Character( Len = : ), Allocatable, Intent( Out ) :: var2
Integer :: split_pos
Integer :: paren_pos
split_pos = Scan( expression, '+-' )
If( split_pos /= 0 ) Then
var1 = Trim( Adjustl( expression( :split_pos - 1 ) ) )
paren_pos = Scan( var1, '(' )
var1 = Trim( Adjustl( var1( paren_pos + 1: ) ) )
var2 = Trim( Adjustl( expression( split_pos + 1: ) ) )
paren_pos = Scan( var2, ')' )
var2 = Trim( Adjustl( var2( :paren_pos - 1 ) ) )
Else
Allocate( Character( Len = 0 ) :: var1 )
Allocate( Character( Len = 0 ) :: var2 )
End If
End Subroutine split_it
End Program binom
ijb#ijb-Latitude-5410:~/work/stack$ gfortran --version
GNU Fortran (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
ijb#ijb-Latitude-5410:~/work/stack$ gfortran -Wall -Wextra -fcheck=all -std=f2008 -g binom.f90
ijb#ijb-Latitude-5410:~/work/stack$ ./a.out
Expression?
(x+y)^3
Var1 = x 1
Var2 = y 1
ijb#ijb-Latitude-5410:~/work/stack$ ./a.out
Expression?
(qwerty-asdf)^12
Var1 = qwerty 6
Var2 = asdf 4
ijb#ijb-Latitude-5410:~/work/stack$ ./a.out
Expression?
( aag + fg ) ^98
Var1 = aag 3
Var2 = fg 2
ijb#ijb-Latitude-5410:~/work/stack$ ./a.out
Expression?
wibble
No + or - in the string
I need to implement an algorithm that operates on a list of matrices. The number of matrices and their sizes are not known in advance - a user is free to apply the algorithm to any finite number of matrices of any size. How can I implement such behavior in Fortran code? Is there a proper data structure available to do that? I am looking for a well-established Fortran programming pattern.
It is relatively easy to implement such an algorithm in Python using a combination of the list data structure and numpy matrices, but it works way to slow.
Note I have assumed that all your matrices have elements of the same data type.
Here is a simplified (and through that very slightly old fashioned) example of what I would do
ian#eris:~/work/stack$ cat list_of_matrices.f90
Module numbers_module
Implicit None
Integer, Parameter, Public :: wp = Selected_real_kind( 12, 70 )
Private
End Module numbers_module
Module matrix_module
Use numbers_module, Only : wp
Implicit None
Type, Public :: matrix
Real( wp ), Dimension( :, : ), Allocatable, Public :: data
End type matrix
Public :: matrix_allocate
Public :: matrix_free
Public :: matrix_set_with_random
Public :: matrix_print
Private
Contains
Subroutine matrix_allocate( A, m, n )
Type( matrix ), Intent( Out ) :: A
Integer , Intent( In ) :: m
Integer , Intent( In ) :: n
Allocate( A%data( 1:m, 1:n ) )
End Subroutine matrix_allocate
Subroutine matrix_free( A )
Type( matrix ), Intent( InOut ) :: A
Deallocate( A%data )
End Subroutine matrix_free
Subroutine matrix_set_with_random( A )
Type( matrix ), Intent( InOut ) :: A
Call Random_number( A%data )
End Subroutine matrix_set_with_random
Subroutine matrix_print( A, format )
Type( matrix ) , Intent( In ) :: A
Character( Len = * ), Intent( In ) :: format
Integer :: i
Write( *, * ) 'The matrix has the shape: ', Shape( A%data )
Do i = 1, Size( A%data, Dim = 1 )
Write( *, format ) A%data( i, : )
End Do
End Subroutine matrix_print
End Module matrix_module
Program test_matrix
Use matrix_module, Only : matrix, matrix_allocate, matrix_free, &
matrix_set_with_random, matrix_print
Implicit None
Type( matrix ), Dimension( : ), Allocatable :: list_of_matrices
Integer :: n_mats
Integer :: n, m
Integer :: i_mat
Write( *, * ) 'How many matrices'
Read ( *, * ) n_mats
Allocate( list_of_matrices( 1:n_mats ) )
Do i_mat = 1, n_mats
Write( *, * ) 'Dimensions for matrix ', i_mat
Read ( *, * ) m, n
Call matrix_allocate( list_of_matrices( i_mat ), m, n )
Call matrix_set_with_random( list_of_matrices( i_mat ) )
End Do
Do i_mat = 1, n_mats
Write( *, * ) 'Data for matrix ', i_mat
Call matrix_print( list_of_matrices( i_mat ), '( 20( f5.2, 1x ) )' )
End Do
Do i_mat = n_mats, 1, -1
Call matrix_free( list_of_matrices( i_mat ) )
End Do
Deallocate( list_of_matrices )
End Program test_matrix
ian#eris:~/work/stack$ gfortran -std=f2008 -Wall -Wextra -Wuse-without-only -Wsurprising -Wimplicit-interface -Werror -fcheck=all list_of_matrices.f90 -o list_of_matrices
ian#eris:~/work/stack$ ./list_of_matrices
How many matrices
3
Dimensions for matrix 1
2 1
Dimensions for matrix 2
4 3
Dimensions for matrix 3
5 6
Data for matrix 1
The matrix has the shape: 2 1
0.06
0.31
Data for matrix 2
The matrix has the shape: 4 3
0.02 0.63 0.08
0.26 0.84 0.75
0.85 0.67 0.34
0.85 0.91 0.33
Data for matrix 3
The matrix has the shape: 5 6
0.35 0.58 0.01 0.93 0.74 0.46
0.43 0.38 0.89 0.83 0.51 0.26
0.33 0.03 0.73 0.26 0.40 0.58
0.48 0.87 0.15 0.62 0.13 0.79
0.59 0.97 0.15 0.09 0.05 0.37
ian#eris:~/work/stack$
In practice I would have the contents of the derived type kept private and only accessible by the module procedures, and nowadays I would use type bound procedures within the matrix type, but for this I think that distracts from the point hence going the slightly older route. In production code I would also probably have a separate list_of_matrices type to hold the array of matrices, but it depends on exactly what you are doing.
In fact I am currently working on something which is essentially a much more complicated version of this - routines to perform linear algebra on list of matrices, where those matrices may be either real or complex, and the data within those matrices can be distributed across multiple processes. Having just berated somebody for asking us to download unknown files I feel somewhat guilty about this, but if interested you can find it on github:
git clone https://github.com/drijbush/dmat2.git
I am new to MPI programming with Fortran. I want to plot a 2D graph. I am trying to let each processor calculate one point of graph and send it to root to write it on file. Can somebody tell me how to send two variables viz: x and f(x) with mpi_gather. Thanks for any help.
Just as an example of both what Hristo said and " Is there anything wrong with passing an unallocated array to a routine without an explicit interface? " here's how you might do it
Program gather
Use mpi
Implicit None
Integer, Dimension( :, : ), Allocatable :: result
Integer, Dimension( 1:2 ) :: buffer
Integer :: me, nprocs, error
Integer :: x, fx
Call mpi_init( error )
Call mpi_comm_rank( mpi_comm_world, me , error )
Call mpi_comm_size( mpi_comm_world, nprocs, error )
If( me == 0 ) Then
Allocate( result( 1:2, 1:nprocs ) ) !Naughty - should check stat
Else
Allocate( result( 1:0, 1:0 ) ) !Naughty - should check stat
End If
x = me
fx = x * x
buffer( 1 ) = x
buffer( 2 ) = fx
Call mpi_gather( buffer, 2, mpi_integer, &
result, 2, mpi_integer, &
0, mpi_comm_world, error )
If( me == 0 ) Then
Write( *, '( 99999( i3, 1x ) )' ) result( 1, : )
Write( *, '( 99999( i3, 1x ) )' ) result( 2, : )
End If
Call mpi_finalize( error )
End Program gather
Wot now? mpif90 gather.f90
Wot now? mpirun -np 7 ./a.out
0 1 2 3 4 5 6
0 1 4 9 16 25 36