Hi I am trying to compute an inner product in Fortran. I provide a sample code below and explain the output I am getting, and the expected output. The code itself compiles with no errors, however the output I obtain is not what I am expecting. I think am not properly coding the inner product. The code is below.
EDIT: I edited the code based on the help obtained in the comments below.
program
integer :: i,j
integer, parameter :: nx = 10, ny = 10
complex, dimension(-nx:nx,-ny:ny) :: A,v
real :: B
B = 0.0
do j = -ny+1,ny-1
do i = -nx+1,nx-1
A(i,j) = v(i+1,j)+v(i-1,j)+v(i,j+1)+v(i,j-1)-4*v(i,j)
B = B + conjg(A(i,j))*A(i,j) !computing the inner product
end do
end do
print *, 'Result of the inner product of A with itself', B
end program
Am I computing the inner product correct now? Thanks.
Note: The trace of a matrix product is an inner product, e.g Frobenius inner product.just a generalization of the inner product to tensors of rank 2, Acts identical to the product between rank 1 tensors
Are you trying to compute the inner product of two matrices? Could you define that?
In any case if you want to calculate the inner product of two vectors if Fortran, you could write
prod = sum( A * B )
Where, A and B are conformable arrays of a type for which multiplication is defined (real, complex, etc.), and prod is a variable of the same type.
If A and B are one-dimensional, this calculates their inner product. I don't know what it is called otherwise.
EDIT
Based of the definition you provided ("Tr(A^\dagger A) = A_{ij}A^*_{ij} =Tr(AA^\dagger)"), you have got the bounds wrong. Put the inner product in a separate loop with
do i = -nx,nx
do j = -ny,ny
B = B + conjg(A(i,j))*A(i,j) !computing the inner product
end do
end do
Or use
B = sum( conjg(A)*A )
without a loop.
Related
My code below correctly solves a 1D heat equation for a function u(x,t). I now want to find the steady-state solution, the solution that no longer changes in time so it should satisfy u(t+1)-u(t) = 0. What is the most efficient way to find the steady-state solution? I show three different attempts below, but I'm not sure if either are actually doing what I want. The first and third have correct syntax, the second method has a syntax error due to the if statement. Each method is different due to the change in the if structure.
Method 1 :
program parabolic1
integer, parameter :: n = 10, m = 20
real, parameter :: h = 0.1, k = 0.005 !step sizes
real, dimension (0:n) :: u,v
integer:: i,j
real::pi,pi2
u(0) = 0.0; v(0) = 0.0; u(n) = 0.0; v(n) =0.0
pi = 4.0*atan(1.0)
pi2 = pi*pi
do i=1, n-1
u(i) = sin( pi*real(i)*h)
end do
do j = 1,m
do i = 1, n-1
v(i) = 0.5*(u(i-1)+u(i+1))
end do
t = real(j)*k !increment in time, now check for steady-state
!steady-state check: this checks the solutions at every space point which I don't think is correct.
do i = 1,n-1
if ( v(i) - u(i) .LT. 1.0e-7 ) then
print*, 'steady-state condition reached'
exit
end if
end do
do i = 1, n-1 !updating solution
u(i) = v(i)
end do
end do
end program parabolic1
Method 2 :
program parabolic1
integer, parameter :: n = 10, m = 20
real, parameter :: h = 0.1, k = 0.005 !step sizes
real, dimension (0:n) :: u,v
integer:: i,j
real::pi,pi2
u(0) = 0.0; v(0) = 0.0; u(n) = 0.0; v(n) =0.0
pi = 4.0*atan(1.0)
pi2 = pi*pi
do i=1, n-1
u(i) = sin( pi*real(i)*h)
end do
do j = 1,m
do i = 1, n-1
v(i) = 0.5*(u(i-1)+u(i+1))
end do
t = real(j)*k !increment in time, now check for steady-state
!steady-state check: (This gives an error message since the if statement doesn't have a logical scalar expression, but I want to compare the full arrays v and u as shown.
if ( v - u .LT. 1.0e-7 ) then
print*, 'steady-state condition reached'
exit
end if
do i = 1, n-1 !updating solution
u(i) = v(i)
end do
end do
end program parabolic1
Method 3 :
program parabolic1
integer, parameter :: n = 10, m = 20
real, parameter :: h = 0.1, k = 0.005 !step sizes
real, dimension (0:n) :: u,v
integer:: i,j
real::pi,pi2
u(0) = 0.0; v(0) = 0.0; u(n) = 0.0; v(n) =0.0
pi = 4.0*atan(1.0)
pi2 = pi*pi
do i=1, n-1
u(i) = sin( pi*real(i)*h)
end do
do j = 1,m
do i = 1, n-1
v(i) = 0.5*(u(i-1)+u(i+1))
end do
t = real(j)*k !increment in time, now check for steady-state
!steady-state check: Perhaps this is the correct expression I want to use
if( norm2(v) - norm2(u) .LT. 1.0e-7 ) then
print*, 'steady-state condition reached'
exit
end if
do i = 1, n-1 !updating solution
u(i) = v(i)
end do
end do
end program parabolic1
Without discussing which method to determine "closeness" is best or correct (not really being a programming problem) we can focus on what the Fortran parts of the methods are doing.
Method 1 and Method 2 are similar ideas (but broken in their execution), while Method 3 is different (and broken in another way).
Note also that in general one wants to compare the magnitude of the difference abs(v-u) rather than the (signed) difference v-u. With non-monotonic changes over iterations these are quite different.
Method 3 uses norm2(v) - norm2(u) to test whether the arrays u and v are similar. This isn't correct. Consider
norm2([1.,0.])-norm2([0.,1.])
instead of the more correct
norm2([1.,0.]-[0.,1.])
Method 2's
if ( v - u .LT. 1.0e-7 ) then
has the problem of being an invalid array expression, but the "are all points close?" can be written appropriately as
if ( ALL( v - u .LT. 1.0e-7 )) then
(You'll find other questions around here about such array reductions).
Method 1 tries something similar, but incorrectly:
do i = 1,n-1
if ( v(i) - u(i) .LT. 1.0e-7 ) then
print*, 'steady-state condition reached'
exit
end if
end do
This is incorrect in one big way, and one subtle way.
First, the loop is exited when the condition tests true the first time, with a message saying the steady state is reached. This is incorrect: you need all values close, while this is testing for any value close.
Second, when the condition is met, you exit. But you don't exit the time iteration loop, you exit the closeness testing loop. (exit without a construct name leaves the innermost do construct). You'll be in exactly the same situation, running again immediately after this innermost construct whether the tested condition is ever or never met (if ever met you'll get the message also). You will need to use a construct name on the time loop.
I won't show how to do that (again there are other questions here about that), because you also need to fix the test condition, by which point you'll be better off using if(all(... (corrected Method 2) without that additional do construct.
For Methods 1 and 2 you'll have something like:
if (all(v-u .lt 1e-7)) then
print *, "Converged"
exit
end if
And for Method 3:
if (norm2(v-u) .lt. 1e-7) then
print *, "Converged"
exit
end if
I would like to build a data structure for non tabular data. I am not sure what is the right way to do that in (modern) Fortran.
I have a data set of houses that includes their location (lat,lon) and price. I have another data of factories that include their location (lat,lon) and the amount of pollution they produce. For each house I need to create a list of factories which are within 5km radius of the house. Not just the number of these factories but the whole (lat,lon,pollution) vectors of these factories. Each house has a different number of factories close to it ranging from zero to about eighty.
MODULE someDefinitions
IMPLICIT NONE
INTEGER, PARAMETER :: N_houses=82390, N_factories=4215
TYPE house
REAL :: lat,lon,price
! a few more fields which are not important here
END TYPE
TYPE factory
REAL :: lat,lon,pollution
! a few more fields which are not important here
END TYPE
Contains
PURE FUNCTION haversine(deglat1,deglon1,deglat2,deglon2) RESULT (dist)
! Some code for computing haversine distance in meters
END FUNCTION haversine
END MODULE someDefinitions
PROGRAM createStructure
USE someDefinitions
IMPLICIT NONE
TYPE(factory), DIMENSION(N_factories) :: factories
TYPE(house), DIMENSION(N_houses) :: houses
INTEGER :: i,j
! more variables definitions as needed
! code to read houses data from the disk
! code to read factories data from the disk
DO i=1,N_houses
DO j=1,N_factories
!here I compute the distance between houses(i) and factories(j)
! If this distance<=5000 I want to add the index j to the list of indices
! associated with house i. How? What is the right data structure to do
! that? some houses have zero factories within 5000 meters from them.
! Some houses have about 80 factories around them. It's unbalanced.
END DO !j
END DO !i
END PROGRAM createStructure
The created structure will then be used in further calculations. A matrix of N_houses x N_factories is way too large to save in memory.
Note: I know Fortran 2008 if that is helpful in any way.
Using too many nested derived types can become tedious. Here is an example using 2D arrays for all data except the required list. This is similar to the K-Nearest Neighbors (KNN) algorithm naively implemented. There may be better algorithms, of course, but the following can be a good start.
program NoStrucyures
implicit none
type listi
real, allocatable :: item(:,:)
end type
integer, parameter :: N_houses=82390, N_factories=4215
real :: houses(N_houses,3)
real :: factories(N_factories,3)
real :: distance(N_factories)
type(listi) :: list(N_houses)
integer :: i, j, k, within5k
! Generating dummy data
call random_number(houses)
call random_number(factories)
houses = houses * 500000
factories = factories * 500000
do i = 1, N_houses
distance = sqrt((houses(i,1)-factories(:,1))**2 + (houses(i,2)-factories(:,2))**2)
within5k = count( distance <= 5000 )
if (within5k > 0) then
allocate(list(i)%item(within5k,3))
k = 0
do j = 1, N_factories
if (distance(j) <= 5000) then
k = k + 1
list(i)%item(k,:) = factories(j,:)
end if
end do
else
list(i)%item = reshape([-1, -1, -1],[1,3])
end if
end do
do i=1,10
print *, list(i)%item
end do
end program NoStrucyures
I am trying to calculate something similar to a weighted matrix inner product in Fortran. The current script that I am using for calculating the inner product is as follows
! --> In
real(kind=8), intent(in), dimension(ni, nj, nk, nVar) :: U1, U2
real(kind=8), intent(in), dimension(ni, nj, nk) :: intW
! --> Out
real(kind=8), intent(out) :: innerProd
! --> Local
integer :: ni, nj, nk, nVar, iVar
! --> Computing inner product
do iVar = 1, nVar
innerProd = innerProd + sum(U1(:,:,:,iVar)*U2(:,:,:,iVar)*intW)
enddo
But I found that the above script that I am currently using is not very efficient. The same operation can be performed in Python using NumPy as follows,
import numpy as np
import os
# --> Preventing numpy from multi-threading
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'
innerProd = 0
# --> Toy matrices
U1 = np.random.random((ni,nj,nk,nVar))
U2 = np.random.random((ni,nj,nk,nVar))
intW = np.random.random((ni,nj,nk))
# --> Reshaping
U1 = np.reshape(np.ravel(U1), (ni*nj*nk, nVar))
U2 = np.reshape(np.ravel(U1), (ni*nj*nk, nVar))
intW = np.reshape(np.ravel(intW), (ni*nj*nk))
# --> Calculating inner product
for iVar in range(nVar):
innerProd = innerProd + np.dot(U1[:, iVar], U2[:, iVar]*intW)
The second method using Numpy seems to be far more faster than the method using Fortran. For a specific case of ni = nj = nk = nVar = 130, the time taken by the two methods are as follows
fortran_time = 25.8641 s
numpy_time = 6.8924 s
I tried improving my Fortran code with ddot from BLAS as follows,
do iVar = 1, nVar
do k = 1, nk
do j = 1, nj
innerProd = innerProd + ddot(ni, U1(:,j,k,iVar), 1, U2(:,j,k,iVar)*intW(:,j,k), 1)
enddo
enddo
enddo
But there was no considerable improvement in time. The time taken by the above method for the case of ni = nj = nk = nVar = 130 is ~24s. (I forgot to mention that I compiled the Fortran code with '-O2' option for optimizing the performance).
Unfortunately, there is no BLAS function for element-wise matrix multiplication in Fortran. And I don't want to use reshape in Fortran because unlike python reshaping in Fortran will lead to copying my array to a new array leading to more RAM usage.
Is there any way to speed up the performance in Fortran so as to get close to the performance of Numpy?
You may not be timing what you think are timing. Here's a complete fortran example
program test
use iso_fortran_env, r8 => real64
implicit none
integer, parameter :: ni = 130, nj = 130, nk = 130, nvar = 130
real(r8), allocatable :: u1(:,:,:,:), u2(:,:,:,:), w(:,:,:)
real(r8) :: sum, t0, t1
integer :: i,j,k,n
call cpu_time(t0)
allocate(u1(ni,nj,nk,nvar))
allocate(u2(ni,nj,nk,nvar))
allocate(w(ni,nj,nk))
call cpu_time(t1)
write(*,'("allocation time(s):",es15.5)') t1-t0
call cpu_time(t0)
call random_seed()
call random_number(u1)
call random_number(u2)
call random_number(w)
call cpu_time(t1)
write(*,'("random init time (s):",es15.5)') t1-t0
sum = 0.0_r8
call cpu_time(t0)
do n = 1, nvar
do k = 1, nk
do j = 1, nj
do i = 1, ni
sum = sum + u1(i,j,k,n)*u2(i,j,k,n)*w(i,j,k)
end do
end do
end do
end do
call cpu_time(t1)
write(*,'("Sum:",es15.5," time(s):",es15.5)') sum, t1-t0
end program
And the output:
$ gfortran -O2 -o inner_product inner_product.f90
$ time ./inner_product
allocation time(s): 3.00000E-05
random init time (s): 5.73293E+00
Sum: 3.57050E+07 time(s): 5.69066E-01
real 0m6.465s
user 0m4.634s
sys 0m1.798s
Computing the inner product is less that 10% of the runtime in this fortran code. How/What you are timing is very important. Are you sure you are timing the same things in the fortran and python versions? Are you sure you are only timing the inner_product calculation?
This avoids making any copy. (note the blas ddot approach still needs to make a copy for the element-wise product)
subroutine dot3(n,a,b,c,result)
implicit none
real(kind=..) a(*),b(*),c(*),result
integer i,n
result=0
do i=1,n
result=result+a(i)*b(i)*c(i)
enddo
end
dot3 is external, meaning not in a module/contains construct. kind should obviously match main declaration.
in main code:
innerprod=0
do iVar = 1, nVar
call dot3(ni*nj*nk, U1(1,1,1,iVar),U2(1,1,1,iVar),intW,result)
innerProd=innerProd+result
enddo
I had the same observation comparing Numpy and Fortran code.
The difference turns out to be the version of BLAS, I found using DGEMM from netlib is similar to looping and about three times slower than OpenBLAS (see profiles in this answer).
The most surprising thing for me was that OpenBLAS provides code which is so much faster than just compiling a Fortran triple nested loop. It seems this is the whole point of GotoBLAS, which was handwritten in assembly code for the processor architecture.
Even timing the right thing, ordering loops correctly, avoiding copies and using every optimising flag (in gfortran), the performance is still about three times slower than OpenBLAS. I've not tried ifort or pgi, but I wonder if this explains the upvoted comment by #kvantour "loop finishes in 0.6s for me" (note intrinsic matmul is replaced by BLAS in some implementations).
I have a solution to a discretized differential equation given by
f(i)
where i is a spatial index. How can I find the difference between the solution at each adjacent time step? To be more clear:
The solution is defined by an array
real,dimension(0:10) :: f
I discretize the differential equation and solve it by stepping forward in time. If the time index is k, a portion of my code is
do k=1,25
do i = 1,10
f(i) = f(i+1)+f(i-1)+f(i)
end do
end do
I can print the solution, f(i) at each time step k by the following code
print*, "print f(i) for k=, k
print "(//(5(5x,e22.14)))", f
How can I find the difference between the solution at each adjacent time step? That is, time steps k+1,k. I will store this value in a new array g, which has a dimension given by
real,dimension(0:10) :: g
So I am trying to find
!g(i)=abs(f(i;k+1)-f(i;k))...Not correct code.
How can I do this? What is the way to implement this code? I am not sure how to do this using if /then statements or whatever code would need be needed to do this. Thanks
Typically, in explicit time integration methods or iterative methods, you have to save the last time-step last solution, the current time-step solution and possibly even some more.
So you have
real,dimension(0:10) :: f0, f
where f0 is the previous value
You iterate your Jacobi or Gauss-Seidel discretization:
f = f0
do k=1,25
do i = 1,9
f(i) = f(i+1)+f(i-1)+f(i)
end do
max_diff = maxval(abs(f-f0))
if (diff small enough) exit
f0 = f
end do
If you have a time-evolving problem like a heat equation:
f = f0
do k=1,25
do i = 1,9
f(i) = f0(i) + dt * viscosity * (f0(i+1)+f0(i-1)+f0(i))
end do
max_diff = maxval(abs(f-f0))
f0 = f
end do
You have a spatial mesh at each point time. Transient problems require that you calculate the value at the end of a time step based on the values at the start:
f(i, j+1) = f(i, j) + f(dot)(i, j)*dt // Euler integration where f(dot) = df/dt derivative
i is the spatial index; j is the temporal one.
real*8, allocatable :: psi(:,:), h(:,:)
integer :: n
real*8 :: t, dt
complex*16 :: ci
write(*,*) ' number of grid points '
read(*,*) n
write(*,*) ' total time '
read(*,*) t
write(*,*) ' time step '
read(*,*) dt
ci = (0d0, 1d0)
allocate(psi(n,n), h(n,n))
do j = 0, t, dt
psi(n,j) = psi(n,j) - ci * dt * h *psi(n,j)
end do
I am basically trying to propagation wave function in a bad way.. but this project told me to propagate like this. So n is a constant, h is a n by n array.
Why it keeps telling me that shape of array on left and right sides do not conform in the do loop? How can I improve it to make left and right equal?
Gfortran 5.2 emits a little more information in its error:
psi(n,j) = psi(n,j) - ci * dt * h *psi(n,j)
1
Error: Incompatible ranks 0 and 2 in assignment at (1)
The LHS of your array is rank 0, as as psi(n,j) is a scalar. On the RHS both instances of psi(n,j) are rank 0 scalars, but h is a rank 2 array. You have:
scalar = scalar - scalar * scalar * rank 2 array * scalar
scalar = scalar - rank 2 array
scalar = rank 2 array
This is because scalars are promoted to arrays of the proper dimensions for the above operations so they act on all elements of h. To make the ranks equal on both sides you'll need to select a single element of h or perform some other operation on it that reduces it to a scalar.
It is worth noting it isn't clear if your loop is correct. Your value of 'n' is fixed at value of the array bounds the user input so your loop is only iterating one row of the array and depending on the choices of d and dt you will exceed array bounds if t > n and none of the row elements between the interval dt will be changed. Seeing how few elements will be touched by your calculation gives me the idea that it isn't right. Your initial value of j is also out of bounds for your arrays because fortran array indices begin at 1 unless the bounds are specifically stated.