I am doing some long-term simulations in which I'm trying to achieve the highest possible accuracy in the solution of systems of ODEs. I am trying to find out how much time quadruple (128-bit) precision calculations take versus double (64-bit) precision. I googled around a bit and saw several opinions about it: some say it will take 4 times longer, others 60-70 times... So I decided to take matters in my own hands and I wrote a simple Fortran benchmark program:
program QUAD_TEST
implicit none
integer,parameter :: dp = selected_int_kind(15)
integer,parameter :: qp = selected_int_kind(33)
integer :: cstart_dp,cend_dp,cstart_qp,cend_qp,crate
real :: time_dp,time_qp
real(dp) :: sum_dp,sqrt_dp,pi_dp,mone_dp,zero_dp
real(qp) :: sum_qp,sqrt_qp,pi_qp,mone_qp,zero_qp
integer :: i
! ==============================================================================
! == TEST 1. ELEMENTARY OPERATIONS ==
sum_dp = 1._dp
sum_qp = 1._qp
call SYSTEM_CLOCK(count_rate=crate)
write(*,*) 'Testing elementary operations...'
call SYSTEM_CLOCK(count=cstart_dp)
do i=1,50000000
sum_dp = sum_dp - 1._dp
sum_dp = sum_dp + 1._dp
sum_dp = sum_dp*2._dp
sum_dp = sum_dp/2._dp
end do
call SYSTEM_CLOCK(count=cend_dp)
time_dp = real(cend_dp - cstart_dp)/real(crate)
write(*,*) 'DP sum: ',sum_dp
write(*,*) 'DP time: ',time_dp,' seconds'
call SYSTEM_CLOCK(count=cstart_qp)
do i=1,50000000
sum_qp = sum_qp - 1._qp
sum_qp = sum_qp + 1._qp
sum_qp = sum_qp*2._qp
sum_qp = sum_qp/2._qp
end do
call SYSTEM_CLOCK(count=cend_qp)
time_qp = real(cend_qp - cstart_qp)/real(crate)
write(*,*) 'QP sum: ',sum_qp
write(*,*) 'QP time: ',time_qp,' seconds'
write(*,*)
write(*,*) 'DP is ',time_qp/time_dp,' times faster.'
write(*,*)
! == TEST 2. SQUARE ROOT ==
sqrt_dp = 2._dp
sqrt_qp = 2._qp
write(*,*) 'Testing square root ...'
call SYSTEM_CLOCK(count=cstart_dp)
do i = 1,10000000
sqrt_dp = sqrt(sqrt_dp)
sqrt_dp = 2._dp
end do
call SYSTEM_CLOCK(count=cend_dp)
time_dp = real(cend_dp - cstart_dp)/real(crate)
write(*,*) 'DP sqrt: ',sqrt_dp
write(*,*) 'DP time: ',time_dp,' seconds'
call SYSTEM_CLOCK(count=cstart_qp)
do i = 1,10000000
sqrt_qp = sqrt(sqrt_qp)
sqrt_qp = 2._qp
end do
call SYSTEM_CLOCK(count=cend_qp)
time_qp = real(cend_qp - cstart_qp)/real(crate)
write(*,*) 'QP sqrt: ',sqrt_qp
write(*,*) 'QP time: ',time_qp,' seconds'
write(*,*)
write(*,*) 'DP is ',time_qp/time_dp,' times faster.'
write(*,*)
! == TEST 3. TRIGONOMETRIC FUNCTIONS ==
pi_dp = acos(-1._dp); mone_dp = 1._dp; zero_dp = 0._dp
pi_qp = acos(-1._qp); mone_qp = 1._qp; zero_qp = 0._qp
write(*,*) 'Testing trigonometric functions ...'
call SYSTEM_CLOCK(count=cstart_dp)
do i = 1,10000000
mone_dp = cos(pi_dp)
zero_dp = sin(pi_dp)
end do
call SYSTEM_CLOCK(count=cend_dp)
time_dp = real(cend_dp - cstart_dp)/real(crate)
write(*,*) 'DP cos: ',mone_dp
write(*,*) 'DP sin: ',zero_dp
write(*,*) 'DP time: ',time_dp,' seconds'
call SYSTEM_CLOCK(count=cstart_qp)
do i = 1,10000000
mone_qp = cos(pi_qp)
zero_qp = sin(pi_qp)
end do
call SYSTEM_CLOCK(count=cend_qp)
time_qp = real(cend_qp - cstart_qp)/real(crate)
write(*,*) 'QP cos: ',mone_qp
write(*,*) 'QP sin: ',zero_qp
write(*,*) 'QP time: ',time_qp,' seconds'
write(*,*)
write(*,*) 'DP is ',time_qp/time_dp,' times faster.'
write(*,*)
end program QUAD_TEST
Results of a typical run, after compiling with gfortran 4.8.4, without any optimization flag:
Testing elementary operations...
DP sum: 1.0000000000000000
DP time: 0.572000027 seconds
QP sum: 1.00000000000000000000000000000000000
QP time: 4.32299995 seconds
DP is 7.55769205 times faster.
Testing square root ...
DP sqrt: 2.0000000000000000
DP time: 5.20000011E-02 seconds
QP sqrt: 2.00000000000000000000000000000000000
QP time: 2.60700011 seconds
DP is 50.1346169 times faster.
Testing trigonometric functions ...
DP cos: -1.0000000000000000
DP sin: 1.2246467991473532E-016
DP time: 2.79600000 seconds
QP cos: -1.00000000000000000000000000000000000
QP sin: 8.67181013012378102479704402604335225E-0035
QP time: 5.90199995 seconds
DP is 2.11087275 times faster.
Something must be going on here. My guess is that sqrt is computed with gfortran through an optimized algorithm, which probably hasn't been implemented for quadruple precision computations. This might not be the case for sin and cos, but why are elementary operations 7.6 times slower in quadruple precision while for trigonometric functions things slow down only by a factor of 2? If the algorithm used for the trigonometric functions would be the same for quad and double precision, I would expect to see a sevenfold increase in CPU time for them too.
What is the average slowdown in scientific calculations when 128-bit precision is used, compared to 64-bit?
I am running this on an Intel i7-4771 # 3.50GHz.
More an extended comment than an answer, but...
Current CPUs provide a lot of hardware acceleration for double precision floating point arithmetic. Some even provide facilities for extended precision.
Beyond that, you are limited to software implementations which are (as you noticed) considerably slower.
However, the exact factor of this slow-down is almost impossible to predict in generic cases.
It depends on your CPU (e.g. which acceleration it has built-in), and on the software stack.
For double-precision you typically use different math-libraries than for quadruple precision, and these might use different algorithms for basic operations.
For a particular operation/algorithm on a given hardware using the same algorithm you can probably derive a number, but this for sure will not be universally true.
It's interesting to note that if you change:
sqrt_qp = sqrt(sqrt_qp)
sqrt_qp = 2._qp
to
sqrt_qp = sqrt(2._qp)
the computation will be faster !
Related
I'm doing a program in Fortran90 which do a sum from i=1 to i=n where nis given. The sum is sum_{i=1}^{i=n}1/(i*(i+1)*(i+2)). This sum converges to 0.25. This is the code:
PROGRAM main
INTEGER n(4)
DOUBLE PRECISION s(4)
INTEGER i
OPEN(11,FILE='input')
OPEN(12,FILE='output')
DO i=1,4
READ(11,*) n(i)
END DO
PRINT*,n
CALL suma(n,s)
PRINT*, s
END
SUBROUTINE suma(n,s)
INTEGER n(4),j,k
DOUBLE PRECISION s(4),add
s=0
DO k=1,4
DO j=1,n(k)
add=1./(j*(j+1)*(j+2))
s(k)=s(k)+add
END DO
END DO
END SUBROUTINE
input
178
1586
18232
142705
The output file is now empty, I need to code it. I'm just printing the results, which are:
0.249984481688 0.249999400246 0.248687836759 0.247565846142
The problem comes with the variable add. When j is bigger, add turns negative, and the sum doesn't converge well. How can I fix it?
The problem is an integer overflow. 142705142706142707 is a number that is too large for a 4-byte integer.
What happens then is that the number overflows and loops back to negative numbers.
As #albert said in his comment, one solution is to convert it to double precision every step of the way: ((1.d0/j) / (j+1)) / (j+2). That way, it is calculating with floating point values.
Another option would be to use 8-byte integers:
integer, parameter :: int64 = selected_int_kind(17)
integer(kind=int64) :: j
You should be very careful with your calculations, though. Finer is not always better. I recommend that you look at how floating point arithmetic is performed by a computer, and what issues this can create. See for example here on wikipedia.
This is likely a better way to achieve what you want. I did remove the IO. The output from the program is
% gfortran -o z a.f90 && ./z
178 0.249984481688392
1586 0.249999801599584
18232 0.249999998496064
142705 0.249999999975453
program main
implicit none ! Never write a program without this statement
integer, parameter :: knd = kind(1.d0) ! double precision kind
integer n(4)
real(knd) s(4)
integer i
n = [178, 1586, 18232, 142705]
call suma(n, s)
do i = 1, 4
print '(I6,F18.15)', n(i), s(i)
end do
contains
!
! Recursively, sum a(j+1) = j * a(j) / (j + 1)
!
subroutine suma(n, s)
integer, intent(in) :: n(4)
real(knd), intent(out) :: s(4)
real(knd) aj
integer j, k
s = 0
do k = 1, 4
aj = 1 / real(1 * 2 * 3, knd) ! a(1)
do j = 1, n(k)
s(k) = s(k) + aj
aj = j * aj / (j + 3)
end do
end do
end subroutine
end program main
A small example serial code, which has the same structure as my code, is shown below.
PROGRAM MAIN
IMPLICIT NONE
INTEGER :: i, j
DOUBLE PRECISION :: en,ei,es
DOUBLE PRECISION :: ki(1000,2000), et(200),kn(2000)
OPEN(UNIT=3, FILE='output.dat', STATUS='UNKNOWN')
DO i = 1, 1000, 1
DO j = 1, 2000, 1
ki(i,j) = DBLE(i) + DBLE(j)
END DO
END DO
DO i = 1, 200, 1
en = 2.0d0/DBLE(200)*(i-1)-1.0d0
et(i) = en
es = 0.0d0
DO j = 1, 1000, 1
kn=ki(j,:)
CALL CAL(en,kn,ei)
es = es + ei
END DO
WRITE (UNIT=3, FMT=*) et(i), es
END DO
CLOSE(UNIT=3)
STOP
END PROGRAM MAIN
SUBROUTINE CAL (en,kn,ei)
IMPLICIT NONE
INTEGER :: i
DOUBLE PRECISION :: en, ei, gf,p
DOUBLE PRECISION :: kn(2000)
p = 3.14d0
ei = 0.0d0
DO i = 1, 2000, 1
gf = 1.0d0 / (en - kn(i) * p)
ei = ei + gf
END DO
RETURN
END SUBROUTINE CAL
I am running my code on the cluster, which has 32 CPUs on one node, and there are totally 250 GB memory shared by 32 CPUs on one node. I can use 32 nodes maximumly.
Every time when the inner Loop is done, there is one data to be collected. After all outer Loops are done, there are totally 200 data to be collected. If only the inner Loop is executed by one CPU, it would take more than 3 days (more than 72 hours).
I want to do the parallelization for both inner Loop and outer Loop respectively? Would anyone please suggest how to parallelize this code?
Can I use MPI technique for both inner Loop and outer Loop respectively? If so, how to differentiate different CPUs that execute different Loops (inner Loop and outer Loop)?
On the other hand, I saw someone mention the parallelization with hybrid MPI and OpenMP method. Can I use MPI technique for the outer Loop and OpenMP technique for the inner Loop? If so, how to collect one data to the CPU after every inner Loop is done each time and collect 200 data in total to CPU after all outer Loops are done. How to differentiate different CPUs that execute inner Loop and outer Loop respectively?
Alternatively, would anyone provide any other suggestion on parallelizing the code and enhance the efficiency? Thank you very much in advance.
As mentioned in the comments, a good answer will require more detailed question. However, at a first sight it seems that parallelizing the internal loop
DO j = 1, 1000, 1
kn=ki(j,:)
CALL CAL(en,kn,ei)
es = es + ei
END DO
should be enough to solve your problem, or at least it will be a good starter. First of all I guess that there is an error on the loop
DO i = 1, 1000, 1
DO j = 1, 2000, 1
ki(j,k) = DBLE(j) + DBLE(k)
END DO
END Do
since the k is set to 0 and and there is no cell with address corresponding to 0 (see your variable declaration). Also ki is declared ki(1000,2000) array while ki(j,i) is (2000,1000) array. Beside these error, I guess that ki should be calculated as
ki(i,j) = DBLE(j) + DBLE(i)
if true, I suggest you the following solution
PROGRAM MAIN
IMPLICIT NONE
INTEGER :: i, j, k,icr,icr0,icr1
DOUBLE PRECISION :: en,ei,es,timerRate
DOUBLE PRECISION :: ki(1000,2000), et(200),kn(2000)
INTEGER,PARAMETER:: nthreads=1
call system_clock(count_rate=icr)
timerRate=real(icr)
call system_clock(icr0)
call omp_set_num_threads(nthreads)
OPEN(UNIT=3, FILE='output.dat', STATUS='UNKNOWN')
DO i = 1, 1000, 1
DO j = 1, 2000, 1
ki(i,j) = DBLE(j) + DBLE(i)
END DO
END DO
DO i = 1, 200, 1
en = 2.0d0/DBLE(200)*(i-1)-1.0d0
et(i) = en
es = 0.0d0
!$OMP PARALLEL DO private(j,kn,ei) firstpribate(en) shared(ki) reduction(+:es)
DO j = 1, 1000, 1
kn=ki(j,:)
CALL CAL(en,kn,ei)
es = es + ei
END DO
!$OMP END PARALLEL DO
WRITE (UNIT=3, FMT=*) et(i), es
END DO
CLOSE(UNIT=3)
call system_clock(icr1)
write (*,*) (icr1-icr0)/timerRate ! return computing time
STOP
END PROGRAM MAIN
SUBROUTINE CAL (en,kn,ei)
IMPLICIT NONE
INTEGER :: i
DOUBLE PRECISION :: en, ei, gf,p
DOUBLE PRECISION :: kn(2000)
p = 3.14d0
ei = 0.0d0
DO i = 1, 2000, 1
gf = 1.0d0 / (en - kn(i) * p)
ei = ei + gf
END DO
RETURN
END SUBROUTINE CAL
I add some variables to check the computing time ;-).
This solution is computed in 5.14 s, for nthreads=1, and in 2.75 s, for nthreads=2. It does not divide the computing time by 2, but it seems to be a good deal for a first shot. Unfortunately, on this machine I have a core i3 proc. So I can't do better than nthreads=2. However, I wonder, how the code will behave with nthreads=16 ???
Please let me know
I hope that this helps you.
Finally, I warn about the choice of variables status (private, firstprivate and shared) that might be consider carefully in the real code.
I am trying to calculate something similar to a weighted matrix inner product in Fortran. The current script that I am using for calculating the inner product is as follows
! --> In
real(kind=8), intent(in), dimension(ni, nj, nk, nVar) :: U1, U2
real(kind=8), intent(in), dimension(ni, nj, nk) :: intW
! --> Out
real(kind=8), intent(out) :: innerProd
! --> Local
integer :: ni, nj, nk, nVar, iVar
! --> Computing inner product
do iVar = 1, nVar
innerProd = innerProd + sum(U1(:,:,:,iVar)*U2(:,:,:,iVar)*intW)
enddo
But I found that the above script that I am currently using is not very efficient. The same operation can be performed in Python using NumPy as follows,
import numpy as np
import os
# --> Preventing numpy from multi-threading
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'
innerProd = 0
# --> Toy matrices
U1 = np.random.random((ni,nj,nk,nVar))
U2 = np.random.random((ni,nj,nk,nVar))
intW = np.random.random((ni,nj,nk))
# --> Reshaping
U1 = np.reshape(np.ravel(U1), (ni*nj*nk, nVar))
U2 = np.reshape(np.ravel(U1), (ni*nj*nk, nVar))
intW = np.reshape(np.ravel(intW), (ni*nj*nk))
# --> Calculating inner product
for iVar in range(nVar):
innerProd = innerProd + np.dot(U1[:, iVar], U2[:, iVar]*intW)
The second method using Numpy seems to be far more faster than the method using Fortran. For a specific case of ni = nj = nk = nVar = 130, the time taken by the two methods are as follows
fortran_time = 25.8641 s
numpy_time = 6.8924 s
I tried improving my Fortran code with ddot from BLAS as follows,
do iVar = 1, nVar
do k = 1, nk
do j = 1, nj
innerProd = innerProd + ddot(ni, U1(:,j,k,iVar), 1, U2(:,j,k,iVar)*intW(:,j,k), 1)
enddo
enddo
enddo
But there was no considerable improvement in time. The time taken by the above method for the case of ni = nj = nk = nVar = 130 is ~24s. (I forgot to mention that I compiled the Fortran code with '-O2' option for optimizing the performance).
Unfortunately, there is no BLAS function for element-wise matrix multiplication in Fortran. And I don't want to use reshape in Fortran because unlike python reshaping in Fortran will lead to copying my array to a new array leading to more RAM usage.
Is there any way to speed up the performance in Fortran so as to get close to the performance of Numpy?
You may not be timing what you think are timing. Here's a complete fortran example
program test
use iso_fortran_env, r8 => real64
implicit none
integer, parameter :: ni = 130, nj = 130, nk = 130, nvar = 130
real(r8), allocatable :: u1(:,:,:,:), u2(:,:,:,:), w(:,:,:)
real(r8) :: sum, t0, t1
integer :: i,j,k,n
call cpu_time(t0)
allocate(u1(ni,nj,nk,nvar))
allocate(u2(ni,nj,nk,nvar))
allocate(w(ni,nj,nk))
call cpu_time(t1)
write(*,'("allocation time(s):",es15.5)') t1-t0
call cpu_time(t0)
call random_seed()
call random_number(u1)
call random_number(u2)
call random_number(w)
call cpu_time(t1)
write(*,'("random init time (s):",es15.5)') t1-t0
sum = 0.0_r8
call cpu_time(t0)
do n = 1, nvar
do k = 1, nk
do j = 1, nj
do i = 1, ni
sum = sum + u1(i,j,k,n)*u2(i,j,k,n)*w(i,j,k)
end do
end do
end do
end do
call cpu_time(t1)
write(*,'("Sum:",es15.5," time(s):",es15.5)') sum, t1-t0
end program
And the output:
$ gfortran -O2 -o inner_product inner_product.f90
$ time ./inner_product
allocation time(s): 3.00000E-05
random init time (s): 5.73293E+00
Sum: 3.57050E+07 time(s): 5.69066E-01
real 0m6.465s
user 0m4.634s
sys 0m1.798s
Computing the inner product is less that 10% of the runtime in this fortran code. How/What you are timing is very important. Are you sure you are timing the same things in the fortran and python versions? Are you sure you are only timing the inner_product calculation?
This avoids making any copy. (note the blas ddot approach still needs to make a copy for the element-wise product)
subroutine dot3(n,a,b,c,result)
implicit none
real(kind=..) a(*),b(*),c(*),result
integer i,n
result=0
do i=1,n
result=result+a(i)*b(i)*c(i)
enddo
end
dot3 is external, meaning not in a module/contains construct. kind should obviously match main declaration.
in main code:
innerprod=0
do iVar = 1, nVar
call dot3(ni*nj*nk, U1(1,1,1,iVar),U2(1,1,1,iVar),intW,result)
innerProd=innerProd+result
enddo
I had the same observation comparing Numpy and Fortran code.
The difference turns out to be the version of BLAS, I found using DGEMM from netlib is similar to looping and about three times slower than OpenBLAS (see profiles in this answer).
The most surprising thing for me was that OpenBLAS provides code which is so much faster than just compiling a Fortran triple nested loop. It seems this is the whole point of GotoBLAS, which was handwritten in assembly code for the processor architecture.
Even timing the right thing, ordering loops correctly, avoiding copies and using every optimising flag (in gfortran), the performance is still about three times slower than OpenBLAS. I've not tried ifort or pgi, but I wonder if this explains the upvoted comment by #kvantour "loop finishes in 0.6s for me" (note intrinsic matmul is replaced by BLAS in some implementations).
I need make a dot product in Fortran. I can do with the intrinsic function dot_product from Fortran or use ddot from OpenBLAS. The problem is the ddot is slower. This is my code:
With BLAS:
program VectorBLAS
! time VectorBlas.e = 0.30s
implicit none
double precision, dimension(3) :: b
double precision :: result
double precision, external :: ddot
integer, parameter :: LargeInt_K = selected_int_kind (18)
integer (kind=LargeInt_K) :: I
DO I = 1, 10000000
b(:) = 3
result = ddot(3, b, 1, b, 1)
END DO
end program VectorBLAS
With dot_product
program VectorModule
! time VectorModule.e = 0.19s
implicit none
double precision, dimension (3) :: b
double precision :: result
integer, parameter :: LargeInt_K = selected_int_kind (18)
integer (kind=LargeInt_K) :: I
DO I = 1, 10000000
b(:) = 3
result = dot_product(b, b)
END DO
end program VectorModule
The two codes are compiled using:
gfortran file_name.f90 -lblas -o file_name.e
What am I doing wrong? BLAS not have to be faster?
While BLAS, and especially the optimized versions, are generally faster for larger arrays, the built-in functions are faster for smaller sizes.
This is especially visible from the linked source code of ddot, where additional work is spent on further functionality (e.g., different increments). For small array lengths, the work done here outweighs the performance gain of the optimizations.
If you make your vectors (much) larger, the optimized version should be faster.
Here is an example to illustrate this:
program test
use, intrinsic :: ISO_Fortran_env, only: REAL64
implicit none
integer :: t1, t2, rate, ttot1, ttot2, i
real(REAL64), allocatable :: a(:),b(:),c(:)
real(REAL64), external :: ddot
allocate( a(100000), b(100000), c(100000) )
call system_clock(count_rate=rate)
ttot1 = 0 ; ttot2 = 0
do i=1,1000
call random_number(a)
call random_number(b)
call system_clock(t1)
c = dot_product(a,b)
call system_clock(t2)
ttot1 = ttot1 + t2 - t1
call system_clock(t1)
c = ddot(100000,a,1,b,1)
call system_clock(t2)
ttot2 = ttot2 + t2 - t1
enddo
print *,'dot_product: ', real(ttot1)/real(rate)
print *,'BLAS, ddot: ', real(ttot2)/real(rate)
end program
The BLAS routines are quite a bit faster here:
OMP_NUM_THREADS=1 ./a.out
dot_product: 0.145999998
BLAS, ddot: 0.100000001
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I've been working for many days trying to find out what's wrong with this code. It's used for modelling water flow through non saturated soil. The equations system is in the form of a tridiagonal matrix, which is solved with the Thomas Algorithm. I have the solution, and the code is not representing it. For example, node A should be a curve that goes from the initial condition of aprox -100 cm to aprox -20 cm. It's a long code, but I'd be tremendously thankful if someone helped me in this one.
program EcuacionRichards
implicit none
!Declaring variables
integer, parameter :: nodos = 100
integer :: i, it, max_it, nodo_a, nodo_b, nodo_c, nodo_d, it_bajo, it_alto
double precision, dimension(1:nodos) :: H, H_ant, C, K, theta, theta_ant, aa, bb, cc, dd, rr, th_ant
double precision :: dz, zbot, tfin, dt, rz, Ksup, Kinf, t, th_lisimetro, h_lisimetro
double precision :: q_ent, tol_h, tol_th, cambio_h, cambio_th
double precision :: mult_alto, mult_bajo, maxdt, mindt, qlibre
logical lisimetro
!Hydraulic Parameters
double precision :: theta_sat=0.43 !cm/cm
double precision :: theta_res=0.078 !cm/cm
double precision :: alpha=0.0325 !1/cm
double precision :: n=1.346
double precision :: m
double precision :: K_sat=86.4 !cm/d
!Grid and iteration parameters
lisimetro=.true.
dt=0.01 !days
zbot=160 !depth of the column in cm
dz=zbot/nodos !cm
tfin=30 !days
max_it=500 !max number of Picard iterations
tol_h=0.1 !tolerance for H iteration, cm
tol_th=0.001 !tolerance for theta iteration, 1/1
it_bajo=3 !minimum recommended number of iterations
it_alto=7 !maximum recommended number of iterations
mult_bajo=1.3 !time multiplicator for low iterations
mult_alto=0.7 !time multiplicator for low iterations
maxdt=0.5 !max value for dt
mindt=0.001 !min value for dt
m=1-1/n
!Initializing other variables
th_lisimetro=0.32
h_lisimetro=HfTH(th_lisimetro)
nodo_a=nodos
nodo_b=2*nodos/3
nodo_c=nodos/3
nodo_d=1
!*********Initial Conditions************************************************************
call theta_ini(theta,nodos) !Fill array with initial moisture values
do i=1,nodos
H(i)=HfTH(theta(i))
call actualiza(H(i), theta(i), C(i), K(i))
end do
!************* OPEN WRITING FILES ************************************************
open(unit=1,file='succion2.txt')
open(unit=2,file='humedad2.txt')
open(unit=3,file='conducti2.txt')
open(unit=4,file='parametr2.txt')
write(4,'("dt(días) =",f7.4)') dt
write(4,'("dz(cm) =",f7.4)') dz
write(4,'("nodos =",i5)') nodos
write(4,'("altura(cm) =",f8.3)') zbot
write(4,'("tfin(días) =",f7.2)') tfin
write(4,'("theta_sat =",f7.4)') theta_sat
write(4,'("theta_res =",f7.4)') theta_res
write(4,'("K_saturada =",g11.3)') K_sat
write(4,'("n =",f7.4)') n
write(4,'("m =",f7.5)') m
write(4,'("alpha =",f7.5)') alpha
write(4,'("max_it =",i4)') max_it
close(4)
write(1,*) "T(días) H_a(cm) H_b(cm) H_c(cm) H_d(cm)"
write(2,*) "T(días) th_a(cm) th_b(cm) th_c(cm) th_d(cm)"
write(3,*) "T(días) K_a(cm/d) K_b(cm/d) K_c(cm/d) K_d(cm/d)"
!*************TIME LOOP**********************************************************************************************
t=0.d0
do while ((t.le.tfin).and.(dt.gt.0))
rz=dz/dt
t=t+dt
theta_ant=theta !Previous time
!Water flow that enters at the top (constant)
q_ent=0.1 !cm/dia
!************* PICARD LOOP ******************************************
Picard:do it=1,max_it
if(it.eq.max_it) pause "MAXIMUM ITERATIONS REACHED"
!Interior Nodes
do i=2, nodos-1
Ksup=2*(K(i+1)*K(i))/(K(i+1)+K(i))
Kinf=2*(K(i-1)*K(i))/(K(i-1)+K(i))
aa(i)=-Kinf/dz !K(i-1/2)
cc(i)=-Ksup/dz !K(i+1/2)
bb(i)=rz*C(i)-aa(i)-cc(i)
rr(i)=rz*C(i)*h(i)-rz*(theta(i)-theta_ant(i))+Ksup-Kinf
end do
!Inferior Node
if (lisimetro) then
!Changing inferior node
if (theta(1).lt.th_lisimetro) then
!Water flow 0, Neumann
Ksup=2*(K(1)*K(2))/(K(1)+K(2))
aa(1)=0
cc(1)=-Ksup/dz
bb(1)=-cc(1)
rr(1)=Ksup
else
!H(1)=0 condition, Dirichlet
Ksup=2*(K(1)*K(2))/(K(1)+K(2))
aa(1)=0
bb(1)=1
cc(1)=0
rr(1)=h_lisimetro
aa(2)=0
rr(2)=rr(2)+Ksup/dz*(h_lisimetro)
end if
else
!Inferior node, free drainage, Neumann
Ksup=2*(K(1)*K(2))/(K(1)+K(2))
qlibre=-K(1)
aa(1)=0
cc(1)=-Ksup/dz
bb(1)=-cc(1)
rr(1)=Ksup+qlibre
end if
!Superior node, known water flow
Kinf=2*(K(nodos)*K(nodos-1))/(K(nodos)+K(nodos-1))
aa(nodos)=-Kinf/dz
cc(nodos)=0
bb(nodos)=0.5*rz*C(nodos)-aa(nodos)
rr(nodos)=0.5*rz*C(nodos)*h(nodos)-0.5*rz*(theta(nodos)-theta_ant(nodos))-Kinf-q_ent
call tridiag(aa,bb,cc,rr,dd,nodos)
!Suction modification and H functions actualization
h_ant=h
th_ant=theta !Save iteration
h=dd !Advance to next iteration
do i=1,nodos
call actualiza(H(i),theta(i), C(i), K(i))
end do
!End of iterations condition
cambio_h=maxval(dabs(h-h_ant))
cambio_th=maxval(dabs(theta-th_ant))
if((cambio_h.lt.tol_h).and.(cambio_th.lt.tol_th)) then
if(.true.) then !(t.eq.tprint)
write (1,'(f8.3,f9.3,f9.3,f9.3,f9.3)') t,H(nodo_a),H(nodo_b),H(nodo_c),H(nodo_d)
write (2,'(f8.3,f7.4,f7.4,f7.4,f7.4)') t,theta(nodo_a),theta(nodo_b),theta(nodo_c),theta(nodo_d)
write (3,'(f8.3,g11.4,g11.4,g11.4,g11.4)') t,k(nodo_a),k(nodo_b),k(nodo_c),k(nodo_d)
end if
if (it.lt.it_bajo) dt=min(dt*mult_bajo,maxdt)
if (it.gt.it_alto) dt=max(dt*mult_alto,mindt)
exit Picard
else
cycle Picard
end if
end do Picard !Picard loop end
if ((tfin-t).le.1E-4) t=huge(1.d0)
end do
!Time Loop End***************************************************************
!******** Close files
close(1)
close(2)
close(3)
!********END OF PROGRAM**********************************************************
!******************************************************************************
!Subroutines and functions
contains
!Initial moistures assignment
subroutine theta_ini(theta,nodos)
integer :: nodos
double precision, dimension(1:nodos) :: theta
integer i
do i=1, nodos
theta(i)=0.30
end do
end subroutine theta_ini
!Subroutine that actualizes salues according to pressure
subroutine actualiza(p,theta,c,k)
double precision p, theta, c, k
double precision se, te
if(p.lt.0) then
te=1+(-alpha*p)**n
se=te**(-m)
theta=theta_res+(theta_sat-theta_res)*se
K=K_sat*se**(0.5)*(1-(1-se**(1/m))**m)**2
c=((alpha**n)*(theta_sat-theta_res)*n*m*(-p)**(n-1))/(te**(m+1)) !d(theta)/dh
else
theta=theta_sat
K=K_sat
c=0
end if
return
end subroutine actualiza
!Tridiag(alpha,beta, gamma, Resto, delta, nodos)
subroutine tridiag(a,b,c,d,x,n)
implicit none
! a - sub-diagonal (means it is the diagonal below the main diagonal)
! b - the main diagonal
! c - sup-diagonal (means it is the diagonal above the main diagonal)
! d - right part
! x - the answer
! n - number of equations
integer,intent(in) :: n
double precision,dimension(n),intent(in) :: a,b,c,d
double precision,dimension(n),intent(out) :: x
double precision,dimension(n) :: cp,dp
double precision :: m
integer i
! initialize c-prime and d-prime
cp(1) = c(1)/b(1)
dp(1) = d(1)/b(1)
! solve for vectors c-prime and d-prime
do i = 2,n
m = b(i)-cp(i-1)*a(i)
cp(i) = c(i)/m
dp(i) = (d(i)-dp(i-1)*a(i))/m
enddo
! initialize x
x(n) = dp(n)
! solve for x from the vectors c-prime and d-prime
do i = n-1, 1, -1
x(i) = dp(i)-cp(i)*x(i+1)
end do
end subroutine tridiag
!Head in terms of moisture
Function HfTH(humedad)
double precision HfTH
double precision humedad
if (humedad.lt.theta_sat) then
HfTH=-1/alpha*(((humedad-theta_res)/(theta_sat-theta_res))**(-1/m)-1)**(1/n) !cm
else
HfTH=0
end if
Return
end function HfTH
end program EcuacionRichards
I can see any number of problems with your code but my attention span is limited so here is just the most egregious
You declare a bunch of variables to be double precision, for example, theta_sat, yet you initialise them with literals of default kind. The statement
double precision :: theta_sat=0.43 !cm/cm
does not make 0.43 a double precision real. Well, to be accurate, it might but on most compilers, and whenever the compilation does not set default real variables to kind double precision, it doesn't. It is almost certain that 0.43 is a 4-byte real while theta_sat is an 8-byte real and you cannot rely on the compiler to set theta_sat to be the 8-byte value closest to 0.43.
In modern Fortran double precision is still available for backward compatibility, but deprecated in favour of specifying the kind of a variable with a kind type. SO is replete with suggestions of how to do this. My favourite is to use the constants defined in the intrinsic module iso_fortran_env, like this:
use, intrinsic :: iso_fortran_env
then declare variables like this:
real(real64) :: theta_sat=0.43_real64 !cm/cm
note the appending of the kind specification _real64 to the value.
Whether your algorithm is sensitive enough that this mistake on your part materially affects the results I don't know.
Finally, you tell us that the program is not correct but you are silent on the way(s) in which it is not correct.