Very small number turns negative in Fortran - fortran

I'm doing a program in Fortran90 which do a sum from i=1 to i=n where nis given. The sum is sum_{i=1}^{i=n}1/(i*(i+1)*(i+2)). This sum converges to 0.25. This is the code:
PROGRAM main
INTEGER n(4)
DOUBLE PRECISION s(4)
INTEGER i
OPEN(11,FILE='input')
OPEN(12,FILE='output')
DO i=1,4
READ(11,*) n(i)
END DO
PRINT*,n
CALL suma(n,s)
PRINT*, s
END
SUBROUTINE suma(n,s)
INTEGER n(4),j,k
DOUBLE PRECISION s(4),add
s=0
DO k=1,4
DO j=1,n(k)
add=1./(j*(j+1)*(j+2))
s(k)=s(k)+add
END DO
END DO
END SUBROUTINE
input
178
1586
18232
142705
The output file is now empty, I need to code it. I'm just printing the results, which are:
0.249984481688 0.249999400246 0.248687836759 0.247565846142
The problem comes with the variable add. When j is bigger, add turns negative, and the sum doesn't converge well. How can I fix it?

The problem is an integer overflow. 142705142706142707 is a number that is too large for a 4-byte integer.
What happens then is that the number overflows and loops back to negative numbers.
As #albert said in his comment, one solution is to convert it to double precision every step of the way: ((1.d0/j) / (j+1)) / (j+2). That way, it is calculating with floating point values.
Another option would be to use 8-byte integers:
integer, parameter :: int64 = selected_int_kind(17)
integer(kind=int64) :: j
You should be very careful with your calculations, though. Finer is not always better. I recommend that you look at how floating point arithmetic is performed by a computer, and what issues this can create. See for example here on wikipedia.

This is likely a better way to achieve what you want. I did remove the IO. The output from the program is
% gfortran -o z a.f90 && ./z
178 0.249984481688392
1586 0.249999801599584
18232 0.249999998496064
142705 0.249999999975453
program main
implicit none ! Never write a program without this statement
integer, parameter :: knd = kind(1.d0) ! double precision kind
integer n(4)
real(knd) s(4)
integer i
n = [178, 1586, 18232, 142705]
call suma(n, s)
do i = 1, 4
print '(I6,F18.15)', n(i), s(i)
end do
contains
!
! Recursively, sum a(j+1) = j * a(j) / (j + 1)
!
subroutine suma(n, s)
integer, intent(in) :: n(4)
real(knd), intent(out) :: s(4)
real(knd) aj
integer j, k
s = 0
do k = 1, 4
aj = 1 / real(1 * 2 * 3, knd) ! a(1)
do j = 1, n(k)
s(k) = s(k) + aj
aj = j * aj / (j + 3)
end do
end do
end subroutine
end program main

Related

Need to find when value is == 0, but I cannot due to numerical errors

I have 2 large lists of vectors (>10,000 vectors each, say vi and wi) and I am trying to find when vi cross-product wi = 0, or, when vi x wi = 0.
The lists of vectors are previously calculated (this is Computational Fluid Dynamics and the calculated vectors represent properties of a fluid. I am doing research in Vortex Identification and this calculation is necessary).
I am trying to find when the cross product == 0 but I only get 3 results out of the thousands where the cross product is satisfied. We are trying to automate a method done by hand so we know for a fact that there are more than 3 vectors.
Our assumption is that since we are using basic numerical methods (of low orders) to calculate the vectors, there is a build up of errors.
TLDR: In essence, this does not work due to numerical errors:
real :: cross1, cross2, cross3
logical :: check1, check2, check3
logical :: is_seed
check1 = cross1 == 0.0
check2 = cross2 == 0.0
check3 = cross3 == 0.0
is_seed = check1 .and. check2 .and. check3
so, we have to do this:
real :: cross1, cross2, cross3
real :: tol
logical :: check1, check2, check3
logical :: is_seed
tol = 1.0e-4 ! NEED TO FIND OUT HOW TO CALCULATE
check1 = cross1 <= (0.0 + tol)
check2 = cross2 <= (0.0 + tol)
check3 = cross3 <= (0.0 + tol)
is_seed = check1 .and. check2 .and. check3
but I want to know how to calculate tol automatically and not hard code it. How can this be done?
Edit 1
As pointed out in the comments, the function below is entirely equivalent to the built-in function spacing(x).
Edit 2
Use the following function ulp(x) to find the value of the least significant bit in the mantissa of an ieee754 number x
32-bit
elemental function ulp32(x) result(d)
real(real32), intent(in) :: x
real(real32) :: d
d = 2.0**(-floor(-log(x)/log(2e0))-24)
end function
64-bit
elemental function ulp64(x) result(d)
real(real64), intent(in) :: x
real(real64) :: d
d = 2d0**(-floor(-log(x)/log(2d0))-53)
end function
interface
interface ulp
procedure :: ulp32, ulp64
end interface
with some results given values between 1 and 1e9
x 32bit 64bit
517.54 0.00006104 0.00000000000011369
1018.45 0.00006104 0.00000000000011369
1972.33 0.00012207 0.00000000000022737
5416.69 0.00048828 0.00000000000090949
11812.67 0.00097656 0.00000000000181899
13190.24 0.00097656 0.00000000000181899
18099.97 0.00195312 0.00000000000363798
28733.47 0.00195312 0.00000000000363798
86965.21 0.00781250 0.00000000001455192
135734.23 0.01562500 0.00000000002910383
203975.41 0.01562500 0.00000000002910383
780835.66 0.06250000 0.00000000011641532
2343924.58 0.25000000 0.00000000046566129
2552437.80 0.25000000 0.00000000046566129
6923904.28 0.50000000 0.00000000093132257
8929837.66 1.00000000 0.00000000186264515
29408286.38 2.00000000 0.00000000372529030
70054595.74 8.00000000 0.00000001490116119
231986024.46 16.00000000 0.00000002980232239
392724963.99 32.00000000 0.00000005960464478
It is recommended to pick a tol value that is a factor of ulp, and this factor should be a power of two. Each power means shifting one bit over to increase the tolerance by a power of two. You can expect each operation that propagates round-off errors to also make the error larger proportionally to 2**n where n is the number of operations.
So depending on the magnitude of the values compared, the tolerance should be approximated by tol = factor * abs(x) * 2**(-24)
For example, comparing two values of x=12418.16752 and y=12418.16774 pick a tolerance with
tol = 8*ulp(15000.0)
check = abs(x-y) <= tol
I get a tol=7.8125000E-03 and the result check=.true.
Edit 0
<Post deleted>
In the first place, you should have knowledge of the error on the vector components, otherwise no test for zero can be conclusive.
Now the absolute error on the cross product is like
(u + δu) x (v + δv) - uv ~ u x δv + δu x v
and in the worst case the vectors can be orthogonal, giving the estimate |u||δu|+|v||δv|=(|u|+|v|)δ. So a value of |u x v| below this bound could correspond to parallel vectors.
I found a solution to my problem.
First, I take the magnitude of the vector. I do this so I only have to work with one value instead of 3. This is possible since ||v|| = 0 if and only if v = 0. I save the magnitude of those vectors in a new array called cross_mag (since the vector is the result of a cross product).
Then I find the lowest value in the array that is not zero. (This is to discount outliers that may be equal to zero)
I found that when the number is written in scientific notation, the exponent of 10 will give me a power x that I can base my tolerance off of. I do this using log10( min_value ).
I then increase the power of the lowest value by 1, which increases the total tolerance directly by a factor of 10.
I use this new value as the exponent of my tol. (This can of course be scaled which I have done by a factor of 1.5).
Or:
real, dimension(:,:,:) :: cross_mag
real :: min_val, ex, tol
integer :: imax, jmax, kmax
! Find new "zero" that is based off of the lowest values.
! This new zero is required due to the buildup of numerical errors.
min_val = rrspacing(1.0)
do k = 1, kmax
do j = 1, jmax
do i = 1, imax
if ((cross_mag(i,j,k) < min_val) .and. (cross_mag(i,j,k) .ne. 0.0)) then
min_val = cross_mag(i,j,k)
end if
end do
end do
end do
ex = log10(abs(min_val))
ex = floor(ex)
tol = 1.5 * 10.0**(ex + 1.0)
write(*,*) 'min_val: ', min_val
write(*,*) 'tol: ', tol
I found this works plenty well for my work and gives me a reasonable amount of vectors to work with. I thank you all for helping my find the rrspacing() function which helped me create an arbitrarily large number.

Why midpoint rule turns out more accurate than Simpson's rule when doing riemann sum approximation on Fortran

everyone.
I am just playing with the calculation of integral of x^2 from [1, 2] using both midpoint rule and Simpson's rule. And I find it out that with the same number of subintervals midpoint rule approximation seems more accurate than Simpson's rule approximation, which is really weird.
The source code of midpoint rule approximation is :
program midpoint
implicit none ! Turn off implicit typing
Integer, parameter :: n=100 ! Number of subintervals
integer :: i ! Loop index
real :: xlow=1.0, xhi=2.0 ! Bounds of integral
real :: dx ! Variable to hold width of subinterval
real :: sum ! Variable to hold sum
real :: xi ! Variable to hold location of ith subinterval
real :: fi ! Variable to value of function at ith subinterval
dx = (xhi-xlow)/(1.0*n) ! Calculate with of subinterval
sum = 0.0 ! Initialize sum
xi = xlow+0.5*dx ! Initialize value of xi
do i = 1,n,1 ! Initiate loop
! xi = xlow+(0.5+1.0*i)*dx
write(*,*) "i,xi ",i,xi ! Print intermidiate result
fi = xi**2 ! Evaluate function at ith point
sum = sum+fi*dx ! Accumulate sum
xi = xi+dx ! Increment location of ith point
end do ! Terminate loop
write(*,*) "sum =",sum
stop ! Stop execution of the program
end program midpoint
the according execution is:
...... ..... ..................
i,xi 100 1.99499905
sum = 2.33332348
The source code of Simpson's rule approximation is:
program simpson
implicit none ! Turn off implicit typing
integer, parameter :: n=100 ! Number of subintervals
integer :: i=0 ! Loop index
real :: xlow=1.0, xhi=2.0 ! Bounds of integral
real :: h ! Variable to hold width of subinterval
real :: sum ! Variable to hold sum
real :: xi ! Variable to hold location of ith subinterval
real :: fi ! Variable to value of function at ith subinterval
real :: Psimp ! Variable of simpson polynomial of xi interval
h = (xhi-xlow)/(1.0*n) ! Calculate width of subinterval
sum = 0.0 ! Initialize sum
do while (xi<=xhi-h) ! Initiate loop
xi = xlow+i*2.0*h ! Increment of xi
i=i+1
write(*,*) "i,xi ",i,xi ! Print intermidiate result
Psimp=xi**2+4.0*(xi+h)**2+(xi+2.0*h)**2
! Evaluate function at ith point
sum = sum+(h/3.0)*Psimp ! Accumulate sum
end do ! Terminate loop
write(*,*) "sum =",sum
end program simpson
the according execution is:
........ ...... ...................
i,xi 101 2.00000000
sum = 2.37353396
To get the same precision of digits as midpoint result, I have to set the number of subintervals in Simpson's program to 100000, which is 1000 times more than the midpoint program (I initially set both of the number subintervals to 100)
I check the codes in Simpson's program and can't find whats wrong.
Simpson's rule should converge more rapid than midpoint rule if I remembered it correct.
Craig Burley once remarked that a WHILE loop looked like as soon as the premise of the loop was violated, the loop would be exited immediately. Here the premise of the loop is violated when x=xhi but the loop doesn't break at that point, only when a whole nother iteration is completed and the test can be applied at the top of the loop. You could more consistently with Fortran idioms convert the loop into a counted DO loop with something like
DO i = 0, n/2-1
and then comment out the
i=i+1
line. Or simply test the loop premise immediately after modifying xi:
xi = xlow+i*2.0*h ! Increment of xi
if(xi>xhi-h) exit ! Test loop premise
Either way leads to the exact results expected for a polynomial of degree no higher than 3 for Simpson's rule.

'x' argument of 'log10' intrinsic at (1) must be real [duplicate]

I want to calculate z value as the coordinate in range of x:-50~50 and y:-50~50 like below code.
program test
implicit none
! --- [local entities]
real*8 :: rrr,th,U0,amp,alp,Ndiv
real*8 :: pi,alpR,NR,Rmin,Rmax,z
integer :: ir, i, j
do i=0, 50
do j=0, 50
th=datan2(i,j)
pi=datan(1.d0)*4.d0
!
Ndiv= 24.d0 !! Number of circumferential division
alp = 90.d0/180.d0*pi !! phase [rad]
U0 = 11.4d0 !! average velocity
amp = 0.5d0 !! amplitude of velocity
Rmin = 10 !! [m]
Rmax = 50 !! [m]
NR = 6.d0 !! Number of radial division
!
rrr=dsqrt(i**2+j**2)
ir=int((rrr-Rmin)/(Rmax-Rmin)*NR)
alpR=2.d0*pi/dble(Ndiv)*dble(mod(ir,2))
z=U0*(1.d0+amp*dsin(0.5d0*Ndiv*th+alp+alpR))
write(*,*) 'i, j, z'
write(*,*) i, j, z
end do
end do
stop
end program test
But I couldn't make it work like below error. I think because i, j are in datan(i,j). How should I change these code?
test.f90:10.16:
th=datan2(i,j)
1
Error: 'y' argument of 'datan2' intrinsic at (1) must be REAL
test.f90:21.16:
rrr=dsqrt(i**2+j**2)
1
Error: 'x' argument of 'dsqrt' intrinsic at (1) must be REAL
Inspired by the comments of #Rodrigo Rodrigues, #Ian Bush, and #Richard, here is a suggested rewrite of the code segment from #SW. Kim
program test
use, intrinsic :: iso_fortran_env, only : real64
implicit none
! --- [local entities]
! Determine the kind of your real variables (select one):
! for specifying a given numerical precision
integer, parameter :: wp = selected_real_kind(15, 307) !15 digits, 10**307 range
! for specifying a given number of bits
! integer, parameter :: wp = real64
real(kind=wp), parameter :: pi = atan(1._wp)*4._wp
real(kind=wp) :: rrr, th, U0, amp, alp, Ndiv
real(kind=wp) :: alpR, NR, Rmin, Rmax, z
integer :: ir, i, j
do i = 0, 50
do j = 0, 50
th = atan2(real(i, kind=wp), real(j, kind=wp))
!
Ndiv= 24._wp !! Number of circumferential division
alp = 90._wp/180._wp*pi !! phase [rad]
U0 = 11.4_wp !! average velocity
amp = 0.5_wp !! amplitude of velocity
Rmin = 10 !! [m]
Rmax = 50 !! [m]
NR = 6._wp !! Number of radial division
!
rrr = sqrt(real(i, kind=wp)**2 + real(j, kind=wp)**2)
ir = int((rrr - Rmin) / (Rmax - Rmin) * NR)
alpR = 2._wp * pi / Ndiv * mod(ir, 2)
z = U0 * (1._wp + amp * sin(0.5_wp * Ndiv * th + alp + alpR))
!
write(*,*) 'i, j, z'
write(*,*) i, j, z
end do
end do
stop
end program test
Specifically, the following changes were made with respect to the original code posted:
Minimum change to answer the question: casting integer variables i and j to real values for using them in the real valued functions datan and dsqrt.
Using generic names for intrinsic procedures, i.e sqrt instead of dsqrt, atan instead of datan, and sin instead of dsin. One benefit of this approach, is that the kind of working precision wp can be changed in one place, without requiring explicit changes elsewhere in the code.
Defining the kind of real variables and calling it wp. Extended discussion of this topic, its implications and consequences can be found on this site, for example here and here. Also #Steve Lionel has an in depth post on his blog, where his general advice is to use selected_real_kind.
Defining pi as a parameter calculating its value once, instead of calculating the same value repeatedly within the nested for loops.

Fortran Error: 'y' argument of 'datan2' intrinsic at (1) must be REAL

I want to calculate z value as the coordinate in range of x:-50~50 and y:-50~50 like below code.
program test
implicit none
! --- [local entities]
real*8 :: rrr,th,U0,amp,alp,Ndiv
real*8 :: pi,alpR,NR,Rmin,Rmax,z
integer :: ir, i, j
do i=0, 50
do j=0, 50
th=datan2(i,j)
pi=datan(1.d0)*4.d0
!
Ndiv= 24.d0 !! Number of circumferential division
alp = 90.d0/180.d0*pi !! phase [rad]
U0 = 11.4d0 !! average velocity
amp = 0.5d0 !! amplitude of velocity
Rmin = 10 !! [m]
Rmax = 50 !! [m]
NR = 6.d0 !! Number of radial division
!
rrr=dsqrt(i**2+j**2)
ir=int((rrr-Rmin)/(Rmax-Rmin)*NR)
alpR=2.d0*pi/dble(Ndiv)*dble(mod(ir,2))
z=U0*(1.d0+amp*dsin(0.5d0*Ndiv*th+alp+alpR))
write(*,*) 'i, j, z'
write(*,*) i, j, z
end do
end do
stop
end program test
But I couldn't make it work like below error. I think because i, j are in datan(i,j). How should I change these code?
test.f90:10.16:
th=datan2(i,j)
1
Error: 'y' argument of 'datan2' intrinsic at (1) must be REAL
test.f90:21.16:
rrr=dsqrt(i**2+j**2)
1
Error: 'x' argument of 'dsqrt' intrinsic at (1) must be REAL
Inspired by the comments of #Rodrigo Rodrigues, #Ian Bush, and #Richard, here is a suggested rewrite of the code segment from #SW. Kim
program test
use, intrinsic :: iso_fortran_env, only : real64
implicit none
! --- [local entities]
! Determine the kind of your real variables (select one):
! for specifying a given numerical precision
integer, parameter :: wp = selected_real_kind(15, 307) !15 digits, 10**307 range
! for specifying a given number of bits
! integer, parameter :: wp = real64
real(kind=wp), parameter :: pi = atan(1._wp)*4._wp
real(kind=wp) :: rrr, th, U0, amp, alp, Ndiv
real(kind=wp) :: alpR, NR, Rmin, Rmax, z
integer :: ir, i, j
do i = 0, 50
do j = 0, 50
th = atan2(real(i, kind=wp), real(j, kind=wp))
!
Ndiv= 24._wp !! Number of circumferential division
alp = 90._wp/180._wp*pi !! phase [rad]
U0 = 11.4_wp !! average velocity
amp = 0.5_wp !! amplitude of velocity
Rmin = 10 !! [m]
Rmax = 50 !! [m]
NR = 6._wp !! Number of radial division
!
rrr = sqrt(real(i, kind=wp)**2 + real(j, kind=wp)**2)
ir = int((rrr - Rmin) / (Rmax - Rmin) * NR)
alpR = 2._wp * pi / Ndiv * mod(ir, 2)
z = U0 * (1._wp + amp * sin(0.5_wp * Ndiv * th + alp + alpR))
!
write(*,*) 'i, j, z'
write(*,*) i, j, z
end do
end do
stop
end program test
Specifically, the following changes were made with respect to the original code posted:
Minimum change to answer the question: casting integer variables i and j to real values for using them in the real valued functions datan and dsqrt.
Using generic names for intrinsic procedures, i.e sqrt instead of dsqrt, atan instead of datan, and sin instead of dsin. One benefit of this approach, is that the kind of working precision wp can be changed in one place, without requiring explicit changes elsewhere in the code.
Defining the kind of real variables and calling it wp. Extended discussion of this topic, its implications and consequences can be found on this site, for example here and here. Also #Steve Lionel has an in depth post on his blog, where his general advice is to use selected_real_kind.
Defining pi as a parameter calculating its value once, instead of calculating the same value repeatedly within the nested for loops.

Fortran error: size of variable is too large

I have a long program and the goal is to solve the matrix system ax=b. When I run it, it reveals that "error: size of variable is too large".
program ddm
integer :: i,j,k
integer, parameter :: FN=1,FML=80,FMH=80
integer, parameter :: NBE=1*80*80 !NBE=FN*FML*FMH
double precision, dimension(1:3*NBE,1:3*NBE) :: AA
double precision, dimension(1:3*NBE) :: BB
double precision :: XX(3*NBE)
double precision, dimension(1:NBE) :: DSL,DSH,DNN
double precision, dimension(1:FML,1:FMH) :: DSL1,DSH1,DNN1
! Construct a block matrix
AA(1:NBE,1:NBE) = SLSL
AA(1:NBE,NBE+1:2*NBE) = SLSH
AA(1:NBE,2*NBE+1:3*NBE) = SLNN
AA(NBE+1:2*NBE,1:NBE) = SHSL
AA(NBE+1:2*NBE,NBE+1:2*NBE) = SHSH
AA(NBE+1:2*NBE,2*NBE+1:3*NBE) = SHNN
AA(2*NBE+1:3*NBE,1:NBE) = NNSL
AA(2*NBE+1:3*NBE,NBE+1:2*NBE) = NNSH
AA(2*NBE+1:3*NBE,2*NBE+1:3*NBE) = NNNN
! Construct a block matrix for boundary condition
BB(1:NBE) = SLBC
BB(NBE+1:2*NBE) = SHBC
BB(2*NBE+1:3*NBE) = NNBC
call GE(AA,BB,XX,3*NBE)
DSL = XX(1:NBE)
DSH = XX(NBE+1:2*NBE)
DNN = XX(2*NBE+1:3*NBE)
DSL1 = reshape(DSL,(/FML,FMH/))
DSH1 = reshape(DSH,(/FML,FMH/))
DNN1 = reshape(DNN,(/FML,FMH/))
open(unit=2, file='DNN2.txt', ACTION="write", STATUS="replace")
do i=1,80
write(2,'(*(F14.7))') real(DNN1(i,:))
end do
end program ddm
Note: GE(AA,BB,XX,3*NBE) is the function for solving the matrix system. Below is the GE function.
subroutine GE(a,b,x,n)
!===========================================================
! Solutions to a system of linear equations A*x=b
! Method: Gauss elimination (with scaling and pivoting)
!-----------------------------------------------------------
! input ...
! a(n,n) - array of coefficients for matrix A
! b(n) - array of the right hand coefficients b
! n - number of equations (size of matrix A)
! output ...
! x(n) - solutions
! coments ...
! the original arrays a(n,n) and b(n) will be destroyed
! during the calculation
!===========================================================
implicit none
integer n
double precision a(n,n),b(n),x(n)
double precision s(n)
double precision c, pivot, store
integer i, j, k, l
! step 1: begin forward elimination
do k=1, n-1
! step 2: "scaling"
! s(i) will have the largest element from row i
do i=k,n ! loop over rows
s(i) = 0.0
do j=k,n ! loop over elements of row i
s(i) = max(s(i),abs(a(i,j)))
end do
end do
! step 3: "pivoting 1"
! find a row with the largest pivoting element
pivot = abs(a(k,k)/s(k))
l = k
do j=k+1,n
if(abs(a(j,k)/s(j)) > pivot) then
pivot = abs(a(j,k)/s(j))
l = j
end if
end do
! Check if the system has a sigular matrix
if(pivot == 0.0) then
write(*,*) "The matrix is singular"
return
end if
! step 4: "pivoting 2" interchange rows k and l (if needed)
if (l /= k) then
do j=k,n
store = a(k,j)
a(k,j) = a(l,j)
a(l,j) = store
end do
store = b(k)
b(k) = b(l)
b(l) = store
end if
! step 5: the elimination (after scaling and pivoting)
do i=k+1,n
c=a(i,k)/a(k,k)
a(i,k) = 0.0
b(i)=b(i)- c*b(k)
do j=k+1,n
a(i,j) = a(i,j)-c*a(k,j)
end do
end do
end do
! step 6: back substiturion
x(n) = b(n)/a(n,n)
do i=n-1,1,-1
c=0.0
do j=i+1,n
c= c + a(i,j)*x(j)
end do
x(i) = (b(i)- c)/a(i,i)
end do
end subroutine GE
Turn your arrays (at least AA, BB, XX) into allocatable arrays and allocate them by yourself in the code. You are hitting the memory limit of statically allocated arrays. There is a limit of 2GB on some systems if I remember well (experts will confirm or give the right numbers).