Why define PI = 4*ATAN(1.d0) - fortran

What is the motivation for defining PI as
PI=4.D0*DATAN(1.D0)
within Fortran 77 code? I understand how it works, but, what is the reasoning?

This style ensures that the maximum precision available on ANY architecture is used when assigning a value to PI.

Because Fortran does not have a built-in constant for PI. But rather than typing in the number manually and potentially making a mistake or not getting the maximum possible precision on the given implementation, letting the library calculate the result for you guarantees that neither of those downsides happen.
These are equivalent and you'll sometimes see them too:
PI=DACOS(-1.D0)
PI=2.D0*DASIN(1.D0)

I believe it's because this is the shortest series on pi. That also means it's the MOST ACCURATE.
The Gregory-Leibniz series (4/1 - 4/3 + 4/5 - 4/7...) equals pi.
atan(x) = x^1/1 - x^3/3 + x^5/5 - x^7/7...
So, atan(1) = 1/1 - 1/3 + 1/5 - 1/7 + 1/9...
4 * atan(1) = 4/1 - 4/3 + 4/5 - 4/7 + 4/9...
That equals the Gregory-Leibniz series, and therefore equals pi, approximately
3.1415926535 8979323846 2643383279 5028841971 69399373510.
Another way to use atan and find pi is:
pi = 16*atan(1/5) - 4*atan(1/239), but I think that's more complicated.
I hope this helps!
(To be honest, I think the Gregory-Leibniz series was based on atan, not 4*atan(1) based on the Gregory-Leibniz series. In other words, the REAL proof is:
sin^2 x + cos^2 x = 1 [Theorem]
If x = pi/4 radians, sin^2 x = cos^2 x, or sin^2 x = cos^2 x = 1/2.
Then, sin x = cos x = 1/(root 2). tan x (sin x / cos x) = 1, atan x (1 / tan x) = 1.
So if atan(x) = 1, x = pi/4, and atan(1) = pi/4.
Finally, 4*atan(1) = pi.)
Please don't load me with comments-I'm still a pre-teen.

There is more to this question than meets the eye. Why 4 arctan(1)? Why not any other representation such as 3 arccos(1/2)?
This will try to find an answer by exclusion.
Mathematical intro: When using inverse trigonometric functions such as arccos, arcsin and arctan, one can easily compute π in various ways:
π = 4 arctan(1) = arccos(-1) = 2 arcsin(1) = 3 arccos(1/2) = 6 arcsin(1/2)
= 3 arcsin(sqrt(3)/2) = 4 arcsin(sqrt(2)/2) = ...
There exist many other exact algebraic expressions for trigonometric values that could be used here.
Floating-point argument 1: it is well understood that a finite binary floating-point representation cannot represent all real numbers. Some examples of such numbers are 1/3, 0.97, π, sqrt(2), .... To this end, we should exclude any mathematical computation of π where the argument to the inverse trigonometric functions cannot be represented numerically. This leaves us the arguments -1,-1/2,0,1/2 and 1.
π = 4 arctan(1) = 2 arcsin(1)
= 3 arccos(1/2) = 6 arcsin(1/2)
= 2 arccos(0)
= 3/2 arccos(-1/2) = -6 arcsin(-1/2)
= -4 arctan(-1) = arccos(-1) = -2 arcsin(-1)
Floating-point argument 2: in the binary representation, a number is represented as 0.bnbn-1...b0 x 2m. If the inverse trigonometric function came up with the best numeric binary approximation for its argument, we do not want to lose precision by multiplication. To this end we should only multiply with powers of 2.
π = 4 arctan(1) = 2 arcsin(1)
= 2 arccos(0)
= -4 arctan(-1) = arccos(-1) = -2 arcsin(-1)
Note: this is visible in the IEEE-754 binary64 representation (the most common form of DOUBLE PRECISION or kind=REAL64). There we have
write(*,'(F26.20)') 4.0d0*atan(1.0d0) -> " 3.14159265358979311600"
write(*,'(F26.20)') 3.0d0*acos(0.5d0) -> " 3.14159265358979356009"
This difference is not there in IEEE-754 binary32 (the most common form of REAL or kind=REAL32) and IEEE-754 binary128 (the most common form of kind=REAL128)
Implementation argument: On intel CPU, the atan2 is part of the x86 Instruction set as FPATAN while the other inverse trigonometric function are derived from atan2. A potential derivation could be:
mathematically numerically
ACOS(x) = ATAN2(SQRT(1-x*x),1) = ATAN2(SQRT((1+x)*(1-x)),1)
ASIN(x) = ATAN2(1,SQRT(1-x*x)) = ATAN2(1,SQRT((1+x)*(1-x)))
This can be seen in the assembly code of these instructions (See here). To this end I would argue the usage of:
π = 4 arctan(1)
Note: this is a fuzzy argument. I'm certain there are people with better opinions on this.
Interesting reads on FPATAN: How is arctan implemented?, x87 trigonometric instructions
The Fortran argument: why should we approximate π as :
integer, parameter :: sp = selected_real_kind(6, 37)
integer, parameter :: dp = selected_real_kind(15, 307)
integer, parameter :: qp = selected_real_kind(33, 4931)
real(kind=sp), parameter :: pi_sp = 4.0_sp*atan2(1.0_sp,1.0_sp)
real(kind=dp), parameter :: pi_dp = 4.0_dp*atan2(1.0_dp,1.0_dp)
real(kind=qp), parameter :: pi_qp = 4.0_qp*atan2(1.0_qp,1.0_qp)
and not :
real(kind=sp), parameter :: pi_sp = 3.14159265358979323846264338327950288_sp
real(kind=dp), parameter :: pi_dp = 3.14159265358979323846264338327950288_dp
real(kind=qp), parameter :: pi_qp = 3.14159265358979323846264338327950288_qp
The answer lays in the Fortran standard. The standard never states that a REAL of any kind should represent an IEEE-754 floating point number. The representation of REAL is processor dependent. This implies that I could inquire selected_real_kind(33, 4931) and expect to obtain a binary128 floating-point number, but I might get a kind returned that represents a floating-point with much higher accuracy. Maybe 100 digits, who knows. In this case, my above string of numbers is to short! One cannot use this just to be sure? Even that file could be too short!
Interesting fact : sin(pi) is never zero
write(*,'(F17.11)') sin(pi_sp) => " -0.00000008742"
write(*,'(F26.20)') sin(pi_dp) => " 0.00000000000000012246"
write(*,'(F44.38)') sin(pi_qp) => " 0.00000000000000000000000000000000008672"
which is understood as:
pi = 4 ATAN2(1,1) = π + δ
SIN(pi) = SIN(pi - π) = SIN(δ) ≈ δ
program print_pi
! use iso_fortran_env, sp=>real32, dp=>real64, qp=>real128
integer, parameter :: sp = selected_real_kind(6, 37)
integer, parameter :: dp = selected_real_kind(15, 307)
integer, parameter :: qp = selected_real_kind(33, 4931)
real(kind=sp), parameter :: pi_sp = 3.14159265358979323846264338327950288_sp
real(kind=dp), parameter :: pi_dp = 3.14159265358979323846264338327950288_dp
real(kind=qp), parameter :: pi_qp = 3.14159265358979323846264338327950288_qp
write(*,'("SP "A17)') "3.14159265358..."
write(*,'(F17.11)') pi_sp
write(*,'(F17.11)') acos(-1.0_sp)
write(*,'(F17.11)') 2.0_sp*asin( 1.0_sp)
write(*,'(F17.11)') 4.0_sp*atan2(1.0_sp,1.0_sp)
write(*,'(F17.11)') 3.0_sp*acos(0.5_sp)
write(*,'(F17.11)') 6.0_sp*asin(0.5_sp)
write(*,'("DP "A26)') "3.14159265358979323846..."
write(*,'(F26.20)') pi_dp
write(*,'(F26.20)') acos(-1.0_dp)
write(*,'(F26.20)') 2.0_dp*asin( 1.0_dp)
write(*,'(F26.20)') 4.0_dp*atan2(1.0_dp,1.0_dp)
write(*,'(F26.20)') 3.0_dp*acos(0.5_dp)
write(*,'(F26.20)') 6.0_dp*asin(0.5_dp)
write(*,'("QP "A44)') "3.14159265358979323846264338327950288419..."
write(*,'(F44.38)') pi_qp
write(*,'(F44.38)') acos(-1.0_qp)
write(*,'(F44.38)') 2.0_qp*asin( 1.0_qp)
write(*,'(F44.38)') 4.0_qp*atan2(1.0_qp,1.0_qp)
write(*,'(F44.38)') 3.0_qp*acos(0.5_qp)
write(*,'(F44.38)') 6.0_qp*asin(0.5_qp)
write(*,'(F17.11)') sin(pi_sp)
write(*,'(F26.20)') sin(pi_dp)
write(*,'(F44.38)') sin(pi_qp)
end program print_pi

It's because this is an exact way to compute pi to arbitrary precision. You can simply continue executing the function to get greater and greater precision and stop at any point to have an approximation.
By contrast, specifying pi as a constant provides you with exactly as much precision as was originally given, which may not be appropriate for highly scientific or mathematical applications (as Fortran is frequently used with).

That sounds an awful lot like a work-around for a compiler bug. Or it could be that this particular program depends on that identity being exact, and so the programmer made it guaranteed.

Related

Need to find when value is == 0, but I cannot due to numerical errors

I have 2 large lists of vectors (>10,000 vectors each, say vi and wi) and I am trying to find when vi cross-product wi = 0, or, when vi x wi = 0.
The lists of vectors are previously calculated (this is Computational Fluid Dynamics and the calculated vectors represent properties of a fluid. I am doing research in Vortex Identification and this calculation is necessary).
I am trying to find when the cross product == 0 but I only get 3 results out of the thousands where the cross product is satisfied. We are trying to automate a method done by hand so we know for a fact that there are more than 3 vectors.
Our assumption is that since we are using basic numerical methods (of low orders) to calculate the vectors, there is a build up of errors.
TLDR: In essence, this does not work due to numerical errors:
real :: cross1, cross2, cross3
logical :: check1, check2, check3
logical :: is_seed
check1 = cross1 == 0.0
check2 = cross2 == 0.0
check3 = cross3 == 0.0
is_seed = check1 .and. check2 .and. check3
so, we have to do this:
real :: cross1, cross2, cross3
real :: tol
logical :: check1, check2, check3
logical :: is_seed
tol = 1.0e-4 ! NEED TO FIND OUT HOW TO CALCULATE
check1 = cross1 <= (0.0 + tol)
check2 = cross2 <= (0.0 + tol)
check3 = cross3 <= (0.0 + tol)
is_seed = check1 .and. check2 .and. check3
but I want to know how to calculate tol automatically and not hard code it. How can this be done?
Edit 1
As pointed out in the comments, the function below is entirely equivalent to the built-in function spacing(x).
Edit 2
Use the following function ulp(x) to find the value of the least significant bit in the mantissa of an ieee754 number x
32-bit
elemental function ulp32(x) result(d)
real(real32), intent(in) :: x
real(real32) :: d
d = 2.0**(-floor(-log(x)/log(2e0))-24)
end function
64-bit
elemental function ulp64(x) result(d)
real(real64), intent(in) :: x
real(real64) :: d
d = 2d0**(-floor(-log(x)/log(2d0))-53)
end function
interface
interface ulp
procedure :: ulp32, ulp64
end interface
with some results given values between 1 and 1e9
x 32bit 64bit
517.54 0.00006104 0.00000000000011369
1018.45 0.00006104 0.00000000000011369
1972.33 0.00012207 0.00000000000022737
5416.69 0.00048828 0.00000000000090949
11812.67 0.00097656 0.00000000000181899
13190.24 0.00097656 0.00000000000181899
18099.97 0.00195312 0.00000000000363798
28733.47 0.00195312 0.00000000000363798
86965.21 0.00781250 0.00000000001455192
135734.23 0.01562500 0.00000000002910383
203975.41 0.01562500 0.00000000002910383
780835.66 0.06250000 0.00000000011641532
2343924.58 0.25000000 0.00000000046566129
2552437.80 0.25000000 0.00000000046566129
6923904.28 0.50000000 0.00000000093132257
8929837.66 1.00000000 0.00000000186264515
29408286.38 2.00000000 0.00000000372529030
70054595.74 8.00000000 0.00000001490116119
231986024.46 16.00000000 0.00000002980232239
392724963.99 32.00000000 0.00000005960464478
It is recommended to pick a tol value that is a factor of ulp, and this factor should be a power of two. Each power means shifting one bit over to increase the tolerance by a power of two. You can expect each operation that propagates round-off errors to also make the error larger proportionally to 2**n where n is the number of operations.
So depending on the magnitude of the values compared, the tolerance should be approximated by tol = factor * abs(x) * 2**(-24)
For example, comparing two values of x=12418.16752 and y=12418.16774 pick a tolerance with
tol = 8*ulp(15000.0)
check = abs(x-y) <= tol
I get a tol=7.8125000E-03 and the result check=.true.
Edit 0
<Post deleted>
In the first place, you should have knowledge of the error on the vector components, otherwise no test for zero can be conclusive.
Now the absolute error on the cross product is like
(u + δu) x (v + δv) - uv ~ u x δv + δu x v
and in the worst case the vectors can be orthogonal, giving the estimate |u||δu|+|v||δv|=(|u|+|v|)δ. So a value of |u x v| below this bound could correspond to parallel vectors.
I found a solution to my problem.
First, I take the magnitude of the vector. I do this so I only have to work with one value instead of 3. This is possible since ||v|| = 0 if and only if v = 0. I save the magnitude of those vectors in a new array called cross_mag (since the vector is the result of a cross product).
Then I find the lowest value in the array that is not zero. (This is to discount outliers that may be equal to zero)
I found that when the number is written in scientific notation, the exponent of 10 will give me a power x that I can base my tolerance off of. I do this using log10( min_value ).
I then increase the power of the lowest value by 1, which increases the total tolerance directly by a factor of 10.
I use this new value as the exponent of my tol. (This can of course be scaled which I have done by a factor of 1.5).
Or:
real, dimension(:,:,:) :: cross_mag
real :: min_val, ex, tol
integer :: imax, jmax, kmax
! Find new "zero" that is based off of the lowest values.
! This new zero is required due to the buildup of numerical errors.
min_val = rrspacing(1.0)
do k = 1, kmax
do j = 1, jmax
do i = 1, imax
if ((cross_mag(i,j,k) < min_val) .and. (cross_mag(i,j,k) .ne. 0.0)) then
min_val = cross_mag(i,j,k)
end if
end do
end do
end do
ex = log10(abs(min_val))
ex = floor(ex)
tol = 1.5 * 10.0**(ex + 1.0)
write(*,*) 'min_val: ', min_val
write(*,*) 'tol: ', tol
I found this works plenty well for my work and gives me a reasonable amount of vectors to work with. I thank you all for helping my find the rrspacing() function which helped me create an arbitrarily large number.

Discrete Fourier Transform seems to be printing incorrect answers?

I am attempting to write a program that calculates the discrete fourier transform of a set of given data. I've sampled a sine wave, so my set is (pi/2,2*pi,3*pi/2,2*pi). Here is my program:
program DFT
implicit none
integer :: k, N, x, y, j, r, l
integer, parameter :: dp = selected_real_kind(15,300)
real, allocatable,dimension(:) :: h, rst
integer, dimension(:,:), allocatable :: W
real(kind=dp) :: pi
open(unit=100, file="dft.dat",status='replace')
N = 4
allocate(h(N))
allocate(rst(N))
allocate(W(-N/2:N/2,1:N))
pi = 3.14159265359
do k=1,N
h(k) = k*(pi*0.5)
end do
do j = -N/2,N/2
do k = 1, N
W(j,k) = EXP((2.0_dp*pi*cmplx(0.0_dp,1.0_dp)*j*k)/N)
end do
end do
rst = matmul(W,h)
!print *, h, w
write(100,*) rst
end program
And this prints out the array rst as:
0.00000000 0.00000000 15.7079639 0.00000000 0.00000000
Using an online calculator, the results should be:
15.7+0j -3.14+3.14j -3.14+0j -3.14-3.14j
I'm not sure why rst is 1 entry too long either.
Can anyone spot why it's printing out 0 for 3/4 of the results? I notice that 15.7 appears in both the actual answers and my result.
Thank you
Even though the question has been answered and accepted, the program given has so many problems that I had to say...
The input given is not a sine wave, it's a linear function of time. Kind of like a 1-based ramp input.
For DFTs the indices normally are considered to go from 0:N-1, not 1:N.
For W the Nyquist frequency is represented twice, as -N/2 and N/2. Again it would have been normal to number the rows 0:N-1, BTW, this is why you have an extra output in your rst vector.
pi is double precision but only initialized to 12 significant figures. It's hard to tell if there's a typo in your value of pi which is why many would use 4*atan(1.0_dp) or acos(-1.0_dp).
Notice that h(N) is actually going to end up as the zero time input, which is one reason the whole world indices DFT vectors from zero.
The expression cmplx(0.0_dp,1.0_dp) is sort of futile because the CMPLX intrinsic always returns a single precision result if the third optional KIND= argument is not present. As a complex literal, (0.0_dp,1.0_dp) would be double precision. However, you could as well use (0,1) because it's exactly representable in single precision and would be converted to double precision when it gets multiplied by the growing product on its left. Also 2.0_dp could have been represented successfully as 2 with less clutter.
The expression EXP((2.0_dp*pi*cmplx(0.0_dp,1.0_dp)*j*k)/N) is appropriate for inverse DFT, disregarding normalization. Thus I would have written the whole thing more cleanly and correctly as EXP(-2*pi*(0,1)*j*k/N). Then the output should have been directly comparable to what the online calculator printed out.
Fortran does complex numbers for you but you must declare the appropriate variables as complex. Try
complex, allocatable,dimension(:) :: rst
complex, dimension(:,:), allocatable :: W

Deliberately simplifying fractional exponents

I have an expression involving fractional exponents that I want to make into a polynomial recognisable to sympy for solution. I could, if necessary, write the exponents using Rational but can't make that work.
What can I do?
>>> from sympy import *
>>> var('d x')
(d, x)
>>> (0.125567*(d + 0.04) - d**2.25*(2.51327*d + 6.72929)).subs(d,x**4)
0.125567*x**4 - (2.51327*x**4 + 6.72929)*(x**4)**2.25 + 0.00502268
SymPy does not combine exponents unless it knows it is safe to do so. For complex numbers it's only safe with integer exponents. Since we don't know if x is real or complex, the exponents are not combined.
Even for real x, (x**4)**(9/4) is not the same as x**9 (consider negative x). If x is declared real, using x = Symbol('x', real=True), then (x**4)**Rational(9, 4) correctly returns x**8*Abs(x) instead of x**9.
If x is declared positive, x = Symbol('x', positive=True), then (x**4)**Rational(9, 4) returns x**9.
It is not advisable to use floating point representation of rational numbers in SymPy, especially as exponents. This is why I replaced 2.25 by Rational(9, 4) above. With 2.25, the result is Abs(x)**9.0 when x is real, and x**9.0 if x is declared positive. The decimal dot indicates these are floating point numbers; so subsequent manipulations will have floating-point results instead of symbolic ones. For example (with x declared positive):
>>> solve((x**4)**Rational(9, 4) - 2)
[2**(1/9)]
>>> solve((x**4)**2.25 - 2)
[1.08005973889231]

CGAL: convert quotient to double

I am having a problem in converting CGAL QP solver
typedef CGAL::Gmpzf ET;
...define a quadratic program qp here...
Solution s = CGAL::solve_quadratic_program(qp, ET());
assert (s.solves_quadratic_program(qp));
cout<<"QP objective = "<<s.objective_value()<<endl;
// The above returns a value of type CGAL::Quotient<ET>
// and I need to convert it to double
double n = s.objective_value_numerator().to_double();
double d = s.objective_value_denominator().to_double();
cout<<"QP objective 2 = "<<n/d<<endl;
I got:
QP objective = -2.57497e-22/2.01459e-22
QP objective 2 = -nan
I checked and observed that n = -inf and d = inf.
How do we properly convert a Quotient into double?
Thank you in advance for any suggestion!!
CGAL has a function CGAL::to_double that can be used on most number types and in particular on Quotient. It has special code exactly for this case where numerator and denominator would overflow. It does not have code for underflow, which cannot happen with a quotient of integers, but could happen with Gmpzf, yielding 0/0.

Tricky arithmetic or sleight of hand?

Vincent answered Fast Arc Cos algorithm by suggesting this function.
float arccos(float x)
{
x = 1 - (x + 1);
return pi * x / 2;
}
The question is, why x = 1 - (x + 1) and not x = -x?
It returns a different result only when (x + 1) causes a loss of precision, that is, x is many orders of magnitude larger or smaller than one.
But I don't think this is tricky or sleight of hand, I think it's just plain wrong.
cos(0) = 1 but f(1) = -pi/2
cos(pi/2) = 0 but f(0) = 0
cos(pi) = -1 but f(-1) = pi/2
where f(x) is Vincent's arccos implementation. All of them are off by pi/2, a linear approximation that gets at least these three points correct would be
g(x) = (1 - x) * pi / 2
I don't see the details instantly, but think about what happens as x approaches 1 or -1 from either side, and consider roundoff error.
Addition causes that both numbers are normalized (in this case, relevant for x). IIRC, in Knuth's volume 2, in the chapter on floating-point arithmetic, you can even see an expression like x+0.