Floating point math accuracy, c++ vs fortran [duplicate]

Floating point math accuracy, c++ vs fortran [duplicate] - c++

This question already has answers here:
Different precision in C++ and Fortran
(2 answers)
Closed 2 years ago.
I have tried to implement a recursive function, in both C++ and Fortran, which calculates the value of the n'th Legendre polynomial, at x. In Fortran I have
recursive function legendre(n, x) result(p)
integer, intent(in) :: n
real(8), intent(in) :: x
real(8) :: p
if(n == 0) then
p = 1.0
else if (n == 1) then
p = x
else
p = (2.0*real(n,dp)-1.0)*x*legendre(n-1,x)-(real(n,dp)-1.0)*legendre(n-2,x)
p = p / real(n,dp)
end if
end function legendre
and then in C++ I have
double legendre(int n, double x) {
double p;
if(n == 0) return 1.0;
else if(n == 1) return x;
else {
p = (2.0*(double)n - 1.0)*x*legendre(n-1,x)-((double)n - 1.0)*legendre(n-2,x);
p /= (double)n;
return p;
}
}
These two functions seem to be exactly the same to me, both using double precission, but the result from the Fortran function is substantial different from the C++ result. For example,
legendre(7,-0.2345) = 0.28876207107499049178814404296875 according to WolframAlpha. The two codes above, when compiled with no optimizations produce
Fortran : 0.28876206852410113
C++ : 0.28876207107499052285
I know that the answers should not be the same due to floating point arithmetic, but the difference in value here for double precision seems somewhat large to me. What is the reason that the Fortran value is so far off from the other two ?

Although the variables in your FORTRAN function are defined as double-precision (8 bytes), the constants you have specified are default (single-precision, 4-byte) values.
According to this discussion, that means the arithmetic is performed to single-precision accuracy:
Even if the variable that you are assigning the result to is defined
to be DP, the Fortran standard requires that the arithmetic on the
constants be performed using SP. That is because you are using default
real constants, since you do not have any kind type parameter at the
end of the constants. By rule, default real is SP.
And, further on in the same discussion:
...Starting with Fortran 90, published in June 1991, this practice of
"promoting" SP constants to DP is prohibited.
So, in order to force double-precision maths, specify the constants as DP: instead of, for example, 1.0, specify 1.0D0 (and so forth for the others).

Thanks to Adrian's response, I was able to fix the problem. Nothing was wrong with the code in the Fortran function, the issue was the value of x which I was passing to it. Even though in the main program, I had defined x as real(8) and latter assigned it the value with a simple
x = -0.2345
which I thought should be double precision. It should actually be
x = -0.2345_dp
This results in the two functions having the same answer. I believe it is likely due to the reason that Adrian pointed out.

Related

Double precision in Fortran for trignometric functions [duplicate]

This question already has answers here:
Result of GAMMA underflows its kind
(2 answers)
Is There a Better Double-Precision Assignment in Fortran 90?
(2 answers)
Best practice when working with double precision magic numbers
(1 answer)
Closed 1 year ago.
I am using the following code to calculate cos for pi/2 in Fortran.
program precision_Fortran
IMPLICIT NONE
!!integer, parameter :: dp = kind(1.0d0) !!Gives same result as line below
integer, parameter :: dp = selected_real_kind(15, 307)
Real(dp), parameter:: pi=4.0*atan(1.0)
Real(dp) :: angle
angle = cos(pi/2.0)
write(*,*)'pi = ', pi
write(*,*)'angle = ', angle
end program precision_Fortran
I compiled using gfortran and ftn95. From both, the output is
pi = 3.1415927410125732
angle = -4.3711390001862412E-008
How do I get a better precision for angle here? For instance in C++ I see it in order of E-18, for all declaration using double.
Please let me know if more information is needed to explain it better.
Extra : The main code I am using, with physical equations having trigonometric terms, is having precision issues, and am not entirely sure, but am suspecting it's because of this. So, want to check if above could be improved somehow. Not expert with Fortran so struggling to figure this out.

Maximum value of 64 bit floating point number for overflow detection

I have a seemingly simple problem: I want to detect whether a floating point addition in Fortran will overflow by doing something like the following:
real*8 :: a, b, c
a = ! some value
b = ! some value
if (b > DOUBLE_MAX - a) then
! handle overflow
else
c = a + b
The problem is that I don't know what DOUBLE_MAX should be. I'm aware of how floating point numbers are represented according to IEEE 754 but the largest value representable by a double precision floating point number seems to be too large for a variable of type real*8 (i.e. if I try to assign 1.7976931348623157e+308 to such a variable gfortran complains). C and C++ have predefined constants/generic functions for this purpose but I couldn't find a Fortran equivalent.
Note: I'm aware that real*8 is not really part of the standard but there seems to be no other way to reliably specify that a floating point number should use the double precision format.

Something like this?
real(REAL64) function func( a, b )
use, intrinsic :: iso_fortran_env, only: REAL64, INT64
use, intrinsic :: ieee_arithmetic, only: ieee_value, ieee_set_flag, IEEE_OVERFLOW, IEEE_QUIET_NAN
implicit none
real(REAL64), intent(in) :: a, b
real(REAL64), parameter :: MAX64 = huge(0.0_REAL64)
if ( b > MAX64-a ) then
! Set IEEE_OVERFLOW flag and return NaN
call ieee_set_flag(IEEE_OVERFLOW,.true.)
func = ieee_value(func,IEEE_QUIET_NAN)
else
func = a + b
end if
return
end function func
All I could find for intrinsic ieee_exceptions module is:
https://github.com/gcc-mirror/gcc/blob/master/libgfortran/ieee/ieee_exceptions.F90
For setting NaN value see post.

There are likely better ways to detect overflow, but the precise answer to your question is to use the huge function. HUGE(a) returns the largest possible number representable by the type a.

Apparent mixed-mode arithmetic from a Fortran intrinsic function

What I'm doing is very straightforward. Here are the relevant declarations:
USE, INTRINSIC :: ISO_Fortran_env, dp=>REAL64 !modern DOUBLE PRECISION
REAL(dp), PARAMETER :: G_H2_alpha = 1.57D+04, G_H2_beta = 5.3D+03, G_H2_gamma = 4.5D+03
REAL(dp) :: E_total_alpha, E_total_beta, E_total_gamma, P_H2_sed
Usage:
P_H2_sed = G_H2_alpha * E_total_alpha + G_H2_beta * E_total_beta * G_H2_gamma * E_total_gamma
where E_total_alpha, E_total_beta, and E_total_gamma are just running dp totals inside various loops. I ask for the nearest integer NINT(P_H2_sed) and get -2147483648, which looks like mixed-mode arithmetic. The float P_H2_sed returns 2529548272025.4888, so I would expect NINT to return 2529548272026. I didn't think it was possible to get this kind of result from an intrinsic function. I haven't seen this since my days with the old F77 compiler. I'm doing something bad, but what is the question.

NINT, by default, returns an integer with default type parameter, that usually is equivalent to int32.
An integer of this kind cannot represent a number as high as 2529548272026. The maximum representable number is 2^31-1, that is 2147483647. The result you are getting is similar to that, but is the lowest representable number, -2147483648 (equivalent o all 32 bits set to 1).
To get a result of other kind from NINT, pass an optional parameter named kind, like this: NINT(P_H2_sed, kind=int64).

Function type does not match the function definition

I am new to Fortran, writing some practice code with a function that returns Farenheit from Celsius
program Console1
implicit none
real, parameter :: ikind = selected_real_kind(p=15)
real (kind = ikind):: c,f,o,faren
print *, "enter a temperature in degrees celsius"
read *, c
write(*,10) "farenheit =", faren(c)
10 format(a,f10.8)
end program Console1
function faren(c)
real, parameter :: ikind = selected_real_kind(p=15)
real (kind = ikind):: c,f
faren = (9/5)*c + 32
end function faren
I get an error #7977 : The type of the function reference does not match the type of the function definition.
So with that if i change function faren(c) to real function faren(c)
I get the same error, but the types are the same?
Am i missing something? Do I have to define the function in the main program?

There are several issues in addition to the structural/code arrangement ones already noted.
First, KIND is an integer, so you want to change
real, parameter :: ikind = selected_real_kind(p=15)
to
integer, parameter :: ikind = selected_real_kind(p=15)
Ideally, you want to define that in only one place (i.e. in a module) and reference it from both your main program and the function, but the code should be fine as it is for test purposes.
A second issue that often trips up newcomers to Fortran (and Python2) is that real numbers and integers are distinct types and are not generally interchangeable.
faren = (9/5)*c + 32
simplifies to
faren = (1)*c + 32
because integer division has an integer result; 9/5 = 1
Fortran is picky about numerical values (that's sort of the whole point of the language) so what you probably want is:
faren = (9.0 / 5.0) * c + 32.0
Or more precisely, if faren is defined with a specific precision of ikind,
faren = (real(9.0,ikind) / real_(5.0,ikind)) * c + real(32.0,ikind)
or
faren = (9.0_ikind / 5.0_ikind) * c + 32.0_ikind
This syntax tends to make people's heads explode. Welcome to modern fortran ;)
The last issue deals with the horrors of Fortran I/O. From a design standpoint, you need to know what results the user expects and make sure the output format can display them. The legitimate range of input values for c is -273.15 (give or take) to some upper bound which relies on the use case for the code. If you're dealing with cooking temperatures, you probably won't exceed 400.0; if you're doing fusion research, you could be going much higher. Are 8 figures past the decimal useful or believable? In this case, we're just testing the code so we may not need a lot of precision in the output; you'll want to change the output format to something like:
10 format(a,es10.2)
or
10 format(a,g16.8)
You need to ensure the total field width (the number before the dot) can contain the decimal part (the number after the dot) along with the integer part of the number, plus the space needed to show sign and exponent. For scientific notation, four characters are eaten by mantissa sign, decimal point, 'E' and exponent sign. It may be safer just starting out to use an output format of *; it's frustrating to fight with numerics and formatting simultaneously.

That is a good effort and simple start to work through the nuance, so a good question.
Personally I would use reals for the math, rather the 9/5, and use a module. In this example you could pass in a real or a double to C2Faren and the interface/procedure will sort out whether to use the real or the double version. Then you have a few options in case you want different precision.
You could also use the ISO_C_BINDING if you do mixed language...
MODULE MyTEMPS
PRIVATE
DOUBLE PRECISION, PARAMETER :: C2F_ScaleFact = 1.8D0
DOUBLE PRECISION, PARAMETER :: F2C_ScaleFact = /(1.0D0 / 1.8D0)/
DOUBLE PRECISION, PARAMETER :: F2C_Offset = 32.0D0
PUBLIC Faren2C
INTERFACE C2Faren
MODULE PROCEDURE C2Faren_Real, C2Faren_DBL
END INTERFACE
CONTAINS
!========= REAL VERISON =========
REAL FUNCTION C2Faren_Real(c)
IMPLICIT NONE
real, INTENT(IN ) :: c
C2Faren_Real = ( C*F2C_ScaleFact ) + F2C_Offset
RETURN
END FUNCTION C2Faren_Real
!========= DOUBLE VERSION =========
DOUBLE PRECISION FUNCTION C2Faren_DBL(c)
IMPLICIT NONE
DOUBLE PRECISION , INTENT(IN ) :: c
C2Faren_DBL = ( C*F2C_ScaleFact ) + F2C_Offset
RETURN
END FUNCTION C2Faren_DBL
!========= REAL VERSION (Faren to Centigrade) =========
REAL FUNCTION faren2C(Faren)
IMPLICIT NONE
REAL, INTENT(IN ) :: Faren
faren2C = (faren - F2C_Offset) / F2C_ScaleFact
RETURN
END FUNCTION faren2C
END MODULE MyTEMPS
Then your program uses the module via USE n the second line...
program Console1
USE MyTEMPS !<== Here
implicit none
real :: c, f
DOUBLE PRECISION :: Dc, Df ! No way to get Df to C or DC in the module (yet)!
print *, "enter a temperature in degrees celsius"
read *, c
write(*,10) "farenheit =", C2faren(c)
10 format(a,f10.6)
Dc = C
write(*,12) "farenheit =", C2faren(Dc)
12 format("DBL:",A,f10.6)
F = Dc
write(*,14) "Centigrade =", faren2C(F)
14 format("DBL:",A,f10.6)
end program Console1
So/and the main advantage of the module is when you end up wanting to use this stuff in a variety of programs and test and sort out the module once... Usually people put this sort of stuff (lots of modules) in a library, when the module(s) have lot of functions.
You could also put just the real, parameter :: ikind = selected_real_kind(p=15) into a module and use that in both the program and the function and you would be there. You were real close, and it mostly a matter of style and utility.
For Intel Fortran you can use REAL(KIND=4) and REAL(KIND=8)... Which I do, but that is not portable to gfortran, so it is probably a better habit to use the ISO_C_BINDING or just use REAL and DOUBLE PRECISION.

Modules are great but if you have a very simple code another way to work is to put the subroutines and functions in your main program. The trick is to put them after the word contains:
program xxx
stuff
contains
subroutine yyy
function zzz
end program xxx
In this way the functions can see into the contents of the main program so you don't have to re-declare your parameters and you are likely to get more meaningful error messages.
Since you are new I have a great resource I learned a lot from to share:
http://www.uv.es/dogarcar/man/IntrFortran90.pdf

Infinity in Fortran

What is the safest way to set a variable to +Infinity in Fortran? At the moment I am using:
program test
implicit none
print *,infinity()
contains
real function infinity()
implicit none
real :: x
x = huge(1.)
infinity = x + x
end function infinity
end program test
but I am wondering if there is a better way?

If your compiler supports ISO TR 15580 IEEE Arithmetic which is a part of so-called Fortran 2003 standard than you can use procedures from ieee_* modules.
PROGRAM main
USE ieee_arithmetic
IMPLICIT NONE
REAL :: r
IF (ieee_support_inf(r)) THEN
r = ieee_value(r, ieee_negative_inf)
END IF
PRINT *, r
END PROGRAM main

I would not rely on the compiler to support the IEEE standard and do pretty much what you did, with two changes:
I would not add huge(1.)+huge(1.), since on some compilers you may end up with -huge(1.)+1 --- and this may cause a memory leak (don't know the reason, but it is an experimental fact, so to say).
You are using real types here. I personally prefer to keep all my floating-point numbers as real*8, hence all float constants are qualified with d0, like this: huge(1.d0). This is not a rule, of course; some people prefer using both real-s and real*8-s.

I'm not sure if the solution bellow works on all compilers, but it's a nice mathematical way of reaching infinity as -log(0).
program test
implicit none
print *,infinity()
contains
real function infinity()
implicit none
real :: x
x = 0
infinity=-log(x)
end function infinity
end program test
Also works nicely for complex variables.

I don't know about safest, but I can offer you an alternative method. I learned to do it this way:
PROGRAM infinity
IMPLICIT NONE
INTEGER :: inf
REAL :: infi
EQUIVALENCE (inf,infi) !Stores two variable at the same address
DATA inf/z'7f800000'/ !Hex for +Infinity
WRITE(*,*)infi
END PROGRAM infinity
If you are using exceptional values in expressions (I don't think this is generally advisable) you should pay careful attention to how your compiler handles them, you might get some unexpected results otherwise.

This seems to work for me.
Define a parameter
double precision,parameter :: inf = 1.d0/0.d0
Then use it in if tests.
real :: sng
double precision :: dbl1,dbl2
sng = 1.0/0.0
dbl1 = 1.d0/0.d0
dbl2 = -log(0.d0)
if(sng == inf) write(*,*)"sng = inf"
if(dbl1 == inf) write(*,*)"dbl1 = inf"
if(dbl2 == inf) write(*,*)"dbl2 = inf"
read(*,*)
When compiled with ifort & run, I get
sng = inf
dbl1 = inf
dbl2 = inf

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Floating point math accuracy, c++ vs fortran [duplicate] - c++

Related

Double precision in Fortran for trignometric functions [duplicate]

Maximum value of 64 bit floating point number for overflow detection

Apparent mixed-mode arithmetic from a Fortran intrinsic function

Function type does not match the function definition

Infinity in Fortran

Categories

Resources