Maximum value of 64 bit floating point number for overflow detection

Maximum value of 64 bit floating point number for overflow detection - fortran

I have a seemingly simple problem: I want to detect whether a floating point addition in Fortran will overflow by doing something like the following:
real*8 :: a, b, c
a = ! some value
b = ! some value
if (b > DOUBLE_MAX - a) then
! handle overflow
else
c = a + b
The problem is that I don't know what DOUBLE_MAX should be. I'm aware of how floating point numbers are represented according to IEEE 754 but the largest value representable by a double precision floating point number seems to be too large for a variable of type real*8 (i.e. if I try to assign 1.7976931348623157e+308 to such a variable gfortran complains). C and C++ have predefined constants/generic functions for this purpose but I couldn't find a Fortran equivalent.
Note: I'm aware that real*8 is not really part of the standard but there seems to be no other way to reliably specify that a floating point number should use the double precision format.

Something like this?
real(REAL64) function func( a, b )
use, intrinsic :: iso_fortran_env, only: REAL64, INT64
use, intrinsic :: ieee_arithmetic, only: ieee_value, ieee_set_flag, IEEE_OVERFLOW, IEEE_QUIET_NAN
implicit none
real(REAL64), intent(in) :: a, b
real(REAL64), parameter :: MAX64 = huge(0.0_REAL64)
if ( b > MAX64-a ) then
! Set IEEE_OVERFLOW flag and return NaN
call ieee_set_flag(IEEE_OVERFLOW,.true.)
func = ieee_value(func,IEEE_QUIET_NAN)
else
func = a + b
end if
return
end function func
All I could find for intrinsic ieee_exceptions module is:
https://github.com/gcc-mirror/gcc/blob/master/libgfortran/ieee/ieee_exceptions.F90
For setting NaN value see post.

There are likely better ways to detect overflow, but the precise answer to your question is to use the huge function. HUGE(a) returns the largest possible number representable by the type a.

Related

Apparent mixed-mode arithmetic from a Fortran intrinsic function

What I'm doing is very straightforward. Here are the relevant declarations:
USE, INTRINSIC :: ISO_Fortran_env, dp=>REAL64 !modern DOUBLE PRECISION
REAL(dp), PARAMETER :: G_H2_alpha = 1.57D+04, G_H2_beta = 5.3D+03, G_H2_gamma = 4.5D+03
REAL(dp) :: E_total_alpha, E_total_beta, E_total_gamma, P_H2_sed
Usage:
P_H2_sed = G_H2_alpha * E_total_alpha + G_H2_beta * E_total_beta * G_H2_gamma * E_total_gamma
where E_total_alpha, E_total_beta, and E_total_gamma are just running dp totals inside various loops. I ask for the nearest integer NINT(P_H2_sed) and get -2147483648, which looks like mixed-mode arithmetic. The float P_H2_sed returns 2529548272025.4888, so I would expect NINT to return 2529548272026. I didn't think it was possible to get this kind of result from an intrinsic function. I haven't seen this since my days with the old F77 compiler. I'm doing something bad, but what is the question.

NINT, by default, returns an integer with default type parameter, that usually is equivalent to int32.
An integer of this kind cannot represent a number as high as 2529548272026. The maximum representable number is 2^31-1, that is 2147483647. The result you are getting is similar to that, but is the lowest representable number, -2147483648 (equivalent o all 32 bits set to 1).
To get a result of other kind from NINT, pass an optional parameter named kind, like this: NINT(P_H2_sed, kind=int64).

How to cast a real as a specific KIND?

I have multiple kinds I am using in Fortran and would like to add a real valued number where the real number is cast as that kind.
For example, something like:
program illsum
implicit none
#if defined(USE_SINGLE)
integer, parameter :: rkind = selected_real_kind(6,37)
#elif defined(USE_DOUBLE)
integer, parameter :: rkind = selected_real_kind(15,307)
#elif defined(USE_QUAD)
integer, parameter :: rkind = selected_real_kind(33, 4931)
#endif
integer :: Nmax = 100
integer :: i
real(kind = rkind) :: mysum = 0.0
do i = 1,Nmax
mysum = mysum + kind(rkind, 1.0)/kind(rkind, i)
enddo
end program illsum
So I want to make sure that 1.0 and the real valued expression of i are expressed as the proper kind that I have chosen before performing the division and addition.
How can I cast 1.0 as rkind?

To convert a numeric value to a real value then there is the real intrinsic function. Further, this takes a second argument which determines the kind value of the result. So, for your named constant rkind
real(i, rkind) ! Returns a real valued i of kind rkind
real(1.0, rkind) ! Returns a real valued 1 of kind rkind
which I think is what you are meaning with kind(rkind, 1.0). kind itself, however, is an intrinsic which returns the kind value of a numeric object.
However, there are other things to note.
First, the literal constant 1._rkind (note the . in there, could be clearer with 1.0_rkind) which is of kind rkind and value approximating 1.
There's no comparable expression i_rkind, though, so the conversion above would be necessary for a real result of kind rkind with value approximating i.
That said, for you example there is no need to do such casting of the integer value. Under the rules of Fortran the expression 1._rkind/i involves that implicit conversion of i and is equivalent to 1._rkind/real(i,rkind) (and real(1.0, rkind)/real(i,rkind)).

Hexadecimal floating point in fortran

Is there an equivalent for the 'a' format specifier known from C in Fortran?
C Example:
printf("%a\n",43.1e6); // 0x1.48d3bp+25
Exporting floating point numbers in hexadecimal format prevents rounding errors. While the rounding errors are usually negligible, it is still advantageous to be able to restore a saved value exactly. Note, that the hexadecimal representation produced by printf is portable and human readable.
How can I export and parse floating point numbers in Fortran like I do in C using the 'a' specifier?

If you want to have full precision, the best way is to use unformatted files, such as this:
program main
real :: r
integer :: i
r = -4*atan(1.)
open(20,access="stream")
write (20) r
close(20)
end program main
(I used stream access, which is new to Fortran 2003, because
it is usually less confusing than normal unformatted access). You can then use, for example, od -t x1 fort.20 to look at this as a hex dump.
You can also use TRANSFER to copy the bit pattern to an integer and then use the Z edit descriptor.
If you really want to mimic the %a specifier, you'll have to roll your own. Most machines now use IEEE format. Use TRANSFER for copying the pattern to an integer, then pick that apart using IAND (and multiplications or divisions by powers of two for shifting).

Another option would be to let the C library do your work for you and interface via C binding. This rather depends on a modern compiler (some F2003 features used).
module x
use, intrinsic :: iso_c_binding
private
public :: a_fmt
interface
subroutine doit(a, dest, n) bind(C)
import
real(kind=c_double), value :: a
character(kind=c_char), intent(out) :: dest(*)
integer, value :: n
end subroutine doit
end interface
interface a_fmt
module procedure a_fmt_float, a_fmt_double
end interface a_fmt
contains
function a_fmt_float(a) result(res)
real(kind=c_float), intent(in) :: a
character(len=:), allocatable :: res
res = a_fmt_double (real(a, kind=c_double))
end function a_fmt_float
function a_fmt_double(a) result(res)
real(kind=c_double), intent(in) :: a
character(len=:), allocatable :: res
character(len=30) :: dest
integer :: n
call doit (a, dest, len(dest))
n = index(dest, achar(0))
res = dest(1:n)
end function a_fmt_double
end module x
program main
use x
implicit none
double precision :: r
integer :: i
r = -1./3.d0
do i=1,1030
print *,a_fmt(r)
r = - r * 2.0
end do
end program main
#include <stdio.h>
void doit (double a, char *dest, int n)
{
snprintf(dest, n-1, "%a", a);
}

Infinity in Fortran

What is the safest way to set a variable to +Infinity in Fortran? At the moment I am using:
program test
implicit none
print *,infinity()
contains
real function infinity()
implicit none
real :: x
x = huge(1.)
infinity = x + x
end function infinity
end program test
but I am wondering if there is a better way?

If your compiler supports ISO TR 15580 IEEE Arithmetic which is a part of so-called Fortran 2003 standard than you can use procedures from ieee_* modules.
PROGRAM main
USE ieee_arithmetic
IMPLICIT NONE
REAL :: r
IF (ieee_support_inf(r)) THEN
r = ieee_value(r, ieee_negative_inf)
END IF
PRINT *, r
END PROGRAM main

I would not rely on the compiler to support the IEEE standard and do pretty much what you did, with two changes:
I would not add huge(1.)+huge(1.), since on some compilers you may end up with -huge(1.)+1 --- and this may cause a memory leak (don't know the reason, but it is an experimental fact, so to say).
You are using real types here. I personally prefer to keep all my floating-point numbers as real*8, hence all float constants are qualified with d0, like this: huge(1.d0). This is not a rule, of course; some people prefer using both real-s and real*8-s.

I'm not sure if the solution bellow works on all compilers, but it's a nice mathematical way of reaching infinity as -log(0).
program test
implicit none
print *,infinity()
contains
real function infinity()
implicit none
real :: x
x = 0
infinity=-log(x)
end function infinity
end program test
Also works nicely for complex variables.

I don't know about safest, but I can offer you an alternative method. I learned to do it this way:
PROGRAM infinity
IMPLICIT NONE
INTEGER :: inf
REAL :: infi
EQUIVALENCE (inf,infi) !Stores two variable at the same address
DATA inf/z'7f800000'/ !Hex for +Infinity
WRITE(*,*)infi
END PROGRAM infinity
If you are using exceptional values in expressions (I don't think this is generally advisable) you should pay careful attention to how your compiler handles them, you might get some unexpected results otherwise.

This seems to work for me.
Define a parameter
double precision,parameter :: inf = 1.d0/0.d0
Then use it in if tests.
real :: sng
double precision :: dbl1,dbl2
sng = 1.0/0.0
dbl1 = 1.d0/0.d0
dbl2 = -log(0.d0)
if(sng == inf) write(*,*)"sng = inf"
if(dbl1 == inf) write(*,*)"dbl1 = inf"
if(dbl2 == inf) write(*,*)"dbl2 = inf"
read(*,*)
When compiled with ifort & run, I get
sng = inf
dbl1 = inf
dbl2 = inf

Result of GAMMA underflows its kind

I would like to calculate gamma(-170.1) using the program below:
program arithmetic
! program to do a calculation
real(8) :: x
x = GAMMA(-170.1)
print *, x
end program
but I get the error:
test.f95:4.10:
x = GAMMA(-170.1)
1
Error: Result of GAMMA underflows its kind at (1)
when I compile with gfortran. According to Maple gamma(-170.1) = 5.191963205*10^(-172) which I think should be within the range of the exponent of the variable x as I've defined it.

The below modification of your program should work. Remember that in Fortran the RHS is evaluated before assigning to the LHS, and that floating point literals are of default kind, that is single precision. Thus, making the argument to GAMMA double precision the compiler chooses the double precision GAMMA.
program arithmetic
! program to do a calculation
integer, parameter :: dp = kind(1.0d0)
real(dp) :: x
x = GAMMA(-170.1_dp)
print *, x
end program

-170.0 may be treated as a float. If so, changing it to a double should resolve the issue.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Maximum value of 64 bit floating point number for overflow detection - fortran

There are likely better ways to detect overflow, but the precise answer to your question is to use the huge function. HUGE(a) returns the largest possible number representable by the type a.

Related

Apparent mixed-mode arithmetic from a Fortran intrinsic function

How to cast a real as a specific KIND?

Hexadecimal floating point in fortran

Infinity in Fortran

Result of GAMMA underflows its kind

Categories

Resources