Fortran: inexact value read from a text file [duplicate] - fortran

In the code below I am adding together 865398.78 and -865398.78. I expect to get 0, but instead I get -0.03.
Source Code:
program main
real(8) :: x
open(10,file="test.txt")
read(10,*)x
print *,"x=",x
x=x+865398.78
print *,"x+865398.78=",x
end program
Result:
x= -865398.780000000
x+865398.78= -3.000000002793968E-002
Am I wrong with the usage of "read" codes or something else?

The number 865398.78 is represented in single precision in your code. Single precision can handle about 7 significant digits, while your number has 8. You can make it double precision by writing
x=x+865398.78_8

I will make one big assumption in this answer: that real(8) corresponds to double precision.
You are probably assuming that your 865398.78 means the same thing wherever it occurs. In source code that is true: it is a default real literal constant which approximates 865398.78.
When you have
x=x+865398.78
for x double precision, then the default real constant is converted to a double precision value.
However, in the read statement
read(10,*)x
given input "-865398.78" then x takes a double precision approximation to that value.
Your non-zero answer comes from the fact that a default real/single precision approximation converted to a double precision value is not in general, and isn't in this case, the same thing as an initial double precision approximation.
This last fact is explained in more detail in other questions. As is the solution to use x=x+865398.78_8 (or better, don't use 8 as the kind value).

Related

Why do I keep getting 0 as output? [duplicate]

This question already has an answer here:
Why are the elements of an array formatted as zeros when they are multiplied by 1/2 or 1/3?
(1 answer)
Closed 5 years ago.
Why does this fortran program produce only zeros? When I print it out i get -0.00000 everywhere! What have I done wrong? In matlab it runs perfectly. I dont see any reason why its not working to be honest!
It seems like its the fraction that messes it up. if I set x equal to some decimal number it works.
program main
implicit none
integer iMax, jMax
double precision, dimension(:,:), allocatable :: T
double precision x, dx,f,L2old,L2norm,y
integer i, j,n,bc
n=10
allocate(T(1:n+2, 1:n+2))
T=0.0d0
do i=2,n+1
do j=2,n+1
x=(j+1)*1/24
y=(i+1)*1/24
T(i,j)= -18*(x**2+y**2)**2
Write(*,*)'T(',i,'',j,'', T(i,j)
end do
end do
Write(*,*)'T(1,1)',T(1,1)
end program main
x=(j+1)*1/24
1/24 is an integer division that rounds down to 0. You should be able to force floating point division by making at least one of the operands floating point,
e.g.
x=(j+1)*1.0/24.0
As was indicated by Jim Lewis, the answer to the OP's question was indeed the integer division used.
Nonehteless, I think it is important to point out that one should take care of how the floating point fraction is written down. As the OP's program shows, x was of type DOUBLE PRECISION. Then the correct result should be
x=(j+1)*1.0D0/24.0D0
The difference here is that now you ensure that the division happens with the same precision as x was declared.
To following program demonstrates the problem ::
program test
WRITE(*,'(A43)') "0.0416666666666666666666666666666666..."
WRITE(*,'(F40.34)') 1/24
WRITE(*,'(F40.34)') 1.0/24.0
WRITE(*,'(F40.34)') 1.0D0/24.0
WRITE(*,'(F40.34)') 1.0D0/24.0D0
end program test
which as the output
0.0416666666666666666666666666666666...
0.0000000000000000000000000000000000
0.0416666679084300994873046875000000
0.0416666666666666643537020320309239
0.0416666666666666643537020320309239
You clearly see the differences. The first line is the mathematical correct result. The second line is the integer division leading to zero. The third line, shows the output in case the division is computed as REAL while the fourth and fifth line are in DOUBLE PRECISION. Please take into account that in my case REAL implies a 32bit floating point number and DOUBLE PRECISION a 64 bit version. The precision and representation of both REAL and DOUBLE PRECISION is compiler dependent and not defined in the Standard. It only requires that DOUBLE PRECISION has a higher precision than REAL.
4.4.2.3 Real type
1 The real type has values that approximate the mathematical real numbers. The processor shall provide two or more approximation methods that define sets of values for data of type real. Each such method has a representation method and is characterized by a value for the kind type parameter KIND. The kind type parameter of an approximation method is returned by the intrinsic function KIND (13.7.89).
5 If the type keyword REAL is used without a kind type parameter, the
real type with default real kind is specified and the kind value is
KIND (0.0). The type specifier DOUBLE PRECISION specifies type real
with double precision kind; the kind value is KIND (0.0D0). The
decimal precision of the double precision real approximation method
shall be greater than that of the default real method.
This actually implies that, if you want to ensure that your computations are done using 32bit, 64bit or 128bit floating point representations, you are advised to use the correct KIND values as defined in the intrinsic module ISO_FORTRAN_ENV.
13.8.2.21 REAL32, REAL64, and REAL128
1 The values of these default integer scalar named constants shall be
those of the kind type parameters that specify a REAL type whose
storage size expressed in bits is 32, 64, and 128 respectively. If,
for any of these constants, the processor supports more than one kind
of that size, it is processor dependent which kind value is provided.
If the processor supports no kind of a particular size, that constant
shall be equal to −2 if the processor supports kinds of a larger size
and −1 otherwise.
So this would lead to the following code
PROGRAM main
USE iso_fortran_env, ONLY : DP => REAL64
IMPLICIT NONE
...
REAL(DP) :: x
...
x = (j+1)*1.0_DP/24.0_DP
...
END PROGRAM main

Is Fortran unable to do the addition between 865398.78 and -865398.78? Why the answer is -0.03?

In the code below I am adding together 865398.78 and -865398.78. I expect to get 0, but instead I get -0.03.
Source Code:
program main
real(8) :: x
open(10,file="test.txt")
read(10,*)x
print *,"x=",x
x=x+865398.78
print *,"x+865398.78=",x
end program
Result:
x= -865398.780000000
x+865398.78= -3.000000002793968E-002
Am I wrong with the usage of "read" codes or something else?
The number 865398.78 is represented in single precision in your code. Single precision can handle about 7 significant digits, while your number has 8. You can make it double precision by writing
x=x+865398.78_8
I will make one big assumption in this answer: that real(8) corresponds to double precision.
You are probably assuming that your 865398.78 means the same thing wherever it occurs. In source code that is true: it is a default real literal constant which approximates 865398.78.
When you have
x=x+865398.78
for x double precision, then the default real constant is converted to a double precision value.
However, in the read statement
read(10,*)x
given input "-865398.78" then x takes a double precision approximation to that value.
Your non-zero answer comes from the fact that a default real/single precision approximation converted to a double precision value is not in general, and isn't in this case, the same thing as an initial double precision approximation.
This last fact is explained in more detail in other questions. As is the solution to use x=x+865398.78_8 (or better, don't use 8 as the kind value).

Retrieve the string “0.1” from after assignment to a double variable

I have a naive question about the high-precision number conversion (in C++ here).
Suppose the user assigns 0.1 to the double variable x_d with this statement,
x_d = 0.1
It is known that x_d thus obtained is no more exactly 0.1, due to the inevitable machine rounding.
I wonder whether we still have a way to get back the original highly precise string “0.1”, from the double variable x_d? Clearly, it is useless to use std::to_string (x_d) here. Even a high precision library like boost::multiprecision or MPFR seem to be helpless. For example, std::to_string(boost:cpp_dec_float_10000(x_d) ) cannot recover back the lost precision.
So my question is, can we retrieve back the string “0.1” from a double x_d that is assigned using the statement x_d = 0.1?
Let's assume that during the assignment, the decimal number 0.1 is rounded to a double value X that does not equal the decimal number 0.1. Now, assume some other computation results in the valueX, but without it being rounded. In order to distinguish those two, you would have to store the origin somewhere. For that, there is simply no place in a double (assuming common implementations), so the answer to your question is "no".

How come some people append f to the end of variables?

In the tutorial I'm reading for OGRE3d here the programmer is constantly adding f at the end of any variable he initializes, like 200.00f or 0.00f so I decided to erase f and see if it compiles and it compiles just fine, what is the point of adding f at the end of the variable?
EDIT: So you're saying if I initialize a variable with 200.03 it won't initialize it as a floating point but if I were to do so with 200.03f it would? If not where does the f become useful then?
It's a way to specify that number has to be interpreted as a "float", not a "double" (which is the standard for C++ decimal numbers and uses up twice the memory).
This discussion could be of help:
http://www.cplusplus.com/forum/beginner/24483/
Quoted from http://msdn.microsoft.com/en-us/library/w9bk1wcy.aspx
A floating-point constant without an f, F, l, or L suffix has type
double. If the letter f or F is the suffix, the constant has type
float. If suffixed by the letter l or L, it has type long double. For
example:
200.00f is not a variable. It can't vary.
It's a compile-time constant, with float representation. The f signifies that it's a float.
By comparison, 200.00 would be interpreted as a double.
The C standard states that constant floats are doubles which promotes the operation to a double.
float a,b,c;
...
a = b+7.1; this is a double precision operation
...
a = b+7.1f; this is a single precision operation
...
c = 7.1; //double
a = b + c; //single all the way
The double precision requires more storage for the constant, plus a conversion from single to double for the variable operand, then a conversion from double to single to assign the result. With all the conversions going on if you are not in tune with how floating point works, rounding and such you might not get the result you were thinking you were going to get. The compiler may at some point in the path optimize some of this behavior out, making it either harder to understand the real problems and the fpu in the hardware might accept mixed mode operands, also hiding what is really going on.
It is not just a speed problem but also accuracy. There was a recent SO question, pretty much the same problem, why does this comparison work with one number and not another. Take the fraction 5/11ths for example 0.454545.... Lets say, hypothetically, you had base 10 fpu with single precision of 3 significant digits and a double of 6 significant digits.
float a = 0.45454545454;
...
if(a>0.4545454545) b=1;
...
well in our hypothetical system we can only store three digits into a, so a = .455 because we are using by default a round up rounding mode. but our comparision will be considered double because we didnt put the f at the end of the number. the double version is 0.454545. a is converted to a double which results in 0.455000, so:
if(0.455000>0.454545) b = 1;
0.455 is greater than 0.454545 so b would be a 1.
float a = 0.45454545454;
...
if(a>0.4545454545f) b=1;
...
so now the comparison is single precision so we are comparing 0.455 to 0.455 which is not greater, so b=1 does not happen.
When you write floating point constants that is base 10 decimal, the floating point numbers in the computer are base 2 and they dont always convert smoothly just like 5/11 would work just fine in base 11 but in base 10 you get an infinite repeating digit. 0.1 in decimal for example creates a repeating pattern in binary. Depending on where the mantissa cuts off the rounding can make that lsbit of the mantissa round up or not (also depends on the rounding mode you are using if the floating point format you are using even has rounding). Which of itself creates problems depending on how you use the variable as the comparison above shows.
For non-floating point the compiler usually saves you, but sometimes doesnt:
unsigned long long a;
...
a = ~3;
a = ~(3ULL);
...
Depending on the compiler and computer the two assignments can give you different results one MIGHT give you 0x00000000FFFFFFFC another MIGHT give 0xFFFFFFFFFFFFFFFC.
If you want something specific you should be quite clear when you tell the compiler what you want otherwise the compiler takes a guess and doesnt always make the guess that you wanted.
It means that the value is to be interpreted as a single-precision floating point variable (type float). Without the f-suffix, it is interpreted as a double-precision floationg point variable (type double).
This is usually done to shut up compiler warnings about possible loss of precision by assigning a double value to a float variable. When you didn't receive such a warning you maybe have switched off warnings in your compiler settings (which is bad!).
But it can also have subtile syntactical meaning. As you know C++ allows functions which have the same name but differ by the types of their parameters. In that case the f suffix can determine which function is called.

Value read from file is stored as a different value in Fortran

I have an input file and the first line contains the following decimal.
0.5053102074297753
I have a Fortran 90 program which reads the file and outputs the value.
read(*,*) answer
write(*,"(F20.16)") answer
This is the output:
0.5053101778030396
Apparently, what is stored is not the same as what is read. The question is, Why?
How is answer declared? If it is a single precision real you can only expect about 6 decimal digits of precision.
Also, values are converted to binary for internal storage and computations. This can cause rounding and other issues, but the difference here is too large for this to be the cause.
To declare answer as double precision, use the following:
integer, parameter :: DRK = selected_real_kind (14)
real (kind=DRK) :: answer
This will guarantee that answer has at least 14 decimal digits. "DRK" can be used throughout your program. Depending on your compiler, you can try asking for even more digits ... it may provide a such a type. Rarely is more than double precision necessary.
What Every Computer Scientist Should Know About Floating-Point Arithmetic.
The default real precision is not enough to store the number with 16 decimal places in the fractional part.