Precision problems of real numbers in Fortran [duplicate] - fortran

This question already has answers here:
Why does floating-point arithmetic not give exact results when adding decimal fractions?
(31 answers)
Closed 6 years ago.
I've been trying to use Fortran for my research project, with the GNU Fortran compiler (gfortran), latest version,
but I've been encountering some problems in the way it processes real numbers. If you have for example the code:
program test
implicit none
real :: y = 23.234, z
z = y * 100000
write(*,*) y, z
end program
You'll get as output:
23.23999 2323400.0
I find this really strange.
Can someone tell me what's exactly happening here? Looking at z I can see that y does retain its precision, so for calculations that shouldn't be a problem I suppose. But why is the output of y not exactly the same as the value that I've specified, and what can I do to make it exactly the same?

This is not a problem - all you see is floating-point representation of the number in the computer. The computer cannot handle real numbers exactly, but only approximations of them. A good read about this can be found here: What Every Computer Scientist Should Know About Floating-Point Arithmetic.

Simply by replacing real with double precision, you can increase the number of significant decimal places from about six to about 15 on most platforms.

The general issue is not limited to Fortran, but the representation of base 10 real numbers in another base of finite precision. This computer science question is asked many times here.
For the specifically Fortran aspects, the declaration "real" will likely give you a single precision floating point. As will expressing a constant as "23.234" without a type qualifier. The constant "100000" without a decimal point is an integer so the expression "y * 100000" is causing an implicit conversion of an integer to a real because "y" is a real variable.
For previous some previous discussions of these issues see Extended double precision , Fortran: integer*4 vs integer(4) vs integer(kind=4) and Is There a Better Double-Precision Assignment in Fortran 90?

The problem here is not with Fortran, in fact it is not a problem at all. This is just a feature of floating-point arithmetic. If you think about how you would represent 23.234 as a 'single float' in binary, you would see that the number has to be saved to only so many decimals of precision.
The thing to remember about float point number is: numbers that look round and even in base-10 probably won't in binary.
For a brief overview of floating-point topics, check the Wikipedia article. And for a VERY thorough explanation, check out the canonical paper by Goldberg (PDF).

Related

How to set Integer and Fractional Precision independently?

I'm learning Fortran(with the Fortran 2008 standard) and would like to set my integer part precision and decimal part precision for a real variable independently. How do i do this?
For example, let us say that i would like to declare a real variable that has integer part precision as 3 and fractional part precision as 8.
An example number in this above specification would be say 123.12345678 but 1234.1234567 would not satisfy the given requirement.
Fortran real numbers are FLOATING point numbers. Floating point numbers do not store the integer part and the decimal part. They store a significand and an exponent.
See how floating point numbers work http://en.wikipedia.org/wiki/Floating-point_arithmetic There is usually one floating point format which your CPU uses and you cannot simply choose a different one.
What you are asking for is more like the FIXED point arithmetic, but modern CPUs and Fortran do not support it natively. https://en.wikipedia.org/wiki/Fixed-point_arithmetic
You can use them in various libraries (even probably Fortran) or languages, but they are not native REAL. They are probably implemented in software, not directly in the CPU and are slower.
I ended up writing a function for this in order to use floating points with the .gt./.lt./.ge./.le./.eq. operators without actually modifying the floating points.
function PreciseInt(arg1, arg2) Result(PreciseInt)
real*8 arg1 !Input variable to be converted
integer*4 arg2 !Input # for desired precision to the right of the decimal
integer*4 PreciseInt !Integer representing the real value with desired precision
PreciseInt = idnint(arg1 * real(10**arg2))
end function

How to handle number with large exponent in c/c++? [duplicate]

This question already has answers here:
Is there a C++ equivalent to Java's BigDecimal?
(9 answers)
Closed 7 years ago.
I found myself with the need to compute the exponential of a large number, e. g.exp(709). Such a number would be represented, in floating point precision, as 8.2184074615549724e+307.
It seems that numbers with exponents larger than that would be simply translated into Inf, which creates problems in my code. I can only guess that things can be fixed using more bits to represent the exponent, but I am not aware of a pragmatical way to proceed.
Here is a code snippet:
double expon = exp(500); /*here I also tried `long double`, with no effect */
printf("%e\n", expon ); /*gives INF*/
double Wa = LambertW<0>( expon); /*gives error, as it can't handle inf*/
Is there a way to compute this?
This problem has been debated in general, but I did not find an useful answer. Also, it seems that GCC supports multiple-precision floating-point arithmetics since version 4.3. How does it help?
Edit: The suggested possible-duplicate questions turned out irrelevant because as I need huge decimals, not exact decimals. This is not a duplicate.
You should be able to perform your computation with adequate precision using long double arithmetic:
The maximum value for 80 bit long double is 1.18×10^4932, much larger than e^709.
In order for the computation to be performed as long double, your must use expl instead if exp:
long double expon = expl(500);
printf("%Le\n", expon);
The LambertW function will handle the long double if it is properly overloaded for this type, otherwise expon will be converted to double and produce inf and the computation will fail as you mentioned.
I don't know which implementation of Lambert W function you use, Darko Veberic's does not support long double arguments, but it might be feasible to extend the implementation to type long double as it is available in source form: https://github.com/DarkoVeberic/LambertW . You might want to contact him directly.
Another approach is to consider that exp(709) is just too close to the maximum precision of the double type, 10^308. If you can alter your computation using just smaller exponents and a different formula, the computation might be done with regular double types.

gfortran REAL not accurate to 8 decimal places [duplicate]

This question already exists:
gfortran represents REAL incorrectly [duplicate]
Closed 8 years ago.
This question has not been previously answered. I am trying to represent a real or any number for that matter in Fortran correctly. What gfortran is doing for me is way off. For example when I declare the variable REAL pi=3.14159 fortran prints pi = 3.14159012 rather than say 3.14159000. See below:
PROGRAM Test
IMPLICIT NONE
REAL:: pi = 3.14159
PRINT *, "PI = ",pi
END PROGRAM Test
This prints:
PI = 3.14159012
I might have expected something like PI = 3.14159000 as a REAL is supposed to be accurate to at least 8 decimal places.
I'm in a good mood, so I'll try to answer this question, which is basic knowledge which can be easily googled (as already pointed out in the comments to this and your former question).
Luckily, Fortran provides some really interesting intrinsics to get some understanding of floating point numbers.
The 8 digits, you are talking about, are a rule of thumb and can be related to the function EPSILON(x), which prints the smallest deviation from 1, which can be represented within the chosen model (e.g. REAL4). This value is actually 1.19e-7 which means, that your 8th digit is most likely wrong. I write most likely, because some numbers can be represented exactly.
In the case of PI, the smallest representable deviation can be printed using the intrinsic SPACING(PI). This shows a value of 2.38e-7, which is slightly larger than the epsilon and still allows for 7 correct digits.
Now, why does your value of PI get stored as 3.14159012? When you store a floating point number, you always store the nearest representable number.
Using the value of spacing, we can get the possible values for your pi. Possible numbers and their differences to your value of 3.14159 are:
3.14158988 1.20E-007
3.14159012 -1.18E-007
3.14159036 -3.56E-007
As you can see, 3.14159012 is the nearest possible value to 3.14159 and is thus stored and printed.
It is common for the last two digit to be erroneous. It is called floating point error.
Check this:
Week 1 - Lecture 2: Binary storage and version control / Fixed and floating point real numbers (9-08).mp4
#
https://class.coursera.org/scicomp-002/lecture

Dividing two floats doesn't give exact result [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 9 years ago.
I had divided 9501/100.0f expecting to get result of 95.01f, but for some deviant reason the result was 95.01000000002f.
I am aware of rounding errors and also that dividing two bigger floats can give improper result, but these two numbers are relative small, and they should not give bad answer.
I have changed floats to doubles, only to see the same result.
So my answer is, why am I seeing this false output?
And eventually workaround without copying number to string and back.
Floating point numbers are not precise, and dealing with them has lots of idiosyncrasies.
What Every Computer Scientist Should Know About Floating-Point Arithmetic
I also enjoy Bruce Dawson's blog entries on floating point values.
Floating point numbers are numbers represented in binary with limited precision.
The error between expected result and actual result is caused by the fact, that the number 95.01 is infinitely periodical in binary representation.
Double has only 51 binary digits, thus there has to be some rounding before the number is stored in the double precision. Single precision has only 23 digits.
It is not possible to represent 95.01 in finite precision floatin point number without any error.
However, you may trust the first 6-9 decimal digits, thus you should format the number with some meaningfull format.
Ahh good, another one of us has become a man in the church of programming :)
Floating points are not exact, the precision will vary from machine to machine. 1.0f != 1.00000000000000000000000000000000000 and so on, it's more like 1.0000001002003400011 and so on (I just picked arbitrary numbers here).

Value read from file is stored as a different value in Fortran

I have an input file and the first line contains the following decimal.
0.5053102074297753
I have a Fortran 90 program which reads the file and outputs the value.
read(*,*) answer
write(*,"(F20.16)") answer
This is the output:
0.5053101778030396
Apparently, what is stored is not the same as what is read. The question is, Why?
How is answer declared? If it is a single precision real you can only expect about 6 decimal digits of precision.
Also, values are converted to binary for internal storage and computations. This can cause rounding and other issues, but the difference here is too large for this to be the cause.
To declare answer as double precision, use the following:
integer, parameter :: DRK = selected_real_kind (14)
real (kind=DRK) :: answer
This will guarantee that answer has at least 14 decimal digits. "DRK" can be used throughout your program. Depending on your compiler, you can try asking for even more digits ... it may provide a such a type. Rarely is more than double precision necessary.
What Every Computer Scientist Should Know About Floating-Point Arithmetic.
The default real precision is not enough to store the number with 16 decimal places in the fractional part.