I am using gfortran v.9.3.0 and openmpi v.4.1.0.
When I compile my application in debug mode with specifically this flag that checks for floating exception.
-ffpe-trap=invalid,zero,overflow,underflow,
I get floating point exception when invoking the routine:
use mpi
integer::n
double precision ::phi(n)
call MPI_AllReduce(MPI_IN_PLACE,phi,n,MPI_DOUBLE_PRECISION,MPI_SUM,MPI_COMM_WORLD,ierr)
I check the minimum and maximum values of phi and both of these figures are within reasonable range.
The only thing I see is different ranks might have reverse numbers so the sum will be zero or close to zero. But would that really trigger a floating exception:
The error:
[server-0:3718 :0:3718] Caught signal 8 (Floating point exception: floating-point underflow)
[server-0:3720 :0:3720] Caught signal 8 (Floating point exception: floating-point underflow)
[-server-0:3721 :0:3721] Caught signal 8 (Floating point exception: floating-point underflow)
==== backtrace (tid: 3718) ====
0 0x0000000000055365 ucs_debug_print_backtrace() /software/contrib/../src/ucs/debug/debug.c:656
1 0x00000000000c7fa5 ompi_op_base_2buff_sum_fortran_double_precision() op_base_functions.c:0
2 0x00000000000a9ada ompi_coll_base_allreduce_intra_recursivedoubling() ???:0
3 0x00000000000056c5 ompi_coll_tuned_allreduce_intra_dec_fixed() ???:0
4 0x0000000000064fb9 MPI_Allreduce() ???:0
5 0x0000000000046739 ompi_allreduce_f() ???:0
6 0x000000000061f881 __parallel_MOD_global_sum_dble_vector() mpifuncs.f90:490
7 0x0000000000f168ee sol_() tt1.f90:244
8 0x000000000102a033 MAIN__() tt.f90:66
9 0x000000000102a449 main() tt.f90:24
10 0x0000000000022555 __libc_start_main() ???:0
11 0x0000000000409c63 _start() ???:0
Related
My outputs are all NaN, and the standard error is "IEEE_INVALID_FLAG". I debug the code in gdb and find that the line that IEEE_INVALID_FLAG first happens is line 281:
Program received signal SIGFPE, Arithmetic exception.
0x000055555555c830 in calcu () at SIMPLE-2D.f:281
281 & +(1.-URFU)*U(I,J)
The code for line 281 is an expression for
enter image description here, and the complete code is:
U(I,J)=URFU/APU(I,J)*
& (AEEU(I,J)*U(I+2,J)+AEU(I,J)*U(I+1,J)
& +AWWU(I,J)*U(I-2,J)+AWU(I,J)*U(I-1,J)
& +ANNU(I,J)*U(I,J+2)+ANU(I,J)*U(I,J+1)
& +ASSU(I,J)*U(I,J-2)+ASU(I,J)*U(I,J-1)
& +(P(I,J)-P(I+1,J))*DY)
& +(1.-URFU)*U(I,J)
I=1:79,J=1:80. AEEU,AEU,... are 79*80 matrix.
Could anyone can give me some idea about this error? Many thanks!
Most of the time, NaNs result from operations on infinities, e.g., Infinity * 2 = NaN. As suggested by the compiler output, you have both overflow and underflow. Overflow happens when the variable type cannot contain the number because the exponent is positive and too big (very large number), and underflow happens when the number is too small because the exponent is negative and too big (very small number). Try changing your code to use double precision real. In FORTRAN 77, this can be achieved using the DOUBLE PRECISION type:
DOUBLE PRECISION URFU
DOUBLE PRECISION U(:,:)
In modern Fortran, you can use something like this:
INTEGER, PARAMETER :: dp = KIND(1.D0)
REAL(KIND=dp) :: URFU, U(:,:)
This might be a really simple question for some, but I'm new to C++ and hope someone can answer this for me.
I'm using this online C++ compiler. Here's the simple code I'm running in it:
int main()
{
int x = 1- 2;
std::cout << x << std::endl;
return x;
}
The output is:
-1
...Program finished with exit code 255
Press ENTER to exit console.
That really ponders me. Why would the main() function return 255 when the value of x is -1?
Doesn't main() return an int (not an unsigned int), so it should be able to return a negative number, right?
How does -1 get converted to 255? Something to do with an 8-bit variable? But isn't the int type 16-bit?
This is not related to C language really. The operating system, or possibly just the C runtime (the small piece of the code which sets up things for your C program and actually calls your main function) limits exit code of the program to unsigned 8 bit number.
Very nearly all systems today use two's complement representation for negative numbers, and then bit pattern for -1 is having all bits of the number to be 1. Doesn't matter how many bits, they are all set when value is -1.
The simplest way to convert an int to 8 bit number is to just take 8 lowest bits (which are now all 1 as per above), so you end up with binary number:
11111111
If interpreted as unsigned, then in decimal value of this happens to be 255 (as signed 8 bits it is still -1), which you can check with any calculator which supports binary (such as Windows 10 Calculator app when you switch it to Programmer mode).
Looking at this from the opposite direction: When trying to understand funny numbers related to computers or programming, it is often useful to convert them to binary. If you convert 255 to binary, you get 11111111, and then if you know binary numbers, you should realize this is -1 if interpreted as signed 8 bit number.
According to The C++ Programming Language - 4th, section 6.2.5:
There are three floating-points types: float (single-precision), double (double-precision), and long double (extended-precision)
Refer to: http://en.wikipedia.org/wiki/Single-precision_floating-point_format
The true significand includes 23 fraction bits to the right of the binary point and an implicit leading bit (to the left of the binary point) with value 1 unless the exponent is stored with all zeros. Thus only 23 fraction bits of the significand appear in the memory format but the total precision is 24 bits (equivalent to log10(224) ≈ 7.225 decimal digits).
→ The maximum digits of floating point number is 7 digits on binary32 interchange format. (a computer number format that occupies 4 bytes (32 bits) in computer memory)
When I test on different compilers (like GCC, VC compiler)
→ It always outputs 6 as the value.
Take a look into float.h of each compiler
→ I found that 6 is fixed.
Question:
Do you know why there is a different here (between actual value theoretical value - 7 - and actual value - 6)?
It sounds like "7" is more reasonable because when I test using below code, the value is still valid, while "8" is invalid
Why don't the compilers check the interchange format for giving decision about the numbers of digits represented in floating-point (instead of using a fixed value)?
Code:
#include <iostream>
#include <limits>
using namespace std;
int main( )
{
cout << numeric_limits<float> :: digits10 << endl;
float f = -9999999;
cout.precision ( 10 );
cout << f << endl;
}
You're not reading the documentation.
std::numeric_limits<float>::digits10 is 6:
The value of std::numeric_limits<T>::digits10 is the number of base-10 digits that can be represented by the type T without change, that is, any number with this many decimal digits can be converted to a value of type T and back to decimal form, without change due to rounding or overflow. For base-radix types, it is the value of digits (digits-1 for floating-point types) multiplied by log10(radix) and rounded down.
The standard 32-bit IEEE 754 floating-point type has a 24 bit fractional part (23 bits written, one implied), which may suggest that it can represent 7 digit decimals (24 * std::log10(2) is 7.22), but relative rounding errors are non-uniform and some floating-point values with 7 decimal digits do not survive conversion to 32-bit float and back: the smallest positive example is 8.589973e9, which becomes 8.589974e9 after the roundtrip. These rounding errors cannot exceed one bit in the representation, and digits10 is calculated as (24-1)*std::log10(2), which is 6.92. Rounding down results in the value 6.
std::numeric_limits<float>::max_digits10 is 9:
The value of std::numeric_limits<T>::max_digits10 is the number of base-10 digits that are necessary to uniquely represent all distinct values of the type T, such as necessary for serialization/deserialization to text. This constant is meaningful for all floating-point types.
Unlike most mathematical operations, the conversion of a floating-point value to text and back is exact as long as at least max_digits10 were used (9 for float, 17 for double): it is guaranteed to produce the same floating-point value, even though the intermediate text representation is not exact. It may take over a hundred decimal digits to represent the precise value of a float in decimal notation.
std::numeric_limits<float>::digits10 equates to FLT_DIG, which is defined by the C standard :
number of decimal digits, q, such that any floating-point number with q decimal digits can be rounded into a floating-point number with p radix b digits and back again without change to the q decimal digits,
⎧ p log10 b if b is a power of 10
⎨
⎩ ⎣( p − 1) log10 b⎦ otherwise
FLT_DIG 6
DBL_DIG 10
LDBL_DIG 10
The reason for the value 6 (and not 7), is due to rounding errors - not all floating point values with 7 decimal digits can be losslessly represented by a 32-bit float. Rounding errors are limited to 1 bit though, so the FLT_DIG value was calculated based on 23 bits (instead of the full 24) :
23 * log10(2) = 6.92
which is rounded down to 6.
Let's,
float dt;
I read dt from a text file as
inputFile >> dt;
Then I have a for loop as,
for (float time=dt; time<=maxTime; time+=dt)
{
// some stuff
}
When dt=0.05 and I output std::cout << time << std::endl; I got,
0.05
0.10
...
7.00001
7.05001
...
So, why number of digits is increasing after a while?
Because not every number can be represented by IEEE754 floating point values. At some point, you'll get a number that isn't quite representable and the computer will have to choose the nearest one.
If you enter 0.05 into Harald Schmidt's excellent online converter and reference the Wikipedia entry on IEEE754-1985, you'll end up with the following bits (my explanation of that follows):
s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm
0 01111010 10011001100110011001101
|||||||| |||||||||||||||||||||||
128 -+||||||| ||||||||||||||||||||||+- 1 / 8388608
64 --+|||||| |||||||||||||||||||||+-- 1 / 4194304
32 ---+||||| ||||||||||||||||||||+--- 1 / 2097152
16 ----+|||| |||||||||||||||||||+---- 1 / 1048576
8 -----+||| ||||||||||||||||||+----- 1 / 524288
4 ------+|| |||||||||||||||||+------ 1 / 262144
2 -------+| ||||||||||||||||+------- 1 / 131072
1 --------+ |||||||||||||||+-------- 1 / 65536
||||||||||||||+--------- 1 / 32768
|||||||||||||+---------- 1 / 16384
||||||||||||+----------- 1 / 8192
|||||||||||+------------ 1 / 4096
||||||||||+------------- 1 / 2048
|||||||||+-------------- 1 / 1024
||||||||+--------------- 1 / 512
|||||||+---------------- 1 / 256
||||||+----------------- 1 / 128
|||||+------------------ 1 / 64
||||+------------------- 1 / 32
|||+-------------------- 1 / 16
||+--------------------- 1 / 8
|+---------------------- 1 / 4
+----------------------- 1 / 2
The sign, being 0, is positive. The exponent is indicated by the one-bits mapping to the numbers on the left: 64+32+16+8+2 = 122 - 127 bias = -5, so the multiplier is 2-5 or 1/32. The 127 bias is to allow representation of very small numbers (as in close to zero rather that negative numbers with a large magnitude).
The mantissa is a little more complicated. For each one-bit, you accumulate the numbers down the right hand side (after adding an implicit 1). Hence you can calculate the number as the sum of {1, 1/2, 1/16, 1/32, 1/256, 1/512, 1/4096, 1/8192, 1/65536, 1/131072, 1/1048576, 1/2097152, 1/8388608}.
When you add all these up, you get 1.60000002384185791015625.
When you multiply that by the multiplier 1/32 (calculated previously from the exponent bits), you get 0.0500000001, so you can see that 0.05 is already not represented exactly. This bit pattern for the mantissa is actually the same as 0.1 but, with that, the exponent is -4 rather than -5, and it's why 0.1 + 0.1 + 0.1 is rarely equal to 0.3 (this appears to be a favourite interview question).
When you start adding them up, that small error will accumulate since, not only will you see an error in the 0.05 itself, errors may also be introduced at each stage of the accumulation - not all the the numbers 0.1, 0.15, 0.2 and so on can be represented exactly either.
Eventually, the errors will get large enough that they'll start showing up in the number if you use the default precision. You can put this off for a bit by choosing your own precision with something like:
#include <iostream>
#include <iomanip>
:
std::cout << std::setprecison (2) << time << '\n';
It won't fix the variable value, but it will give you some more breathing space before the errors become visible.
As an aside, some people recommend avoiding std::endl since it forces a flush of the buffers. If your implementation is behaving itself, this will happen for terminal devices when you send a newline anyway. And if you've redirected standard output to a non-terminal, you probably don't want flushing on every line. Not really relevant to your question and it probably won't make a real difference in the vast majority of cases, just a point I thought I'd bring up.
IEEE floats use the binary number system and therefore can't store decimal numbers exactly. When you add several of them together (sometimes just two is enough), the representational errors can accumulate and become visible.
Some numbers can't be precisely represented using floating points OR base 2 numbers. If I remember correcly, one of such numbers is decimal 0.05 (in base 2 results in infinitely repeating fractional number). Another issue is that if you print floating point to file (as base 10 number) then read it back you might as well get different number - because base differs and that might cause problems when converting fractional base2 to fractional base10 number.
If you want better precision you could try searching for a bignum library. This will work much slower than floating points, though. Another way to deal with precision problems would be to try storing numbers as "common fraction" with numberator/denominator(i.e. 1/10 instead of 0.1, 1/3 instead of 0.333.., etc - there's probably library even for that, but I haven't heard about it), but that won't work with irrational numbers like pi or e.
There is a example in http://www.gotw.ca/gotw/067.htm
int main()
{
double x = 1e8;
//float x = 1e8;
while( x > 0 )
{
--x;
}
}
When you change the double to float, it's a infinite loop in VS2008.
According to the Gotw explanation:
What if float can't exactly represent all integer values from 0 to
1e8? Then the modified program will start counting down, but will
eventually reach a value N which can't be represented and for which
N-1 == N (due to insufficient floating-point precision)... and then
the loop will stay stuck on that value until the machine on which the
program is running runs out of power.
From what I understand, the IEEE754 float is a single precision(32 bits) and the range of float should be +/- 3.4e +/- 38 and it should have a 7 digits significant.
But I still don't understand how exactly this happens: "eventually reach a value N which can't be represented and for which N-1 == N (due to insufficient floating-point precision)." Can someone try to explan this bit ?
A bit of extra info : When I use double x = 1e8, it finished in about 1 sec, when I change it to
float x = 1e8, it runs much longer(still running after 5 min), also if I change it to float x = 1e7;, it finished in about 1 second.
My testing environment is VS2008.
BTW I'm NOT asking the basic IEEE 754 format explanation as I already understand that.
Thanks
Well, for the sake of argument, lets assume we have a processor which represents a floating point number with 7 significant decimal digits, and an mantissa with, say, 2 decimal digits. So now the number 1e8 would be stored as
1.000 000 e 08
(where the "." and "e" need not be actually stored.)
So now you want to compute "1e8 - 1". 1 is represented as
1.000 000 e 00
Now, in order to do the subtraction we first do a subtraction with infinite precision, then normalize so that the first digit before the "." is between 1 and 9, and finally round to the nearest representable value (with break on even, say). The infinite precision result of "1e8 - 1" is
0.99 999 999 e 08
or normalized
9.9 999 999 e 07
As can be seen, the infinite precision result needs one more digit in the significand than what our architecture actually provides; hence we need to round (and re-normalize) the infinitely precise result to 7 significant digits, resulting in
1.000 000 e 08
Hence you end up with "1e8 - 1 == 1e8" and your loop never terminates.
Now, in reality you're using IEEE 754 binary floats, which are a bit different, but the principle is roughly the same.
The operation x-- is (in this case) equivalent to x = x - 1. That means the original value of x is taken, 1 is subtracted (using infinite precision, as mandated by IEEE 754-1985), and then the result is rounded to the next value of the float value space.
The rounded result for the numbers 1.0e8f + i is given for i in [-10;10] below:
-10: 9.9999992E7 (binary +|10011001|01111101011110000011111)
-9: 9.9999992E7 (binary +|10011001|01111101011110000011111)
-8: 9.9999992E7 (binary +|10011001|01111101011110000011111)
-7: 9.9999992E7 (binary +|10011001|01111101011110000011111)
-6: 9.9999992E7 (binary +|10011001|01111101011110000011111)
-5: 9.9999992E7 (binary +|10011001|01111101011110000011111)
-4: 1.0E8 (binary +|10011001|01111101011110000100000)
-3: 1.0E8 (binary +|10011001|01111101011110000100000)
-2: 1.0E8 (binary +|10011001|01111101011110000100000)
-1: 1.0E8 (binary +|10011001|01111101011110000100000)
0: 1.0E8 (binary +|10011001|01111101011110000100000)
1: 1.0E8 (binary +|10011001|01111101011110000100000)
2: 1.0E8 (binary +|10011001|01111101011110000100000)
3: 1.0E8 (binary +|10011001|01111101011110000100000)
4: 1.0E8 (binary +|10011001|01111101011110000100000)
5: 1.00000008E8 (binary +|10011001|01111101011110000100001)
6: 1.00000008E8 (binary +|10011001|01111101011110000100001)
7: 1.00000008E8 (binary +|10011001|01111101011110000100001)
8: 1.00000008E8 (binary +|10011001|01111101011110000100001)
9: 1.00000008E8 (binary +|10011001|01111101011110000100001)
10: 1.00000008E8 (binary +|10011001|01111101011110000100001)
So you can see that 1.0e8f and 1.0e8f + 4 and some other numbers have the same representation. Since you already know the details of the IEEE 754-1985 floating point formats, you also know that the remaining digits must have been rounded away.
What is the result of n - 1 if n - 1 and n have both identical representation due to the approximate nature of floating point numbers?
Regarding "reach" a value that can't be represented, I think Herb was including the possibility of quite esoteric floating point representations.
With any ordinary floating point representations, you will either start with such value (i.e. stuck on first value), or you will be somewhere in the contiguous range of integers centered around zero that can be represented exactly, so that the countdown succeeds.
For IEEE 754 the 32-bit representation, typically float in C++, has 23 bits mantissa, while the 64-bit representation, typically double in C++, has 52 bits mantissa. This means that with double you can at least represent exactly the integers in the range -(2^52-1) ... 2^52-1. I'm not quite sure if the range can be extended with another factor of 2. I get a bit dizzy thinking about it. :-)
Cheers & hth.,