I'm trying to set float register values in GDB. set $f works in most cases but not in cases where NAN or INF is used. Is there a way to set the actual RAW bytes in the register?
For example:
(gdb) set $f31 = 4.4
(gdb) info reg $f31
f31 4.4000000000000004 (raw 0x401199999999999a)
I want to be able to directly manipulate the raw bytes to a NAN or INF.
edit:
Thanks to user #ssbassa! I'm stil not used to stack overflow so I don't know how to mark that as a solution but they recommended doing this:
set $f31=1./0 for INF and set $f31=0./0 for NAN.
That being said: I'd like to also know how to set these values:
SOLUTION: use set (void*)
(gdb) set (void *) $f1 = 0x1
(gdb) info reg $f1
f1 4.9406564584124654e-324 (raw 0x0000000000000001)
double precision +/- pynan 0x(7)FFFC_0000_0000_0000
single precision +/- pynan 0x(7)FFC0_0000
double precision +/- qnan 0x(7)FFFC_0000_0000_0001
double precision +/- snan 0x(7)FFF8_0000_0000_0001
double precision +/- inf 0x(7)FFF0_0000_0000_0000
single precicion +/- qnan 0x(7)FFC0_0001
single precicion +/- snan 0x(7)FF80_0001
double precision +/- inf 0x(7)FF80_0000
double precicion neg zero 0x8000_0000_0000_0000
single precision neg zero 0x8000_0000
Any idea on these ones?
Thanks to user #ssbassa’s comment:
Set $f31=1./0 for INF and set $f31=0./0 for NAN.
That being said: I'd like to also know how to set these values. The solution to that is to use set (void*):
(gdb) set (void *) $f1 = 0x1
(gdb) info reg $f1
f1 4.9406564584124654e-324 (raw 0x0000000000000001)
Related
I am testing some very simple equivalence errors when precision is an issue and was hoping to perform the operations in extended double precision (so that I knew what the answer would be in ~19 digits) and then perform the same operations in double precision (where there would be roundoff error in the 16th digit), but somehow my double precision arithmetic is maintaining 19 digits of accuracy.
When I perform the operations in extended double, then hardcode the numbers into another Fortran routine, I get the expected errors, but is there something strange going on when I assign an extended double precision variable to a double precision variable here?
program code_gen
implicit none
integer, parameter :: Edp = selected_real_kind(17)
integer, parameter :: dp = selected_real_kind(8)
real(kind=Edp) :: alpha10, x10, y10, z10
real(kind=dp) :: alpha8, x8, y8, z8
real(kind = dp) :: pi_dp = 3.1415926535897932384626433832795028841971693993751058209749445
integer :: iter
integer :: niters = 10
print*, 'tiny(x10) = ', tiny(x10)
print*, 'tiny(x8) = ', tiny(x8)
print*, 'epsilon(x10) = ', epsilon(x10)
print*, 'epsilon(x8) = ', epsilon(x8)
do iter = 1,niters
x10 = rand()
y10 = rand()
z10 = rand()
alpha10 = x10*(y10+z10)
x8 = x10
x8 = x8 - pi_dp
x8 = x8 + pi_dp
y8 = y10
y8 = y8 - pi_dp
y8 = y8 + pi_dp
z8 = z10
z8 = z8 - pi_dp
z8 = z8 + pi_dp
alpha8 = alpha10
write(*, '(a, es30.20)') 'alpha8 .... ', x8*(y8+z8)
write(*, '(a, es30.20)') 'alpha10 ... ', alpha10
if( alpha8 .gt. x8*(y8+z8) ) then
write(*, '(a)') 'ERROR(.gt.)'
elseif( alpha8 .lt. x8*(y8+z8) ) then
write(*, '(a)') 'ERROR(.lt.)'
endif
enddo
end program code_gen
where rand() is the gfortran function found here.
If we are speaking about only one precision type (take, for example, double), then we can denote machine epsilon as E16 which is approximately 2.22E-16. If we take a simple addition of two Real numbers, x+y, then the resulting machine expressed number is (x+y)*(1+d1) where abs(d1) < E16. Likewise, if we then multiply that number by z, the resulting value is really (z*((x+y)*(1+d1))*(1+d2)) which is nearly (z*(x+y)*(1+d1+d2)) where abs(d1+d2) < 2*E16. If we now move to extended double precision, then the only thing that changes is that E16 turns to E20 and has a value of around 1.08E-19.
My hope was to perform the analysis in extended double precision so that I could compare two numbers which should be equal but show that, on occasion, roundoff error will cause comparisons to fail. By assigning x8=x10, I was hoping to create a double precision 'version' of the extended double precision value x10, where only the first ~16 digits of x8 conform to the values of x10, but upon printing out the values, it shows that all 20 digits are the same and the expected double precision roundoff error is not occurring as I would expect.
It should also be noted that before this attempt, I wrote a program which actually writes another program where the values of x, y, and z are 'hardcoded' to 20 decimal places. In this version of the program, the comparisons of .gt. and .lt. failed as expected, but I am not able to duplicate the same failures by casting an extended double precision value as a double precision variable.
In an attempt to further 'perturb' the double precision values and add roundoff error, I have added, then substracted, pi from my double precision variables which should leave the remaining variables with some double precision roundoff error, but I am still not seeing that in the final result.
As the gfortran documentation you link states, the function result of rand is a default real value (single precision). Such a value can be represented exactly by each of your other real types.
That is, x10=rand() assigns a single precision value to the extended precision variable x10. It does so exactly. This same value now stored in x10 is assigned to the double precision variable x8, but this remains exactly representable as double precision.
There is sufficient precision in the single-as-double that the calculations using double and extended types return the same value. [See the note at the end of this answer.]
If you wish to see real effects of loss of precision, then start by using an extended or double precision value. For example, rather than using rand (returning a single precision value), use the intrinsic random_number
call random_number(x10)
(which has the advantage of being standard Fortran). Unlike a function, which in (nearly) all cases returns a value type regardless of the end use of the value, this subroutine will give you a precision corresponding to the argument. You will (hopefully) see much as you will from your "hard-coded" experiment.
Alternatively, as agentp commented, it may be more intuitive to start with a double precision value
call random_number(x8); x10=x8 ! x8 and x10 have the precision of double precision
call random_number(y8); y10=y8
call random_number(z8); z10=z8
and perform the calculations from that starting point: those extra bits will then start to show.
In summary, when you do x8=x10 you are getting the first few bits of x8 corresponding to those of x10, but many of those bits and those that follow in x10 are all zero.
When it comes to your pi_dp perturbation, you are again assigning a single precision (this time a literal constant) value to a double precision variable. Just having all those digits doesn't make it anything other than a default real literal. You can specify a different kind of literal with a _Edp suffix, as described in other answers.
Finally, one also then has to worry about what the compiler does with regards to optimization.
My thesis is that starting from the single precision value, the calculations performed are representable exactly in both double and extended precision (with the same values). For other calculations, or from a starting point with more bits set, or representations (for example, on some systems or with other compilers the numeric type with kind selected_real_kind(17) may have quite different characteristics such as a different radix) that needn't be the case.
While this was largely based on guessing and hoping it explained the observation. Fortunately, there are ways to test this idea. As we're talking about IEEE arithmetic we can consider the inexact flag. If that flag isn't raised during the computation we can be happy.
With gfortran there is the compilation option -ffpe=inexact which will make the inexact flag signalling. With gfortran 5.0 the intrinsic module ieee_exceptions is supported which can be used in a portable/standard manner.
You can consider this flag for further experimentation: if it is raised then you can expect to see differences between the two precisions.
Q1:Will dividing a integer by its divisor lose precision ?
int a=M*N,b=N;//M and N are random non-zero integers.
float c=float(a)/b;
if (c==M)
cout<<"accurate"<<endl;
Q2:Will passing a float value lose precision ?
float a=K;//K is a random float;
if (a==K)
cout<<"accurate"<<endl;
Q1:Will dividing a integer by its divisor lose precision ?
Yes. I used the following program to come up with some numbers:
#include <iostream>
#include <climits>
int main()
{
int M = 10;
int N = 7;
int inaccurateCount = 0;
for (; M < INT_MAX && inaccurateCount < 10; ++M )
{
int a = M*N;
float c = float(a)/N;
if ( c != M )
{
std::cout << "Not accurate for M: " << M << " and N: " << N << std::endl;
inaccurateCount++;
}
}
return 0;
}
and here's the output:
Not accurate for M: 2396747 and N: 7
Not accurate for M: 2396749 and N: 7
Not accurate for M: 2396751 and N: 7
Not accurate for M: 2396753 and N: 7
Not accurate for M: 2396755 and N: 7
Not accurate for M: 2396757 and N: 7
Not accurate for M: 2396759 and N: 7
Not accurate for M: 2396761 and N: 7
Not accurate for M: 2396763 and N: 7
Not accurate for M: 2396765 and N: 7
Q2:Will passing a float value lose precision ?
No, it shouldn't.
Q1:Will dividing a integer by its divisor lose precision ?
You actually asked if converting a int to a float will lose precsion.
Yes, it will typically do that. On today 32-bit (or wider) computer architectures an int stores 32-bit of data: 1 bit sign plus 31 bit significand. A float stores also 32-bit of data, but these are: 1 bit sign, 8 bit exponent, and 23 bit fractional part, cf. IEEE 754 single-precision floating point format (It might not lose precision on a 16-bit architecture, but I can't check that.)
Depending on the floating point number it will be stored in different represantations, one is the normalized form, where the fractional part is prepended by a hidden one, so that, we get a 24 bit significand. This is less than as stored in an int.
For example the integer 01010101 01010101 01010101 01010101 (binary, space only for better reading) cannot be expressed as float without loosing precision. In normalized form this would be 1,010101 01010101 01010101 01010101 * 2^30. So we have 30 significand binary digits after the comma, which cannot be stored in 23 bit (fractional part) without losing precision. The actual round modes defines how the value is shortened.
Note, that it does not depends on if the value is actually "high". The integer 01000000 00000000 00000000 00000000 is in normalized form 1,000000 00000000 00000000 00000000 * 2^30. This number has zero significant bits after the comma and can be stored without losing precision.
Q2: Will passing a float value lose precision ?
No.
Q1:Will dividing a integer by its divisor lose precision ?
If a is to large it might loose precision, otherwise (if a is small enough to be exactly represented as a float) it will not. The loss of precision may actually happen already when you convert a. Also the division will loose precision, but sometimes it could be that these losses of precision will cancel each other.
For example if N = 8388609 and M=5. You have the (binary) mantissa 100...001 and multiply with 101 and end up with 101000...0000101, but then the last two bits will be rounded to zero and you get an error in (float)(N*M), but then when you divide by five, you get 1000...00 and a remainder of 100, which means that it should round up one step and you get back the original number.
Q2:Will passing a float value lose precision ?
No, it will not lose precision. However your code could still fail to identify it as accurate.
The case this could happen is if K is a NaN (for example 0.0/0.0), then x will also become a NaN - however NaN shouldn't (need to) compare equals. In this case one could argue that you lost precision and I agree, but it's not at the point x=K that looses precision - you already lost precision when producing K.
It wall not be exact but to get more accurate answers you can use the value types double and long
Case 1: Yes it loses precision in some cases. For small values of M it will be accurate.
Case 2: No it doesn't lose its precision.
Due to safety reasons, I need to perform the same computation twice, one time with only integer (int32) variables and another time with only float (float32) variables. At the end of the computation a comparison between the two results is taking place.
I read the article about comparing floating point numbers.
There are few things I don't understand:
I haven't the following compression for float number: Assuming a and b are floats, is this way of comparison correct:
if !(a > b) && !(a < b) is true, then _a_ and _b_ are probably identical otherwise not.
If I cast a float number to integer, I get the integer part of the number, why by using an union object and there defining the same memory as int32 and float32 I get different solution? Doesn't it cast the float number to int32 as well?
why by using an union object and there defining the same memory as int32 and float32 i get different solution?
The only reason the float/int union even makes sense is by virtue of the fact that both float and int share a storage size of 32-bits. What you are missing is an understanding that floats (in fact all floating point numbers) are stored in IEEE-754 Floating Point Format (floats are single-precision, doubles are double-precision, etc..)
When you use the float/int union trick, you see the integer value that is the integer equivalent to the IEEE-754 Single-Precision Floating Point Format for the float. The two values have nothing to do with representing the same numeric value. You can just look at the memory they occupy as either a float or as an integer by virtue of them both occupying 32-bits of memory. If you look through the float window, you see what those 32-bits mean as a float. If on the other hand, you look at the same 32-bits as an integer, you see nothing more that what those same 32-bits would be if taken as an integer. An example looking at the binary representation usually helps.
Take for example, the float value 123.456. If you look at the 32-bits in memory you see:
The float value entered : 123.456001
binary value in memory : 01000010-11110110-11101001-01111001
As unsigned integer : 1123477881
The IEEE-754 Single Precision Floating Point Representation is a specific floating point format in memory comprised of the following 3-components:
0 1 0 0 0 0 1 0 1 1 1 1 0 1 1 0 1 1 1 0 1 0 0 1 0 1 1 1 1 0 0 1
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -|
|s| exp | mantissa |
Where s is the sign-bit, exp is the biased exponent with the remaining 23-bits called the mantissa.There is no way you can expect to cast the float 123.456 and get anything close to integer 123, you are off by about 7 orders of magnitude
Doesn't it cast the float number to int32 as well?
Answer: No
False. For instance, a and b are both nan, in which case a < b, a > b, a == b are all false.
Because integers and floating point numbers are represented differently. When you cast a float to int (e.g, 1.0f to 1), their memory are not the same.
False
Comparison in floating-point types works the same as in integer types for normal types. a == b always return true if the value of a is exactly equal to b, i.e 12.345 == 12.345 but 12.345 < 12.346. If you need to compare that the value are close enough to each other, use an epsilon like instead of ==. However for special numbers like NaN or Inf then things are different. Comparisons with NaN returns false for all operators except !=
Because one works with values and the other works with object representation
A simple cast to int like (int)some_double_value will return the value of some_double_value. However if you store it into a union and read out the integer like that, it'll return the representation in memory of the float
For example (int)2.5 == 2 but the binary representation of 2.5 in IEEE-754 double precision is 0x4004000000000000
Someone wanting less precision would write
999 format ('The answer is x = ', F8.3)
Others wanting higher output precision may write
999 format ('The answer is x = ', F18.12)
Thus it totally depends on what the user desires. What is the format
statement that exactly matches the precision used in the calculation?
(Note, this may vary from system to system)
It is a difficult question because you request "the precision of the calculation", which depends on so many factors. For example: if I solve f(x)=0 via Newton's method to a tolerance of 1E-6, would you want a format with seven digits?
On the other hand, if you mean the "highest precision attainable by the type" (e. g., double or single precision) then you can simply find the corresponding epsilon (machine eps, or precision) and use that as the format flag. If epsilon is 1E-15, then you can use a format flag that does not have more than 16 digits.
In Fortran you can use the EPSILON(X) function to get this number (the answer will depend on the type of X), the you can take the floor of the absolute value of the logarithm (base 10) of epsilon, and make that the number of decimals in your float representation.
For example, if epsilon is 1E-12, the log is -12, the abs is 12, and the floor is 12, so you want a format like 15.12F (12 decimals + 1 point + the zero + the sign = 15 places)
The problem with floating point numbers is that there is no precision as such: only significant digits.
For instance, if you are calculating longitudes in real*1, near the UK, you'd be accurate to 6 decimal places but if you were in Colorado Springs, it would only be accurate to 4 decimal places. It would not make any sense to print the number in F format it is just rubbish after the 4th decimal place.
If you wish to print to maximum precision, print in E format. Since it is always n.nn..nEnn, you get all the significant digits.
Edit - user4050's query
Try the following example
program main
real intpart, multiplier
integer ii
multiplier = 1
do ii = 1, 6
intpart = 9.87654321
intpart = intpart * multiplier
print '(F15.7 E15.7 G15.8)', intpart, intpart, intpart
multiplier = multiplier * 10
end do
stop
end program
What you will get is something like
9.8765430 0.9876543E+01 9.8765430
98.7654266 0.9876543E+02 98.765427
987.6542969 0.9876543E+03 987.65430
9876.5429688 0.9876543E+04 9876.5430
98765.4296875 0.9876543E+05 98765.430
987654.3125000 0.9876543E+06 987654.31
Notice that the precision changes as the number gets bigger because a float only has 7 significant figures.
I am writing a piece of code in which i have to convert from double to float values. I am using boost::numeric_cast to do this conversion which will alert me of any overflow/underflow. However i am also interested in knowing if that conversion resulted in some precision loss or not.
For example
double source = 1988.1012;
float dest = numeric_cast<float>(source);
Produces dest which has value 1988.1
Is there any way available in which i can detect this kind of precision loss/rounding
You could cast the float back to a double and compare this double to the original - that should give you a fair indication as to whether there was a loss of precision.
float dest = numeric_cast<float>(source);
double residual = source - numeric_cast<double>(dest);
Hence, residual contains the "loss" you're looking for.
Look at these articles for single precision and double precision floats. First of all, floats have 8 bits for the exponent vs. 11 for a double. So anything bigger than 10^127 or smaller than 10^-126 in magnitude is going to be the overflow as you mentioned. For the float, you have 23 bits for the actual digits of the number, vs 52 bits for the double. So obviously, you have a lot more digits of precision for the double than float.
Say you have a number like: 1.1123. This number may not actually be encoded as 1.1123 because the digits in a floating point number are used to actually add up as fractions. For example, if your bits in the mantissa were 11001, then the value would be formed by 1 (implicit) + 1 * 1/2 + 1 * 1/4 + 0 * 1/8 + 0 * 1/16 + 1 * 1/32 + 0 * (64 + 128 + ...). So the exact value cannot be encoded unless you can add up these fractions in such a way that it's the exact number. This is rare. Therefore, there will almost always be a precision loss.
You're going to have a certain level of precision loss, as per Dave's answer. If, however, you want to focus on quantifying it and raising an exception when it exceeds a certain number, you will have to open up the floating point number itself and parse out the mantissa & exponent, then do some analysis to determine if you've exceeded your tolerance.
But, the good news, its usually the standard IEEE floating-point float. :-)