I am coding in Qt C++.
I am declaring a literal double value in my main rountine of 0.1. I then pass this variable to a function that takes type double. When I hover the mouse over the variable in debug mode I see 0.100000000000001
What is the cause of this change and how do I stop it please?
Method Definition:
void MyClass::MyMethod(double fNewCellSize)
{
// variable fNewCellSize appears as 0.10000000000001
}
Method Call with literal value of 0.1:
MyObj.MyMethod(0.1);
My environment is Windows 64 bit OS using Qt 5.2.1 and compiling using Microsoft 2010 Visual Studio compiler in 32 bit.
Many decimal numbers are not exactly representable in IEEE floating point (i.e. the binary representation used for double) and 0.1 falls in this camp. When you type 0.1 the C++ compiler is required to convert that into a double to be represented on your hardware. It does so by computing a near approximation of this and so you see a bit of error.
If you try something like a power of two: 0.5, 0.25 these will be exactly represented.
See this link for a much more in-depth description of the idea.
This is normal behaviour in any computer. It's caused by the fact that decimal numbers are being stored in fixed-sized binary memory. The extra digits come from the inherent errors caused by converting from binary to decimal.
You will not get rid of them. The best you can do is choose a precision (either float or double) that is big enough so that the extra error digits will not make any difference to your solution, and then chop them off when you display the number.
Related
I read a double value using std::cin from the keyboard, let the value be 1.15 . When I place a break point after reading the value visual studio showed that value as 1.14999999. But If I print it It printed 1.15 on my console. Later I wrote following code and it did not work well
int main()
{
long double valueA;
int required;
std::cin>>valueA;
required=(valueA*10000)-(((int)valueA)*10000);
std::cout<<(required);
}
When the input is 1.015 the output is 149 but the expected output is 150. Why is my compiler considering 1.015 as 1.014999999? How can I correct that error?
What you are describing is floating point error. This happens because of the way floating point is represented at the hardware level (see here). Basically a floating point number is kept as sme and is reconstructed as s * m * 2 ^ e where ^ is to the power of and s is 1 if the s bit is 0 and -1 if the s bit is 1.
If you need the sort of accuracy, you can use a decimal arithmetic library, it's more or less the same thing, but instead of using powers of 2, they use powers of 10 and because they are implemented in software, it means they can have arbitrary precision (more on that here).
Here's a list of libraries that implement decimal arithmetic that you can use:
gmp - https://gmplib.org/
qdecimal - https://code.google.com/p/qdecimal/
intel decimal - https://software.intel.com/en-us/articles/intel-decimal-floating-point-math-library/
When I debug my software in VS C++ by stepping the code I notice that some float calculations show up as a number with a trailing dot, i.e.:
1232432.
One operation that lead up to this result is this:
float result = pow(10, a * 0.1f) / b
where a is a large negative number around -50 to -100 and b is most often around 1. I read some articles about problem with precision when it comes to floating-points. My question is just if the trailing dot is a Visual-Studio-way of telling me that the precision is very low on this number, i.e. in the variable result. If not, what does it mean?
This came up at work today and I remember that there was a problem for larger numbers so this did to occur every time (and by "this" I mean that trailing dot). But I do remember that it happened when there was seven digits in the number. Here they wright that the precision of floats are seven digits:
C++ Float Division and Precision
Can this be the thing and Visual Studio tells me this by putting a dot in the end?
I THINK I FOUND IT! It says "The mantissa is specified as a sequence of digits followed by a period". What does the mantissa mean? Can this be different on a PC and when running the code on a DSP? Because the thing is that I get different results and the only thing that looks strange to me is this period-thing, since I don't know what it means.
http://msdn.microsoft.com/en-us/library/tfh6f0w2(v=vs.71).aspx
If you're referring to the "sig figs" convention where "4.0" means 4±0.1 and "4.00" means 4±0.01, then no, there's no such concept in float or double. Numbers are always* stored with 24 or 53 significant bits (7.22 or 15.95 decimal digits) regardless of how many are actually "significant".
The trailing dot is just a decimal point without any digits after it (which is a legal C literal). It either means that
The value is 1232432.0 and they trimed the unnecessary trailing zero, OR
Everything is being rounded to 7 significant digits (in which case the true value might also be 1232431.5, 1232431.625, 1232431.75, 1232431.875, 1232432.125, 1232432.25, 1232432.375, or 1232432.5.)
The real question is, why are you using float? double is the "normal" floating-point type in C(++), and float a memory-saving optimization.
* Pedants will be quick to point out denormals, x87 80-bit intermediate values, etc.
The precision is not variable, that is simply how VS is formatting it for display. The precision (or lackof) is always constant for a given floating point number.
The MSDN page you linked to talks about the syntax of a floating-point literal in source code. It doesn't define how the number will be displayed by whatever tool you're using. If you print a floating-point number using either printf or std:cout << ..., the language standard specifies how it will be printed.
If you print it in the debugger (which seems to be what you're doing), it will be formatted in whatever way the developers of the debugger decided on.
There are a number of different ways that a given floating-point number can be displayed: 1.0, 1., 10.0E-001, and .1e+1 all mean exactly the same thing. A trailing . does not typically tell you anything about precision. My guess is that the developers of the debugger just used 1232432. rather than 1232432.0 to save space.
If you're seeing the trailing . for some values, and a decimal number with no . at all for others, that sounds like an odd glitch (possibly a bug) in the debugger.
If you're wondering what the actual precision is, for IEEE 32-bit float (the format most computers use these days), the next representable numbers before and after 1232432.0 are 1232431.875 and 1232432.125. (You'll get much better precision using double rather than float.)
I'm working through C++ Primer Plus, using Xcode as my IDE and one of the exercises calls for assigning the value 0.0254 to a symbolic constant (converting inches to meters). The problem is, when I declare the constant I'm getting a value of 0.0253999997. I'm declaring the constant as seen below.
#include <iostream>
const float METERS_PER_INCH = .0254;
This is the issue of rounding error due to floats storing numbers in a base 2 number system(think about how we cannot write 1/3 without rounding in our base 10 number system). It results in small rounding errors like the ones you are seeing when you store non-base 2 numbers.
The solution is to use integers or a bignum library(I suggest the GNU
Multiple Precision library ). A bignumber library uses integers to store arbitrary precision numbers exactly.
Don't worry your mac is OK.
Problem is that computers can't accurately represent floats and doubles so this is what you are seeing there.
Why comparing double and float leads to unexpected result?
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
strange output in comparison of float with float literal
Google is your friend!
My question has no practical application. I'm just interested. Suppose, I have a double value and I want to obtain its string representation similarly to the printf function. How would I do that without the C runtime library? Let's suppose I'm on the x86 architecture.
Given that you state your question has no practical application, I figure you're trying to learn about floating point number representations.
Thus, if you're looking for a solution without using any library support, start with the format specification. From that you can discern the various "special" values (Infinity, NAN, etc) as well as decoding/calculating the actual numeric value. Once you have the significand and exponent, you know where to put the decimal point. You'll have to write your own itoa type routine. For radices which are a power of two, this can be as simple as a lookup table. For decimal, you'll have to do a little extra math.
you can get all values on left side by (double % 10) and then divide by 10 every time.
they will be in right to left.
to get values on right of dot you have to multiply by 10 and then (double % 10). they will be in left-to-right.
If you want to do it simply with a "close enough" result, see my article http://www.exploringbinary.com/quick-and-dirty-floating-point-to-decimal-conversion/ . It describes a simple program that uses floating-point to convert from floating-point to decimal, and explains why that approach can never be accurate for all conversions. (The program doesn't do decimal rounding like printf, but that should be easy enough to add.)
I am porting Fortran code from Fortran PowerStation(version 4.0) to the Fortran 11(2003) compiler. The old compiler (PowerStation) has 53-bit precision. When porting to the new compiler, I am not achieving proper or exact values for my real/float variables. I hope the new compiler is 64-bit precision. So I think I need to change the FPU (floating point unit) from 53-bit to 64-bit precision. Is this correct? If so, how do I go about changing 53-bit to 64-bit precision using the properties of the new compiler? If not, what should I be doing?
Thanks in advance.
The portable way to request floating point precision in Fortran 90/95/2003 is with the selected_real_kind intrinsic function. For example,
integer, parameter :: DoubleReal_K = selected_real_kind (14) will define a integer DoubleReal_K that specifies a floating point variable with at least 14 decimal digits:
real (DoubleReal_K) :: MyFloat.
Requesting 14 decimal digits will typically produce a double-precision float with 53 bits -- but the only guarantee is 14 decimal digits.
If you need more precision, use a larger value than 14 to specify a longer type -- 17 decimal digits might get extended precision (64 bits), or it might get quadrupole precision, or nothing, depending on the compiler. If the compiler has a larger type available, it will provide it... Otherwise, get a better compiler. Why are you using such an old and unsupported compiler? Also, don't expect exact results from floating point calculations -- it is normal for changes to cause small changes in the results.
You hope the new compiler is 64-bit precision ? I sort of expect that you read the manual and figure that out yourself but if you can't do that, tell us which compiler you are using and someone might help.
How different are the results of the old code and the code compiled with the new compiler ? Of course the results won't be exactly the same if the precision has changed -- how could they be unless you take special steps to ensure sameness.;