Calculating pow with doubles gives wrong results - c++

I'm programming a calculator on an Arduino and I'm trying to calculate pow and writing it to a string (result). This is my code:
dtostrf(exp(n*log(x)), 0, 5, result); // x ^ n
2 ^ 2 = 4.00000 // works fine
10 ^ 5 = 99999.9770 // should be 100000
What's wrong with my code and how can I always get the right result?
I mean how can I round it but still be able to use doubles ( e.g. 5.2 ^ 3.123 )

You're just hitting rounding errors. There's nothing you can do about this, except revert to an integer-based approach whenever the inputs are integers.
You could condition on whether the inputs are integers, and if so then use integer arithmetic; if not, then use doubles. But using exp and log will always introduce rounding errors, so you can't expect exact answers with that approach.
More precisely, to use integer arithmetic, you need the base to be an integer and the exponent to be a non-negative integer.

Since you are programming a calculator, speed is not your concern but the number of reliable digits is. So, you could try to use a double precision library. It uses 64-Bit-doubles but has only about 200 FLOPS at 16MHz CPU clock and much less at higher-order calculations like exp(), log(), or sin(). Thus, it will take a second after having typed in the digits and pressed the enter button but this was also the case with the old 8-Bit-based pocket caluclators.
See this Link (only in German)

Related

C++ set precision of a double (not for output)

Alright so I am trying to truncate actual values from a double with a given number of digits precision (total digits before and after, or without, decimal), not just output them, not just round them. The only built in functions I found for this truncates all decimals, or rounds to given decimal precision.
Other solutions I have found online, can only do it when you know the number of digits before the decimal, or the entire number.
This solution should be dynamic enough to handle any number. I whipped up some code that does the trick below, however I can't shake the feeling there is a better way to do it. Does anyone know of something more elegant? Maybe a built in function that I don't know about?
I should mention the reason for this. There are 3 different sources of observed values. All 3 of these sources agree to some level in precision. Such as below, they all agree within 10 digits.
4659.96751751236
4659.96751721355
4659.96751764253
However I need to only pull from 1 of the sources. So the best approach, is to only use up to the precision all 3 sources agree on. So its not like I am manipulating numbers and then need to truncate precision, they are observed values. The desired result is
4659.967517
double truncate(double num, int digits)
{
// check valid digits
if (digits < 0)
return num;
// create string stream for full precision (string conversion rounds at 10)
ostringstream numO;
// read in number to stream, at 17+ precision things get wonky
numO << setprecision(16) << num;
// convert to string, for character manipulation
string numS = numO.str();
// check if we have a decimal
int decimalIndex = numS.find('.');
// if we have a decimal, erase it for now, logging its position
if(decimalIndex != -1)
numS.erase(decimalIndex, 1);
// make sure our target precision is not higher than current precision
digits = min((int)numS.size(), digits);
// replace unwanted precision with zeroes
numS.replace(digits, numS.size() - digits, numS.size() - digits, '0');
// if we had a decimal, add it back
if (decimalIndex != -1)
numS.insert(numS.begin() + decimalIndex, '.');
return atof(numS.c_str());
}
This will never work since a double is not a decimal type. Truncating what you think are a certain number of decimal digits will merely introduce a new set of joke digits at the end. It could even be pernicious: e.g. 0.125 is an exact double, but neither 0.12 nor 0.13 are.
If you want to work in decimals, then use a decimal type, or a large integral type with a convention that part of it holds a decimal portion.
I disagree with "So the best approach, is to only use up to the precision all 3 sources agree on."
If these are different measurements of a physical quantity, or represent rounding error due to different ways of calculating from measurements, you will get a better estimate of the true value by taking their mean than by forcing the digits they disagree about to any arbitrary value, including zero.
The ultimate justification for taking the mean is the Central Limit Theorem, which suggests treating your measurements as a sample from a normal distribution. If so, the sample mean is the best available estimate of the population mean. Your truncation process will tend to underestimate the actual value.
It is generally better to keep every scrap of information you have through the calculations, and then remember you have limited precision when outputting results.
As well as giving a better estimate, taking the mean of three numbers is an extremely simple calculation.

d0 when taking roots of numbers

So in general, I understand the difference between specifying 3. and 3.0d0 with the difference being the number of digits stored by the computer. When doing arithmetic operations, I generally make sure everything is in double precision. However, I am confused about the following operations:
64^(1./3.) vs. 64^(1.0d0/3.0d0)
It took me a couple of weeks to find an error where I was assigning the output of 64^(1.0d0/3.0d0) to an integer. Because 64^(1.0d0/3.0d0) returns 3.999999, the integer got the value 3 and not 4. However, 64^(1./3.) = 4.00000. Can someone explain to me why it is wise to use 1./3. vs. 1.0d0/3.0d0 here?
The issue isn't so much single versus double precision. All floating point calculations are subject to imprecision compared to true real numbers. In assigning a real to an integer, Fortran truncates. You probably want to use the Fortran intrinsic nint.
this is a peculiar fortuitous case where the lower precision calculation gives the exact result. You can see this without the integer conversion issue:
write(*,*)4.d0-64**(1./3.),4.d0-64**(1.d0/3.d0)
0.000000000 4.440892E-016
In general this does not happen, here the double precision value is "better"
write(*,*)13.d0-2197**(1./3.),13.d0-2197**(1.d0/3.d0)
-9.5367E-7 1.77E-015
Here, since the s.p. calc comes out slightly high it gives you the correct value on integer conversion, while the d.p. result will get rounded down, hence be wrong, even though the floating point error was smaller.
So in general, no you should not consider use of single precision to be preferred.
in fact 64 and 125 seem to be the only special cases where the s.p. calc gives a perfect cube root while the d.p. calc does not.

Why the digits after decimal are all zero?

I want to perform some calculations and I want the result correct up to some decimal places, say 12.
So I wrote a sample:
#define PI 3.1415926535897932384626433832795028841971693993751
double d, k, h;
k = 999999/(2*PI);
h = 999999;
d = PI*k*k*h;
printf("%.12f\n", d);
But it gives the output:
79577232813771760.000000000000
I even used setprecision(), but same answer rather in exponential form.
cout<<setprecision(12)<<d<<endl;
prints
7.95772328138e+16
Used long double also, but in vain.
Now is there any way other than storing the integer part and the fractional part separately in long long int types?
If so, what can be done to get the answer precisely?
A double has only about 16 decimal digits of precision. Everything after the decimal point would be nonsense. (In fact, the last digit or two left of the point may not agree with an infinite-precision calculation.)
Long double is not standardized, AFAIK. It may be that on your system it is the same as double, or no more precise. That would slightly surprise me, but it doesn't violate anything.
You need to read Double-Precision concepts again; more carefully.
The double has increased precision by using 64 bits.
Stuff before the decimal is more important than that after it.
So, when you have a large integer part, it will truncate the lower precision -- this is being described to you in various answers here as rounding off.
Update:
To increase precision, you'll need to use some library or change your language.
Check this other question: Best coding language for dealing with large numbers (50000+ digits)
Yet, I'll ask you to re-check your intent once more.
Do you really need 12 decimal places for numbers that have really high values
(over 10 digits in the integer part like in your example)?
Maybe you won't really have large integer parts
(in which case such code should work fine).
But if you are tracking a value like 10000000000.123456789,
I am really interested in exactly which application you are working on (astronomy?).
If the integer part of your values is some way under 10000, you should be fine here.
Update2:
IF you must demonstrate the ability of a specific formula to work accurately within constrained error limits, the way to go is fixing the processing of your formula such that the least error is introduced.
Example,
If you want to do say, (x * y) / z
it would be prudent to try something like max(x,y)/z * min(x,y)
rather than, the original form which may overflow after (x * y), loosing precision if that did not fit in the 16 decimals of double
If you had just 2 digit precision,
. 2-digit regular-precision
`42 * 7 290 297
(42 * 7)/2 290/2 294/2
Result ==> 145 147
But ==> 42/2 = 21
21 * 7 = 147
This is probably the intent of your contest.
The double-precision binary format used by most computers can only hold about 16 digits, after that you'll get rounding. See http://en.wikipedia.org/wiki/Double-precision_floating-point_format
Floating point values have a limit range of digits. Just because your "PI" value has six times as many digits as a double will support doesn't alter the way the hardware works.
A typical (IEEE754) double will produce approximately 15-16 decimal places. Whether that's 0.12345678901235, 1234567.8901235, 12345678901235 or 12345678901235000000000, or some other variation.
In other words, yes, if you calculate your calculation EXACTLY, you'll get lots of decimal places, because pi never ends. On a computer, you get about 15-16 digits, no matter what input values you use - all that changes is where in that sequence the decimal place sits. To get more, you need "big number support", such as the Gnu Multiprcession (GMP) library.
You're looking for std::fixed. That tells the ostream not to use exponential form.
cout << setprecision(12) << std::fixed << d << endl;

Preventing Rounding Errors

I was just reading about rounding errors in C++. So, if I'm making a math intense program (or any important calculations) should I just drop floats all together and use only doubles or is there an easier way to prevent rounding errors?
Obligatory lecture: What Every Programmer Should Know About Floating-Point Arithmetic.
Also, try reading IEEE Floating Point standard.
You'll always get rounding errors. Unless you use an infinite arbitrary precision library, like gmplib. You have to decide if your application really needs this kind of effort.
Or, you could use integer arithmetic, converting to floats only when needed. This is still hard to do, you have to decide if it's worth it.
Lastly, you can use float or double taking care not to make assumption about values at the limit of representation's precision. I'd wish this Valgrind plugin was implemented (grep for float)...
The rounding errors are normally very insignificant, even using floats. Mathematically-intense programs like games, which do very large numbers of floating-point computations, often still use single-precision.
This might work if your highest number is less than 10 billion and you're using C++ double precision.
if ( ceil(10000*(x + 0.00001)) > ceil(100000*(x - 0.00001))) {
x = ceil(10000*(x + 0.00004)) / 10000;
}
This should allow at least the last digit to be off +/- 9. I'm assuming dividing by 1000 will always just move a decimal place. If not, then maybe it could be done in binary.
You would have to apply it after every operation that is not +, -, *, or a comparison. For example, you can't do two divisions in the same formula because you'd have to apply it to each division.
If that doesn't work, you could work in integers by scaling the numbers up and always use integer division. If you need advanced functions maybe there is a package that does deterministic integer math. Integer division is required in a lot of financial settings because of round off error being subject to exploit like in the movie "The Office".

C++: How to Convert From Float to String Without Rounding, Truncation or Padding? [duplicate]

This question already has answers here:
Why do I see a double variable initialized to some value like 21.4 as 21.399999618530273?
(14 answers)
Closed 6 years ago.
I am facing a problem and unable to resolve it. Need help from gurus. Here is sample code:-
float f=0.01f;
printf("%f",f);
if we check value in variable during debugging f contains '0.0099999998' value and output of printf is 0.010000.
a. Is there any way that we may force the compiler to assign same values to variable of float type?
b. I want to convert float to string/character array. How is it possible that only and only exactly same value be converted to string/character array. I want to make sure that no zeros are padded, no unwanted values are padded, no changes in digits as in above example.
It is impossible to accurately represent a base 10 decimal number using base 2 values, except for a very small number of values (such as 0.25). To get what you need, you have to switch from the float/double built-in types to some kind of decimal number package.
You could use boost::lexical_cast in this way:
float blah = 0.01;
string w = boost::lexical_cast<string>( blah );
The variable w will contain the text value 0.00999999978. But I can't see when you really need it.
It is preferred to use boost::format to accurately format a float as an string. The following code shows how to do it:
float blah = 0.01;
string w = str( boost::format("%d") % blah ); // w contains exactly "0.01" now
Have a look at this C++ reference. Specifically the section on precision:
float blah = 0.01;
printf ("%.2f\n", blah);
There are uncountably many real numbers.
There are only a finite number of values which the data types float, double, and long double can take.
That is, there will be uncountably many real numbers that cannot be represented exactly using those data types.
The reason that your debugger is giving you a different value is well explained in Mark Ransom's post.
Regarding printing a float without roundup, truncation and with fuller precision, you are missing the precision specifier - default precision for printf is typically 6 fractional digits.
try the following to get a precision of 10 digits:
float amount = 0.0099999998;
printf("%.10f", amount);
As a side note, a more C++ way (vs. C-style) to do things is with cout:
float amount = 0.0099999998;
cout.precision(10);
cout << amount << endl;
For (b), you could do
std::ostringstream os;
os << f;
std::string s = os.str();
In truth using the floating point processor or co-processor or section of the chip itself (most are now intergrated into the CPU), will never result in accurate mathematical results, but they do give a fairly rough accuracy, for more accurate results, you could consider defining a class "DecimalString", which uses nybbles as decimal characters and symbols... and attempt to mimic base 10 mathematics using strings... in that case, depending on how long you want to make the strings, you could even do away with the exponent part altogether a string 256 can represent 1x10^-254 upto 1^+255 in straight decimal using actual ASCII, shorter if you want a sign, but this may prove significantly slower. You could speed this by reversing the digit order, so from left to right they read
units,tens,hundreds,thousands....
Simple example
eg. "0021" becomes 1200
This would need "shifting" left and right to make the decimal points line up before routines as well, the best bet is to start with the ADD and SUB functions, as you will then build on them in the MUL and DIV functions. If you are on a large machine, you could make them theoretically as long as your heart desired!
Equally, you could use the stdlib.h, in there are the sprintf, ecvt and fcvt functions (or at least, there should be!).
int sprintf(char* dst,const char* fmt,...);
char *ecvt(double value, int ndig, int *dec, int *sign);
char *fcvt(double value, int ndig, int *dec, int *sign);
sprintf returns the number of characters it wrote to the string, for example
float f=12.00;
char buffer[32];
sprintf(buffer,"%4.2f",f) // will return 5, if it is an error it will return -1
ecvt and fcvt return characters to static char* locations containing the null terminated decimal representations of the numbers, with no decimal point, most significant number first, the offset of the decimal point is stored in dec, the sign in "sign" (1=-,0=+) ndig is the number of significant digits to store. If dec<0 then you have to pad with -dec zeros pror to the decimal point. I fyou are unsure, and you are not working on a Windows7 system (which will not run old DOS3 programs sometimes) look for TurboC version 2 for Dos 3, there are still one or two downloads available, it's a relatively small program from Borland which is a small Dos C/C++ edito/compiler and even comes with TASM, the 16 bit machine code 386/486 compile, it is covered in the help files as are many other useful nuggets of information.
All three routines are in "stdlib.h", or should be, though I have found that on VisualStudio2010 they are anything but standard, often overloaded with function dealing with WORD sized characters and asking you to use its own specific functions instead... "so much for standard library," I mutter to myself almost each and every time, "Maybe they out to get a better dictionary!"
You would need to consult your platform standards to determine how to best determine the correct format, you would need to display it as a*b^C, where 'a' is the integral component that holds the sign, 'b' is implementation defined (Likely fixed by a standard), and 'C' is the exponent used for that number.
Alternatively, you could just display it in hex, it'd mean nothing to a human, though, and it would still be binary for all practical purposes. (And just as portable!)
To answer your second question:
it IS possible to exactly and unambiguously represent floats as strings. However, this requires a hexadecimal representation. For instance, 1/16 = 0.1 and 10/16 is 0.A.
With hex floats, you can define a canonical representation. I'd personally use a fixed number of digits representing the underlying number of bits, but you could also decide to strip trailing zeroes. There's no confusion possible on which trailing digits are zero.
Since the representation is exact, the conversions are reversible: f==hexstring2float(float2hexstring(f))