The following code throws an std::out_of_range exception in Visual Studio 2013 where in my opinion it shouldn't:
#include <string>
#include <limits>
int main(int argc, char ** argv)
{
double maxDbl = std::stod(std::to_string(std::numeric_limits<double>::max()));
return 0;
}
I tested the code also with gcc 4.9.2 and there it does not throw an exception. The issue seems to be caused by an inaccurate string representation after the conversion to string. In Visual Studio std::to_string(std::numeric_limits<double>::max()) yields
179769313486231610000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000
which indeed seems too large. In gcc, however, it yields
179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000
which seems to be smaller than the passed value.
However, isn't std::numeric_limits<double>::max() supposed to return the
maximum finite representable floating-point number?
So why do the string representations get off? What am I missing here?
Direct answer
Gcc (and Clang and VS2105) correctly return the integer value of (21024 - 1) - (21024-53 - 1) that is what is represented with 52 one bits of significand and an unbiased exponent of 1023 (21024 - 1 would be the integer value with 1023 one bits, and I just substract all the bits below the 52 of the IEE754 format)
I can confirm that a large integer library give 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368L
The previous exact floating point would be 2971 lesser (971 = 1023 - 52) that is : 179769313486231550856124328384506240234343437157459335924404872448581845754556114388470639943126220321960804027157371570809852884964511743044087662767600909594331927728237078876188760579532563768698654064825262115771015791463983014857704008123419459386245141723703148097529108423358883457665451722744025579520L
The next non representable value would be 2971 greater that is:
179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137216L
But the value used by MSVC2013 and previous is near to 21024 + 2971, that is : 179769313486231610731333614426100589925524828262616317947942685512308090830973387504827396012048193870699768806228404251083258210739369062217227314575410731769485876273179688476358949112102859294830297395714877595371718127781702814782017661749531126051903195165027873311156314696040132728420308633064323416064L
. As it is greater than any value representable in IEEE754 double precision, it cannot be decoded to a double.
Because at most, one could say that any value between 21024 - 2971 (std::numeric_limits<double>::max()) and 21024 could be rounded to std::numeric_limits<double>::max(), but values greater than 21024 are clearly an overflow.
Discussion on accuracy
Only 16 decimal digits are accurate in a double and all other digits can be seen as garbage or random values since they do not depend on the value itself but only one the way you choose to calculate them. Just try to substract 1e+288 (that's already a big value) to maxDbl and look what happens :
maxLess = max Dbl - 1.e+288;
if (maxLess == maxDbl) {
std::cout << "Unchanged" << std::endl;
}
else std::cout << "Changed" << std::endl;
You should see ... Unchanged.
It just looks like VS 2013 is a little incoherent in the way it rounds floating point values : it rounded maxDbl by excess to one bit higher than the maximum actually representable value, and could not decode it later.
The problem is that the standard choosed to use a %f format which gives a false sentiment of accuracy. If you want to see an equivalent problem in gcc, just use :
#include <iostream>
#include <string>
#include <limits>
#include <iomanip>
#include <sstream>
int main() {
double max = std::numeric_limits<double>::max();
std::ostringstream ostr;
ostr << std::setprecision(16) << max;
std::string smax = ostr.str();
std::cout << smax << std::endl;
double m2 = std::stod(smax);
std::cout << m2 << std::endl;
return 0;
}
Rounded to 16 digits mxDbl writes (correctly) : 1.797693134862316e+308, but can no longer be decoded back
And this one :
#include <iostream>
#include <string>
#include <limits>
int main() {
double maxDbl = std::numeric_limits<double>::max();
std::string smax = std::to_string(maxDbl);
std::cout << smax << std::endl;
std::string smax2 = "179769313486231570800000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000";
double max2 = std::stod(smax2);
if (max2 == maxDbl) {
std::cout << smax2 << " is same double as " << smax << std::endl;
}
return 0;
}
Displays :
179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000
179769313486231570800000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000 is same double as 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000
TL/DR : What I mean is that one big enoudh double value can of course be represented by an exact integer (per IEEE754). But it does represent all integers between half to the previous one and half to the next one. So any integer in that range could be an acceptable representation for the double, and one value rounded at 16 decimal digits should be acceptable, but current standard libraries only allow max floating point value to be truncated at 16 decimal digits. But VS2013 gave a number above the max of the range what was in any case an error.
Reference
IEEE floating point on wikipedia
Related
how would I make it so when i enter 2.785 for the input question the output will display the variable question as 2.79?
I tried using setprecision but for some reason it is not working unless i am doing it wrong
here is the user input question and what it should be:
Enter positive daily growth % (.1 must be entered as 10):
user enters "2.785"
output -> 0.02785
My desired output should look like:
desired output-> 2.79%
Any help is appreciated. I know it may seem simple to others but I have already tried looking online and everything I find just isn't making sense or doesn't work and I dont know what I am doing wrong.
Floating point arithmetic
The reason why it is challenging is that floating point cannot be represented accurately when you perform operations on them. See wikipedia article
It is a very intesting topic, if you have a bit of time, take a look at explanations about floating point and how its representation inside the computer.
If you are looking for the display only (only works for small decimals)
If you are just looking to display a small value you can use below code:
#include <cmath>
#include <iostream>
#include <iomanip>
#include <limits>
#include <sstream>
using namespace std;
string truncateAsString(double n, int precision) {
stringstream ss;
double remainder = static_cast<double>((int)floor((n - floor(n)) * precision) % precision);
ss << setprecision(numeric_limits<double> ::max_digits10 + __builtin_ctz(precision))<< floor(n);
if (remainder)
ss << "." << remainder;
cout << ss.str() << "%" << endl;
return ss.str();
}
int main(void) {
double a = 0.02785;
int precision = 100; // as many digits as you add zeroes. 3 zeroes means precision of 3.
string s = truncateAsString(a*100 + 0.5 / 100, precision);
return 0;
}
Looking for the true value?
Maybe you are looking for true value for your floating point, you can use boost multiprecision library
The Boost.Multiprecision library can be used for computations requiring precision exceeding that of standard built-in types such as float, double and long double. For extended-precision calculations, Boost.Multiprecision supplies a template data type called cpp_dec_float. The number of decimal digits of precision is fixed at compile-time via template parameter.
You need to use a custom library like boost/multiprecision because of the lack of precision for floating points, see my code below:
#include <boost/math/constants/constants.hpp>
#include <boost/multiprecision/cpp_dec_float.hpp>
#include <iostream>
#include <limits>
#include <cmath>
#include <iomanip>
using namespace std;
using boost::multiprecision::cpp_dec_float_50;
cpp_dec_float_50 truncate(cpp_dec_float_50 n, int precision) {
cpp_dec_float_50 remainder = static_cast<cpp_dec_float_50>((int)floor((n - floor(n)) * precision) % precision) / static_cast<cpp_dec_float_50>(precision);
return floor(n) + remainder;
}
int main(void) {
int precision = 100; // as many digits as you add zeroes. 5 zeroes means precision of 5.
cpp_dec_float_50 n = 0.02785 * 100;
n = truncate(n + 0.5/precision, precision); // first part is remainder, floor(n) is int value truncated.
cout << setprecision(numeric_limits<cpp_dec_float_50> ::max_digits10 + __builtin_ctz(precision)) << n << "%" << endl; // __builtin_ctz(precision) will equal the number of trailing 0, exactly the precision we need!
return 0;
}
Output (both cases)
2.79%
NB: I add 0.5 / precision to the truncate function to force it to act like a rounding.
For a number a = 1.263839, we can do -
float a = 1.263839
cout << fixed << setprecision(2) << a <<endl;
output :- 1.26
But what if i want set precision of a number and store it, for example-
convert 1.263839 to 1.26 without printing it.
But what if i want set precision of a number and store it
You can store the desired precision in a variable:
int precision = 2;
You can then later use this stored precision when converting the float to a string:
std::cout << std::setprecision(precision) << a;
I think OP wants to convert from 1.263839 to 1.26 without printing the number.
If this is your goal, then you first must realise, that 1.26 is not representable by most commonly used floating point representation. The closest representable 32 bit binary IEEE-754 value is 1.2599999904632568359375.
So, assuming such representation, the best that you can hope for is some value that is very close to 1.26. In best case the one I showed, but since we need to calculate the value, keep in mind that some tiny error may be involved beyond the inability to precisely represent the value (at least in theory; there is no error with your example input using the algorithm below, but the possibility of accuracy loss should always be considered with floating point math).
The calculation is as follows:
Let P bet the number of digits after decimal point that you want to round to (2 in this case).
Let D be 10P (100 in this case).
Multiply input by D
std::round to nearest integer.
Divide by D.
P.S. Sometimes you might not want to round to the nearest, but instead want std::floor or std::ceil to the precision. This is slightly trickier. Simply std::floor(val * D) / D is wrong. For example 9.70 floored to two decimals that way would become 9.69, which would be undesirable.
What you can do in this case is multiply with one magnitude of precision, round to nearest, then divide the extra magnitude and proceed:
Let P bet the number of digits after decimal point that you want to round to (2 in this case).
Let D be 10P (100 in this case).
Multiply input by D * 10
std::round to nearest integer.
Divide by 10
std::floor or std::ceil
Divide by D.
You would need to truncate it. Possibly the easiest way is to multiply it by a factor (in case of 2 decimal places, by a factor of 100), then truncate or round it, and lastly divide by the very same factor.
Now, mind you, that floating-point precision issues might occur, and that even after those operations your float might not be 1.26, but 1.26000000000003 instead.
If your goal is to store a number with a small, fixed number of digits of precision after the decimal point, you can do that by storing it as an integer with an implicit power-of-ten multiplier:
#include <stdio.h>
#include <math.h>
// Given a floating point value and the number of digits
// after the decimal-point that you want to preserve,
// returns an integer encoding of the value.
int ConvertFloatToFixedPrecision(float floatVal, int numDigitsAfterDecimalPoint)
{
return (int) roundf(floatVal*powf(10.0f, numDigitsAfterDecimalPoint));
}
// Given an integer encoding of your value (as returned
// by the above function), converts it back into a floating
// point value again.
float ConvertFixedPrecisionBackToFloat(int fixedPrecision, int numDigitsAfterDecimalPoint)
{
return ((float) fixedPrecision) / powf(10.0f, numDigitsAfterDecimalPoint);
}
int main(int argc, char ** arg)
{
const float val = 1.263839;
int fixedTwoDigits = ConvertFloatToFixedPrecision(val, 2);
printf("fixedTwoDigits=%i\n", fixedTwoDigits);
float backToFloat = ConvertFixedPrecisionBackToFloat(fixedTwoDigits, 2);
printf("backToFloat=%f\n", backToFloat);
return 0;
}
When run, the above program prints this output:
fixedTwoDigits=126
backToFloat=1.260000
If you're talking about storing exactly 1.26 in your variable, chances are you can't (there may be an off chance that exactly 1.26 works, but let's assume it doesn't for a moment) because floating point numbers don't work like that. There are always little inaccuracies because of the way computers handle floating point decimal numbers. Even if you could get 1.26 exactly, the moment you try to use it in a calculation.
That said, you can use some math and truncation tricks to get very close:
int main()
{
// our float
float a = 1.263839;
// the precision we're trying to accomplish
int precision = 100; // 3 decimal places
// because we're an int, this will keep the 126 but lose everything else
int truncated = a * precision; // multiplying by the precision ensures we keep that many digits
// convert it back to a float
// Of course, we need to ensure we're doing floating point division
float b = static_cast<float>(truncated) / precision;
cout << "a: " << a << "\n";
cout << "b: " << b << "\n";
return 0;
}
Output:
a: 1.26384
b: 1.26
Note that this is not really 1.26 here. But is is very close.
This can be demonstrated by using setprecision():
cout << "a: " << std:: setprecision(10) << a << "\n";
cout << "b: " << std:: setprecision(10) << b << "\n";
Output:
a: 1.263839006
b: 1.25999999
So again, it's not exactly 1.26, but very close, and slightly closer than you were before.
Using a stringstream would be an easy way to achieve that:
#include <iostream>
#include <iomanip>
#include <sstream>
using namespace std;
int main() {
stringstream s("");
s << fixed << setprecision(2) << 1.263839;
float a;
s >> a;
cout << a; //Outputs 1.26
return 0;
}
I'm trying to assign a big double value to a variable and print it on the console. The number I supply in is different than what is displayed as output. Is it possible to get the double value correctly assigned and output without the loss of precision? Here is the C++ code:
#include <iostream>
#include <limits>
int main( int argc, char *argv[] ) {
// turn off scientific notation on floating point numbers
std::cout << std::fixed << std::setprecision( 3 );
// maximum double value on my machine
std::cout << std::numeric_limits<double>::max() << std::endl;
// string representation of the double value I want to get
std::cout << "123456789123456789123456789123456789.01" << std::endl;
// value I supplied
double d = 123456789123456789123456789123456789.01;
// it's printing 123456789123456784102659645885120512.000 instead of 123456789123456789123456789123456789.01
std::cout << d << std::endl;
return EXIT_SUCCESS;
}
Could you, please, help me to understand the problem.
C++ built-in floating point types are finite in precision. double is usually implemented as IEEE-754 double precision, meaning it has 53 bits of mantissa (the "value") precision, 11 bits of exponent precision, and 1 sign bit.
The number 123456789123456789123456789123456789 requires way more than 53 bits to represent, meaning a typical double cannot possibly represent it accurately. If you want such large numbers with perfect precision, you need to use some sort of a "big number" library.
For more information on floating point formats and their inaccuracies, you can read What Every Programmer Should Know About Floating-Point Arithmetic.
I have a seemingly simple c++ issue that's bothering me. The output of the code
#include <iostream>
using namespace std;
int main() {
// your code goes here
double c = 9.43827 * 0.105952 ;
cout << c << endl ;
return 0;
}
is 1. Just 1. I guess this is due to precision loss based on how doubles are stored in c++ but surely there must be a way in c++ to get some sort of precision (2 or 3 decimal places) in the result.
It's not precision loss in storage, it's precision loss in converting to text. The stream inserter for double defaults to six significant digits. The product here, 1.000003583, rounded to six significant digits, is 1.00000. In addition, if you haven't set showpoint, the trailing zeros and the decimal point will be suppressed, so you'll see a bare 1. To get the decimal point to show, use std::cout << std::showpoint << c << '\n';. To see more significant digits, use std::cout << std::setprecision(whatever) << c << '\n';, where whatever is the number of digits you want the formatter to use.
#include <stdio.h>
int main() {
// your code goes here
double c = ((double)9.43827) * 0.105952 ;
for(int i = (sizeof(double)*8)-1; i >= 0; i-- ) {
printf("%ld", (*(long*)&c>>i)&1);
}
}
If you run that, you can clearly see the bit representation of your double is not the integer value 1. You're not losing any data.
0011111111110000000000000000001111000001110100001010001001001001
but it is very close to 1, so that's what gets printed out.
Try using cout<<setprecision(12)<<c<<endl;
setprecision sets the decimal precision to be used to format floating-point values on output operations.
source
I run this code but the output was different from what I expected.
The output:
c = 1324
v = 1324.99
I expected that the output should be 1324.987 for v. Why is the data in v different from output?
I'm using code lite on Windows 8 32.
#include <iostream>
using namespace std;
int main()
{
double v = 1324.987;
int n;
n = int (v);
cout << "c = " << n << endl;
cout << "v = " << v << endl;
return 0;
}
Floating point types inherit rounding errors as a result of their fixed width representations. For more information, see What Every Computer Scientist Should Know About Floating-Point Arithmetic.
The default precision when printing with cout is 6, so only 6 decimal places will be displayed. The number is rounded to the nearest value, that's why you saw 1324.99. You need to set a higher precision to see the more "correct" value
However, setting the precision too high may print out a lot of garbage digits behind, because binary floating-point types cannot store all decimal floating-point values exactly.