I need to extract significand and exponent of a double in C++ using gpmlib.
Ex: double a = 1.234;
I would like to extract 1234 as significand and 3 as exponent so that a = 1234e-3. I heard that gpmlib supports this type functions. I am not sure how to this library.
Please share some sample code using this library.
It seems that you're looking for mpf_class::get_str(), which will break up the floating-point value 1.234 into the string "1234" and the exponent 1, because 1.234 == 0.1234 * 10^1
You will need to subtract the size of the string from that exponent, to fit your requirements.
#include <iostream>
#include <string>
#include <gmpxx.h>
int main()
{
double a = 1.125; // 1.234 cannot be stored in a double exactly, try "1.234"
mpf_class f(a);
mp_exp_t exp;
std::string significand = f.get_str(exp);
std::cout << "significand = " << significand
<< " exponent = " << exp-(int)significand.size() << '\n';
}
This prints
~ $ ./test
significand = 1125 exponent = -3
Related
I am trying extract the mantissa and exponent part from the double.
For the test data '0.15625', expected mantissa and exponent are '5' and '-5' respectively (5*2^-5).
double value = 0.15625;
double mantissa = frexp(value, &exp);
Result: mantissa = 0.625 and exp = -2.
Here the mantissa returned is a fraction. For my use case (ASN.1 encoding), mantissa should be integer. I understand by right-shifting the mantissa and adjusting the exponent, I can convert binary fraction to the integer. In the eg, 0.625 base 10 is 0.101 base 2, so 3 bytes to be shifted to get the integer. But I am finding it difficult to find a generic algorithm.
So my question is, how do I calculate the number bits to be shifted to convert a decimal fraction to a binary integer?
#include <cmath> // For frexp.
#include <iomanip> // For fixed and setprecision.
#include <iostream> // For cout.
#include <limits> // For properties of floating-point format.
int main(void)
{
double value = 0.15625;
// Separate value into significand in [.5, 1) and exponent.
int exponent;
double significand = std::frexp(value, &exponent);
// Scale significand by number of digits in it, to produce an integer.
significand = scalb(significand, std::numeric_limits<double>::digits);
// Adjust exponent to compensate for scaling.
exponent -= std::numeric_limits<double>::digits;
// Set stream to print significand in full.
std::cout << std::fixed << std::setprecision(0);
// Output triple with significand, base, and exponent.
std::cout << "(" << significand << ", "
<< std::numeric_limits<double>::radix << ", " << exponent << ")\n";
}
Sample output:
(5629499534213120, 2, -55)
(If the value is zero, you might wish to force the exponent to zero, for aesthetic reasons. Mathematically, any exponent would be correct.)
how would I make it so when i enter 2.785 for the input question the output will display the variable question as 2.79?
I tried using setprecision but for some reason it is not working unless i am doing it wrong
here is the user input question and what it should be:
Enter positive daily growth % (.1 must be entered as 10):
user enters "2.785"
output -> 0.02785
My desired output should look like:
desired output-> 2.79%
Any help is appreciated. I know it may seem simple to others but I have already tried looking online and everything I find just isn't making sense or doesn't work and I dont know what I am doing wrong.
Floating point arithmetic
The reason why it is challenging is that floating point cannot be represented accurately when you perform operations on them. See wikipedia article
It is a very intesting topic, if you have a bit of time, take a look at explanations about floating point and how its representation inside the computer.
If you are looking for the display only (only works for small decimals)
If you are just looking to display a small value you can use below code:
#include <cmath>
#include <iostream>
#include <iomanip>
#include <limits>
#include <sstream>
using namespace std;
string truncateAsString(double n, int precision) {
stringstream ss;
double remainder = static_cast<double>((int)floor((n - floor(n)) * precision) % precision);
ss << setprecision(numeric_limits<double> ::max_digits10 + __builtin_ctz(precision))<< floor(n);
if (remainder)
ss << "." << remainder;
cout << ss.str() << "%" << endl;
return ss.str();
}
int main(void) {
double a = 0.02785;
int precision = 100; // as many digits as you add zeroes. 3 zeroes means precision of 3.
string s = truncateAsString(a*100 + 0.5 / 100, precision);
return 0;
}
Looking for the true value?
Maybe you are looking for true value for your floating point, you can use boost multiprecision library
The Boost.Multiprecision library can be used for computations requiring precision exceeding that of standard built-in types such as float, double and long double. For extended-precision calculations, Boost.Multiprecision supplies a template data type called cpp_dec_float. The number of decimal digits of precision is fixed at compile-time via template parameter.
You need to use a custom library like boost/multiprecision because of the lack of precision for floating points, see my code below:
#include <boost/math/constants/constants.hpp>
#include <boost/multiprecision/cpp_dec_float.hpp>
#include <iostream>
#include <limits>
#include <cmath>
#include <iomanip>
using namespace std;
using boost::multiprecision::cpp_dec_float_50;
cpp_dec_float_50 truncate(cpp_dec_float_50 n, int precision) {
cpp_dec_float_50 remainder = static_cast<cpp_dec_float_50>((int)floor((n - floor(n)) * precision) % precision) / static_cast<cpp_dec_float_50>(precision);
return floor(n) + remainder;
}
int main(void) {
int precision = 100; // as many digits as you add zeroes. 5 zeroes means precision of 5.
cpp_dec_float_50 n = 0.02785 * 100;
n = truncate(n + 0.5/precision, precision); // first part is remainder, floor(n) is int value truncated.
cout << setprecision(numeric_limits<cpp_dec_float_50> ::max_digits10 + __builtin_ctz(precision)) << n << "%" << endl; // __builtin_ctz(precision) will equal the number of trailing 0, exactly the precision we need!
return 0;
}
Output (both cases)
2.79%
NB: I add 0.5 / precision to the truncate function to force it to act like a rounding.
The program
int main ()
{
long long ll = LLONG_MAX;
float f = ll;
std::cout << ll << '\n';
std::cout << std::fixed << f << '\n';
return 0;
}
gives:
9223372036854775807
9223372036854775808.000000
How is it possible? If 23-bit mantissa can have only 8,388,607 maximum value, why does cout output a 64-bit number?
You stored 2^63-1 in a float, which was rounded to 2^63 = 9223372036854775808. The powers of 2 are exactly representable.
The nearest number which is exactly representable is 2^63 + 2^40 = 9223373136366403584.
long long for you is a 64 bit data type so that means LLONG_MAX has a value of 2^63 - 1. You are right in that this can't be stored in a float which only has 23 bits of mantissa, but 2^63, which is one more than LLONG_MAX is easily stored in a float. It stores 2 in the mantissa and 63 in the exponent and there you have it.
I have a code where i can get digit upto 9 digits after decimal point so say something like 0.123456789. Now we can have a case where i get the value 10.123 or say 1001.12. Now there are only 3 digits after decimal point and 2 digits in e.g 10.123 and 1001.12. I am using
#include <iostream>
#include <iomanip>
#include <sstream>
#include <stdio.h>
using namespace std;
int main()
{
std:stringstream ss;
double val = 1.234;
ss.str(std::string());
ss << std::fixed;
ss << std::setprecision(9);
ss << val;
string number= ss.str();
std::cout << number <<"\n";
return 0;
}
Above would give output as 1.234000000 . Note that i would want the precision to be handled automatically depending on the length of the digits after decimal point. One way is for me to find number of digits after decimal point and set precision evverytime , is there some other standard method provided, that takes care of it ?
Thanks
The following code throws an std::out_of_range exception in Visual Studio 2013 where in my opinion it shouldn't:
#include <string>
#include <limits>
int main(int argc, char ** argv)
{
double maxDbl = std::stod(std::to_string(std::numeric_limits<double>::max()));
return 0;
}
I tested the code also with gcc 4.9.2 and there it does not throw an exception. The issue seems to be caused by an inaccurate string representation after the conversion to string. In Visual Studio std::to_string(std::numeric_limits<double>::max()) yields
179769313486231610000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000
which indeed seems too large. In gcc, however, it yields
179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000
which seems to be smaller than the passed value.
However, isn't std::numeric_limits<double>::max() supposed to return the
maximum finite representable floating-point number?
So why do the string representations get off? What am I missing here?
Direct answer
Gcc (and Clang and VS2105) correctly return the integer value of (21024 - 1) - (21024-53 - 1) that is what is represented with 52 one bits of significand and an unbiased exponent of 1023 (21024 - 1 would be the integer value with 1023 one bits, and I just substract all the bits below the 52 of the IEE754 format)
I can confirm that a large integer library give 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368L
The previous exact floating point would be 2971 lesser (971 = 1023 - 52) that is : 179769313486231550856124328384506240234343437157459335924404872448581845754556114388470639943126220321960804027157371570809852884964511743044087662767600909594331927728237078876188760579532563768698654064825262115771015791463983014857704008123419459386245141723703148097529108423358883457665451722744025579520L
The next non representable value would be 2971 greater that is:
179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137216L
But the value used by MSVC2013 and previous is near to 21024 + 2971, that is : 179769313486231610731333614426100589925524828262616317947942685512308090830973387504827396012048193870699768806228404251083258210739369062217227314575410731769485876273179688476358949112102859294830297395714877595371718127781702814782017661749531126051903195165027873311156314696040132728420308633064323416064L
. As it is greater than any value representable in IEEE754 double precision, it cannot be decoded to a double.
Because at most, one could say that any value between 21024 - 2971 (std::numeric_limits<double>::max()) and 21024 could be rounded to std::numeric_limits<double>::max(), but values greater than 21024 are clearly an overflow.
Discussion on accuracy
Only 16 decimal digits are accurate in a double and all other digits can be seen as garbage or random values since they do not depend on the value itself but only one the way you choose to calculate them. Just try to substract 1e+288 (that's already a big value) to maxDbl and look what happens :
maxLess = max Dbl - 1.e+288;
if (maxLess == maxDbl) {
std::cout << "Unchanged" << std::endl;
}
else std::cout << "Changed" << std::endl;
You should see ... Unchanged.
It just looks like VS 2013 is a little incoherent in the way it rounds floating point values : it rounded maxDbl by excess to one bit higher than the maximum actually representable value, and could not decode it later.
The problem is that the standard choosed to use a %f format which gives a false sentiment of accuracy. If you want to see an equivalent problem in gcc, just use :
#include <iostream>
#include <string>
#include <limits>
#include <iomanip>
#include <sstream>
int main() {
double max = std::numeric_limits<double>::max();
std::ostringstream ostr;
ostr << std::setprecision(16) << max;
std::string smax = ostr.str();
std::cout << smax << std::endl;
double m2 = std::stod(smax);
std::cout << m2 << std::endl;
return 0;
}
Rounded to 16 digits mxDbl writes (correctly) : 1.797693134862316e+308, but can no longer be decoded back
And this one :
#include <iostream>
#include <string>
#include <limits>
int main() {
double maxDbl = std::numeric_limits<double>::max();
std::string smax = std::to_string(maxDbl);
std::cout << smax << std::endl;
std::string smax2 = "179769313486231570800000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000";
double max2 = std::stod(smax2);
if (max2 == maxDbl) {
std::cout << smax2 << " is same double as " << smax << std::endl;
}
return 0;
}
Displays :
179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000
179769313486231570800000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000 is same double as 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000
TL/DR : What I mean is that one big enoudh double value can of course be represented by an exact integer (per IEEE754). But it does represent all integers between half to the previous one and half to the next one. So any integer in that range could be an acceptable representation for the double, and one value rounded at 16 decimal digits should be acceptable, but current standard libraries only allow max floating point value to be truncated at 16 decimal digits. But VS2013 gave a number above the max of the range what was in any case an error.
Reference
IEEE floating point on wikipedia