Truncate a floating-point number to the leading N decimal digits - c++

Which is the most optimal way to get the n leftmost non-zero digits of a floating point number (number >= 0.0).
For example,
if n = 1:
0.014568 -> 0.01
0.246456 -> 0.2
if n = 2:
0.014568 -> 0.014
0.246456 -> 0.24
After #schil227 comment:
Currently I am doing multiplications and divisions (by 10) as necessary in order to have the n digits at the decimal number field.

Code could use sprintf(buf, "%e",...) to do most of the heavy lifting.
There are so many corner cases that other direct code may fail, sprintf() is likely to be, at least, as good solid reference solution.
This code prints the double to DBL_DECIMAL_DIG places to insure there is no rounding in digits that would make a difference. Then it zeros out various digits depending on n.
See #Mark Dickinson comment for reasons to use a greater value than DBL_DECIMAL_DIG. Perhaps on the order of DBL_DECIMAL_DIG*2. As mentioned above, there are many corner cases.
#include <float.h>
#include <math.h>
#include <stdio.h>
double foo(double x, int n) {
if (!isfinite(x)) {
return x;
}
printf("%g\n", x);
char buf[DBL_DECIMAL_DIG + 11];
sprintf(buf, "%+.*e", DBL_DECIMAL_DIG, x);
//puts(buf);
assert(n >= 1 && n <= DBL_DECIMAL_DIG + 1);
memset(buf + 2 + n, '0', DBL_DECIMAL_DIG - n + 1);
//puts(buf);
char *endptr;
x = strtod(buf, &endptr);
printf("%g\n", x);
return x;
}
int main() {
foo(0.014568, 1);
foo(0.246456, 1);
foo(0.014568, 2);
foo(0.246456, 2);
return 0;
}
Output
0.014568
0.01
0.246456
0.2
0.014568
0.014
0.246456
0.24
This answer assumes OP does not want a rounded answer. Re: 0.246456 -> 0.24

If you want the result as a string, you should probably print to a string with extra precision, then chop that off yourself. (See #chux's answer for details on how much extra precision you need for IEEE 64-bit double to avoid rounding up from a string of 9s, since you want truncation but all the usual to-string functions round to nearest.)
If you want a double result, then are you sure you really want this? Rounding / truncating early in the middle of a calculation usually just worsens the accuracy of the final result. Of course, there are uses in real algorithms for floor/ceil, trunc, and nearbyint, and this is just a scaled version of trunc.
If you just want a double, you can get fairly good results without ever going to a string. Use ndigits and floor(log10(fabs(x))) to work out a scale factor, then truncate the scaled value to an integer, then scale back.
Tested and working (with and without -ffast-math). See the asm on the Godbolt compiler explorer. This might run reasonably efficiently, especially with -ffast-math -msse4.1 (so floor and trunc can inline to roundsd).
If you care about speed, look into replacing pow() with something that takes advantage of the fact that the exponent is a small integer. I'm not sure how fast library pow() implementations are in that case. GNU C __builtin_powi(x, n) trades accuracy for speed, for integer exponents, doing a multiplication tree, which is less accurate than what pow() does.
#include <float.h>
#include <math.h>
#include <stdio.h>
double truncate_n_digits(double x, int digits)
{
if (x==0 || !isfinite(x))
return x; // good idea stolen from Chux's answer :)
double l10 = log10(fabs(x));
double scale = pow(10., floor(l10) + (1 - digits)); // floor rounds towards -Inf
double scaled = x / scale;
double scaletrunc = trunc(scaled); // trunc rounds towards zero
double truncated = scaletrunc * scale;
#if 1 // debugging code
printf("%2d %24.14g =>\t%24.14g\t scale=%g, scaled=%.30g\n", digits, x, truncated, scale, scaled);
// print with more accuracy to reveal the real behaviour
printf(" %24.20g =>\t%24.20g\n", x, truncated);
#endif
return truncated;
}
test cases:
int main() {
truncate_n_digits(0.014568, 1);
truncate_n_digits(0.246456, 1);
truncate_n_digits(0.014568, 2);
truncate_n_digits(-0.246456, 2);
truncate_n_digits(1234567, 2);
truncate_n_digits(99999999999, 6);
truncate_n_digits(-99999999999, 6);
truncate_n_digits(99999, 10);
truncate_n_digits(-0.0000000001234567, 3);
truncate_n_digits(1000, 6);
truncate_n_digits(0.001, 6);
truncate_n_digits(1e-312, 2); // denormal, and not exactly representable: 9.999...e-313
truncate_n_digits(nextafter(1e-312, INFINITY), 2); // denormal, just above 1.00000e-312
return 0;
}
each result shown twice: first with only %.14g so rounding gives the string we want, then again with %.20g to show enough places to reveal the realities of floating point math. Most numbers are not exactly-representable, so even with perfect rounding it's impossible to return a double exactly represents the truncated decimal string. (Integers up to about the size of the mantissa are exactly representable, and so are fractions where the denominator is a power of 2.)
1 0.014568 => 0.01 scale=0.01, scaled=1.45679999999999987281285029894
0.014567999999999999353 => 0.010000000000000000208
1 0.246456 => 0.2 scale=0.1, scaled=2.46456000000000008398615136684
0.2464560000000000084 => 0.2000000000000000111
2 0.014568 => 0.014 scale=0.001, scaled=14.5679999999999996163069226895
0.014567999999999999353 => 0.014000000000000000291
2 -0.246456 => -0.24 scale=0.01, scaled=-24.6456000000000017280399333686
-0.2464560000000000084 => -0.23999999999999999112
3 1234.56789 => 1230 scale=10, scaled=123.456789000000000555701262783
1234.567890000000034 => 1230
6 1234.56789 => 1234.56 scale=0.01, scaled=123456.789000000004307366907597
1234.567890000000034 => 1234.5599999999999454
6 99999999999 => 99999900000 scale=100000, scaled=999999.999990000040270388126373
99999999999 => 99999900000
6 -99999999999 => -99999900000 scale=100000, scaled=-999999.999990000040270388126373
-99999999999 => -99999900000
10 99999 => 99999 scale=1e-05, scaled=9999900000
99999 => 99999.000000000014552
3 -1.234567e-10 => -1.23e-10 scale=1e-12, scaled=-123.456699999999983674570103176
-1.234566999999999879e-10 => -1.2299999999999998884e-10
6 1000 => 1000 scale=0.01, scaled=100000
1000 => 1000
6 0.001 => 0.001 scale=1e-08, scaled=100000
0.0010000000000000000208 => 0.0010000000000000000208
2 9.9999999999847e-313 => 9.9999999996388e-313 scale=1e-314, scaled=100.000000003458453079474566039
9.9999999999846534143e-313 => 9.9999999996388074622e-313
2 1.0000000000034e-312 => 9.0000000001196e-313 scale=1e-313, scaled=9.9999999999011865980946822674
1.0000000000034059979e-312 => 9.0000000001195857973e-31
Since the result you want will often not be exactly representable, (and because of other rounding errors) the resulting double will sometimes be below the result you want, so printing it with full precision might give 1.19999999 instead of 1.20000011. You might want to use nextafter(result, copysign(INFINITY, original)) to get a result that's more likely to have a higher magnitude than what you want.
Of course, that could just make things worse in some cases. But since we truncate towards zero, most often we get a result that's just below (in magnitude) the unrepresentable exact value.

Ok, another one like #Peter Cordes but more generic.
/** Return \c digits semantic digis of number \c x.
\tparam T Type of number \c x can be floating point or integral.
\param x The number.
\param digits The requested number of semantic digits of number \c x.
\return The number with only \c digits semantic digits of number \c x. */
template<typename T>
requires(std::integral<T> || std::floating_point<T>)
T roundn(T x, unsigned int digits)
{
if (!x || !std::isfinite(x)) return x;
typedef std::conditional_t<std::floating_point<T>, T, double> Tp;
Tp mul = pow(10, floor(digits - log10(abs(x))));
Tp y = round(x * mul) / mul;
if constexpr (std::floating_point<T>) return y;
else return round(y);
}
int main()
{
cout << setprecision(100);
cout << roundn(123.456789, 1) << "\n";
cout << roundn(123.456789, 2) << "\n";
cout << roundn(123.456789, 3) << "\n";
cout << roundn(123.456789, 4) << "\n";
cout << roundn(123.456789, 5) << "\n";
cout << roundn(-123.456789, 1) << "\n";
cout << roundn(-123.456789, 2) << "\n";
cout << roundn(-123.456789, 3) << "\n";
cout << roundn(-123.456789, 4) << "\n";
cout << roundn(-123.456789, 5) << "\n";
cout << roundn(-123.456789, 15) << "\n";
cout << roundn(123456, 1) << "\n";
cout << roundn(123456, 2) << "\n";
cout << roundn(123456, 3) << "\n";
cout << roundn(123456, 10) << "\n";
cout << roundn(-123456, 1) << "\n";
cout << roundn(-123456, 2) << "\n";
cout << roundn(-123456, 3) << "\n";
cout << roundn(-123456, 10) << "\n";
cout << roundn(0.0123456789, 1) << "\n";
cout << roundn(0.0123456789, 2) << "\n";
cout << roundn(-0.0123456789, 1) << "\n";
cout << roundn(-0.0123456789, 2) << "\n";
return 0;
}
It returns
99.9999999999999857891452847979962825775146484375
120
123
123.5
123.4599999999999937472239253111183643341064453125
-99.9999999999999857891452847979962825775146484375
-120
-123
-123.5
-123.4599999999999937472239253111183643341064453125
-123.4567890000000005557012627832591533660888671875
100000
120000
123000
123456
-100000
-120000
-123000
-123456
0.01000000000000000020816681711721685132943093776702880859375
0.0120000000000000002498001805406602215953171253204345703125
-0.01000000000000000020816681711721685132943093776702880859375
-0.0120000000000000002498001805406602215953171253204345703125

Related

Double precision issues when converting it to a large integer

Precision is the number of digits in a number. Scale is the number of
digits to the right of the decimal point in a number. For example, the
number 123.45 has a precision of 5 and a scale of 2.
I need to convert a double with a maximum scale of 7(i.e. it may have 7 digits after the decimal point) to a __int128. However, given a number, I don't know in advance, the actual scale the number has.
#include <iostream>
#include "json.hpp"
using json = nlohmann::json;
#include <string>
static std::ostream& operator<<(std::ostream& o, const __int128& x) {
if (x == std::numeric_limits<__int128>::min()) return o << "-170141183460469231731687303715884105728";
if (x < 0) return o << "-" << -x;
if (x < 10) return o << (char)(x + '0');
return o << x / 10 << (char)(x % 10 + '0');
}
int main()
{
std::string str = R"({"time": [0.143]})";
std::cout << "input: " << str << std::endl;
json j = json::parse(str);
std::cout << "output: " << j.dump(4) << std::endl;
double d = j["time"][0].get<double>();
__int128_t d_128_bad = d * 10000000;
__int128_t d_128_good = __int128(d * 1000) * 10000;
std::cout << std::setprecision(16) << std::defaultfloat << d << std::endl;
std::cout << "d_128_bad: " << d_128_bad << std::endl;
std::cout << "d_128_good: " << d_128_good << std::endl;
}
Output:
input: {"time": [0.143]}
output: {
"time": [
0.143
]
}
0.143
d_128_bad: 1429999
d_128_good: 1430000
As you can see, the converted double is not the expected 1430000 instead it is 1429999. I know the reason is that a float point number can not be represented exactly. The problem can be solved if I know the number of digit after the decimal point.
For example,
I can instead use __int128_t(d * 1000) * 10000. However, I don't know the scale of a given number which might have a maximum of scale 7.
Question> Is there a possible solution for this? Also, I need to do this conversion very fast.
I'm not familiar with this library, but it does appear to have a mechanism to get a json object's string representation (dump()). I would suggest you parse that into your value rather than going through the double intermediate representation, as in that case you will know the scale of the value as it was written.

How can I check for - and get a remainder using fmod (floats)?

My goal is to check if there is any remainder left when dividing 2 floats, and if there is, give that remainder back to the user.
Given the following code, I had expected that fmod(2, 0.2) would be 0, however, I get back 0.2. I read that this has to do with floating point problems. But is there any way this can be done properly?
int main() {
float a = 2.0;
float b = 0.2;
float rem = fmod(a, b);
if (rem > 0) {
std::cout << "There is a remainder: " << rem << std::endl;
} else {
std::cout << "No remainder: " << rem << std::endl;
}
}
Output:
There is a remainder: 0.2
Yes your hunch is correct. std::fmod is computing
std::fmod(2.0f, 0.20000000298023223876953125f)
where the second parameter is the closest IEEE754 (assume your plaform uses that) float to 0.2.
Luckily though mathematical modulus is distributive across multiplication, so you could repose as
double rem = (long long)std::round(a * 10) % (long long)std::round(b * 10) / 10.0;
using a larger power of 10 according to the number of decimal places required to represent the original problem.

How to avoid floating point format error

I am facing with following issue.
when I multiply two numbers depending from values of this numbers I get different results. I tried to experiment with types but didn't get expected result.
#include <stdio.h>
#include <iostream>
#include <fstream>
#include <iomanip>
#include <math.h>
int main()
{
const double value1_39 = 1.39;
const long long m_100000 = 100000;
const long long m_10000 = 10000;
const double m_10000double = 10000;
const long long longLongResult_1 = value1_39 * m_100000;
const double doubleResult_1 = value1_39 * m_100000;
const long long longLongResult_2 = value1_39 * m_10000;
const double doubleResult_2 = value1_39 * m_10000;
const long long longLongResult_3 = value1_39 * m_10000double;
const double doubleResult_3 = value1_39 * m_10000double;
std::cout << std::setprecision(6) << value1_39 << '\n';
std::cout << std::setprecision(6) << longLongResult_1 << '\n';
std::cout << std::setprecision(6) << doubleResult_1 << '\n';
std::cout << std::setprecision(6) << longLongResult_2 << '\n';
std::cout << std::setprecision(6) << doubleResult_2 << '\n';
std::cout << std::setprecision(6) << longLongResult_3 << '\n';
std::cout << std::setprecision(6) << doubleResult_3 << '\n';
return 0;
}
result seen in debuger
Variable Value
value1_39 1.3899999999999999
m_100000 100000
m_10000 10000
m_10000double 10000
longLongResult_1 139000
doubleResult_1 139000
longLongResult_2 13899
doubleResult_2 13899.999999999998
longLongResult_3 13899
doubleResult_3 13899.999999999998
result seen in cout
1.39
139000
139000
13899
13900
13899
13900
I know that the problem is that the problem is in nature of keeping floating point format in computer. It keeps data as a fractions in base 2.
My question is how to get 1.39 * 10 000 as 13900?(because I am getting 139000 when multipling with 100000 the same value) is there any trick which can help to achieve my goal?
I have some ideas in my mind bunt not sure are they good enough.
1) pars string to get number from left of . and rigth of doth
2) multiply number by 100 and divide by 100 when calculation is done, but each of this solutions has their drawback. I am wondering is there any nice trick for this.
As the comments already said, no there is no solution. This problem is due to the nature of floating points being stored as base 2 (as you already said). The type floating point is defined in IEEE 754. Everything that is not a base two number can't be stored precisely in base 2.
To be more specific
You CAN store:
1.25 (2^0 + 2^-2)
0.75 (2^-1 + 2^-2)
because there is an exact representation.
You CAN'T store:
1.1
1.4
because this will result in an irrational fracture in the base 2 system. You can try to round or use a sort of arbitrary precision float point library (but even they have their limits [memory/speed]) with a much greater precision than float and then backcast to float after multiplication.
There are also a lot of other related problems when it comes to floating points. You will find out that the result of 10^20 + 2 is only 10^20 because you have a fixed digit resolution (6-7 digits for float and 15-16 digits for double). When you calculate with numbers that have huge differences in magnitude the smaller ones will just "disappear".
Question: Why does multiply 1.39 * 10^6 get 139000 but multiplying 1.39 * 10^5 not?
This could be because of the order of magnitude. 10000 has 5 digits, 1.39 has 3 digits (distance 7 - just within the float). Both could be near enough to "show" the problem. When it comes to 100000 you have 6 digits but you have one more magnitude difference to 1.39 (distance 8 - just out of float). Therefore one of the trailing digits gets cut off and you get a more "natural" result. (This is just one reason for this. Compiler, OS and other reasons might exist)

precision of double function based string

Let's say that you have a function:
string function(){
double f = 2.48452
double g = 2
double h = 5482.48552
double i = -78.00
double j = 2.10
return x; // ***
}
* for x we insert:
if we will insert f, function returns: 2.48
if we will insert g, function returns: 2
if we will insert h, function returns: 5482.49
if we will insert i, function returns:-78
if we will insert j, function returns: 2.1
They are only example, who shows how the funcion() works. To precise:
The function for double k return rounded it to: k.XX,
but for:
k=2.20
it return 2.2 as string.
How it implements?
1) Just because you see two digits, it doesn't mean the underlying value was necessarily rounded to two digits.
The precision of the VALUE and the number of digits displayed in the FORMATTED OUTPUT are two completely different things.
2) If you're using cout, you can control formatting with "setprecision()":
http://www.cplusplus.com/reference/iomanip/setprecision/
EXAMPLE (from the above link):
// setprecision example
#include <iostream> // std::cout, std::fixed
#include <iomanip> // std::setprecision
int main () {
double f =3.14159;
std::cout << std::setprecision(5) << f << '\n';
std::cout << std::setprecision(9) << f << '\n';
std::cout << std::fixed;
std::cout << std::setprecision(5) << f << '\n';
std::cout << std::setprecision(9) << f << '\n';
return 0;
}
sample output:
3.1416
3.14159
3.14159
3.141590000
Mathematically, 2.2 is exactly the same as 2.20, 2.200, 2.2000, and so on. If you want to see more insignificant zeros, use [setprecision][1]:
cout << fixed << setprecision(2);
cout << 2.2 << endl; // Prints 2.20
To show up to 2 decimal places, but not showing trailing zeros you can do something such as:
std::string function(double value)
{
// get fractional part
double fracpart = value - static_cast<long>(value);
// compute fractional part rounded to 2 decimal places as an int
int decimal = static_cast<int>(100*fabs(fracpart) + 0.5);
if (decimal >= 100) decimal -= 100;
// adjust precision based on the number of trailing zeros
int precision = 2; // default 2 digits precision
if (0 == decimal) precision = 0; // 2 trailing zeros, don't show decimals
else if (0 == (decimal % 10)) precision = 1; // 1 trailing zero, keep 1 decimal place
// convert value to string
std::stringstream str;
str << std::fixed << std::setprecision(precision) << value;
return str.str();
}

Output float as three digits, or more to avoid exponent

I'm trying to output a float as three digits, or more if necessary to avoid an exponent.
Some examples:
0.12 // 0.123 would be ok
1.23
12.3
123
1234
12345
The closest I've gotten is
std::cout << std::setprecision(3) << f << std::cout;
but this prints things like
21 // rather than 21.0
1.23e+03 // rather than 1234
Combining std::setprecision with std::fixed means I always get the same number of post-decimal digits, which is not what I want.
Using std::setw, 123.456 would still print as 123.456 rather than 123.
Any suggestions?
As far as I can tell, the easiest way around this is to roll a function to catch it. I threw this together and it seems to work. I'm not sure if you wanted large numbers to only have 3 significant digits or if they should keep all sig figs to the left of the decimal place, but it wouldn't be hard to make that modification:
void printDigits(float value, int numDigits = 3)
{
int log10ofValue = static_cast<int>(std::log10(std::abs(value)));
if(log10ofValue >= 0) //positive log means >= 1
{
++log10ofValue; //add 1 because we're culling to the left of the decimal now
//The difference between numDigits and the log10 will let us transition across the decimal
// in cases like 12.345 or 1.23456 but cap it at 0 for ones greater than 10 ^ numDigits
std::cout << std::setprecision(std::max(numDigits - log10ofValue, 0));
}
else
{
//We know log10ofValue is <= 0, so set the precision to numDigits + the abs of that value
std::cout << std::setprecision(numDigits + std::abs(log10ofValue));
}
//This is a floating point truncate -- multiply up into integer space, use floor, then divide back down
float truncated = std::floor(value * std::pow(10.0, numDigits - log10ofValue)) / std::pow(10.0, numDigits - log10ofValue);
std::cout << std::fixed << truncated << std::endl;
}
Test:
int main(void)
{
printDigits(0.0000000012345);
printDigits(12345);
printDigits(1.234);
printDigits(12.345678);
printDigits(0.00012345);
printDigits(123456789);
return 0;
}
Output:
0.00000000123
12300
1.23
12.3
0.000123
123000000
Here's the solution I came up with. Ugly, but I believe it works.
if(f>=100) {
std::cout << std::fixed << std::setprecision(0) << f << std::endl;
std::cout.unsetf(std::ios_base::floatfield);
} else {
std::cout << std::showpoint << std::setprecision(3) << f << std::noshowpoint << std::endl;
}
If someone knows how to simplify this, please let me know!