Precision is the number of digits in a number. Scale is the number of
digits to the right of the decimal point in a number. For example, the
number 123.45 has a precision of 5 and a scale of 2.
I need to convert a double with a maximum scale of 7(i.e. it may have 7 digits after the decimal point) to a __int128. However, given a number, I don't know in advance, the actual scale the number has.
#include <iostream>
#include "json.hpp"
using json = nlohmann::json;
#include <string>
static std::ostream& operator<<(std::ostream& o, const __int128& x) {
if (x == std::numeric_limits<__int128>::min()) return o << "-170141183460469231731687303715884105728";
if (x < 0) return o << "-" << -x;
if (x < 10) return o << (char)(x + '0');
return o << x / 10 << (char)(x % 10 + '0');
}
int main()
{
std::string str = R"({"time": [0.143]})";
std::cout << "input: " << str << std::endl;
json j = json::parse(str);
std::cout << "output: " << j.dump(4) << std::endl;
double d = j["time"][0].get<double>();
__int128_t d_128_bad = d * 10000000;
__int128_t d_128_good = __int128(d * 1000) * 10000;
std::cout << std::setprecision(16) << std::defaultfloat << d << std::endl;
std::cout << "d_128_bad: " << d_128_bad << std::endl;
std::cout << "d_128_good: " << d_128_good << std::endl;
}
Output:
input: {"time": [0.143]}
output: {
"time": [
0.143
]
}
0.143
d_128_bad: 1429999
d_128_good: 1430000
As you can see, the converted double is not the expected 1430000 instead it is 1429999. I know the reason is that a float point number can not be represented exactly. The problem can be solved if I know the number of digit after the decimal point.
For example,
I can instead use __int128_t(d * 1000) * 10000. However, I don't know the scale of a given number which might have a maximum of scale 7.
Question> Is there a possible solution for this? Also, I need to do this conversion very fast.
I'm not familiar with this library, but it does appear to have a mechanism to get a json object's string representation (dump()). I would suggest you parse that into your value rather than going through the double intermediate representation, as in that case you will know the scale of the value as it was written.
I need to find function (1-exp(-x))/x values from 10^-30 to 10^9. But for very small numbers the output is 0 but it should be 1. X is correct but there're some issues with fx, and i've not idea how to solve it.
#include <math.h>
#include <iomanip>
int main()
{
double fx ,x;
x=pow(10.0L,-30);
std::cout<<"fx\t\t\t\t"<<"x\t\t"<<std::endl;
for(double i=0;x<pow(10,9);i+=0.01)
{
std::setprecision(20);
x=pow(10.0L,-30+i);
fx =(double)((1-exp(-pow(10,-30+i)))/pow(10,-30+i));
std::cout<<x<<"\t\t\t\t\t"<<fx<<std::endl;
}
}
My expected Output is:
for x ==10^-30 fx==1
But realoutput is:
for x ==10^-30 fx==0
You are right that fx should be equal to about 1 for small x.
The issue is that double has a finite precision. Its relative error is much higher than 10^{-30}.
A mantissa of at leat 100 bits would be needed.
This is illustrated by the following relation:
(1 - (1-x)) = 0
for very small x. See the code hereafter.
In other words, because of rounding errors, the addition is no longer associative.
A workaround is to use a different formula for very small x, for example less than 10^{10} in absolute value.
#include <iostream>
#include <cmath>
#include <iomanip>
int main()
{
double fx ,x;
x = pow(10.0L,-30);
std::setprecision(20);
fx = (1.0 - exp(-x)) / x;
std::cout << x << "\t" << fx << std::endl;
fx = (1.0 - (1.0 - x))/x;
std::cout << x << "\t" << fx << std::endl;
// for very small x:
fx = 1 - x/2 + (x*x)/6;
std::cout << x << "\t" << fx << std::endl;
}
Output:
1e-30 0
1e-30 0
1e-30 1
I'm trying to compute a series using C++.
The series is:
(for those wondering)
My code is the following:
#include <iostream>
#include <fstream>
#include <cmath> // exp
#include <iomanip> //setprecision, setw
#include <limits> //numeric_limits (http://en.cppreference.com/w/cpp/types/numeric_limits)
long double SminOneCenter(long double gamma)
{
using std::endl; using std::cout;
long double result=0.0l;
for (long double k = 1; k < 1000 ; k++)
{
if(isinf(pow(1.0l+pow(gamma,k),6.0l/4.0l)))
{
cout << "infinity for reached for gamma equals: " << gamma << "value of k: " << k ;
cout << "maximum allowed: " << std::numeric_limits<long double>::max()<< endl;
break;
}
// CAS PAIR: -1^n = 1
if ((int)k%2 == 0)
{
result += pow(4.0l*pow(gamma,k),3.0l/4.0l) /(pow(1+pow(gamma,k)),6.0l/4.0l);
}
// CAS IMPAIR:-1^n = -1
else if ((int)k%2!=0)
{
result -= pow(4.0l*pow(gamma,k),3.0l/4.0l) /(pow(1+pow(gamma,k)),6.0l/4.0l);
//if (!isinf(pow(k,2.0l)*zeta/2.0l))
}
// cout << result << endl;
}
return 1.0l + 2.0l*result;
}
Output will be, for instance with gamma = 1.7 :
infinity reached for gamma equals: 1.7 value of k: 892
The maximum value a long double can represent, as provided by the STL numeric_limits, is: 1.18973e+4932.
However (1+1.7^892)= 2.19.... × 10^308 which is way lower than 10^4932, so it shouldn't be considered as infinity.
Provided my code is not wrong (but it very well might be), could anyone tell me why the discussed code evals to infinity when it should not?
You need to use powl rather than pow if you want to supply long double arguments.
Currently you are hitting the numeric_limits<double>::max() in your pow calls.
As an alternative, consider using std::pow which has appropriate overloads.
Reference http://en.cppreference.com/w/c/numeric/math/pow
Which is the most optimal way to get the n leftmost non-zero digits of a floating point number (number >= 0.0).
For example,
if n = 1:
0.014568 -> 0.01
0.246456 -> 0.2
if n = 2:
0.014568 -> 0.014
0.246456 -> 0.24
After #schil227 comment:
Currently I am doing multiplications and divisions (by 10) as necessary in order to have the n digits at the decimal number field.
Code could use sprintf(buf, "%e",...) to do most of the heavy lifting.
There are so many corner cases that other direct code may fail, sprintf() is likely to be, at least, as good solid reference solution.
This code prints the double to DBL_DECIMAL_DIG places to insure there is no rounding in digits that would make a difference. Then it zeros out various digits depending on n.
See #Mark Dickinson comment for reasons to use a greater value than DBL_DECIMAL_DIG. Perhaps on the order of DBL_DECIMAL_DIG*2. As mentioned above, there are many corner cases.
#include <float.h>
#include <math.h>
#include <stdio.h>
double foo(double x, int n) {
if (!isfinite(x)) {
return x;
}
printf("%g\n", x);
char buf[DBL_DECIMAL_DIG + 11];
sprintf(buf, "%+.*e", DBL_DECIMAL_DIG, x);
//puts(buf);
assert(n >= 1 && n <= DBL_DECIMAL_DIG + 1);
memset(buf + 2 + n, '0', DBL_DECIMAL_DIG - n + 1);
//puts(buf);
char *endptr;
x = strtod(buf, &endptr);
printf("%g\n", x);
return x;
}
int main() {
foo(0.014568, 1);
foo(0.246456, 1);
foo(0.014568, 2);
foo(0.246456, 2);
return 0;
}
Output
0.014568
0.01
0.246456
0.2
0.014568
0.014
0.246456
0.24
This answer assumes OP does not want a rounded answer. Re: 0.246456 -> 0.24
If you want the result as a string, you should probably print to a string with extra precision, then chop that off yourself. (See #chux's answer for details on how much extra precision you need for IEEE 64-bit double to avoid rounding up from a string of 9s, since you want truncation but all the usual to-string functions round to nearest.)
If you want a double result, then are you sure you really want this? Rounding / truncating early in the middle of a calculation usually just worsens the accuracy of the final result. Of course, there are uses in real algorithms for floor/ceil, trunc, and nearbyint, and this is just a scaled version of trunc.
If you just want a double, you can get fairly good results without ever going to a string. Use ndigits and floor(log10(fabs(x))) to work out a scale factor, then truncate the scaled value to an integer, then scale back.
Tested and working (with and without -ffast-math). See the asm on the Godbolt compiler explorer. This might run reasonably efficiently, especially with -ffast-math -msse4.1 (so floor and trunc can inline to roundsd).
If you care about speed, look into replacing pow() with something that takes advantage of the fact that the exponent is a small integer. I'm not sure how fast library pow() implementations are in that case. GNU C __builtin_powi(x, n) trades accuracy for speed, for integer exponents, doing a multiplication tree, which is less accurate than what pow() does.
#include <float.h>
#include <math.h>
#include <stdio.h>
double truncate_n_digits(double x, int digits)
{
if (x==0 || !isfinite(x))
return x; // good idea stolen from Chux's answer :)
double l10 = log10(fabs(x));
double scale = pow(10., floor(l10) + (1 - digits)); // floor rounds towards -Inf
double scaled = x / scale;
double scaletrunc = trunc(scaled); // trunc rounds towards zero
double truncated = scaletrunc * scale;
#if 1 // debugging code
printf("%2d %24.14g =>\t%24.14g\t scale=%g, scaled=%.30g\n", digits, x, truncated, scale, scaled);
// print with more accuracy to reveal the real behaviour
printf(" %24.20g =>\t%24.20g\n", x, truncated);
#endif
return truncated;
}
test cases:
int main() {
truncate_n_digits(0.014568, 1);
truncate_n_digits(0.246456, 1);
truncate_n_digits(0.014568, 2);
truncate_n_digits(-0.246456, 2);
truncate_n_digits(1234567, 2);
truncate_n_digits(99999999999, 6);
truncate_n_digits(-99999999999, 6);
truncate_n_digits(99999, 10);
truncate_n_digits(-0.0000000001234567, 3);
truncate_n_digits(1000, 6);
truncate_n_digits(0.001, 6);
truncate_n_digits(1e-312, 2); // denormal, and not exactly representable: 9.999...e-313
truncate_n_digits(nextafter(1e-312, INFINITY), 2); // denormal, just above 1.00000e-312
return 0;
}
each result shown twice: first with only %.14g so rounding gives the string we want, then again with %.20g to show enough places to reveal the realities of floating point math. Most numbers are not exactly-representable, so even with perfect rounding it's impossible to return a double exactly represents the truncated decimal string. (Integers up to about the size of the mantissa are exactly representable, and so are fractions where the denominator is a power of 2.)
1 0.014568 => 0.01 scale=0.01, scaled=1.45679999999999987281285029894
0.014567999999999999353 => 0.010000000000000000208
1 0.246456 => 0.2 scale=0.1, scaled=2.46456000000000008398615136684
0.2464560000000000084 => 0.2000000000000000111
2 0.014568 => 0.014 scale=0.001, scaled=14.5679999999999996163069226895
0.014567999999999999353 => 0.014000000000000000291
2 -0.246456 => -0.24 scale=0.01, scaled=-24.6456000000000017280399333686
-0.2464560000000000084 => -0.23999999999999999112
3 1234.56789 => 1230 scale=10, scaled=123.456789000000000555701262783
1234.567890000000034 => 1230
6 1234.56789 => 1234.56 scale=0.01, scaled=123456.789000000004307366907597
1234.567890000000034 => 1234.5599999999999454
6 99999999999 => 99999900000 scale=100000, scaled=999999.999990000040270388126373
99999999999 => 99999900000
6 -99999999999 => -99999900000 scale=100000, scaled=-999999.999990000040270388126373
-99999999999 => -99999900000
10 99999 => 99999 scale=1e-05, scaled=9999900000
99999 => 99999.000000000014552
3 -1.234567e-10 => -1.23e-10 scale=1e-12, scaled=-123.456699999999983674570103176
-1.234566999999999879e-10 => -1.2299999999999998884e-10
6 1000 => 1000 scale=0.01, scaled=100000
1000 => 1000
6 0.001 => 0.001 scale=1e-08, scaled=100000
0.0010000000000000000208 => 0.0010000000000000000208
2 9.9999999999847e-313 => 9.9999999996388e-313 scale=1e-314, scaled=100.000000003458453079474566039
9.9999999999846534143e-313 => 9.9999999996388074622e-313
2 1.0000000000034e-312 => 9.0000000001196e-313 scale=1e-313, scaled=9.9999999999011865980946822674
1.0000000000034059979e-312 => 9.0000000001195857973e-31
Since the result you want will often not be exactly representable, (and because of other rounding errors) the resulting double will sometimes be below the result you want, so printing it with full precision might give 1.19999999 instead of 1.20000011. You might want to use nextafter(result, copysign(INFINITY, original)) to get a result that's more likely to have a higher magnitude than what you want.
Of course, that could just make things worse in some cases. But since we truncate towards zero, most often we get a result that's just below (in magnitude) the unrepresentable exact value.
Ok, another one like #Peter Cordes but more generic.
/** Return \c digits semantic digis of number \c x.
\tparam T Type of number \c x can be floating point or integral.
\param x The number.
\param digits The requested number of semantic digits of number \c x.
\return The number with only \c digits semantic digits of number \c x. */
template<typename T>
requires(std::integral<T> || std::floating_point<T>)
T roundn(T x, unsigned int digits)
{
if (!x || !std::isfinite(x)) return x;
typedef std::conditional_t<std::floating_point<T>, T, double> Tp;
Tp mul = pow(10, floor(digits - log10(abs(x))));
Tp y = round(x * mul) / mul;
if constexpr (std::floating_point<T>) return y;
else return round(y);
}
int main()
{
cout << setprecision(100);
cout << roundn(123.456789, 1) << "\n";
cout << roundn(123.456789, 2) << "\n";
cout << roundn(123.456789, 3) << "\n";
cout << roundn(123.456789, 4) << "\n";
cout << roundn(123.456789, 5) << "\n";
cout << roundn(-123.456789, 1) << "\n";
cout << roundn(-123.456789, 2) << "\n";
cout << roundn(-123.456789, 3) << "\n";
cout << roundn(-123.456789, 4) << "\n";
cout << roundn(-123.456789, 5) << "\n";
cout << roundn(-123.456789, 15) << "\n";
cout << roundn(123456, 1) << "\n";
cout << roundn(123456, 2) << "\n";
cout << roundn(123456, 3) << "\n";
cout << roundn(123456, 10) << "\n";
cout << roundn(-123456, 1) << "\n";
cout << roundn(-123456, 2) << "\n";
cout << roundn(-123456, 3) << "\n";
cout << roundn(-123456, 10) << "\n";
cout << roundn(0.0123456789, 1) << "\n";
cout << roundn(0.0123456789, 2) << "\n";
cout << roundn(-0.0123456789, 1) << "\n";
cout << roundn(-0.0123456789, 2) << "\n";
return 0;
}
It returns
99.9999999999999857891452847979962825775146484375
120
123
123.5
123.4599999999999937472239253111183643341064453125
-99.9999999999999857891452847979962825775146484375
-120
-123
-123.5
-123.4599999999999937472239253111183643341064453125
-123.4567890000000005557012627832591533660888671875
100000
120000
123000
123456
-100000
-120000
-123000
-123456
0.01000000000000000020816681711721685132943093776702880859375
0.0120000000000000002498001805406602215953171253204345703125
-0.01000000000000000020816681711721685132943093776702880859375
-0.0120000000000000002498001805406602215953171253204345703125
My school give me an assignment to calculate pi.
The result should be :
Question 4
Accuracy set at : 1000
term pi
1 4
100 3.13159
200 3.13659
300 3.13826
400 ...
... ...
The result in my program :
term pi
1 4
100 3
200 3
300 3
400 ...
... ...
I guess that when I do (4 / denominator), the result will lose the decimal number although I have changed some declarations of data type from int to double.
(Some websites tell me to do this.)
Maybe I do it wrongly.
How can I deal with this problem?
The following is my program.
#include <iostream>
using namespace std;
class Four
{
private:
int inputedAccuracy;
double pi;
int denominator;
int doneTermCounter;
double oneTerm;
int negativeController;
public:
double question4()
{
cout << "Accuracy set at : " ;
cin >> inputedAccuracy;
cout << endl;
pi = 0.0;
denominator = 1.0;
doneTermCounter = 0;
negativeController = 1;
cout << "Term" << " " << "pi" << endl;
cout << "1 " << " " << "4" << endl;
for (inputedAccuracy; inputedAccuracy > 0; inputedAccuracy -= 100)
{
for (int doneTerm = 0; doneTerm < 100; doneTerm++)
{
pi = pi + (negativeController * 4 / denominator);
negativeController *= -1;
denominator += 2;
doneTermCounter++;
}
if (doneTermCounter >= 10000)
cout << doneTermCounter << " " << pi << endl;
else
if (doneTermCounter >= 1000)
cout << doneTermCounter << " " << pi << endl;
else
cout << doneTermCounter << " " << pi << endl;
}
return 0.0;
}
};
Thank you for your attention!
pi = pi + (negativeController * 4 / denominator);
The (negativeController * 4 / denominator) expression results in an int because both negativeController and denominator are int. In other words, you're doing an integer division here which explains why you don't get the expected result.
Declare either (or both) of them as double to force a floating-point division.
You should change :-
int denominator; to
double denominator;
See here
pi = pi + (negativeController * 4 / denominator);
In this line, you have an integer division (because both operands of / are of type int), meaning that the fractional part of the division's result is discarded.
To use floating point division, at least one operand/side needs to be of type float or (long) double. The easiest way to achieve this would in this case be a change of 4 (a literal of type int) to 4.0 (a literal of type double):
Then, when calculating the result of *, negativeController will also be converted to double (usual arithmetic conversions), yielding a double as the left-hand side operand of / which in turn causes denominator (the rhs) to also be converted into a double and so on.
I think changing negativeController and denominator to int would do the trick as the sub-expression is being evaluated on integers thus loosing precision.