I need to find some way to deal with infinitesimial double values.
For example:
exp(-0.00000000000000000000000000000100000000000000000003)= 0.99999999999999999999999999999899999999999999999997
But exp function produce result = 1.000000000000000000000000000000
So my first thought was to make my own exp function. Unfortunately I am getting same output.
double my_exp(double x)
{
bool minus = x < 0;
x = abs(x);
double exp = (double)1 + x;
double temp = x;
for (int i = 2; i < 100000; i++)
{
temp *= x / (double)i;
exp = exp + temp;
}
return minus ? exp : (double)1 / exp;
}
I found that issue is when such small numbers like 1.00000000000000000003e-030 doesn't work well when we try to subtract it, neither both if we subtracting or adding such a small number the result always is equal to 1.
Have U any idea how to manage with this?
Try using std::expm1
Computes the e (Euler's number, 2.7182818) raised to the given power
arg, minus 1.0. This function is more accurate than the expression
std::exp(arg)-1.0 if arg is close to zero.
#include <iostream>
#include <cmath>
int main()
{
std::cout << "expm1(-0.00000000000000000000000000000100000000000000000003) = " << std::expm1(-0.00000000000000000000000000000100000000000000000003) << '\n';
}
Run the example in the below source by changing the arguments to your very small numbers.
Source: https://en.cppreference.com/w/cpp/numeric/math/expm1
I think the best way of dealing with such small numbers is to use existing libraries. You could try GMP starting with their example to calculate billions of digits of pi. Another library, MPFR which is based on GMP, seems to be a good choice. I don't know when to choose one over the other.
Related
original outdated code:
Write an algorithm that compute the Euler's number until
My professor from Algorithms course gave me the following homework:
Write a C/C++ program that calculates the value of the Euler's number (e) with a given accuracy of eps > 0.
Hint: The number e = 1 + 1/1! +1/2! + ... + 1 / n! + ... = 2.7172 ... can be calculated as the sum of elements of the sequence x_0, x_1, x_2, ..., where x_0 = 1, x_1 = 1+ 1/1 !, x_2 = 1 + 1/1! +1/2 !, ..., the summation continues as long as the condition |x_(i+1) - x_i| >= eps is valid.
As he further explained, eps is the precision of the algorithm. For example, the precision could be 1/100 |x_(i + 1) - x_i| = absolute value of ( x_(i+1) - x_i )
Currently, my program looks in the following way:
#include<iostream>
#include<cstdlib>
#include<math.h>
// Euler's number
using namespace std;
double factorial(double n)
{
double result = 1;
for(double i = 1; i <= n; i++)
{
result = result*i;
}
return result;
}
int main()
{
long double euler = 2;
long double counter = 2;
long double epsilon = 1.0/1000;
long double moduloDifference;
do
{
euler+= 1 / factorial(counter);
counter++;
moduloDifference = (euler + 1 / factorial(counter+1) - euler);
} while(moduloDifference >= epsilon);
printf("%.35Lf ", euler );
return 0;
}
Issues:
It seems my epsilon value does not work properly. It is supposed to control the precision. For example, when I wish precision of 5 digits, I initialize it to 1.0/10000, and it outputs 3 digits before they get truncated after 8 (.7180).
When I use long double data type, and epsilon = 1/10000, my epsilon gets the value 0, and my program runs infinitely. Yet, if change the data type from long double to double, it works. Why epsilon becomes 0 when using long double data type?
How can I optimize the algorithm of finding Euler's number? I know, I can rid off the function and calculate the Euler's value on the fly, but after each attempt to do that, I receive other errors.
One problem with computing Euler's constant this way is pretty simple: you're starting with some fairly large numbers, but since the denominator in each term is N!, the amount added by each successive term shrinks very quickly. Using naive summation, you quickly reach a point where the value you're adding is small enough that it no longer affects the sum.
In the specific case of Euler's constant, since the numbers constantly decrease, one way we can deal with them quite a bit better is to compute and store all the terms, then add them up in reverse order.
Another possibility that's more general is to use Kahan's summation algorithm instead. This keeps track of a running error while it's doing the summation, and takes the current error into account as it's adding each successive term.
For example, I've rewritten your code to use Kahan summation to compute to (approximately) the limit of precision of a typical (80-bit) long double:
#include<iostream>
#include<cstdlib>
#include<math.h>
#include <vector>
#include <iomanip>
#include <limits>
// Euler's number
using namespace std;
long double factorial(long double n)
{
long double result = 1.0L;
for(int i = 1; i <= n; i++)
{
result = result*i;
}
return result;
}
template <class InIt>
typename std::iterator_traits<InIt>::value_type accumulate(InIt begin, InIt end) {
typedef typename std::iterator_traits<InIt>::value_type real;
real sum = real();
real running_error = real();
for ( ; begin != end; ++begin) {
real difference = *begin - running_error;
real temp = sum + difference;
running_error = (temp - sum) - difference;
sum = temp;
}
return sum;
}
int main()
{
std::vector<long double> terms;
long double epsilon = 1e-19;
long double i = 0;
double term;
for (int i=0; (term=1.0L/factorial(i)) >= epsilon; i++)
terms.push_back(term);
int width = std::numeric_limits<long double>::digits10;
std::cout << std::setw(width) << std::setprecision(width) << accumulate(terms.begin(), terms.end()) << "\n";
}
Result: 2.71828182845904522
In fairness, I should actually add that I haven't checked what happens with your code using naive summation--it's possible the problem you're seeing is from some other source. On the other hand, this does fit fairly well with a type of situation where Kahan summation stands at least a reasonable chance of improving results.
#include<iostream>
#include<cmath>
#include<iomanip>
#define EPSILON 1.0/10000000
#define AMOUNT 6
using namespace std;
int main() {
long double e = 2.0, e0;
long double factorial = 1;
int counter = 2;
long double moduloDifference;
do {
e0 = e;
factorial *= counter++;
e += 1.0 / factorial;
moduloDifference = fabs(e - e0);
} while (moduloDifference >= EPSILON);
cout << "Wynik:" << endl;
cout << setprecision(AMOUNT) << e << endl;
return 0;
}
This an optimized version that does not have a separate function to calculate the factorial.
Issue 1: I am still not sure how EPSILON manages the precision.
Issue 2: I do not understand the real difference between long double and double. Regarding my code, why long double requires a decimal point (1.0/someNumber), and double doesn't (1/someNumber)
So I'm new to stackoverflow and coding I was learning about functions in c++ and how the stack frame works etc..
in that I made a function for factorials and used that to calculate binomial coefficients. it worked fine for small values like n=10 and r=5 etc... but for large a medium value like 23C12 it gave 4 as answer.
IDK what is wrong with the code or I forgot to add something.
My code:
#include <iostream>
using namespace std;
int fact(int n)
{
int a = 1;
for (int i = 1; i <= n; i++)
{
a *= i;
}
return a;
}
int main()
{
int n, r;
cin >> n >> r;
if (n >= r)
{
int coeff = fact(n) / (fact(n - r) * fact(r));
cout << coeff << endl;
}
else
{
cout << "please enter valid values. i.e n>=r." << endl;
}
return 0;
}
Thanks for your help!
You're not doing anything "wrong" per se. It's just that factorials quicky become huge numbers.
In your example you're using ints, which are typically 32-bit variables. If you take a look at a table of factorials, you'll note that log2(13!) = 32.535.... So the largest factorial that will fit in a 32-bit number is 12!. For a 64-bit variable, the largest factorial you can store is 20! (since log2(21!) = 65.469...).
When you get 4 as the result that's because of overflow.
If you need to be able to calculate such huge numbers, I suggest a bignum library such as GMP.
Factorials overflow easily. In practice you rarely need bare factorials, but they almost always appear in fractions. In your case:
int coeff = fact(n) / (fact(n - r) * fact(r));
Note the the first min(n,n-r,r) factors of the denominator and numerator are identical. I am not going to provide you the code, but I hope an example will help to understand what to do instead.
Consider n=5, r=3 then coeff is
5*4*3*2*1 / 2*1 * 3*2*1
And before actually carrying out any calculations you can reduce that to
5*4 / 2*1
If you are certain that the final result coeff does fit in an int, you can also calculate it using ints. You just need to take care not to overflow the intermediate terms.
I have written the following routine, which is supposed to truncate a C++ double at the n'th decimal place.
double truncate(double number_val, int n)
{
double factor = 1;
double previous = std::trunc(number_val); // remove integer portion
number_val -= previous;
for (int i = 0; i < n; i++) {
number_val *= 10;
factor *= 10;
}
number_val = std::trunc(number_val);
number_val /= factor;
number_val += previous; // add back integer portion
return number_val;
}
Usually, this works great... but I have found that with some numbers, most notably those that do not seem to have an exact representation within double, have issues.
For example, if the input is 2.0029, and I want to truncate it at the fifth place, internally, the double appears to be stored as something somewhere between 2.0028999999999999996 and 2.0028999999999999999, and truncating this at the fifth decimal place gives 2.00289, which might be right in terms of how the number is being stored, but is going to look like the wrong answer to an end user.
If I were rounding instead of truncating at the fifth decimal, everything would be fine, of course, and if I give a double whose decimal representation has more than n digits past the decimal point it works fine as well, but how do I modify this truncation routine so that inaccuracies due to imprecision in the double type and its decimal representation will not affect the result that the end user sees?
I think I may need some sort of rounding/truncation hybrid to make this work, but I'm not sure how I would write it.
Edit: thanks for the responses so far but perhaps I should clarify that this value is not producing output necessarily but this truncation operation can be part of a chain of many different user specified actions on floating point numbers. Errors that accumulate within the double precision over multiple operations are fine, but no single operation, such as truncation or rounding, should produce a result that differs from its actual ideal value by more than half of an epsilon, where epsilon is the smallest magnitude represented by the double precision with the current exponent. I am currently trying to digest the link provided by iinspectable below on floating point arithmetic to see if it will help me figure out how to do this.
Edit: well the link gave me one idea, which is sort of hacky but it should probably work which is to put a line like number_val += std::numeric_limits<double>::epsilon() right at the top of the function before I start doing anything else with it. Dunno if there is a better way, though.
Edit: I had an idea while I was on the bus today, which I haven't had a chance to thoroughly test yet, but it works by rounding the original number to 16 significant decimal digits, and then truncating that:
double truncate(double number_val, int n)
{
bool negative = false;
if (number_val == 0) {
return 0;
} else if (number_val < 0) {
number_val = -number_val;
negative = true;
}
int pre_digits = std::log10(number_val) + 1;
if (pre_digits < 17) {
int post_digits = 17 - pre_digits;
double factor = std::pow(10, post_digits);
number_val = std::round(number_val * factor) / factor;
factor = std::pow(10, n);
number_val = std::trunc(number_val * factor) / factor;
} else {
number_val = std::round(number_val);
}
if (negative) {
number_val = -number_val;
}
return number_val;
}
Since a double precision floating point number only can have about 16 digits of precision anyways, this just might work for all practical purposes, at a cost of at most only one digit of precision that the double would otherwise perhaps support.
I would like to further note that this question differs from the suggested duplicate above in that a) this is using C++, and not Java... I don't have a DecimalFormatter convenience class, and b) I am wanting to truncate, not round, the number at the given digit (within the precision limits otherwise allowed by the double datatype), and c) as I have stated before, the result of this function is not supposed to be a printable string... it is supposed to be a native floating point number that the end user of this function might choose to further manipulate. Accumulated errors over multiple operations due to imprecision in the double type are acceptable, but any single operation should appear to perform correctly to the limits of the precision of the double datatype.
OK, if I understand this right, you've got a floating point number and you want to truncate it to n digits:
10.099999
^^ n = 2
becomes
10.09
^^
But your function is truncating the number to an approximately close value:
10.08999999
^^
Which is then displayed as 10.08?
How about you keep your truncate formula, which does truncate as well as it can, and use std::setprecision and std::fixed to round the truncated value to the required number of decimal places? (Assuming it is std::cout you're using for output?)
#include <iostream>
#include <iomanip>
using std::cout;
using std::setprecision;
using std::fixed;
using std::endl;
int main() {
double foo = 10.08995; // let's imagine this is the output of `truncate`
cout << foo << endl; // displays 10.0899
cout << setprecision(2) << fixed << foo << endl; // rounds to 10.09
}
I've set up a demo on wandbox for this.
I've looked into this. It's hard because you have inaccuracies due to the floating point representation, then further inaccuracies due to the decimal. 0.1 cannot be represented exactly in binary floating point. However you can use the built-in function sprintf with a %g argument that should round accurately for you.
char out[64];
double x = 0.11111111;
int n = 3;
double xrounded;
sprintf(out, "%.*g", n, x);
xrounded = strtod(out, 0);
Get double as a string
If you are looking just to print the output, then it is very easy and straightforward using stringstream:
#include <cmath>
#include <iostream>
#include <iomanip>
#include <limits>
#include <sstream>
using namespace std;
string truncateAsString(double n, int precision) {
stringstream ss;
double remainder = static_cast<double>((int)floor((n - floor(n)) * precision) % precision);
ss << setprecision(numeric_limits<double> ::max_digits10 + __builtin_ctz(precision))<< floor(n);
if (remainder)
ss << "." << remainder;
cout << ss.str() << endl;
return ss.str();
}
int main(void) {
double a = 9636346.59235;
int precision = 1000; // as many digits as you add zeroes. 3 zeroes means precision of 3.
string s = truncateAsString(a, precision);
return 0;
}
Getting the divided floating point with an exact value
Maybe you are looking for true value for your floating point, you can use boost multiprecision library
The Boost.Multiprecision library can be used for computations requiring precision exceeding that of standard built-in types such as float, double and long double. For extended-precision calculations, Boost.Multiprecision supplies a template data type called cpp_dec_float. The number of decimal digits of precision is fixed at compile-time via template parameter.
Demonstration
#include <boost/math/constants/constants.hpp>
#include <boost/multiprecision/cpp_dec_float.hpp>
#include <iostream>
#include <limits>
#include <cmath>
#include <iomanip>
using boost::multiprecision::cpp_dec_float_50;
cpp_dec_float_50 truncate(cpp_dec_float_50 n, int precision) {
cpp_dec_float_50 remainder = static_cast<cpp_dec_float_50>((int)floor((n - floor(n)) * precision) % precision) / static_cast<cpp_dec_float_50>(precision);
return floor(n) + remainder;
}
int main(void) {
int precision = 100000; // as many digits as you add zeroes. 5 zeroes means precision of 5.
cpp_dec_float_50 n = 9636346.59235789;
n = truncate(n, precision); // first part is remainder, floor(n) is int value truncated.
cout << setprecision(numeric_limits<cpp_dec_float_50> ::max_digits10 + __builtin_ctz(precision)) << n << endl; // __builtin_ctz(precision) will equal the number of trailing 0, exactly the precision we need!
return 0;
}
Output:
9636346.59235
NB: Requires sudo apt-get install libboost-all-dev
I am at the moment trying to code a titration curve simulator. But I am running into some trouble with comparing two values.
I have created a small working example that perfectly replicates the bug that I encounter:
#include <iostream>
#include <math.h>
using namespace std;
int main()
{
double a, b;
a = 5;
b = 0;
for(double i = 0; i<=(2*a); i+=0.1){
b = i;
cout << "a=" << a << "; b="<<b;
if(a==b)
cout << "Equal!" << endl;
else
cout << endl;
}
return 0;
}
The output at the relevant section is
a=5; b=5
However, if I change the iteration increment from i+=0.1 to i+=1 or i+=0.5 I get an output of
a=5; b=5Equal!
as you would expect.
I am compiling with g++ on linux using no further flags and I am frankly at a loss how to solve this problem. Any pointers (or even a full-blown solution to my problem) are very appreciated.
Unlike integers, multiplying floats/doubles and adding them up doesn't produce exactly the same results.
So the best practice is find if the abs of their difference is small enough.
If you have some idea on the size of the numbers, you can use a constant:
if (fabs(a - b) < EPS) // equal
If you don't (much slower!):
float a1 = fabs(a), b1 = fabs(b);
float mn = min(a1,b1), mx = max(a1,b1);
if (mn / mx > (1- EPS)) // equal
Note:
In your code, you can use std::abs instead. Same for std::min/max. The code is clearer/shorter when using the C functions.
I would recommend restructuring your loop to iterate using integers and then converting the integers into doubles, like this:
double step = 0.1;
for(int i = 0; i*step<=2*a; ++i){
b = i*step;
cout << "a=" << a << "; b="<<b;
if(a==b)
cout << "Equal!" << endl;
else
cout << endl;
}
This still isn't perfect. You possibly have some loss of precision in the multiplication; however, the floating point errors don't accumulate like they do when iterating using floating point values.
Floating point arithmetic is... interesting. Testing equality is annoying with floats/doubles in most languages because it is impossible to accurately represent many numbers in IEEE floating point math. Basically, where you might compute an expression to be 5.0, the compiler might compute it to be 4.9999999, because it's the closest representable number in the IEEE standard.
Because these numbers are slightly different, you end up with an inequality. Because it's unmaintainble to try and predict which number you will see at compile time, you can't/shouldn't attempt to hard code either one of them into your source to test equality with. As a hard rule, avoid directly checking equality of floating point numbers.
Instead, test that they are extremely close to being equal with something like the following:
template<typename T>
bool floatEqual(const T& a, const T& b) {
auto delta = a * 0.03;
auto minAccepted = a - delta;
auto maxAccepted = a + delta;
return b > minAccepted && b < maxAccepted;
}
This checks whether b is within a range of + or - 3% of the value of a.
I am trying to generate a number of series of double random numbers with high precision. For example, 0.856365621 (has 9 digits after decimal).
I've found some methods from internet, however, they do generate double random number, but the precision is not as good as I request (only 6 digits after the decimal).
Thus, may I know how to achieve my goal?
In C++11 you can using the <random> header and in this specific example using std::uniform_real_distribution I am able to generate random numbers with more than 6 digits. In order to see set the number of digits that will be printed via std::cout we need to use std::setprecision:
#include <iostream>
#include <random>
#include <iomanip>
int main()
{
std::random_device rd;
std::mt19937 e2(rd());
std::uniform_real_distribution<> dist(1, 10);
for( int i = 0 ; i < 10; ++i )
{
std::cout << std::fixed << std::setprecision(10) << dist(e2) << std::endl ;
}
return 0 ;
}
you can use std::numeric_limits::digits10 to determine the precision available.
std::cout << std::numeric_limits<double>::digits10 << std::endl;
In a typical system, RAND_MAX is 231-1 or something similar to that. So your "precision" from using a method like:L
double r = rand()/RAND_MAX;
would be 1/(2<sup>31</sup)-1 - this should give you 8-9 digits "precision" in the random number. Make sure you print with high enough precision:
cout << r << endl;
will not do. This will work better:
cout << fixed << sprecision(15) << r << endl;
Of course, there are some systems out there with much smaller RAND_MAX, in which case the results may be less "precise" - however, you should still get digits down in the 9-12 range, just that they are more likely to be "samey".
Why not create your value out of multiple calls of the random function instead?
For instance:
const int numDecimals = 9;
double result = 0.0;
double div = 1.0;
double mul = 1.0;
for (int n = 0; n < numDecimals; ++n)
{
int t = rand() % 10;
result += t * mul;
mul *= 10.0;
div /= 10.0;
}
result = result * div;
I would personally try a new implementation of the rand function though or at least multiply with the current time or something..
In my case, I'm using MQL5, a very close derivative of C++ for a specific market, whose only random generator produces a random integer from 0 to 32767 (= (2^15)-1). Far too low precision.
So I've adapted his idea -- randomly generate a string of digits any length I want -- to solve my problem, more reliably (and arguably more randomly also), than anything else I can find or think of. My version builds a string and converts it to a double at the end -- avoids any potential math/rounding errors along the way (because we all know 0.1 + 0.2 != 0.3 😉 )
Posting it here in case it helps anyone.
(Disclaimer: The following is valid MQL5. MQL5 and C++ are very close, but some differences. eg. No RAND_MAX constant (so I've hard-coded the 32767). I'm not entirely sure of all the differences, so there may be C++ syntax errors here. Please adapt accordingly).
const int RAND_MAX_INCL = 32767;
const int RAND_MAX_EXCL = RAND_MAX_INCL + 1;
int iRandomDigit() {
const double dRand = rand()/RAND_MAX_EXCL; // double 0.0 <= dRand < 1.0
return (int)(dRand * 10); // int 0 <= result < 10
};
double dRandom0IncTo1Exc(const int iPrecisionDigits) {
int iPrecisionDigits2 = iPrecisionDigits;
if ( iPrecisionDigits > DBL_DIG ) { // DBL_DIG == "Number of significant decimal digits for double type"
Print("WARNING: Can't generate random number with precision > ", DBL_DIG, ". Adjusted precision to ", DBL_DIG, " accordingly.");
iPrecisionDigits2 = DBL_DIG;
};
string sDigits = "";
for (int i = 0; i < iPrecisionDigits2; i++) {
sDigits += (string)iRandomDigit();
};
const string sResult = "0." + sDigits;
const double dResult = StringToDouble(sResult);
return dResult;
}
Noted in a comment on #MasterPlanMan's answer -- the other answers use more "official" methods designed for the question, from standard library, etc. However, I think conceptually it's a good solution when faced with limitations that the other answers can't address.