Is there an implementation in gmp that allows a power function with only mpf_t's as argument? I want to do this:
mpf_t s ;
mpf_init (s);
mpf_set_d (s,boost::lexical_cast<double>(sec));
mpf_t ten,mil;
mpf_init(ten);
mpf_init(mil);
mpf_set_d(ten,10.0);
mpf_set_d(mil,0.001);
mpf_div(s,s,ten);
mpf_pow_ui(s,ten,s); //<- this doesn't work because it need an unsigned int as third argument but I need it with a mpf_t
mpf_mul(s,s,mil);
I don't think so, at least not with GNU Multi-Precision library only. But you could use mpfr, which is based on gmp and supports a mpfr_pow (mpfr_t rop, mpfr_t op1, mpfr_t op2, mpfr_rnd_t rnd) function. See here.
If you decide to do that, this could also be helpful.
There is one interesting workaround using square root mpf_sqrt_ui. From math we know that x^y = Sqrt(x)^(y * 2), so we can multiply Y many times by 2 and take square root of X same amount of times.
Thus by multiplying Y by 2 you may make it almost whole integer. And as you know there is mpf_pow_ui that does powering into whole integer.
Following code does all this. Don't forget that b should be set to high precision only to allow many times square rooting.
For simplicity I used mpf_class, this is C++ interface to mpf.
I did output to console actual mpf result value and reference value computed through std::pow from .
To avoid setting high precision is a bit more difficult, but possible e.g. through Taylor Serie like
Sqrt(1 + x) = 1 + 1/2*x - 1/8*x^2 + 1/16*x^3 - 5/128*x^4 + ...
Try it online!
#include <cmath>
#include <iostream>
#include <iomanip>
#include <gmpxx.h>
int main() {
mpf_class const b0 = 9.87654321, p0 = 1.23456789;
mpf_class b = b0, p = p0;
b.set_prec(1 << 7); p.set_prec(1 << 7);
int const sqrt_cnt = 48;
for (int i = 0; i < sqrt_cnt; ++i)
mpf_sqrt(b.get_mpf_t(), b.get_mpf_t());
mpf_mul_2exp(p.get_mpf_t(), p.get_mpf_t(), sqrt_cnt);
mpf_pow_ui(b.get_mpf_t(), b.get_mpf_t(), std::lround(p.get_d()));
std::cout << std::fixed << std::setprecision(12) << "Actual "
<< b.get_d() << ", Reference " << std::pow(b0.get_d(), p0.get_d())
<< std::endl;
}
Output:
Actual 16.900803674719, Reference 16.900803674719
Related
I need to find some way to deal with infinitesimial double values.
For example:
exp(-0.00000000000000000000000000000100000000000000000003)= 0.99999999999999999999999999999899999999999999999997
But exp function produce result = 1.000000000000000000000000000000
So my first thought was to make my own exp function. Unfortunately I am getting same output.
double my_exp(double x)
{
bool minus = x < 0;
x = abs(x);
double exp = (double)1 + x;
double temp = x;
for (int i = 2; i < 100000; i++)
{
temp *= x / (double)i;
exp = exp + temp;
}
return minus ? exp : (double)1 / exp;
}
I found that issue is when such small numbers like 1.00000000000000000003e-030 doesn't work well when we try to subtract it, neither both if we subtracting or adding such a small number the result always is equal to 1.
Have U any idea how to manage with this?
Try using std::expm1
Computes the e (Euler's number, 2.7182818) raised to the given power
arg, minus 1.0. This function is more accurate than the expression
std::exp(arg)-1.0 if arg is close to zero.
#include <iostream>
#include <cmath>
int main()
{
std::cout << "expm1(-0.00000000000000000000000000000100000000000000000003) = " << std::expm1(-0.00000000000000000000000000000100000000000000000003) << '\n';
}
Run the example in the below source by changing the arguments to your very small numbers.
Source: https://en.cppreference.com/w/cpp/numeric/math/expm1
I think the best way of dealing with such small numbers is to use existing libraries. You could try GMP starting with their example to calculate billions of digits of pi. Another library, MPFR which is based on GMP, seems to be a good choice. I don't know when to choose one over the other.
I'm trying to improve surf.cpp performances. From line 140, you can find this function:
inline float calcHaarPattern( const int* origin, const SurfHF* f, int n )
{
double d = 0;
for( int k = 0; k < n; k++ )
d += (origin[f[k].p0] + origin[f[k].p3] - origin[f[k].p1] - origin[f[k].p2])*f[k].w;
return (float)d;
}
Running an Intel Advisor Vectorization analysis, it shows that "1 Data type conversions present" which could be inefficient (especially in vectorization).
But my question is: looking at this function, why the authors would have created d as double and then cast it to float? If they wanted a decimal number, float would be ok. The only reason that comes to my mind is that since double is more precise than float, then it can represents smaller numbers, but the final value is big enough to be stored in a float, but I didn't run any test on d value.
Any other possible reason?
Because the author want to have higher precision during calculation, then only round the final result. This is the same as preserving more significant digit during calculation.
More precisely, when addition and subtraction, error can be accumulated. This error can be considerable when large number of floating point numbers involved.
You questioned the answer saying it's to use higher precision during the summation, but I don't see why. That answer is correct. Consider this simplified version with completely made-up numbers:
#include <iostream>
#include <iomanip>
float w = 0.012345;
float calcFloat(const int* origin, int n )
{
float d = 0;
for( int k = 0; k < n; k++ )
d += origin[k] * w;
return (float)d;
}
float calcDouble(const int* origin, int n )
{
double d = 0;
for( int k = 0; k < n; k++ )
d += origin[k] * w;
return (float)d;
}
int main()
{
int o[] = { 1111, 22222, 33333, 444444, 5555 };
std::cout << std::setprecision(9) << calcFloat(o, 5) << '\n';
std::cout << std::setprecision(9) << calcDouble(o, 5) << '\n';
}
The results are:
6254.77979
6254.7793
So even though the inputs are the same in both cases, you get a different result using double for the intermediate summation. Changing calcDouble to use (double)w doesn't change the output.
This suggests that the calculation of (origin[f[k].p0] + origin[f[k].p3] - origin[f[k].p1] - origin[f[k].p2])*f[k].w is high-enough precision, but the accumulation of errors during the summation is what they're trying to avoid.
This is because of how errors are propagated when working with floating point numbers. Quoting The Floating-Point Guide: Error Propagation:
In general:
Multiplication and division are “safe” operations
Addition and subtraction are dangerous, because when numbers of different magnitudes are involved, digits of the smaller-magnitude number are lost.
So you want the higher-precision type for the sum, which involves addition. Multiplying the integer by a double instead of a float doesn't matter nearly as much: you will get something that is approximately as accurate as the float value you start with (as long as the result it isn't very very large or very very small). But summing float values that could have very different orders of magnitude, even when the individual numbers themselves are representable as float, will accumulate errors and deviate further and further from the true answer.
To see that in action:
float f1 = 1e4, f2 = 1e-4;
std::cout << (f1 + f2) << '\n';
std::cout << (double(f1) + f2) << '\n';
Or equivalently, but closer to the original code:
float f1 = 1e4, f2 = 1e-4;
float f = f1;
f += f2;
double d = f1;
d += f2;
std::cout << f << '\n';
std::cout << d << '\n';
The result is:
10000
10000.0001
Adding the two floats loses precision. Adding the float to a double gives the right answer, even though the inputs were identical. You need nine significant digits to represent the correct value, and that's too many for a float.
I'm trying to predict numbers generated by C++ rand() function. Here's a link to the code, it possibly uses: click
And here's my code that emulates rand():
#include <iostream>
#include <cstdlib>
#include <ctime>
using namespace std;
int main() {
srand(time(0));
unsigned a = rand();
unsigned b = rand();
cout << (a * 1103515245U + 12345U) % 0x7fffffffU << '\n';
cout << b << '\n'; // they should match, right? But they don't...
return 0;
}
Why doesn't my value match b?
The glibc only uses the old linear congruential generator if the TYPE_0 generator is chosen, as you can see in the code you linked. (By default, it uses the TYPE_3 generator.) This is the only case if the RNG buffer is 8 bytes large. You can force the old behavior with initstate:
char state[8];
initstate(time(0), state, 8);
unsigned a = rand();
unsigned b = rand();
cout << (a * 1103515245u + 12345u) % 0x7fffffffu << '\n';
Then you often get the same numbers, and when you don't, it's only offset by one. I haven't, at a cursory glance, been able to figure precisely why that difference happens (may edit later), but I suspect carry bit shenanigans.
EDIT: Okay, I figured it out. glibc's rand uses signed arithmetic inside, and it uses & rather than % for the modulus. This makes the one-bit difference if (a * 1103515245 + 12345) becomes negative. If you write
int a = rand();
int b = rand();
cout << (a * 1103515245 + 12345) & 0x7fffffff << '\n';
then you get the same results all the time. Well, really a and b should be int32_t for maximum portability, but I suspect that's not a concern here. Because library internals and portability is kind of a lost cause, anyway.
Adding to Wintermute's response: By default, it uses the TYPE_3 generator.
I am at the moment trying to code a titration curve simulator. But I am running into some trouble with comparing two values.
I have created a small working example that perfectly replicates the bug that I encounter:
#include <iostream>
#include <math.h>
using namespace std;
int main()
{
double a, b;
a = 5;
b = 0;
for(double i = 0; i<=(2*a); i+=0.1){
b = i;
cout << "a=" << a << "; b="<<b;
if(a==b)
cout << "Equal!" << endl;
else
cout << endl;
}
return 0;
}
The output at the relevant section is
a=5; b=5
However, if I change the iteration increment from i+=0.1 to i+=1 or i+=0.5 I get an output of
a=5; b=5Equal!
as you would expect.
I am compiling with g++ on linux using no further flags and I am frankly at a loss how to solve this problem. Any pointers (or even a full-blown solution to my problem) are very appreciated.
Unlike integers, multiplying floats/doubles and adding them up doesn't produce exactly the same results.
So the best practice is find if the abs of their difference is small enough.
If you have some idea on the size of the numbers, you can use a constant:
if (fabs(a - b) < EPS) // equal
If you don't (much slower!):
float a1 = fabs(a), b1 = fabs(b);
float mn = min(a1,b1), mx = max(a1,b1);
if (mn / mx > (1- EPS)) // equal
Note:
In your code, you can use std::abs instead. Same for std::min/max. The code is clearer/shorter when using the C functions.
I would recommend restructuring your loop to iterate using integers and then converting the integers into doubles, like this:
double step = 0.1;
for(int i = 0; i*step<=2*a; ++i){
b = i*step;
cout << "a=" << a << "; b="<<b;
if(a==b)
cout << "Equal!" << endl;
else
cout << endl;
}
This still isn't perfect. You possibly have some loss of precision in the multiplication; however, the floating point errors don't accumulate like they do when iterating using floating point values.
Floating point arithmetic is... interesting. Testing equality is annoying with floats/doubles in most languages because it is impossible to accurately represent many numbers in IEEE floating point math. Basically, where you might compute an expression to be 5.0, the compiler might compute it to be 4.9999999, because it's the closest representable number in the IEEE standard.
Because these numbers are slightly different, you end up with an inequality. Because it's unmaintainble to try and predict which number you will see at compile time, you can't/shouldn't attempt to hard code either one of them into your source to test equality with. As a hard rule, avoid directly checking equality of floating point numbers.
Instead, test that they are extremely close to being equal with something like the following:
template<typename T>
bool floatEqual(const T& a, const T& b) {
auto delta = a * 0.03;
auto minAccepted = a - delta;
auto maxAccepted = a + delta;
return b > minAccepted && b < maxAccepted;
}
This checks whether b is within a range of + or - 3% of the value of a.
I am trying to generate a number of series of double random numbers with high precision. For example, 0.856365621 (has 9 digits after decimal).
I've found some methods from internet, however, they do generate double random number, but the precision is not as good as I request (only 6 digits after the decimal).
Thus, may I know how to achieve my goal?
In C++11 you can using the <random> header and in this specific example using std::uniform_real_distribution I am able to generate random numbers with more than 6 digits. In order to see set the number of digits that will be printed via std::cout we need to use std::setprecision:
#include <iostream>
#include <random>
#include <iomanip>
int main()
{
std::random_device rd;
std::mt19937 e2(rd());
std::uniform_real_distribution<> dist(1, 10);
for( int i = 0 ; i < 10; ++i )
{
std::cout << std::fixed << std::setprecision(10) << dist(e2) << std::endl ;
}
return 0 ;
}
you can use std::numeric_limits::digits10 to determine the precision available.
std::cout << std::numeric_limits<double>::digits10 << std::endl;
In a typical system, RAND_MAX is 231-1 or something similar to that. So your "precision" from using a method like:L
double r = rand()/RAND_MAX;
would be 1/(2<sup>31</sup)-1 - this should give you 8-9 digits "precision" in the random number. Make sure you print with high enough precision:
cout << r << endl;
will not do. This will work better:
cout << fixed << sprecision(15) << r << endl;
Of course, there are some systems out there with much smaller RAND_MAX, in which case the results may be less "precise" - however, you should still get digits down in the 9-12 range, just that they are more likely to be "samey".
Why not create your value out of multiple calls of the random function instead?
For instance:
const int numDecimals = 9;
double result = 0.0;
double div = 1.0;
double mul = 1.0;
for (int n = 0; n < numDecimals; ++n)
{
int t = rand() % 10;
result += t * mul;
mul *= 10.0;
div /= 10.0;
}
result = result * div;
I would personally try a new implementation of the rand function though or at least multiply with the current time or something..
In my case, I'm using MQL5, a very close derivative of C++ for a specific market, whose only random generator produces a random integer from 0 to 32767 (= (2^15)-1). Far too low precision.
So I've adapted his idea -- randomly generate a string of digits any length I want -- to solve my problem, more reliably (and arguably more randomly also), than anything else I can find or think of. My version builds a string and converts it to a double at the end -- avoids any potential math/rounding errors along the way (because we all know 0.1 + 0.2 != 0.3 😉 )
Posting it here in case it helps anyone.
(Disclaimer: The following is valid MQL5. MQL5 and C++ are very close, but some differences. eg. No RAND_MAX constant (so I've hard-coded the 32767). I'm not entirely sure of all the differences, so there may be C++ syntax errors here. Please adapt accordingly).
const int RAND_MAX_INCL = 32767;
const int RAND_MAX_EXCL = RAND_MAX_INCL + 1;
int iRandomDigit() {
const double dRand = rand()/RAND_MAX_EXCL; // double 0.0 <= dRand < 1.0
return (int)(dRand * 10); // int 0 <= result < 10
};
double dRandom0IncTo1Exc(const int iPrecisionDigits) {
int iPrecisionDigits2 = iPrecisionDigits;
if ( iPrecisionDigits > DBL_DIG ) { // DBL_DIG == "Number of significant decimal digits for double type"
Print("WARNING: Can't generate random number with precision > ", DBL_DIG, ". Adjusted precision to ", DBL_DIG, " accordingly.");
iPrecisionDigits2 = DBL_DIG;
};
string sDigits = "";
for (int i = 0; i < iPrecisionDigits2; i++) {
sDigits += (string)iRandomDigit();
};
const string sResult = "0." + sDigits;
const double dResult = StringToDouble(sResult);
return dResult;
}
Noted in a comment on #MasterPlanMan's answer -- the other answers use more "official" methods designed for the question, from standard library, etc. However, I think conceptually it's a good solution when faced with limitations that the other answers can't address.