Here is my code for the problem : Check If It Is a Straight Line
bool checkStraightLine(vector<vector<int>>& a) {
double m = (((double)a[1][1]-a[0][1])/(a[1][0]-a[0][0]));
for(int i=1;i<a.size()-1;i++)
{
double slope=(((double)a[i][1]-a[i+1][1])/(a[i][0]-a[i+1][0]));
if( m!=slope)
return false;
}
return true;
}
My doubt is why does my code produce an error when I replace :
double m = (((double)a[1][1]-a[0][1])/(a[1][0]-a[0][0]));
with
double m = (double)((a[1][1]-a[0][1])/(a[1][0]-a[0][0]));
and
double slope=(((double)a[i][1]-a[i+1][1])/(a[i][0]-a[i+1][0]));
with
double slope=(double)((a[i][1]-a[i+1][1])/(a[i][0]-a[i+1][0]));
Given:
int a = 1 , b = 2;
this line of code:
double d = (double)(a/b); // d is 0.
is not the same as:
double d = ((double)a/b); // d is 0.5
In the first case, you are doing an integer division before converting the result.
In the second case, you are converting the numerator to a double, and then dividing, which does a floating point division.
In your case, the error might be coming about because the code expects a non-zero slope, but the integer division gives you a zero.
Note that this comparison:
if( m!=slope)
is fundamentally flawed. You should never compare floating point numbers for equality. Use a threshold for the comparison instead.
Here's what I want to do:
Take a double (which is between -1 and 1) and cast it to a float. But I want to make sure that the float is ALWAYS less than the double.
Is there any straightforward way to do this?
For reference, here's something I came up with.
float DoubleToSmallerFloat (double X) // ex. X = 0.79828470019999997
{
float Y = X; // 0.79828471 -> note this is greater than X
double Diff = X - Y;
return Y - Abs (Diff) * 10;
}
If you are able to use C++11 then you can use nextafter() for this:
float doubleToSmallerFloat(double x) {
float f = x;
return f < x ? f : nextafter(f, -1.0f);
}
I think that is a good question. Look at IEEE 754 single-precision and double-precision binary floating-point format
.
The real value assumed by a given 32 bit binary32 data with a given biased sign s, exponent e (the 8 bit unsigned integer), and a 23 bit fraction (mantissa) is
s * m * (2 ^(e-127)),
where m is
For double use 1023 instead of 127: s * m * (2 ^(e-1023))
First case is exponent e and sign s save its values after double-float cast. Then float mantissa is almost first digits of the double mantissa. You need to slightly decrease the value of float mantissa.
Second case. Exponent (e-127) from float is greater than exponent (e-1023) from double. Then I hope that fraction part is 23 zeros. Ok. Decrease exponent part and set fraction part to 23 ones. To get access for the fields use union.
union {
float fl;
uint32_t dw;
} f;
int s = ( f.dw >> 31 ) ? -1 : 1; /* sign */
int e = ( f.dw >> 23 ) & 0xFF; /* exponent */
int fract = f.dw & 0x7FFFFF; /* fraction */
I am trying to calculate the numerical gradient of a smooth function in c++. And the parameter value could vary from zero to a very large number(maybe 1e10 to 1e20?)
I used the function f(x,y) = 10*x^3 + y^3 as a testbench, but I found that if x or y is too large, I can't get correct gradient.
Here is my code to calculate the graidient:
#include <iostream>
#include <cmath>
#include <cassert>
using namespace std;
double f(double x, double y)
{
// black box expensive function
return 10 * pow(x, 3) + pow(y, 3);
}
int main()
{
// double x = -5897182590.8347721;
// double y = 269857217.0017581;
double x = 1.13041e+19;
double y = -5.49756e+14;
const double epsi = 1e-4;
double f1 = f(x, y);
double f2 = f(x, y+epsi);
double f3 = f(x, y-epsi);
cout << f1 << endl;
cout << f2 << endl;
cout << f3 << endl;
cout << f1 - f2 << endl; // 0
cout << f2 - f3 << endl; // 0
return 0;
}
If I use the above code to calculate the gradient, the gradient would be zero!
The testbench function, 10*x^3 + y^3, is just a demo, the real problem I need to solve is actually a black box function.
So, is there any "standard" way to calculate the numerical gradient?
In the first place, you should use the central difference scheme, which is more accurate (by cancellation of one more term of the Taylor develoment).
(f(x + h) - f(x - h)) / 2h
rather than
(f(x + h) - f(x)) / h
Then the choice of h is critical and using a fixed constant is the worst thing you can do. Because for small x, h will be too large so that the approximation formula no more works, and for large x, h will be too small, resulting in severe truncation error.
A much better choice is to take a relative value, h = x√ε, where ε is the machine epsilon (1 ulp), which gives a good tradeoff.
(f(x(1 + √ε)) - f(x(1 - √ε))) / 2x√ε
Beware that when x = 0, a relative value cannot work and you need to fall back to a constant. But then, nothing tells you which to use !
You need to consider the precision needed.
At first glance, since |y| = 5.49756e14 and epsi = 1e-4, you need at least ⌈log2(5.49756e14)-log2(1e-4)⌉ = 63 bits of significand precision (that is the number of bits used to encode the digits of your number, also known as mantissa) for y and y+epsi to be considered different.
The double-precision floating-point format only has 53 bits of significand precision (assuming it is 8 bytes). So, currently, f1, f2 and f3 are exactly the same because y, y+epsi and y-epsi are equal.
Now, let's consider the limit : y = 1e20, and the result of your function, 10x^3 + y^3. Let's ignore x for now, so let's take f = y^3. Now we can calculate the precision needed for f(y) and f(y+epsi) to be different : f(y) = 1e60 and f(epsi) = 1e-12. This gives a minimum significand precision of ⌈log2(1e60)-log2(1e-12)⌉ = 240 bits.
Even if you were to use the long double type, assuming it is 16 bytes, your results would not differ : f1, f2 and f3 would still be equal, even though y and y+epsi would not.
If we take x into account, the maximum value of f would be 11e60 (with x = y = 1e20). So the upper limit on precision is ⌈log2(11e60)-log2(1e-12)⌉ = 243 bits, or at least 31 bytes.
One way to solve your problem is to use another type, maybe a bignum used as fixed-point.
Another way is to rethink your problem and deal with it differently. Ultimately, what you want is f1 - f2. You can try to decompose f(y+epsi). Again, if you ignore x, f(y+epsi) = (y+epsi)^3 = y^3 + 3*y^2*epsi + 3*y*epsi^2 + epsi^3. So f(y+epsi) - f(y) = 3*y^2*epsi + 3*y*epsi^2 + epsi^3.
The only way to calculate gradient is calculus.
Gradient is a vector:
g(x, y) = Df/Dx i + Df/Dy j
where (i, j) are unit vectors in x and y directions, respectively.
One way to approximate derivatives is first order differences:
Df/Dx ~ (f(x2, y)-f(x1, y))/(x2-x1)
and
Df/Dy ~ (f(x, y2)-f(x, y1))/(y2-y1)
That doesn't look like what you're doing.
You have a closed form expression:
g(x, y) = 30*x^2 i + 3*y^2 j
You can plug in values for (x, y) and calculate the gradient exactly at any point. Compare that to your differences and see how well your approximation is doing.
How you implement it numerically is your responsibility. (10^19)^3 = 10^57, right?
What is the size of double on your machine? Is it a 64 bit IEEE double precision floating point number?
Use
dx = (1+abs(x))*eps, dfdx = (f(x+dx,y) - f(x,y)) / dx
dy = (1+abs(y))*eps, dfdy = (f(x,y+dy) - f(x,y)) / dy
to get meaningful step sizes for large arguments.
Use eps = 1e-8 for one-sided difference formulas, eps = 1e-5 for central difference quotients.
Explore automatic differentiation (see autodiff.org) for derivatives without difference quotients and thus much smaller numerical errors.
We can examine the behaviour of the error in the derivative using the following program - it calculates the 1-sided derivative and the central difference based derivative using a varying step size. Here I'm using x and y ~ 10^10, which is smaller than what you were using, but should illustrate the same point.
#include <iostream>
#include <cmath>
#include <cassert>
using namespace std;
double f(double x, double y) {
return 10 * pow(x, 3) + pow(y, 3);
}
double f_x(double x, double y) {
return 3 * 10 * pow(x,2);
}
double f_y(double x, double y) {
return 3 * pow(y,2);
}
int main()
{
// double x = -5897182590.8347721;
// double y = 269857217.0017581;
double x = 1.13041e+10;
double y = -5.49756e+10;
//double x = 10.1;
//double y = -5.2;
double epsi = 1e8;
for(int i=0; i<60; ++i) {
double dfx_n = (f(x+epsi,y) - f(x,y))/epsi;
double dfx_cd = (f(x+epsi,y) - f(x-epsi,y))/(2*epsi);
double dfx = f_x(x,y);
cout<<epsi<<" "<<fabs(dfx-dfx_n)<<" "<<fabs(dfx - dfx_cd)<<std::endl;
epsi/=1.5;
}
return 0;
}
The output shows that a 1-sided difference gets us an optimal error of about 1.37034e+13 at a step length of about 100.0. Note that while this error looks large, as a relative error it is 3.5746632302764072e-09 (since the exact value is 3.833e+21)
In comparison the 2-sided difference gets an optimal error of about 1.89493e+10 with a step size of about 45109.3. This is three-orders of magnitude better, (with a much larger step-size).
How can we work out the step size? The link in the comments of Yves Daosts answer gives us a ballpark value:
h=x_c sqrt(eps) for 1-Sided, and h=x_c cbrt(eps) for 2-Sided.
But either way, if the required step size for decent accuracy at x ~ 10^10 is 100.0, the required step size with x ~ 10^20 is going to be 10^10 larger too. So the problem is simply that your step size is way too small.
This can be verified by increasing the starting step-size in the above code and resetting the x/y values to the original values.
Then expected derivative is O(1e39), best 1-sided error of about O(1e31) occurs near a step length of 5.9e10, best 2-sided error of about O(1e29) occurs near a step length of 6.1e13.
As numerical differentiation is ill conditioned (which means a small error could alter your result significantly) you should consider to use Cauchy's integral formula. This way you can calculate the n-th derivative with an integral. This will lead to less problems with considering accuracy and stability.
here is a sample code that doesnt seem to work. I get the same value of X and Y ( both of them equal to zero ) for all the iteration. Can someone help me with this mistake of mine?
#include <iostream>
using namespace std;
int main()
{
double coord[4][2];
int div_x, div_y;
coord[1][0]=2;
coord[1][1]=0;
coord[2][0]=2;
coord[2][1]=4;
coord[3][0]=0;
coord[3][1]=4;
div_x = 4;
div_y = 3;
double a =0,b=0,c=0,d=0,e=0,f=0,g=0,h=0;
a = coord[1][0]+coord[2][0]+coord[3][0];
b = coord[1][0]+coord[2][0]-coord[3][0];
c = coord[2][0]-coord[1][0]+coord[3][0];
d = coord[2][0]-coord[1][0]-coord[3][0];
e = coord[1][1]+coord[2][1]+coord[3][1];
f = coord[1][1]+coord[2][1]-coord[3][1];
g = coord[2][1]-coord[1][1]+coord[3][1];
h = coord[2][1]-coord[1][1]-coord[3][1];
for (int i=0; i<div_y+1; i++){ // loop all rows (blue)
for (int j=0; j<div_x+1; j++){ // loop all nodes of one row (green)
double w = -1 + (2/div_x)*j;
double s = -1 + (2/div_y)*i;
double X = (a+b*w+c*s+(w*s*d));
double Y = (e+f*w+g*s+(h*s*w));
cout<<"\nX "<<X<<endl;
cout<<"\nY "<<Y<<endl;
}
}
return 0;
}
.
Your problem is with the divisions here:
double w = -1 + (2/div_x)*j;
double s = -1 + (2/div_y)*i;
2/div_x and 2/div_y are integer divisions. When you divide two values of type integer in C++, the division is carried out as an integer division. Since div_x is 4 and div_y is 3, the result of both of them is 0. As an integer division:
2 / 4 = 0
2 / 3 = 0
The easiest way to fix this is to use a double value for one of the two values, which results in a double division. For example:
double w = -1.0 + (2.0/div_x)*j;
double s = -1.0 + (2.0/div_y)*i;
You may also want to consider using the float type instead of double, unless you really need more than float precision, which is about 7 decimal digits.
In your code, div_x and div_y are int, thus, (2/div_x) and (2/div_y) are integer divisions, and evaluate to 0.
So w and s are both always equal to -1.
You can force float evaluation by using : (2.0/div_x) and (2.0/div_y)
For given values 'a' and 'y' , how to find max x value such that x = a^b < y for b∈N and a>0.
For example, y=14 and a = 2 is given, then x must be 8. In other words,for all values of y in [8.15] , x must be 8.
Similarly, for all values of y in [9,26] , x must be 9.
You can use log with base a. Such a function does not exist in <cmath>, but if you remember the formula that
log (base a, c) = log (base e, c) / log (base e, a)
You can do it with cmath's log (natural logarithm) function.
int exponent = log(y)/log(a); //truncates to the floor, just what we need.
int answer = a to the power of exponent
Quite an obvious-to-algorithmize task... Take the integer part of the base-a logaritm of y and raise a to that power:
#include <cmath>
int exponent = (int)(log(y) / log(a)); // base-a logarithm of y, truncated to a whole number
int x = (int)pow(a, exponent); // a raised to that power