I'm having an issue with floating point arithmetic in c++ (using doubles) that I've never had before, and so I'm wondering how people usually deal with this type of problem.
I'm trying to represent a curve in polar coordinates as a series of Point objects (Point is just a class that holds the coordinates of a point in 3D). The collection of Points representing the curve are stored in a vector (of Point*). The curve I'm representing is a function r(theta), which I can compute. This function is defined on the range of theta contained in [0,PI]. I am representing PI as 4.0*atan(1.0), storing it as a double.
To represent the surface, I specify the desired number of points (n+1), for which I am currently using n=80, and then I determine the interval in theta required to divide [0,PI] into 80 equal intervals (represented by n+1=81 Points). So dTheta = PI / n. dTheta is a double. I next assign coordinates to my Points. (See sample code below.)
double theta0 = 0.0; // Beginning of inteval in theta
double thetaF = PI; // End of interval in theta
double dTheta = (thetaF - theta0)/double(nSegments); // segment width
double theta = theta0; // Initialize theta to start at beginning of inteval
vector<Point*> pts; // Declare a variable to hold the Points.
while (theta <= thetaF)
{
// Store Point corresponding to current theta and r(theta) in the vector.
pts.push_back(new Point(theta, rOfTheta(theta), 0.0));
theta += dTheta; // Increment theta
}
rofTheta(theta) is some function that computes r(theta). Now the problem is that the very last point somehow doesn't satisfy the (theta <= thetaF) requirement to enter the loop one final time. Actually, after the last pass through the loop, theta is very slightly greater than PI (it's like PI + 1e-15). How should I deal with this? The function is not defined for theta > PI. One idea is to just test for ((theta > PI) and (theta < (PI+delta))) where delta is very small. If that's true, I could just set theta=PI, get and set the coordinates of the corresponding Point, and exit the loop. This seems like a reasonable problem to have, but interestingly I have never faced such a problem before. I had been using gcc 4.4.2, and now I'm using gcc 4.8.2. Could that be the problem? What is the normal way to handle this kind of problem? Thanks!
Never iterate over a range with a floating point value (theta) by adding increments if you have the alternative of computing the next value by
theta = theta0 + idx*dTheta.
Control the iteration using the integer number of steps and compute the float as indicated.
If dTheta is small compared to the entire interval, you'll accumulate errors.
You may not insert the computed last value of the range[theta0, thetaF]. That value is actually theta = theta0 + n * (dTheta + error). Skip that last calculated value and use thetaF instead.
What I might try:
while (theta <= thetaF)
{
// Store Point corresponding to current theta and r(theta) in the vector.
pts.push_back(new Point(theta, rOfTheta(theta), 0.0));
theta += dTheta; // Increment theta
}
if (theta >= thetaF) {
pts.push_back(new Point(thetaF, rOfTheta(thetaF), 0.0));
}
you might want to cehck the if statement with pts.length() == nSegments, just experiment and see which produces the better results.
If you know that there would be 81 values of theta, then why not run a for loop 81 times?
int i;
theta = theta0;
for(i = 0; i < nSegments; i++) {
pts.push_back(new Point(theta, rOfTheta(theta), 0.0));
theta += dTheta;
}
First of all: get rid of the naked pointer :-)
You know the number of segments you have, so instead of using the value of theta in the while-block:
for (auto idx = 0; idx != nSegments - 1; ++idx) {
// Store Point corresponding to current theta and r(theta) in the vector.
pts.emplace_back(theta, rOfTheta(theta), 0.0);
theta += dTheta; // Increment theta
}
pts.emplace_back(thetaF, /* rOfTheta(PI) calculated exactly */, 0.0);
for (int i = 0; i < nSegments; ++i)
{
theta = (double) i / nSegments * PI;
…
}
This:
produces the correct number of iterations (since the loop counter is maintained as an integer),
does not accumulate any error (since theta is calculated freshly each time), and
produces exactly the desired value (well, PI, not π) in the final iteration (since (double) i / nSegments will be exactly one).
Unfortunately, it contains a division, which is typically a time-consuming instruction.
(The loop counter could also be a double, and this will avoid the cast from to double inside the loop. As long as integer values are used, the arithmetic for it will be exact, until you get beyond 253 iterations.)
Related
I have such a function that calculates weights according to Gaussian distribution:
const float dx = 1.0f / static_cast<float>(points - 1);
const float sigma = 1.0f / 3.0f;
const float norm = 1.0f / (sqrtf(2.0f * static_cast<float>(M_PI)) * sigma);
const float divsigma2 = 0.5f / (sigma * sigma);
m_weights[0] = 1.0f;
for (int i = 1; i < points; i++)
{
float x = static_cast<float>(i)* dx;
m_weights[i] = norm * expf(-x * x * divsigma2) * dx;
m_weights[0] -= 2.0f * m_weights[i];
}
In all the calc above the number does not matter. The only thing matters is that m_weights[0] = 1.0f; and each time I calculate m_weights[i] I subtract it twice from m_weights[0] like this:
m_weights[0] -= 2.0f * m_weights[i];
to ensure that w[0] + 2 * w[i] (1..N) will sum to exactly 1.0f. But it does not. This assert fails:
float wSum = 0.0f;
for (size_t i = 0; i < m_weights.size(); ++i)
{
float w = m_weights[i];
if (i == 0) {
wSum += w;
} else {
wSum += (w + w);
}
}
assert(wSum == 1.0 && "Weights sum is not 1.");
How can I ensure the sum to be 1.0f on all platforms?
You can't. Floating point isn't like that. Even adding the same values can produce different results according to the cpu used.
All you can do is define some accuracy value and ensure that you end up with 1.0 +/- that value.
See: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
Because the precision of float is only 23 bits (see e.g. https://en.wikipedia.org/wiki/Single-precision_floating-point_format ), rounding error quickly accumulates therefore even if the rest of code is correct, your sum becomes something like 1.0000001 or 0.9999999 (have you watched it in the debugger or tried to print it to console, by the way?). To improve precision you can replace float with double, but still the sum will not be exactly 1.0: the error will just be smaller, something like 1e-16 instead of 1e-7.
The second thing to do is to replace strict comparison to 1.0 with a range comparison, like:
assert(fabs(wSum - 1.0) <= 1e-13 && "Weights sum is not 1.");
Here 1e-13 is the epsilon within which you consider two floating-point numbers equal. If you choose to go with float (not double), you may need epsilon like 1e-6 .
Depending on how large your weights are and how many points there are, accumulated error can become larger than that epsilon. In that case you would need special algorithms for keeping the precision higher, such as sorting the numbers by their absolute values prior to summing them up starting with the smallest numbers.
How can I ensure the sum to be 1.0f on all platforms?
As the other answers (and comments) have stated, you can't achieve this, due to the inexactness of floating point calculations.
One solution is that, instead of using double, use a fixed point or multi-precision library such as GMP, Boost Multiprecision Library, or one of the many others out there.
Im trying to calculate the angle between two edges in a graph, in order to do that I transfer both edges to origin and then used dot product to calculate the angle. my problem is that for some edges like e1 and e2 the output of angle(e1,e2) is -1.#INDOO.
what is this output? is it an error?
Here is my code:
double angle(Edge e1, Edge e2){
Edge t1 = e1, t2 = e2;
Point tail1 = t1.getTail(), head1 = t1.getHead();
Point u(head1.getX() - tail1.getX(), head1.getY() - tail1.getY());
Point tail2 = t2.getTail(), head2 = t2.getHead();
Point v(head2.getX() - tail2.getX(), head2.getY() - tail2.getY());
double dotProduct = u.getX()*v.getX() + u.getY()*v.getY();
double cosAlpha = dotProduct / (e1.getLength()*e2.getLength());
return acos(cosAlpha);
}
Edge is a class that holds two Points, and Point is a class that holds two double numbers as x and y.
Im using angle(e1,e2) to calculate the orthogonal projection length of a vector like b on to a vector like a :
double orthogonalProjectionLength(Edge b, Edge a){
return (b.getLength()*sin(angle(b, a) * (PI / 180)));
}
and this function also sometimes gives me -1.#INDOO. you can see the implementation of Point and Edge here.
My input is a set S of n Points in 2D space. Iv constructed all edges between p and q (p,q are in S) and then tried to calculate the angle like this:
for (int i = 0; i < E.size(); i++)
for (int j = 0; j < E.size(); j++){
if (i == j)
cerr << fixed << angle(E[i], E[j]) << endl; //E : set of all edges
}
If the problem comes from cos() and sin() functions, how can I fix it? is here other libraries that calculate sin and cos in more efficient way?
look at this example.
the inputs in this example are two distinct points(like p and q), and there are two Edges between them (pq and qp). shouldnt the angle(pq , qp) always be 180 ? and angle(pq,pq) and angle(qp,qp) should be 0. my programm shows two different kinds of behavior, sometimes angle(qp,qp) == angle(pq,pq) ==0 and angle(pq , qp) == angle(pq , qp) == 180.0, and sometimes the answer is -1.#INDOO for all four edges.
Here is a code example.
run it for several times and you will see the error.
You want the projection and you go via all this trig? You just need to dot b with the unit vector in the direction of a. So the final answer is
(Xa.Xb + Ya.Yb) / square_root(Xa^2 + Ya^2)
Did you check that cosAlpha doesn't reach 1.000000000000000000000001? That would explain the results, and provide another reason not to go all around the houses like this.
It seems like dividing by zero. Make sure that your vectors always have 0< length.
Answer moved from mine comment
check if your dot product is in <-1,+1> range ...
due to float rounding it can be for example 1.000002045 which will cause acos to fail.
so add two ifs and clamp to this range.
or use faster way: acos(0.99999*dot)
but that lowers the precision for all angles
and also if 0.9999 constant is too big then the error is still present
A recommended way to compute angles is by means of the atan2 function, taking two arguments. It returns the angle on four quadrants.
You can use it in two ways:
compute the angles of u and v separately and subtract: atan2(Vy, Vx) - atan2(Uy, Ux).
compute the cross- and dot-products: atan2(Ux.Vy - Uy.Vx, Ux.Uy + Vx.Vy).
The only case of failure is (0, 0).
I am working on a project. every variable used in here is stored as double.
The thing is I have a 2D velocity vector with both coordinates and I want to compute the angle between the vector and the OX axis, so I use Theta1 = atan(v1y / v1x);. Still, by this approach I can only get the angle from between -PI/2; PI/2, so in order to extend the range I added
#define PI 3.14159265358979323846
double Theta1;
Theta1 = atan(v1y / v1x);
if (v1x < 0.0)
if (v1y > 0.0)
Theta1 = Theta1 + (PI/2.0);
else
Theta1 = Theta1 - (PI/2.0);
else;
When I try to use Theta1 then, it seems not to be modified by the first if operations. I mean it never adds the (+/-)PI/2.0, yet if I just try
cout << Theta + PI/2.0 << endl;
it prints Theta1 modified. What am I doing wrong? It seems like either theres some pitfall or I just don't see something simple.
The period of the tangent is π, so your adjustment is not correct, it should be ±π. As is, when both coordinates are negative, the quotient, and the result of atan will be positive, a value between 0 and π/2. Then you subtract π/2 and get a negative value between -π/2 and 0, but you should get one between -π and -π/2 geometrically.
Also, you should use atan2, which gives you the correct angle without adjustment.
If v1x is greater than or equal to zero, Theta will not be modified because it then enters the else clause of the outer if statement, which contains no code.
If v1x is negative then, short of a compiler bug, Theta will change. I would suggest placing:
std::cout << Theta1 << '\n';
immediately before and after the if statement for verification.
I currently have an algorithm that generates a point grid. It takes input from the user in the form of a length of x (lx) and a length of y (ly), and what increment, or delta, (dx and dy) to space the points by. I need the points to always start and finish on the edges of the bounding square defined by lx and ly. I've tried a few methods:
The start edge of the bounding square is defined as:
double startx = lx / -2.0, starty = ly / -2.0;
My first method determines the number of points and rounds:
int numintervalx = round(lx / dx), numintervaly = round(ly / dy);
My second method determines the number of points and uses the closest integer greater than the number of points:
int numintervalx = ceil(lx / dx), numintervaly = ceil(ly / dy);
My third method determines the number of points and uses the closest integer less than the number of points:
int numintervalx = floor(lx / dx), numintervaly = floor(ly / dy);
The delta is then recalculated to fit the bounding box:
dx = lx / double(numintervalx);
dy = ly / double(numintervaly);
These are then fed into a for loop that generates the points themselves:
for (int i = 0; i <= numintervaly; i++)
for (int j = 0; j <= numintervalx; j++)
{
double point[3] = {startx + dx * j, starty + dy * i, 0};
}
Is there another, more accurate, method that would make the actual grid closer to the user specified grid that still always starts and finishes on the edges?
Think of the integer conversion as adding error. In that case, the way to minimize the error added when converting to an integer is rounding. The worst case is if the user inputs values such that lx/dx is something.5, which means a rounding error of 0.5. Given your problem, this is the best you can do.
Consider renaming numpoints to numintervals or something, as you actually create one more point than numpoints, which is strange.
Require the users to give you values of lx and ly that are multiples of dx and dy, respectively. This will, of course, require some basic input validation, but it will guarantee the actual grid will be exactly the same as the user-specified grid, with points always starting and finishing on the edges.
your input is obviously not guaranteed to be compatible with lx being an integer multiple of dx, hence your problems. You must therefore require inputs that are compatible, ideally by taking nx as input and either dx or lx, whatever makes more sense in your application.
Alternatively, you can treat the user input as guide only, i.e. take
nx = int(ceil(lx/dx)); // get suitable number of points
dx = lx/nx; // set suitable spacing to fit range exactly
I have a one dimensional gird. its spacing is a floating point. I have a point with floating point coordinate as well. I need to find its distance to the closest grid point.
For example:
0.12
|
*
|---------|---------|---------|---------|---------|
0 0.1 0.2 0.3 0.4 0.5
The result would be -0.02 since the closest point is behind it.
However if it was
-0.66
|
*
|---------|---------|---------|---------|---------|
-1 -0.8 -0.6 -0.4 -0.2 0
The result will be 0.06. As you can see its in floating point and can be negative.
I tried the following:
float spacing = ...;
float point = ...;
while(point >= spacing) point -= spacing;
while(point < 0) point += spacing;
if(std::abs(point - spacing) < point) point -= spacing;
It works, but I'm sure there is a way without loops
Let us first compute the nearest points on the left and right as follows:
leftBorder = spacing * floor(point/spacing);
rightBorder = leftBorder + spacing;
Then the distance is straightforward:
if ((point - leftBorder) < (rightBorder - point))
distance = leftBorder - point;
else
distance = rightBorder - point;
Note that, we could find the nearest points alternatively by ceiling:
rightBorder = spacing * ceil(point/spacing);
leftBorder = rightBorder - spacing;
std::vector<float> spacing = ...;
float point = ...;
float result;
Since you say the spacing isn't (linear), I would cache the sums:
std::vector<float> sums(1, 0.0);
float sum=0;
for(int i=0; i<spacing.size(); ++i)
sums.push_back(sum+=spacing[i]);
//This only needs doing once.
//sums needs to be in increasing order.
Then do a binary search to find the point to the left:
std::vector<float>::iterator iter;
iter = std::lower_bound(sums.begin(), sums.end(), point);
Then find the result from there:
if (iter+1 == sums.end())
return point-*iter;
else {
float midpoint = (*iter + *(iter+1))/2;
if (point < midpoint)
result = point - *iter;
else
result = *(iter+1) - point;
}
[EDIT] Don't I feel silly. You said the spacing wasn't constant. I interpreted that as not-linear. But then your sample code is linear, just not a compile-time constant. My bad. I'll leave this answer as a more general solution, though your (linear) question is solvable much faster.
Here is my first blush attempt, note that this is not tested at all.
float remainder = fmod(point, spacing); // This is the fractional difference of the spaces
int num_spaces = point/spacing; // This is the number of "spaces" down you are, rounded down
// If our fractional part is greater than half of the space length, increase the number of spaces.
// Not sure what you want to do when the point is equidistant to both grid points
if(remainder > .5 * spacing)
{
++num_spaces;
}
float closest_value = num_spaces*spacing;
float distance = closest_value - point;
You should just round the number using this:
float spacing = ...;
float point = ...;
(point > 0.0) ? floor(point + spacing/2) : ceil(point - spacing/2);
Much, much more generally, for arbitrary spacing, dimensions, and measures of distance (metric), the structure you're looking for would be a Voronoi Diagram.