Stable Cotangent - c++

Is there a more stable implementation for the cotangent function than return 1.0/tan(x);?

cot(x) = cos(x)/sin(x) should be more numerically stable close to π/2 than cot(x) = 1/tan(x). You can implement that efficiently using sincos on platforms that have it.
Another possibility is cot(x) = tan(M_PI_2 - x). This should be faster than the above (even if sincos is available), but it may also be less accurate, because M_PI_2 is of course only an approximation of the transcendental number π/2, so the difference M_PI_2 - x will not be accurate to the full width of a double mantissa -- in fact, if you get unlucky, it may have only a few meaningful bits.

TL;DR No.
As a rule of thumb, when looking for sources of inaccuracy one should be concerned first and foremost about additions and subtractions, which can lead to the issue of subtractive cancellation. Multiplications and divisions are typically harmless for accuracy other than adding additional rounding error, but may cause problems through overflow and underflow in intermediate computations.
No machine number x can get close enough to multiples of π/2 to cause tan(x) to overflow, therefore tan(x) is well-defined and finite for all floating-point encodings for any of the IEEE-754 floating-point formats, and by extension, so is cot(x) = 1.0 / tan(x).
This is easily demonstrated by performing an exhaustive test with all numerical float encodings, as exhaustive test using double is not feasible, except perhaps with the largest supercomputers in existence today.
Using a math library with an accurate implementation of tan() with a maximum error of ~= 0.5 ulp, we find that computing cot(x) = 1.0 / tan(x) incurs a maximum error of less than 1.5 ulp, where the additional error compared to tan() itself is contributed by the rounding error of the division.
Repeating this exhaustive test over all float values with cot(x) = cos(x) / sin(x), where sin() and cos() are computed with a maximum error of ~= 0.5 ulp, we find that the maximum error in cot() is less than 2.0 ulps, so slightly larger. This is easily explained by having three sources of error instead of two in the previous formula.
Lastly, cot(x) = tan (M_PI_2 - x) suffers from the issue of subtractive cancellation mentioned earlier when x is near M_PI_2, and from the issue that in finite-precision floating-point arithmetic, M_PI_2 - x == M_PI_2 when x is sufficiently small in magnitude. This can lead to very large errors that leave us with no valid bits in the result.

If you consider the angle between two vectors (v and w), you can also obtain the cotangent as follow (using Eigen::Vector3d):
inline double cot(Eigen::Vector3d v, Eigen::Vector3d w) {
return( v.dot(w) / (v.cross(w).norm()) );
};
With theta the angle between v and w, the above function is correct because:
|v x w| = |v|.|w|.sin(theta)
v . w = |v|.|w|.cos(theta)
cot(theta) = cos(theta) / sin(theta) = (v . w) / |v x w|

Related

The asin() function from cmath library returns inaccurate values

I'm using the asin and acos functions from cmath library to find the angles from their corresponding sine and cosine values but it sometimes returns inaccurate values. For example the result of the code below is not zero:
cout << acos(-1) / 6 - asin(0.5) << endl;
So what can I do? I use acos(-1) as the value of Pi, then somewhere in my code I want to see for example how many asin(0.5) are there in Pi but current method doesn't work.
Inexactness lurks in various places.
double is very commonly encoded using base 2, such as with binary64. As a floating point number, it can be expected to be a Dyadic rational. The key is that a finite floating-point number is a rational number.
Mathematically, arccos(-1) is π. π is an irrational number. Even with many digits of precision, a double cannot represent exactly π. Instead a nearby value for acos(-1) is used called machine pi or perhaps M_PI.
Mathematically, arcsin(0.5) is π/6. As above, asin(0.5) is a nearby value to π/6.
Division by 3 introduces potential inexactness. About 2 of every double/3 results in an inexact, yet rounded, quotient.
Of course it is not zero.
What is curious about this is that acos(-1) / 3 - asin(0.5) is about 1.57..., not 0.0.
I suspect OP was researching acos(-1) / 6 - asin(0.5).
Here subtraction of nearby values incurs catastrophic cancellation and a small, but non-zero difference, is a likely outcome.
So what can I do?
I want to see for example how many asin(0.5) are there in Pi
Tolerate marginal error in the calculation and employ the desired rounding.
double my_pi = acos(-1);
double target = asin(0.5);
long close_integer_quotient = lround(my_pi/target);

Python trig functions not returning matched results

I came across something rather interesting while I am playing around with the math module for trigonometric calculations using tan, sin, and cos.
As stated is all math textbooks, online source, and courses, the following is true:
tan(x) = sin(x) / cos(x)
Although I came across some precision errors while using the three trig functions with the following:
from math import tan, sin, cos
theta = -30
alpha = tan(theta)
omega = sin(theta) / cos(theta)
print(alpha, omega)
print(alpha == omega)
>>> (6.405331196646276, 6.4053311966462765)
>>> (False)
I have tried a couple of different values for theta and the last digit of the results has been off by a tiny bit.
Is there something that I am missing?
This issue is because of the finite floating point precision (not all real numbers can be represented exactly and not all calculations with them are precise). An accessible guide is in the Python docs.
Using the default, "double precision" floating point representation, you can never hope for better than about 15 decimal place precision and calculations involving such numbers will tend to degrade this precision (the rounding error refered to in the above comment). In the same way, you get False from the following:
In [1]: 0.01 == (0.1)**2
Out[1]: False
because the Python isn't squaring 0.1 but the "nearest representable number" to 0.1, which is neither 0.01 nor the nearest representable number to 0.01.
D Stanley has given the correct way to test for "equality" within some absolute tolerance: (abs(a-b) < tol) where tol is some small number you choose to fit your expected precision.
As you have discovered, there is a level of imprecision when comparing floating point numbers. A common way to test for "equality" is to determine a reasonable amount of difference you want to accept (commonly called "epsilon") an compare the difference between the two numbers against that maximum error:
epsilon = 1E-14
print(alpha, omega)
print(alpha == omega)
print(abs(alpha - omega) < epsilon)
First you should notice that the arguments of trigonometric functions are given in arc length, not in degree. Thus theta=-30 refers to an angle of -30*180/pi in degrees.
Second, the processor, and thus the calling math library, has separate internal procedures for the computation of tan and (sin, cos). The extra division operation loses 1/2 to 1 bit of precision, which explains the difference in results.

What is the numerical stability of std::pow() compared to iterated multiplication?

What sort of stability issues arise or are resolved by using std::pow()?
Will it be more stable (or faster, or at all different) in general to implement a simple function to perform log(n) iterated multiplies if the exponent is known to be an integer?
How does std::sqrt(x) compare, stability-wise, to something of the form std::pow(x, k/2)? Would it make sense to choose the method preferred for the above to raise to an integer power, then multiply in a square root, or should I assume that std::pow() is fast and accurate to machine precision for this? If k = 1, is there a difference from std::sqrt()?
How would std::pow(x, k/2) or the method above compare, stability-wise, to an integer exponentiation of std::sqrt(x)?
And as a bonus, what are the speed differences likely to be?
Will it be more stable (or faster, or at all different) in general to implement a simple function to perform log(n) iterated multiplies if the exponent is known to be an integer?
The result of exponentiation by squaring for integer exponents is in general less accurate than pow, but both are stable in the sense that close inputs produce close results. You can expect exponentiation by squaring to introduce 0.5 ULP of relative error by multiplication (for instance, 1 ULP of error for computing x3 as x * x * x).
When the second argument n is statically known to be 2, then by all means implement xn as x * x. In that case it is faster and more accurate than any possible alternative.
How does std::sqrt(x) compare, stability-wise, to something of the form std::pow(x, k/2)
First, the accuracy of sqrt cannot be beat for an IEEE 754 implementation, because sqrt is one of the basic operations that this standard mandates to be as accurate as possible.
But you are not asking about sqrt, you are asking (I think) about <computation of xn> * sqrt(x) as opposed to pow(x, n + 0.5). Again, in general, for a quality implementation of pow, you can expect pow(x, n + 0.5) to be more accurate than the alternatives. Although sqrt(x) would be computed to 0.5 ULP, the multiplication introduces its own approximation of up to 0.5 ULP, and all in all, it is better to obtain the result you are interested in in a single call to a well-implemented function. A quality implementation of pow will give you 1 ULP of accuracy for its result, and the best implementations will “guarantee” 0.5 ULP.
And as a bonus, what are the speed differences likely to be?
If you know in advance that the exponent is going to be a small integer or multiple of 0.5, then you have information that the implementer of pow did not have, so you can beat them by at least the cost of the test to determine that the second argument is a small integer. Plus, the implementer of a quality implementation is aiming for a more accurate result than simple exponentiation by squaring provides. On the other hand, the implementer of pow can use extremely sophisticated techniques to minimize the average execution time despite the better accuracy: see for instance CRlibm's implementation. I put the verb “guarantee” above inside quotes when talking about the best implementations of pow because pow is one function for which CRlibm's 0.5 ULP accuracy guarantee is only “with astronomical probability”.

Should I combine multiplication and division steps when working with floating point values?

I am aware of the precision problems in floats and doubles, which why I am asking this:
If I have a formula such as: (a/PI)*180.0 (where PI is a constant)
Should I combine the division and multiplication, so I can use only one division: a/0.017453292519943295769236, in order to avoid loss of precision ?
Does this make it more precise when it has less steps to calculate the result?
Short answer
Yes, you should in general combine as many multiplications and divisions by constants as possible into one operation. It is (in general(*)) faster and more accurate at the same time.
Neither π nor π/180 nor their inverses are representable exactly as floating-point. For this reason, the computation will involve at least one approximate constant (in addition to the approximation of each of the operations involved).
Because two operations introduce one approximation each, it can be expected to be more accurate to do the whole computation in one operation.
In the case at hand, is division or multiplication better?
Apart from that, it is a question of “luck” whether the relative accuracy to which π/180 can be represented in the floating-point format is better or worse than that of 180/π.
My compiler provides addition precision with the long double type, so I am able to use it as reference for answering this question for double:
~ $ cat t.c
#define PIL 3.141592653589793238462643383279502884197L
#include <stdio.h>
int main() {
long double heop = 180.L / PIL;
long double pohe = PIL / 180.L;
printf("relative acc. of π/180: %Le\n", (pohe - (double) pohe) / pohe);
printf("relative acc. of 180/π: %Le\n", (heop - (double) heop) / heop);
}
~ $ gcc t.c && ./a.out
relative acc. of π/180: 1.688893e-17
relative acc. of 180/π: -3.469703e-17
In usual programming practice, one wouldn't bother and simply multiply by (the floating-point representation of) 180/π, because multiplication is so much faster than division.
As it turns out, in the case of the binary64 floating-point type double almost always maps to, π/180 can be represented with better relative accuracy than 180/π, so π/180 is the constant one should use to optimize accuracy: a / ((double) (π / 180)). With this formula, the total relative error would be approximately the sum of the relative error of the constant (1.688893e-17) and of the relative error of the division (which will depend on the value of a but never be more than 2-53).
Alternative methods for faster and more accurate results
Note that division is so expensive that you could get an even more accurate result faster by using one multiplication and one fma: let heop1 be the best double approximation of 180/π, and heop2 the best double approximation of 180/π - heop1. Then the best value for the result can be computed as:
double r = fma(a, heop1, a * heop2);
The fact that the above is the absolute best possible double approximation to the real computation is a theorem (in fact, it is a theorem with exceptions. The details can be found in the “Handbook of Floating-Point Arithmetic”). But even when the real constant you want to multiply a double by in order to get a double result is one of the exceptions to the theorem, the above computation is still clearly very accurate and only differs from the best double approximation for a few exceptional values of a.
If, like mine, your compiler provides more precision for long double than for double, you can also use one long double multiplication:
// this is more accurate than double division:
double r = (double)((long double) a * 57.295779513082320876798L)
This is not as good as the solution based on fma, but it is good enough that for most values of a, it produces the optimal double approximation to the real computation.
A counter-example to the general claim that operations should be grouped as one
(*) The claim that it is better to group constant is only statistically true for most constants.
If you happened to wish to multiply a by, say, the real constant 0.0000001 * DBL_MIN, you would be better off multiplying first by 0.0000001, then by DBL_MIN, and the end result (which can be a normalized number if a is larger than 1000000 or so) would be more precise than if you had multiplied by the best double representation of 0.0000001 * DBL_MIN. This is because the relative accuracy when representing 0.0000001 * DBL_MIN as a single double value is much worse than the accuracy for representing 0.0000001.

Is the floating point implementation of exp() function equivalent to a truncated Taylor series expansion?

Is the floating point implementation of exp() function in cmath equivalent to a truncated Taylor series expansion of a very high order? One possible source of the error we should keep in mind is the finiteness of the number of bits to represent the answer
Is the floating point implementation of exp() function in cmath equivalent to a truncated Taylor series expansion of a very high order?
Equivalent to? Yes. That's because any decent implementation of exp() has an error of half an ULP (unit of least precision) or so. Ignoring problems with finite precision arithmetic, one can always construct a truncated Taylor series that does the same.
However, no decent implementation of exp() will use a Taylor expansion. That would be very very slow, and wouldn't achieve the desired accuracy. It would be a downright stupid implementation. Much better is to use the fact that there is a strong relation between 2x and ex and the fact that 2x is fairly easy to compute given the almost universal power of 2 representation of floating point numbers.
Just an example how you could calculate exp (x):
If x is quite large then the result is +inf. If x is quite small then the result is 0.
Let k = round (x / ln 2). Then exp (x) = 2^k * exp (x - k ln 2). 2^k is very easy to calculate. A small problem is to calculate x - k ln 2 without any rounding error. That's quite easy: Let L1 = ln 2 rounded to say 35 bits, and L2 = ln 2 - L1. k is a smallish integer, so k * L1 has no rounding error, nor has x - k * L1; then we subtract k * L2 which is small and therefore has little rounding error.
To do this quicker (without a division), we calculate k = round (x * (1 / ln 2)). And we check whether x is close to zero, so the whole calculation isn't needed. Anyway, we now calculate exp (x) where the result is between sqrt (1/2) and sqrt (2).
You could calculate exp (x) using a Taylor polynomial. Instead you would probably use a Chebychev polynomial minimising the cutoff error with a much lower degree. With some care you can find a polynomial with a cutoff error substantially less than the lowest bit of the result.
It depends on the implementation of the compiler, C runtime and the processor. However, whoever computes the exponent is unlikely to use the Taylor expansion since better methods exist.
As per glibc, it may use its own implementation which says this in the comment (from sysdeps/ieee754/dbl-64/e_exp.c):
/* An ultimate exp routine. Given an IEEE double machine number x */
/* it computes the correctly rounded (to nearest) value of e^x */
Or it may use hardware supported processor instructions for floating point computations, as with x86 FPU. In both cases you are likely to get a correctly rounded value with full precision.
That's dependent of which C library implementation you're using. In the overy popular glibc, it isn't.