Restoring the exact angle from std::cos(angle) using std::acos - c++

Is it guaranteed by the C++ standard that angle == std::acos(std::cos(angle)) if angle is in the range [0, Pi], or in other words is it possible to restore the exact original value of angle from the result of std::cos using std::acos given the mentioned range limit?
The marginal cases when angle is infinity or NaN are omitted.

Answer by StoryTeller:
The standard cannot make that guarantee, simply because the result of std::cos may not be representable exactly by a double, so you get a truncation error, which will affect the result of std::acos.

From cppreference.com:
” If no errors occur, [acos returns] the arc cosine of arg (arccos(arg)) in the range [0 ; π]
In degrees, that's 0 to 180, inclusive, corresponding to cosine values 1 down through -1, inclusive.
Outside that range you can't even get an approximate correspondence. Computing the cosine discards information about which angle you had outside of that range. There's no way to get that information back.
How information is discarded:
First, in degrees, cos(x) = cos(K*360 + x), for arbitrary integer K. Secondly, cos(x) = cos(-x). This adds up to an awful lot of angle values that produce the same cosine value.
Also, even though all readers likely know this, but for completeness: since sines are cosines are very irrational numbers, generally not simple fractions, you can't expect exact results except for maybe cosine 1, which corresponds to 0 degrees.

According to the standard:
This International Standard imposes no requirements on the accuracy
of floating-point operations; see also 18.3.2. — end note ]
http://open-std.org/JTC1/SC22/WG21/docs/papers/2016/n4606.pdf

Even mathematically this is impossible. For example, cos(2*PI) is 0, but so is cos(4*PI).

Related

How can I effectively calculate the phase angle of a complex number that is (essentially) equal to zero?

I'm writing a C++ program that takes the FFT of a real input signal containing double values and returns a vector X containing std::complex<double> values. Once I have the vector of results I then attempt to calculate the magnitude and phase of the result.
I am running into an issue with calculating the phase angle when one of the outputs is "zero". Zero is in quotes because when a calculation that results in 0 returns a double, the returned value will be very near zero, but not quite exactly zero.
For example, at index 3 my output array has the calculated "zero" value:
X[3] = 3.0531133177191805e-16 - i*5.5511151231257827e-17
I am trying to use the standard library std::arg function that is supposed to return the phase angle of a complex number. std::arg(X[3])
While X[3] is essentially 0, it is not EXACTLY 0 and the way phase is calculated this causes a problem because the calculation uses the ratio of the imaginary part divided by the ratio of the real part which is far from 0!
Doing the actual calculation results in a far from desirable result.
How can I make C++ realize that the result is really 0 so I can get the correct phase angle?
I'm looking for a more elegant solution than using an arbitrary hard-coded "epsilon" value to compare the double to, but so far searching online I haven't had any luck coming up with something better.
If you are computing the floating-point FFT of an input signal, then that signal will include noise, thus have a signal-to-noise ratio, including sensor noise, thermal noise, quantization noise, timing jitter noise, etc.
Thus the threshold for discarding FFT results as below your noise floor most likely isn't a matter of computational mathematics, but part of your physical or electronic data acquisition analysis. You will have to plug that number in, and set the phase to 0.0 or NaN or whatever your default flagging value is for a non-useful (at or below the noise floor) FFT result.
It was brought to my attention that my original answer will not work when the input to the FFT has been scaled. I believe I have an actual valid solution now... The original answer is kept below so that the comments still make sense.
From the comments on this answer and others, I've gathered that calculating the exact rounding error in the language may technically be possible but it is definitely not practical. The best practical solution seems to be to allow the user to provide their own noise threshold (in dB) and ignore any data points whose power level falls below that threshold. It would be impossible to come up with a generic threshold for all situations, but the user can provide a reasonable threshold based on the signal-to-noise ratio of the signal being analyzed and pass that in.
A generic phase calculation function is shown below that calculates the phase angles for a vector of complex data points.
std::vector<double> Phase(std::vector<std::complex<double>> X, double threshold, double amplitude)
{
size_t N = X.size();
std::vector<double> X_phase(N);
std::transform(X.begin(), X.end(), X_phase.begin(), [threshold, amplitude](const std::complex<double>& value) {
double level = 10.0 * std::log10(std::norm(value) / std::pow(amplitude, 2.0));
return level > threshold ? std::arg(value) : 0.0;
});
return X_phase;
}
This function takes 3 arguments:
The vector of complex signal data you want to calculate the phase of.
A sensible threshold -- Can be calculated from the signal-to-noise ratio of whatever measurement device was used to capture the signal. If your signal contains no noise other than the rounding errors of the language itself you can set this to some arbitrary really low value, like -120dB.
The maximum possible amplitude of your input signal. If your signal is calculated, this should simply be set to the amplitude of your signal. If your signal is measured, this should be set to the maximum amplitude the measuring device is capable of measuring (If your signal comes from reading an audio file, often its data will be normalized between -1.0 and 1.0. In this case you would just set the amplitude value to 1.0).
This new implementation still provides me with the correct results, but is much more robust. By leaving the threshold calculation to the user they can set the most sensible value themselves based on the characteristics of the measurement device used to measure their input signal.
Please let me know if you notice any errors or any ways I can further improve the design!
Original Answer
I found a solution that seems generic enough.
In the #include <limits> header, there is a constant value for std::numeric_limits<double>::digits10.
According the the documentation:
The value of std::numeric_limits<T>::digits10 is the number of base-10 digits that can be represented by the type T without change, that is, any number with this many significant decimal digits can be converted to a value of type T and back to decimal form, without change due to rounding or overflow.
Using this I can filter out any output values that have a magnitude lower than this limit:
Calculate the phase of X[3]:
int N = X.size();
auto tmp = std::abs(X[3])/N > std::pow(10, -std::numeric_limits<double>::digits10)
? value
: 0.0
double phase = std::arg(tmp);
This effectively filters out any values that are not precisely zero due to rounding errors within the C++ language itself. It will NOT however filter out garbage data caused by noise in the input signal.
After adding this to my phase calculation I get the expected results.
The map from complex numbers to magnitude and phase is discontinuous at 0.
This is a discontinuity caused by the choice of coordinates you are using.
The solution will depend on why you chose those coordinates in a situation where values near the discontinuity are possible.
It isn't "really" zero. If you factored in error bars properly, your answer would really be a small magnitude (hopefully) and a unconstrained angle.

Fortran round-off errors

I have simple code, which flags nodes with in region enclosed by cylinder. On implementing the code, the result is mild tilt of the cylinder observed case with 90 degrees
The actual issue:
The above algorithm is implemented in Fortran. The code checks for points in Cartesian grid if inside the cylinder. Following being the test case:
The cylinder makes an angle 90 degrees in the yz-plane with respect to y-axis. Therefore, the orientation vector $\vec{o}$ is (0, 1, 0).
Case 1:
Orientation vector is assigned directly with $\vec{o}=(0.0,1.0,0.0)$. This results in perfect cylinder with $\theta=90.$
Case 2:
Orientation vector is specified with intrinsic Fortran functions with double precision accuracy dsin and dcos with $\vec{o}=(0.0, \sin(\pi/2.0), \cos(\pi/2.0))$ with $\pi$ value assigned with more than 20 significant decimal points. The resulting cylinder results in mild tilt.
The highlighted region indicates the extra material due to tilt of cylinder with respect to Cartesian axes. I also tried architecture specific maximum precision "pi" value. This also results in same problem.
This shows like the actual angle made by cylinder is not 90 degrees. Can anyone suggest valid solution for this problem. I need to use the inbuilt trigonometric functions for arbitrary angles and looking for accurate cell flagging method.
Note: All operations are performed with double precision accuracy.
The actual function is below. rk is defined parameter with value 8
pure logical function in_particle(p,px,x)
type(md_particle_type),intent(in) :: p
real(kind=rk),intent(in) :: px(3),x(3)
real(kind=rk) :: r(3),rho(3),rop(2),ro2,rdiff,u
rop = particle_radii(p) ! (/R_orth,R_para/)
ro2 = rop(1)**2
rdiff = rop(2) - rop(1)
r = x-px
! Case 1:
! u = dot_product((/0.0_rk,-1.0_rk,0.0_rk/),r)
! rho = r-u*(/0.0_rk,-1.0_rk,0.0_rk/)
! Case 2:
u = dot_product((/0.0_rk,-dsin(pi/2.0_rk),dcos(pi/2.0_rk)/),r)
rho = r-u*(/0.0_rk,-dsin(pi/2.0_rk),dcos(pi/2.0_rk)/)
if((u.le.rdiff).and.(u.ge.-rdiff)) then
in_particle = dot_product(rho,rho) < ro2
else
in_particle = .false.
end if
end function in_particle
Note: The trigonometric operations are done inside the code to explain the problem better. However the original code reads the orientation in vector form from user. Then converts this information to quaternions for particle-particle collision operations. On converting quaternions back to orientation vector, this error is even more amplified. Even before the start of collision, the orientation of cylinder tends to be disoriented by 2 lattice cells.
cos(pi/2) is not necessarily going to give you exactly 0, no matter how exact you make the cos calculation, and no matter how many digits of pi you have, because:
pi, as an irrational number, will contain up to 1/2 ulp of error when represented as an FP number; and
sin and cos are not guaranteed by the IEEE-754 standard to be correctly rounded (or even implemented).
Now, sin(pi/2) is extremely likely to come out as 1 regardless of precision and FP architecture, simply because sin has such a low derivative around 1; with single-precision floats, it should come out to 1 if you're anywhere within about 3e-4 of the exact value of pi/2. The problematic call is the cos, which has lots of precision to play with around 0 and a derivative of about -1 in the neighborhood.
Still, we're talking about extremely small values here. I think what's really potentiating the problem here is the in/out test you're doing, combined with ordinary FP rounding rules. I would guess, in fact, that if you were to bias your test points by, say, a quarter of the grid quantum, you'd see all straight verticals in your voxelization (though it might not be symmetrical around the minor axes).
Another option would be to actually discard some precision from your sin/cos calculation before doing the dot product, effectively quantizing your axes.
Short answer: Create a table of sin and cos of common angles (0, pi/6, pi/4, pi/3, pi/2, pi and their multiples) and compute only for uncommon angles. The reason being that errors with uncommon angles will be tolerated by most people while errors with common angles will likely not be tolerated.
Explanation:
Because floating point computation is not exact (that is its nature), you sometime need a little bit of compromise between the accuracy and the readability of the code.
One way of doing that is to avoid to compute something that is known exactly. To do that, you can check the value of the angle and do the actual computation only if it is not an obvious angle. For example angle 0, 90, 180 and 270 degrees have obvious values of sin and cos. More generally, the cos and sin of common angles (0, pi/6, pi/4, pi/3, pi/2, pi and their multiples) are known exactly (even if they are irrational numbers).

Can you get a "nan" from overflow in C++?

I'm writing a program that uses a very long recursion (about 50,000) and some very large vectors (also 50,000 in length of type double) to store the result of each recursion before averaging them. At the end of the program, I expect to get a number output.
However, some of the results I got was "nan". The mysterious thing is, if I reduce the number of recursions the program will work just fine. So I'm guessing this might be something to do with the size of the vector. So my question is, if you get an overflow in a very long vector (or say array), what will be the effect? Will you get an "nan" just like in my case?
Another mysterious thing about my program is that I have tried some even larger recursions (100,000), but the output was normal. But when I changed a parameter value, so that each numbers stored in the vector will become larger (although they are still of type double), the output becomes "nan". Will the maximum capacity of a vector be dependent on the size of the number it stores?
You didn't tell us what your recursion is, but it is fairly easy to generate NaNs with a long sequence of operations if you are using square root, pow, inverse sine, or inverse cosine.
Suppose your calculation produces a quantity, call it x, that is supposed to be the sine of some angle θ, and suppose the underlying math dictates that x must always be between -1 and 1, inclusive. You calculate θ by taking the inverse sine of x.
Here's the problem: Arithmetic done on a computer is but an approximation of the arithmetic of the real numbers. Addition and multiplication with IEEE floating point numbers are not transitive. You might well get a value of 1.0000000000000002 for x instead of 1. Take the inverse sine of this value and you get a NaN.
A standard trick is to protect against those near misses that result from numerical errors. Don't use the built-in asin, acos, sqrt, and pow. Use wrappers that protects against things like asin(1.0000000000000002) and sqrt(-1e-16). Make the former pi/2 rather than NaN, and make the latter zero. This is admittedly a kludge, and doing this can get you in trouble. What if the problem is that your calculations are formulated incorrectly? It's legitimate to treat 1.0000000000000002 as 1, but it's best not to treat a value of 100 as if it were 1. A value of 100 to your asin wrapper is best treated by throwing an exception rather than truncating to 1.
There's one other problem with infinities and NaNs: They propagate. An Inf or NaN in one single computation quickly becomes an Inf or a NaN in hundreds, then thousands of values. I usually make the floating point machinery raise a floating point exception on obtaining an Inf or NaN instead of continuing on. (Note well: Floating point exceptions are not C++ exceptions.) When you do this, your program will bomb unless you have a signal handler in place. That's not necessarily a bad thing. You can run the program in the debugger and find exactly where the problem arose. Without these floating point exceptions it is very hard to find the source of the problem.
Depends on the exact natur of your computations. If you just add up numbers which aren't NaN, the result shouldn't be NaN, either. It might be +infinity, though.
But you will get NaN if e.g. some part of your computation yields +infinity, another -infinity, and you later add those two results.
Assuming that your architecture conforms to IEEE 754, this http://en.wikipedia.org/wiki/NaN#Creation tells the situations in which arithmetic operations return NaN.

"lossless" float to BYTE conversion

I'm using libnoise to generate Perlin noise on a 1024x1024 terrain grid. I want to convert its float output to a BYTE between 0 and 255. The question is ultimately a math one: how can I convert a value in the real interval (-1,1) to the integer one (0,255) minimizing loss?
This formula will give you a number in [0, 255] if your input range excludes the endpoints:
(int)((x + 1.0) * (256.0 / 2.0))
If you are including the endpoints (and then it usually written [-1,1] rather than (-1,1)) you will need a special case for x == 1.0 to round that down to 255.
The best way might depend on the distribution of floats in (-1,1); if in some areas there are more of them and in some there are less, you might want to increase the "precision" in the former at the expense of the latter. Basically, if you have a probability function for the output defined at this interval, you may split (0,1) - the interval of probability values - to 256 equal sub-intervals, and for each given float you calculate into which sub-interval its probability function value falls. For noise the probability function is (or at least should be) close to linear, so perhaps the answer of Mark Byers is the way to go.

How do I check and handle numbers very close to zero

I have some math (in C++) which seems to be generating some very small, near zero, numbers (I suspect the trig function calls are my real problem), but I'd like to detect these cases so that I can study them in more detail.
I'm currently trying out the following, is it correct?
if ( std::abs(x) < DBL_MIN ) {
log_debug("detected small num, %Le, %Le", x, y);
}
Second, the nature of the mathematics is trigonometric in nature (aka using a lot of radian/degree conversions and sin/cos/tan calls, etc), what sort of transformations can I do to avoid mathematical errors?
Obviously for multiplications I can use a log transform - what else?
Contrary to widespread belief, DBL_MIN is not the smallest positive double value but the smallest positive normalized double value. Typically - for 64-bit ieee754 doubles - it's 2-1022, while the smallest positive double value is 2-1074. Therefore
I'm currently trying out the following, is it correct?
if ( std::abs(x) < DBL_MIN ) {
log_debug("detected small num, %Le, %Le", x, y);
}
may have an affirmative answer. The condition checks whether x is a denormalized (also called subnormal) number or ±0.0. Without knowing more about your specific situation, I cannot tell if that test is appropriate. Denormalized numbers can be legitimate results of calculations or the consequence of rounding where the correct result would be 0. It is also possible that rounding produces numbers of far greater magnitude than DBL_MIN when the mathematically correct result would be 0, so a much larger threshold could be sensible.
If x is a double, then one problem with this approach is that you can't distinguish between x being legitimately zero, and x being a positive value smaller than DBL_MIN. So this will work if you know x can never be legitimately zero, and you want to see when underflow occurs.
You could also try catching the SIGFPE signal, which will fire on a POSIX-compliant system any time there's a math error including floating-point underflow. See: http://en.wikipedia.org/wiki/SIGFPE
EDIT: To be clear, DBL_MIN is NOT the largest negative value that a double can hold, it is the smallest positive normalized value that a double can hold. So your approach is fine as long as the value can't be zero.
Another useful constant is DBL_EPSILON which is the smallest double value that can be added to 1.0 without getting 1.0 back. Note that this is a much larger value than DBL_MIN. But it may be useful to you since you're doing trigonometric functions that may tend toward 1 instead of tending toward 0.
Since you are using C++, the most idiomatic is to use std::numeric_limits from header <limits>.
For instance:
template <typename T>
bool is_close_to_zero(T x)
{
return std::abs(x) < std::numeric_limits<T>::epsilon();
}
The actual tolerance to be used heavily depends on your problem. Please complete your question with a concrete use case so that I can enhance my answer.
There is also std::numeric_limits<T>::min() and std::numeric_limits<T>::denorm_min() that may be useful. The first one is the smallest positive non-denormalized value of type T (equal to FLT/DBL/LDBL_MIN from <cfloat>), the second one is the smallest positive value of type T (no <cfloat> equivalent).
[You may find this document useful to read if you aren't at ease with floating point numbers representation.]
The first if check will actually only be true when your value is zero.
For your second question, you imply lots of conversions. Instead, pick one unit (deg or rad) and do all your computational operations in that unit. Then at the very end do a single conversion to the other value if you need to.