I have an application that requires very low precision (within 2 degrees) and very high speed to determine the angle of a line given rise/run. Specifically, the precision is really only needed closer to the x axis (below 45 or above 135 degrees), which I think is easier to accomplish because as the angle nears 90 it approaches an undefined value. Currently, I use atan2 from the math.h library, but I would like something faster.
I have seen this example and think a lookup table for atan would suffice, however its much more tricky to make one for an arctan than tan, as I have to think in terms of the slope and how it can be represented as an integer so it can be used as an indices of the table.
Has anyone done this before? I'm thinking I need to have some sort of scale factor, so when I take rise/run and get my slope as a decimal I may have to multiply it by a constant value, otherwise everything below 45 will be 0 degrees. However, in this case I sacrifice a lot of accuracy above 45 degrees. Really, I do not need to distinguish between anything between 75-105 degrees. But in the 30/160 degree range it would be good to be able to have accuracy at 2-3 degrees.
Related
I'm writing a C++ program that takes the FFT of a real input signal containing double values and returns a vector X containing std::complex<double> values. Once I have the vector of results I then attempt to calculate the magnitude and phase of the result.
I am running into an issue with calculating the phase angle when one of the outputs is "zero". Zero is in quotes because when a calculation that results in 0 returns a double, the returned value will be very near zero, but not quite exactly zero.
For example, at index 3 my output array has the calculated "zero" value:
X[3] = 3.0531133177191805e-16 - i*5.5511151231257827e-17
I am trying to use the standard library std::arg function that is supposed to return the phase angle of a complex number. std::arg(X[3])
While X[3] is essentially 0, it is not EXACTLY 0 and the way phase is calculated this causes a problem because the calculation uses the ratio of the imaginary part divided by the ratio of the real part which is far from 0!
Doing the actual calculation results in a far from desirable result.
How can I make C++ realize that the result is really 0 so I can get the correct phase angle?
I'm looking for a more elegant solution than using an arbitrary hard-coded "epsilon" value to compare the double to, but so far searching online I haven't had any luck coming up with something better.
If you are computing the floating-point FFT of an input signal, then that signal will include noise, thus have a signal-to-noise ratio, including sensor noise, thermal noise, quantization noise, timing jitter noise, etc.
Thus the threshold for discarding FFT results as below your noise floor most likely isn't a matter of computational mathematics, but part of your physical or electronic data acquisition analysis. You will have to plug that number in, and set the phase to 0.0 or NaN or whatever your default flagging value is for a non-useful (at or below the noise floor) FFT result.
It was brought to my attention that my original answer will not work when the input to the FFT has been scaled. I believe I have an actual valid solution now... The original answer is kept below so that the comments still make sense.
From the comments on this answer and others, I've gathered that calculating the exact rounding error in the language may technically be possible but it is definitely not practical. The best practical solution seems to be to allow the user to provide their own noise threshold (in dB) and ignore any data points whose power level falls below that threshold. It would be impossible to come up with a generic threshold for all situations, but the user can provide a reasonable threshold based on the signal-to-noise ratio of the signal being analyzed and pass that in.
A generic phase calculation function is shown below that calculates the phase angles for a vector of complex data points.
std::vector<double> Phase(std::vector<std::complex<double>> X, double threshold, double amplitude)
{
size_t N = X.size();
std::vector<double> X_phase(N);
std::transform(X.begin(), X.end(), X_phase.begin(), [threshold, amplitude](const std::complex<double>& value) {
double level = 10.0 * std::log10(std::norm(value) / std::pow(amplitude, 2.0));
return level > threshold ? std::arg(value) : 0.0;
});
return X_phase;
}
This function takes 3 arguments:
The vector of complex signal data you want to calculate the phase of.
A sensible threshold -- Can be calculated from the signal-to-noise ratio of whatever measurement device was used to capture the signal. If your signal contains no noise other than the rounding errors of the language itself you can set this to some arbitrary really low value, like -120dB.
The maximum possible amplitude of your input signal. If your signal is calculated, this should simply be set to the amplitude of your signal. If your signal is measured, this should be set to the maximum amplitude the measuring device is capable of measuring (If your signal comes from reading an audio file, often its data will be normalized between -1.0 and 1.0. In this case you would just set the amplitude value to 1.0).
This new implementation still provides me with the correct results, but is much more robust. By leaving the threshold calculation to the user they can set the most sensible value themselves based on the characteristics of the measurement device used to measure their input signal.
Please let me know if you notice any errors or any ways I can further improve the design!
Original Answer
I found a solution that seems generic enough.
In the #include <limits> header, there is a constant value for std::numeric_limits<double>::digits10.
According the the documentation:
The value of std::numeric_limits<T>::digits10 is the number of base-10 digits that can be represented by the type T without change, that is, any number with this many significant decimal digits can be converted to a value of type T and back to decimal form, without change due to rounding or overflow.
Using this I can filter out any output values that have a magnitude lower than this limit:
Calculate the phase of X[3]:
int N = X.size();
auto tmp = std::abs(X[3])/N > std::pow(10, -std::numeric_limits<double>::digits10)
? value
: 0.0
double phase = std::arg(tmp);
This effectively filters out any values that are not precisely zero due to rounding errors within the C++ language itself. It will NOT however filter out garbage data caused by noise in the input signal.
After adding this to my phase calculation I get the expected results.
The map from complex numbers to magnitude and phase is discontinuous at 0.
This is a discontinuity caused by the choice of coordinates you are using.
The solution will depend on why you chose those coordinates in a situation where values near the discontinuity are possible.
It isn't "really" zero. If you factored in error bars properly, your answer would really be a small magnitude (hopefully) and a unconstrained angle.
I have simple code, which flags nodes with in region enclosed by cylinder. On implementing the code, the result is mild tilt of the cylinder observed case with 90 degrees
The actual issue:
The above algorithm is implemented in Fortran. The code checks for points in Cartesian grid if inside the cylinder. Following being the test case:
The cylinder makes an angle 90 degrees in the yz-plane with respect to y-axis. Therefore, the orientation vector $\vec{o}$ is (0, 1, 0).
Case 1:
Orientation vector is assigned directly with $\vec{o}=(0.0,1.0,0.0)$. This results in perfect cylinder with $\theta=90.$
Case 2:
Orientation vector is specified with intrinsic Fortran functions with double precision accuracy dsin and dcos with $\vec{o}=(0.0, \sin(\pi/2.0), \cos(\pi/2.0))$ with $\pi$ value assigned with more than 20 significant decimal points. The resulting cylinder results in mild tilt.
The highlighted region indicates the extra material due to tilt of cylinder with respect to Cartesian axes. I also tried architecture specific maximum precision "pi" value. This also results in same problem.
This shows like the actual angle made by cylinder is not 90 degrees. Can anyone suggest valid solution for this problem. I need to use the inbuilt trigonometric functions for arbitrary angles and looking for accurate cell flagging method.
Note: All operations are performed with double precision accuracy.
The actual function is below. rk is defined parameter with value 8
pure logical function in_particle(p,px,x)
type(md_particle_type),intent(in) :: p
real(kind=rk),intent(in) :: px(3),x(3)
real(kind=rk) :: r(3),rho(3),rop(2),ro2,rdiff,u
rop = particle_radii(p) ! (/R_orth,R_para/)
ro2 = rop(1)**2
rdiff = rop(2) - rop(1)
r = x-px
! Case 1:
! u = dot_product((/0.0_rk,-1.0_rk,0.0_rk/),r)
! rho = r-u*(/0.0_rk,-1.0_rk,0.0_rk/)
! Case 2:
u = dot_product((/0.0_rk,-dsin(pi/2.0_rk),dcos(pi/2.0_rk)/),r)
rho = r-u*(/0.0_rk,-dsin(pi/2.0_rk),dcos(pi/2.0_rk)/)
if((u.le.rdiff).and.(u.ge.-rdiff)) then
in_particle = dot_product(rho,rho) < ro2
else
in_particle = .false.
end if
end function in_particle
Note: The trigonometric operations are done inside the code to explain the problem better. However the original code reads the orientation in vector form from user. Then converts this information to quaternions for particle-particle collision operations. On converting quaternions back to orientation vector, this error is even more amplified. Even before the start of collision, the orientation of cylinder tends to be disoriented by 2 lattice cells.
cos(pi/2) is not necessarily going to give you exactly 0, no matter how exact you make the cos calculation, and no matter how many digits of pi you have, because:
pi, as an irrational number, will contain up to 1/2 ulp of error when represented as an FP number; and
sin and cos are not guaranteed by the IEEE-754 standard to be correctly rounded (or even implemented).
Now, sin(pi/2) is extremely likely to come out as 1 regardless of precision and FP architecture, simply because sin has such a low derivative around 1; with single-precision floats, it should come out to 1 if you're anywhere within about 3e-4 of the exact value of pi/2. The problematic call is the cos, which has lots of precision to play with around 0 and a derivative of about -1 in the neighborhood.
Still, we're talking about extremely small values here. I think what's really potentiating the problem here is the in/out test you're doing, combined with ordinary FP rounding rules. I would guess, in fact, that if you were to bias your test points by, say, a quarter of the grid quantum, you'd see all straight verticals in your voxelization (though it might not be symmetrical around the minor axes).
Another option would be to actually discard some precision from your sin/cos calculation before doing the dot product, effectively quantizing your axes.
Short answer: Create a table of sin and cos of common angles (0, pi/6, pi/4, pi/3, pi/2, pi and their multiples) and compute only for uncommon angles. The reason being that errors with uncommon angles will be tolerated by most people while errors with common angles will likely not be tolerated.
Explanation:
Because floating point computation is not exact (that is its nature), you sometime need a little bit of compromise between the accuracy and the readability of the code.
One way of doing that is to avoid to compute something that is known exactly. To do that, you can check the value of the angle and do the actual computation only if it is not an obvious angle. For example angle 0, 90, 180 and 270 degrees have obvious values of sin and cos. More generally, the cos and sin of common angles (0, pi/6, pi/4, pi/3, pi/2, pi and their multiples) are known exactly (even if they are irrational numbers).
I'm using floats to represent a position in my game:
struct Position
{
float x;
float y;
};
I'm wondering if this is the best choice and what the consequences will be as the position values continue to grow larger. I took some time to brush up on how floats are stored and realized that I am a little confused.
(I'm using Microsoft Visual C++ compiler.)
In float.h, FLT_MAX is defined as follows:
#define FLT_MAX 3.402823466e+38F /* max value */
which is 340282346600000000000000000000000000000.
That value is much greater than UINT_MAX which is defined as:
#define UINT_MAX 0xffffffff
and corresponds to the value 4294967295.
Based on this, it seems like a float would be a good choice to store a very large number like a position. Even though FLT_MAX is very large, I'm wondering how the precision issues will come into play.
Based on my understanding, a float uses 1 bit to store the sign, 8 bits to store the exponent, and 23 bits to store the mantissa (a leading 1 is assumed):
S EEEEEEEE MMMMMMMMMMMMMMMMMMMMMMM
That means FLT_MAX might look like:
0 11111111 11111111111111111111111
which would be the equivalent of:
1.11111111111111111111111 x 2^128
or
111111111111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Even knowing this, I have trouble visualizing the loss of precision and I'm getting confused thinking about what will happen as the values continue to increase.
Is there any easier way to think about this? Are floats or doubles generally used to store very large numbers over something like an unsigned int?
A way of thinking about the precision of a float, is to consider that they have roughly 5 digits of accuracy. So if your units are meters, and you have something 1km away, thats 1000m - attempting to deal with that object at a resolution of 10cm (0.1m) or less may be problematic.
The usual approach in a game would be to use floats, but to divide the world up such that positions are relative to local co-ordinate systems (for example, divide the world into a grid, and for each grid square have a translation value). Everything will have enough precision until it gets transformed relative to the camera for rendering, at which point the imprecision for far away things is not a problem.
As an example, imagine a game set in the solar system. If the origin of your co-ordinate system is in the heart of the sun, then co-ordinates on the surface of planets will be impossible to represent accurately in a float. However if you instead have a co-ordinate system relative to the planet's surface, which in turn is relative to the center of the planet, and then you know where the planet is relative to the sun, you can operate on things in a local space with accuracy, and then transform into whatever space you want for rendering.
No, they're not.
Let's say your position needs to increase by 10 cm for a certain frame since the game object moved.
Assuming a game world scaled in meters, this is 0.10. But if your float value is large enough it won't be able to represent a difference of 0.10 any more, and your attempt to increase the value will simply fail.
Do you need to store a value greater than 16.7m with a fractional part? Then float will be too small.
This series by Bruce Dawson may help.
If you really need to handle very large numbers, then consider using an arbitrary-precision arithmetic library. You will have to profile your code because these libraries are slower than the arithmetics of built-in types.
It is possible that you do not really need very large coordinate values. For example, you could wrap around the edges of your world, and use modulo arithmetic for handling positions.
Say I have a huge floating number, say a trillion decimal places out. Obviously a long double can't hold this. Let's also assume I have a computer with more than enough memory to hold it. How do you do something like this?
You need arbitrary-precision arithmetic.
Arbitrary-precision math.
It's easy to say "arbitrary precision arithmetic" (or something similar), but I think it's worth adding that it's difficult to conceive of ways to put numbers anywhere close to this size to use.
Just for example: the current estimates of the size of the universe are somewhere in the vicinity of 150-200 billion light years. At the opposite end of the spectrum, the diameter of a single electron is estimated at a little less than 1 atometer. 1 light year is roughly 9.46x1015 meters (for simplicity, we'll treat it as 1016 meters).
So, let's take 1 atometer as our unit, and figure out the size of number for the diameter of the universe in that unit. 1018 units/meter * 1016 meters/light year * 1011 light years/universe diameter = about a 45 digit number to express the diameter of the universe in units of roughly the diameter of an electron.
Even if we went the next step, and expressed it in terms of the theorized size of a superstring, and added a few extra digits just in case the current estimates are off by a couple orders of magnitude, we'd still end up with a number around 65 digits or so.
This means, for example, that if we knew the diameter of the universe to the size of a single superstring, and we wanted to compute something like volume of the universe in terms of superstring diameters, our largest intermediate result would be something like 600-700 digits or so.
Consider another salient point: if you were to program a 64-bit computer running at, say, 10 GHz to do nothing but count -- increment a register once per clock cycle -- it would take roughly 1400 years for it to just cycle through the 64-bit numbers so it wrapped around to 0 again.
The bottom line is that it's incredibly difficult to come up with excuses (much less real reasons) to carry out calculations to anywhere close to millions, billions/milliards or trillions/billions of digits. The universe isn't that big, doesn't contain that many atoms, etc.
Sounds like what logarithms were invented for.
Without knowing what you intend to do with the number, it's impossible to accurately say how to represent it.
I've implemented a plotting class that is currently capable of handling integer values only. I would like to get advice about techniques/mechanisms in order to handle floating numbers. Library used is GDI.
Thanks,
Adi
At some point, they need to be converted to integers to draw actual pixels.
Generally speaking, however, you do not want to just cast each float to int, and draw -- you'll almost certainly get a mess. Instead, you need/want to scale the floats, then round the scaled value to an integer. In most cases, you'll want to make the scaling factor variable so the user can zoom in and out as needed.
Another possibility is to let the hardware handle most of the work -- you could use OpenGL (for one example) to render your points, leaving them as floating point internally, and letting the driver/hardware handle issues like scaling and conversion to integers. This has a rather steep cost up-front (learning enough OpenGL to get it to do anything useful), but can have a fairly substantial payoff as well, such as fast, hardware-based rendering, and making it relatively easy to handle some things like scaling and (if you ever need it) being able to display 3D points as easily as 2D.
Edit:(mostly response to comment): Ultimately it comes down to this: the resolution of a screen is lower than the resolution of a floating point number. For example, a really high resolution screen might display 2048 pixels horizontally -- that's 11 bits of resolution. Even a single precision floating point number has around 24 bits of precision. No matter how you do it, reducing 24-bit resolution to 12-bit resolution is going to lose something -- usually a lot.
That's why you pretty nearly have to make your scaling factor variable -- so the user can choose whether to zoom out and see the whole picture with reduced resolution, or zoom in to see a small part at high resolution.
Since sub-pixel resolution was mentioned: it does help, but only a little. It's not going to resolve a thousand different items that map to a single pixel.
What do these float values represent? I will assume they are some co-ordinates. You will need to know two things:
The source resolution (i.e. the dpi at which these co-ordinates are drawn)
The range that you need to address
After that, this becomes a problem of scaling the points to suitable integer co-ordinates (based on your screen-resolution).
Edit: A simple formula will be:
X(dst) = X(src) * DPI(dst) / DPI(src)
You'll have to convert them to integers and then pass them to functions like MoveTo() and LineTo().
Scale. For example, multiply all the integral values by 10. Multiply the floating point values by 10.0 and then truncate or round (your choice). Now plot as normal.
This will give you extra precision in your graphing. Just remember the scale factor when you look at the picture.
Otherwise convert the floats to int before plotting.
You can try to use GDI+ instead GDI, it has functions that are using float coordinates.