There are analytical expressions that permit the calculation of the curve resulting from the overlapping between three penetrating spheres. There are also approximate methods that using grids or another methodologies, calculate points with more or less accuracy that belong to this interesection. I wonder if for the latter, the calculation can be done somehow using special hardware functions from the GPU, with CUDA or OpenGL. I need it for a very computing intensive number crunching program, so trivial implementations are not valid since they are very slow, and that is why I consider the GPU option
To test if a point (x, y, z) is in a sphere centered on (a, b, c) with radius r, the test is:
(x - a)^2 + (y - b)^2 + (z - c)^2 < r^2
Testing if the point is in multiple spheres is just an and of similar expressions. This only requires subtraction, multiplication, and comparison, no special hardware functions needed. You can write a CUDA kernel that does this with no problem.
The closest "specialized hardware" that might be applicable is the rsqrtf() function in CUDA, which computes 1/sqrt(x) in single precision to good accuracy with a single hardware instruction. You might use this to help calculate z values given x and y values of spheres, this could be useful for more sophisticated point generation algorithms for this problem.
Related
Using spherical harmonics for lighting I faced a problem for a big enough bandwidths. The correctness of an approximation by first n^2 terms became worse and worse starting from n=7. I look into associated Legendre polynomials definition and found out, that there is a ratio of factorials (l - m)! / (l + m)! in normalization constant. For n = 7 (l + m)! can be up to 12!. I have to use float (IEEE-754 32-bit floating-point type), due to GPUs nature.
Now I think, that tgamma from C/C++ might be more appropriate, then naive calculation of factorial by definition. Even more: maybe there is a good (approximation) formula for ratio of gamma functions (of two big arguments).
Is there a good stable approach to calculate gamma function (for big positive integers) in shaders?
Surely I just can save a lookup table (matrix) for all the possible combinations of values in numerator and denominator, but I want to have alternative (space-efficient) approach.
I have simple code, which flags nodes with in region enclosed by cylinder. On implementing the code, the result is mild tilt of the cylinder observed case with 90 degrees
The actual issue:
The above algorithm is implemented in Fortran. The code checks for points in Cartesian grid if inside the cylinder. Following being the test case:
The cylinder makes an angle 90 degrees in the yz-plane with respect to y-axis. Therefore, the orientation vector $\vec{o}$ is (0, 1, 0).
Case 1:
Orientation vector is assigned directly with $\vec{o}=(0.0,1.0,0.0)$. This results in perfect cylinder with $\theta=90.$
Case 2:
Orientation vector is specified with intrinsic Fortran functions with double precision accuracy dsin and dcos with $\vec{o}=(0.0, \sin(\pi/2.0), \cos(\pi/2.0))$ with $\pi$ value assigned with more than 20 significant decimal points. The resulting cylinder results in mild tilt.
The highlighted region indicates the extra material due to tilt of cylinder with respect to Cartesian axes. I also tried architecture specific maximum precision "pi" value. This also results in same problem.
This shows like the actual angle made by cylinder is not 90 degrees. Can anyone suggest valid solution for this problem. I need to use the inbuilt trigonometric functions for arbitrary angles and looking for accurate cell flagging method.
Note: All operations are performed with double precision accuracy.
The actual function is below. rk is defined parameter with value 8
pure logical function in_particle(p,px,x)
type(md_particle_type),intent(in) :: p
real(kind=rk),intent(in) :: px(3),x(3)
real(kind=rk) :: r(3),rho(3),rop(2),ro2,rdiff,u
rop = particle_radii(p) ! (/R_orth,R_para/)
ro2 = rop(1)**2
rdiff = rop(2) - rop(1)
r = x-px
! Case 1:
! u = dot_product((/0.0_rk,-1.0_rk,0.0_rk/),r)
! rho = r-u*(/0.0_rk,-1.0_rk,0.0_rk/)
! Case 2:
u = dot_product((/0.0_rk,-dsin(pi/2.0_rk),dcos(pi/2.0_rk)/),r)
rho = r-u*(/0.0_rk,-dsin(pi/2.0_rk),dcos(pi/2.0_rk)/)
if((u.le.rdiff).and.(u.ge.-rdiff)) then
in_particle = dot_product(rho,rho) < ro2
else
in_particle = .false.
end if
end function in_particle
Note: The trigonometric operations are done inside the code to explain the problem better. However the original code reads the orientation in vector form from user. Then converts this information to quaternions for particle-particle collision operations. On converting quaternions back to orientation vector, this error is even more amplified. Even before the start of collision, the orientation of cylinder tends to be disoriented by 2 lattice cells.
cos(pi/2) is not necessarily going to give you exactly 0, no matter how exact you make the cos calculation, and no matter how many digits of pi you have, because:
pi, as an irrational number, will contain up to 1/2 ulp of error when represented as an FP number; and
sin and cos are not guaranteed by the IEEE-754 standard to be correctly rounded (or even implemented).
Now, sin(pi/2) is extremely likely to come out as 1 regardless of precision and FP architecture, simply because sin has such a low derivative around 1; with single-precision floats, it should come out to 1 if you're anywhere within about 3e-4 of the exact value of pi/2. The problematic call is the cos, which has lots of precision to play with around 0 and a derivative of about -1 in the neighborhood.
Still, we're talking about extremely small values here. I think what's really potentiating the problem here is the in/out test you're doing, combined with ordinary FP rounding rules. I would guess, in fact, that if you were to bias your test points by, say, a quarter of the grid quantum, you'd see all straight verticals in your voxelization (though it might not be symmetrical around the minor axes).
Another option would be to actually discard some precision from your sin/cos calculation before doing the dot product, effectively quantizing your axes.
Short answer: Create a table of sin and cos of common angles (0, pi/6, pi/4, pi/3, pi/2, pi and their multiples) and compute only for uncommon angles. The reason being that errors with uncommon angles will be tolerated by most people while errors with common angles will likely not be tolerated.
Explanation:
Because floating point computation is not exact (that is its nature), you sometime need a little bit of compromise between the accuracy and the readability of the code.
One way of doing that is to avoid to compute something that is known exactly. To do that, you can check the value of the angle and do the actual computation only if it is not an obvious angle. For example angle 0, 90, 180 and 270 degrees have obvious values of sin and cos. More generally, the cos and sin of common angles (0, pi/6, pi/4, pi/3, pi/2, pi and their multiples) are known exactly (even if they are irrational numbers).
Small amount of background: I am working on a converter that bridges between a map maker (Tiled) that outputs in XML, and an engine (Angel2D) that inputs lua tables. Most of this is straight forward
However, Tiled outputs in pixel offsets (integers of absolute values), while Angel2D inputs OpenGL units (floats of relative values); a conversion factor between these two is needed (for example, 32px = 1gu). Since OpenGL units are abstract, and the camera can zoom in or out if the objects are too small or big, the actual conversion factor isn't important; I could use a random number, and the user would merely have to zoom in or out.
But it would be best if the conversion factor was selected such that most numbers outputted were small and whole (or fractions of small whole numbers), because that makes it easier to work with (and the whole point of the OpenGL units is that they are easy to work with).
How would I find such a conversion factor reliably?
My first attempt was to use the smallest number given; this resulted in no fractions below 1, but often lead to lots of decimal places where the factors didn't line up.
Then I tried the mode of the sequence, which lead to the largest number of 1's possible, but often lead to very long floats for background images.
My current approach gets the GCD of the whole sequence, which, when it works, works great, but can easily be thrown off course by a single bad apple.
Note that while I could easily just pass the numbers I am given along, or pick some fixed factor, or use one of the conversions I specified above, I am looking for a method to reliably scale this list of integers to small, whole numbers or simple fractions, because this would most likely be unsurprising to the end user; this is not a one off conversion.
The end users tend to use 1.0 as their "base" for manipulations (because it's simple and obvious), so it would make more sense for the sizes of entities to cluster around this.
How about the 'largest number which is a factor of some % of the values'.
So the GCD is the 'largest number which is a factor of 100%' of the values.
You could pick the largest number which is a factor of, say 60%, of the values. I don't know if it's a technical term but it's sort of a 'rough GCD if not a precise GCD'.
You might have to do trail and error to find it (possibly a binary search). But you could also consider sampling. I.e. if you have a million data points, just pick 100 or 1000 at random to find a number which divides evenly into your goal percentage of the sample set and that might be good enough.
some crummy pseudo C.
/** return percent of values in sampleset for which x is a factor */
double percentIsFactorOf(x, sampleset) {
int factorCount = 0;
for (sample : sampleset)
if (sample%x == 0) factorCount++;
return (double)factorCount/sampleset.size;
}
/** find largest value which is a factor of goalPercentage of sampleset */
double findGoodEnoughCommonFactor(sampleset, goalPercentage) {
// slow n^2 alogrithm here - add binary search, sampling, or something smarter to improve if you like
int start = max(sampleset);
while (percentIsFactorOf(start, sampleset)< goalPercent)
start--;
}
If your input is in N^2 (two dimensional space over the field the natural numbers, i.e. non-negative integers), and you need to output to R^2 (two dimensional space over the field of real numbers, which in this case will be represented/approximated with a float).
Forget about scaling for a minute and let the output be of the same scale as the input. The first step is to realize that you the input coordinate <0, 0> does not represent <0, 0> in the output, it represents <0.5f, 0.5f>, the center of the pixel. Similarly the input <2, 3> becomes <2.5, 3.5>. In general the conversion can be performed like this:
float x_prime = (float)x + 0.5f;
float y_prime = (float)y + 0.5f;
Next, you probably want to pick a scaling factor, as you have mentioned. I've always found it useful to pick some real-world unit, usually meters. This way you can reason about other physical aspects of what you're trying to model, because they have units; i.e. speeds, accelerations, can now be in meters per second, or meters per second squared. How many meters tall or wide is the thing you are making? How many meters is a pixel? Pick something that makes sense, and then your formula becomes this:
float x_prime = ((float)x + 0.5f) * (float)units_per_pixel;
float y_prime = ((float)y + 0.5f) * (float)units_per_pixel;
You may not want all of your output coordinates to be in the positive quadrant; that is you may want the origin to be in the center of the object. If you do, you probably want your starting coordinate system's field to include negative integers, or provide some offset to the true center. Lets say you provide a pixel offset to the true center. Your conversion then becomes this:
float x_prime = ((float)x + 0.5f - (float)x_offset) * (float)units_per_pixel;
float y_prime = ((float)y + 0.5f - (float)y_offset) * (float)units_per_pixel;
Discarding your background information, I understand that the underlying problem you are trying to solve is the following:
Given a finite number of (positive) integers {x_1, ... x_N} find some (rational) number f such that all x_i / f are "nice".
If you insist on "nice" meaning integer and as small as possible, then f = GCD is the (mathematically) exact answer to this question. There just is nothing "better", if the GCD is 1, tough luck.
If "nice" is supposed to mean rational with small numerator and denominator, the question gets more interesting and depending on what "small" means, find your trade off between small absolute value (f = max) and small denominator (f = GCD). Notice, however, that small numerator/denominator does not mean small floating point representation, e.g. 1/3 = 0.333333... in base 10.
If you want short floating points, make sure that f is a power of your base, i.e. 10 or 2, depending on whether the numbers should look short to the user or actually have a reasonable machine representation. This is what is used for scientific representation of floating points, which might be the best answer to the question of how to make decimal numbers look nice in the first place.
I have no idea what you are talking about with "GL units".
At the most abstract level, GL has no unit. Vertex coordinates are in object-space initially, and go through half a dozen user-defined transformations before they eventually produce coordinates (window-space) with familiar units (pixels).
You are absolutely correct that even in window-space, coordinates are still not whole numbers. You would not want this in fact, or triangles would jump all over the place and generally would not resemble triangles if their vertex positions were snapped to integer pixel coordinates.
Instead, GL throws sub-pixel precision into the mix. Coordinates still ultimately wind up quantized to integer values, but each integer may cover 1/256th of a pixel given 8-bit sub-pixel precision. Pixel coverage testing is done at the sub-pixel level as you can see here:
(source: microsoft.com)
GL never attempts to find any conversion factor like you are discussing, it just splits the number space for pixel coordinates up into a fixed division between integral and fractional... fixed-point in other words. You might consider doing the same thing.
You can recycle the code you probably currently use for vector normalisation, normalise the values to fit within a max. value of 1; for example:
the formula for 3d normalisation of a vector works fine here
Get the length first:
|a| = sqrt((ax * ax) + (ay * ay) + (az * az))
Then you will need to divide the values of each component by the length:
x = ax/|a|
y = ay/|a|
z = az/|a|
Now all the x, y, z values will fall into the maxima of -1 to 1, the same as the OpenGL base coordinate system.
I know this does not generate the whole numbers system you would like, however it does give a smaller more unified feel to the range.
Say you want to limit the range to whole numbers only, simply use a function like the following, which will take the normalised value and convert it to an int-only range value:
#include <algorithm> // this allows the use of std::min
int maxVal = 256
unsigned char convertToSpread(float floatValueToConvert){
return (unsigned char) (std::min((maxVal-1), (int) (floatValueToConvert * maxVal)));
}
The above will spread your values between 0 and 255, simply increase the value of maxVal to what you need and change the unsigned char to a datatype which suits your needs.
So if you want 1024 values, simply change maxVal to 1024 and unsigned char tounsigned int`
Hope this helps, however, let me know if you need more information as well, and I can elaborate:)
I am implementing a conventional (that means not fast), separated Fourier transform for images. I know that in floating point a sum over one period of sin or cos in equally spaced samples is not perfectly zero, and that this is more a problem with the conventional transform than with the fast.
The algorithm works with 2D double arrays and is correct. The inverse is done inside (over a double sign flag and conditional check when using the asymmetric formula), not outside with conjugations. Results are nearly 100% like expected, so its a question about details:
When I perform a forward transform, save logarithmed magnitude and angle to images, reload them, and do an inverse transform, I experience different types of rounding errors with different types of implemented formulas:
F(u,v) = Sum(x=0->M-1) Sum(y=0->N-1) f(x,y) * e^(-i*2*pi*u*x/M) * e^(-i*2*pi*v*y/N)
f(x,y) = 1/M*N * (like above)
F(u,v) = 1/sqrt(M*N) * (like above)
f(x,y) = 1/sqrt(M*N) * (like above)
So the first one is the asymmetric transform pair, the second one the symmetric. With the asymmetric pair, the rounding errors are more in the bright spots of the image (some pixel are rounded slightly outside value range (e.g. 256)). With the symmetric pair, the errors are more in the constant mid-range area of the image (no exceeding of value range!). In total, it seems that the symmetric pair produces a bit more rounding errors.
Then, it also depends of the input: when image stored in [0,255] the rounding errors are other than when in [0,1].
So my question: how should an optimal, most accurate algorithm be implemented (theoretically, no code): asymmetric/symmetric pair? value range of input in [0,255] or [0,1]? How linearly upscaling result before saving logarithmed one to file?
Edit:
my algorithm simply computes the separated asymmetric or symmetric DFT formula. Factors are decomposed into real and imaginary part using Eulers identity, then expanded and sumed up separately as real and imaginary part:
sum_re += f_re * cos(-mode*pi*((2.0*v*y)/N)) - // mode = 1 for forward, -1
f_im * sin(-mode*pi*((2.0*v*y)/N)); // for inverse transform
// sum_im permutated in the known way and + instead of -
This value grouping indside cos and sin should give in my eyes the lowest rounding error (compared to e.g. cos(-mode*2*pi*v*y/N)), because not multiplicating/dividing significantly false rounded transcedental pi several times, but only one time. Isn't it?
The scale factor 1/M*N or 1/sqrt(M*N) is applied separately after each separation outside of the innermost sum. Better inside? Or combined completely at the end of both separations?
For some deeper analysis, I have quitted the input->transform->save-to-file->read-from-file->transform^-1->output workflow and chosen to compare directly in double-precision: input->transform->transform^-1->output.
Here the results for an real life 704x528 8-bit image (delta = max absolute difference between real part of input and output):
with input inside [0,1] and asymmetric formula: delta = 2.6609e-13 (corresponds to 6.785295e-11 for [0,255] range).
with input insde [0,1] and symmetric formula: delta = 2.65232e-13 (corresponds to 6.763416e-11 for [0,255] range).
with input inside [0,255] and asymmetric formula: delta = 6.74731e-11.
with input inside [0,255] and symmetric formula: delta = 6.7871e-11.
These are no real significant differences, however, the full ranged input with the asymmetric transform performs best. I think the values may get worse with 16-bit input.
But in general I see, that my experienced issues are more because of scaling-before-saving-to-file (or inverse) rounding errors, than real transformation rounding errors.
However, I am curious: what is the most used implementation of the Fourier transform: the symmetric or asymmetric? Which value range is in general used for the input: [0,1] or [0,255]? And usual shown spectra in log scale: e.g. [0,M*N] after asymmetric transform of [0,1] input is directly log-scaled to [0,255] or before linearly scaled to [0,255*M*N]?
The errors you report are tiny, normal, and generally can be ignored. Simply scale your results and clamp any results outside the target interval to the endpoints.
In library implementations of FFTs (that is, FFT routines written to be used generally by diverse applications, not custom designed for a single application), little regard is given to scaling; the routine often simply returns data that has been naturally scaled by the arithmetic, with no additional multiplication operations used to adjust the scale. This is because the scale is often either irrelevant for the application (e.g., finding the frequencies with the largest energies works no matter what the scale is) or that the scale may be distributed through multiply operations and performed just once (e.g., instead of scaling in a forward transform and in an inverse transform, the application can get the same effect by explicitly scaling just once). So, since scaling is often not needed, there is no point in including it in a library routine.
The target interval that data are scaled to depends on the application.
Regarding the question on what transform to use (logarithmic or linear) for showing spectra, I cannot advise; I do not work with visualizing spectra.
Scaling causes roundoff errors. Hence, solution 1 (which scales once) is better than solution 2 (which does it twice). Similarly, scaling once after summation is better than scaling everything before summation.
Do you run y from 0 to 2*N or from -N to +N ? Mathematically it's the same, but you have an extra bit of precision in the latter case.
BTW, what's mode doing in cos(-mode * stuff) ?
I am writing a physics simulator in C++ and am concerned about robustness. I've read that catastrophic cancellation can occur in floating point arithmetic when the difference of two numbers of almost equal magnitude is calculated.
It occurred to me that this may happen in the simulator when the dot product of two almost orthogonal vectors is calculated.
However, the references I have looked at only discuss solving the problem by rewriting the equation concerned (eg the quadratic formula can be rewritten to eliminate the problem) - but this doesn't seem to apply when calculating a dot product?
I guess I'd be interested to know if this is typically an issue in physics engines and how it is addressed.
One common trick is to make the accumulator variable be a type with higher precision than the vectors itself.
Alternatively, one can use Kahan summation when summing the terms.
Another approach is to use various blocked dot product algorithms instead of the canonical algorithm.
One can of course combine both the above approaches.
Note that the above is wrt general error behavior for dot products, not specifically catastrophic cancellation.
You say in a comment that you have to calculate x1*x2 + y1*y2, where all variables are floats. So if you do the calculation in double-precision, you lose no accuracy at all, because double-precision has more than twice as many bits of precision as float (assuming your target uses IEEE-754).
Specifically: let xx, yy be the real numbers represented by the float variables x, y. Let xxyy be their product, and let xy be the result of the double-precision multiplication x * y. Then in all cases, xxyy is the real number represented by xy.