Optimizing sphere fold function (GLSL) - opengl

There is a sphere fold function, which transforms space based on the distance from origin. There are two constant parameters: fR2 and mR2. The function looks like this:
const float fR2 = 1.0;
const float mR2 = 0.25;
vec3 s_fold(in vec3 v) {
float mag = dot(v, v);
if (mag < mR2)
{
v = v * fR2 / mR2;
}
else if (mag < fR2)
{
v = v * fR2 / mag;
}
return v;
}
It works, but because of the if-else branch, it runs slowly. It is possible to get rid of the branching, if I use the step function:
float a = step(mR2 , mag);
float b = step(fR2 , mag);
float sc = dot( vec3(
1.0-a,
a*(1.0-b),
b
), vec3(
fR2 / mR2,
fR2 / mag,
1.0)
);
return sc*v;
It is now a bit faster, but I would like to optimize it further. I've found a one-line solution, but it gives me a different result:
return v*clamp(max(mR2/mag,mR2),0.0,fR2);
It is unclear to me, how is it possible to calculate it with a single clamp.

Your if-else branch is a function like res= v * F where F is one of three possible values: a constant (fR2/mR2), a (hyperbolic) variable (fR2/mag), and a constant again (1.0).
clamp(x, minVal, maxVal) does this same logic (const-var-const). So you can write:
res = v * clamp(fR2/mag, 1.0, fR2/mR2); //with fR2 >= mR2
Let's write it in other way:
res = v * fR2 * clamp(1.0/mag, 1.0/fR2, 1.0/mR2);
We can avoid two divisions because 1/mR2 <= 1/fR2 implies fR2 >= mR2
res = v * fR2 / clamp(mag, mR2, fR2);
The final code is
const float fR2 = 1.0;
const float mR2 = 0.25;
vec3 s_fold(in vec3 v) {
return v * fR2 / clamp(dot(v, v), mR2, fR2);
}
Both your solution and the posted "liner" suffer a potential division by zero when mag=0

Related

How to port and debug math functions into shader code?

I'm trying to learn to apply the techniques to sdf render a landscape from Painting a Landscape with Maths. In the video, Quilez shows this formula:
But I seem to be incorrectly implementing it into glsl. I'm very new to
shaders. This is what I have so far:
#define PI 3.141593
// a_ij = 2 {uv(u+v)} - 1
float uv_to_cooef(in vec2 uv) {
return 2.0 * fract(uv.x * uv.y * (uv.x + uv.y)) - 1.0;
}
// S(a,b,x) = 3λ^2 - 2λ^3
float smoothstep01(in float x) {
return smoothstep(0.0, 1.0, x);
}
float terrain(in vec2 p) {
vec2 ij = floor(p);
vec2 i = vec2(1.0, 0.0);
vec2 j = vec2(0.0, 1.0);
float s = 500.0;
float a = uv_to_cooef(s * fract(ij / PI));
float b = uv_to_cooef(s * fract((ij + i) / PI));
float c = uv_to_cooef(s * fract((ij + j) / PI));
float d = uv_to_cooef(s * fract((ij + i + j) / PI));
float s_xi = smoothstep01(p.x - ij.x);
float s_yj = smoothstep01(p.y - ij.y);
return a
+ (b - a) * s_xi
+ (c - a) * s_yj;
+ (a - b - c + d) * s_xi * s_yj;
}
// Use raymarching setup in https://www.shadertoy.com/view/tdS3DG but change map to:
vec2 map( in vec3 p, int id ) {
return vec2(p.y - terrain(p.xz), 1.0);
}
I get discontinous junk:
What are some good ways to debug converting math functions to glsl?
Replacing map with a simpler continuous function produces nice results:
vec2 map( in vec3 p, int id ) {
float d2 = p.y + sin(p.x / 3.0) * cos(p.y / 7.0);
return vec2(d2,2.0);
}
So I think that means my function isn't continuous, but how to debug that? Break it into 2d functions and graph it on desmos?
The completed sdf scene is on shadertoy, but I want to understand how to get from math to that.
Turns out my problem was an extra semicolon and glsl didn't complain about the pointless statement on the next line.
return a
+ (b - a) * s_xi
+ (c - a) * s_yj; // bad semicolon
+ (a - b - c + d) * s_xi * s_yj;
The steps I took to try to debug:
Visualize the function in another format. I ported the function to desmos and it was discontinuous.
I fixed my desmos (the definition of a(i,j) was wrong), but couldn't see how the fixed version was different from my glsl code. My next step was going to be to reduce the function to a simpler continuous one in desmos and then apply the same changes to my glsl code.
Instead, I noodled around with other code and noticed the semicolon by accident. So luck?

Efficient floating point scaling in C++

I'm working on my fast (and accurate) sin implementation in C++, and I have a problem regarding the efficient angle scaling into the +- pi/2 range.
My sin function for +-pi/2 using Taylor series is the following
(Note: FLOAT is a macro expanded to float or double just for the benchmark)
/**
* Sin for 'small' angles, accurate on [-pi/2, pi/2], fairly accurate on [-pi, pi]
*/
// To switch between float and double
#define FLOAT float
FLOAT
my_sin_small(FLOAT x)
{
constexpr FLOAT C1 = 1. / (7. * 6. * 5. * 4. * 3. * 2.);
constexpr FLOAT C2 = -1. / (5. * 4. * 3. * 2.);
constexpr FLOAT C3 = 1. / (3. * 2.);
constexpr FLOAT C4 = -1.;
// Correction for sin(pi/2) = 1, due to the ignored taylor terms
constexpr FLOAT corr = -1. / 0.9998431013994987;
const FLOAT x2 = x * x;
return corr * x * (x2 * (x2 * (x2 * C1 + C2) + C3) + C4);
}
So far so good... The problem comes when I try to scale an arbitrary angle into the +-pi/2 range. My current solution is:
FLOAT
my_sin(FLOAT x)
{
constexpr FLOAT pi = 3.141592653589793238462;
constexpr FLOAT rpi = 1 / pi;
// convert to +-pi/2 range
int n = std::nearbyint(x * rpi);
FLOAT xbar = (n * pi - x) * (2 * (n & 1) - 1);
// (2 * (n % 2) - 1) is a sign correction (see below)
return my_sin_small(xbar);
};
I made a benchmark, and I'm losing a lot for the +-pi/2 scaling.
Tricking with int(angle/pi + 0.5) is a nope since it is limited to the int precision, also requires +- branching, and i try to avoid branches...
What should I try to improve the performance for this scaling? I'm out of ideas.
Benchmark results for float. (In the benchmark the angle could be out of the validity range for my_sin_small, but for the bench I don't care about that...):
Benchmark results for double.
Sign correction for xbar in my_sin():
Algo accuracy compared to python sin() function:
Candidate improvements
Convert the radians x to rotations by dividing by 2*pi.
Retain only the fraction so we have an angle (-1.0 ... 1.0). This simplifies the OP's modulo step to a simple "drop the whole number" step instead. Going forward with different angle units simply involves a co-efficient set change. No need to scale back to radians.
For positive values, subtract 0.5 so we have (-0.5 ... 0.5) and then flip the sign. This centers the possible values about 0.0 and makes for better convergence of the approximating polynomial as compared to the math sine function. For negative values - see below.
Call my_sin_small1() that uses this (-0.5 ... 0.5) rotations range rather than [-pi ... +pi] radians.
In my_sin_small1(), fold constants together to drop the corr * step.
Rather than use the truncated Taylor's series, use a more optimal set. IMO, this will provide better answers, especially near +/-pi.
Notes: No int to/from float code. With more analysis, possible to get a better set of coefficients that fix my_sin(+/-pi) closer to 0.0. This is just a quick set of code to demo less FP steps and good potential results.
C like code for OP to port to C++
FLOAT my_sin_small1(FLOAT x) {
static const FLOAT A1 = -5.64744881E+01;
static const FLOAT A2 = +7.81017968E+01;
static const FLOAT A3 = -4.11145353E+01;
static const FLOAT A4 = +6.27923581E+00;
const FLOAT x2 = x * x;
return x * (x2 * (x2 * (x2 * A1 + A2) + A3) + A4);
}
FLOAT my_sin1(FLOAT x) {
static const FLOAT pi = 3.141592653589793238462;
static const FLOAT pi2i = 1/(pi * 2);
x *= pi2i;
FLOAT xfraction = 0.5f - (x - truncf(x));
return my_sin_small1(xfraction);
}
For negative values, use -my_sin1(-x) or like code to flip the sign - or add 0.5 in the above minus 0.5 step.
Test
#include <math.h>
#include <stdio.h>
int main(void) {
for (int d = 0; d <= 360; d += 20) {
FLOAT x = d / 180.0 * M_PI;
FLOAT y = my_sin1(x);
printf("%12.6f %11.8f %11.8f\n", x, sin(x), y);
}
}
Output
0.000000 0.00000000 -0.00022483
0.349066 0.34202013 0.34221691
0.698132 0.64278759 0.64255589
1.047198 0.86602542 0.86590189
1.396263 0.98480775 0.98496443
1.745329 0.98480775 0.98501128
2.094395 0.86602537 0.86603642
2.443461 0.64278762 0.64260530
2.792527 0.34202022 0.34183803
3.141593 -0.00000009 0.00000000
3.490659 -0.34202016 -0.34183764
3.839724 -0.64278757 -0.64260519
4.188790 -0.86602546 -0.86603653
4.537856 -0.98480776 -0.98501128
4.886922 -0.98480776 -0.98496443
5.235988 -0.86602545 -0.86590189
5.585053 -0.64278773 -0.64255613
5.934119 -0.34202036 -0.34221727
6.283185 0.00000017 -0.00022483
Alternate code below makes for better results near 0.0, yet might cost a tad more time. OP seems more inclined to speed.
FLOAT xfraction = 0.5f - (x - truncf(x));
// vs.
FLOAT xfraction = x - truncf(x);
if (x >= 0.5f) x -= 1.0f;
[Edit]
Below is a better set with about 10% reduced error.
-56.0833765f
77.92947047f
-41.0936875f
6.278635918f
Yet another approach:
Spend more time (code) to reduce the range to ±pi/4 (±45 degrees), then possible to use only 3 or 2 terms of a polynomial that is like the usually Taylors series.
float sin_quick_small(float x) {
const float x2 = x * x;
#if 0
// max error about 7e-7
static const FLOAT A2 = +0.00811656036940792f;
static const FLOAT A3 = -0.166597759850666f;
static const FLOAT A4 = +0.999994132743861f;
return x * (x2 * (x2 * A2 + A3) + A4);
#else
// max error about 0.00016
static const FLOAT A3 = -0.160343346851626f;
static const FLOAT A4 = +0.999031566686144f;
return x * (x2 * A3 + A4);
#endif
}
float cos_quick_small(float x) {
return cosf(x); // TBD code.
}
float sin_quick(float x) {
if (x < 0.0) {
return -sin_quick(-x);
}
int quo;
float x90 = remquof(fabsf(x), 3.141592653589793238462f / 2, &quo);
switch (quo % 4) {
case 0:
return sin_quick_small(x90);
case 1:
return cos_quick_small(x90);
case 2:
return sin_quick_small(-x90);
case 3:
return -cos_quick_small(x90);
}
return 0.0;
}
int main() {
float max_x = 0.0;
float max_error = 0.0;
for (int d = -45; d <= 45; d += 1) {
FLOAT x = d / 180.0 * M_PI;
FLOAT y = sin_quick(x);
double err = fabs(y - sin(x));
if (err > max_error) {
max_x = x;
max_error = err;
}
printf("%12.6f %11.8f %11.8f err:%11.8f\n", x, sin(x), y, err);
}
printf("x:%.6f err:%.6f\n", max_x, max_error);
return 0;
}

GLSL for Minecraft: How I set a position for clouds?

I'm using gl_FragCoord.xy, but the clouds are locked to the player eye or camera position. So they are moving with you, if you look back. But I want the clouds to have they own static position. I tried it with worldposition = gbufferModelViewInverse * fragposition but with a ugly result. The clouds are stretching if I looked to the sunrise or sunset and had a mirror effect. So they had the same sample like at sunrise if I looks at the sunset.
Now my question: How can I make it so that if the player look at left, that the clouds are not moving with the player eye? It's hard to explain it, so I tried to create a simple Picture:
Here is a part from my code:
float hash( float n )
{
return fract(sin(n)*43758.5453);
}
float noise( in vec2 x )
{
vec2 p = floor(x);
vec2 f = fract(x);
f = f*f*(3.0-2.0*f);
float n = p.x + p.y*57.0;
float res = mix(mix( hash(n+ 0.0), hash(n+ 1.0),f.x), mix( hash(n+ 57.0), hash(n+ 58.0),f.x),f.y);
return res;
}
float fbm( vec2 p )
{
float f = 0.0;
f += 0.50000*noise( p ); p = p*2.02;
f += 0.25000*noise( p ); p = p*2.03;
f += 0.12500*noise( p ); p = p*2.01;
f += 0.06250*noise( p ); p = p*2.04;
f += 0.03125*noise( p );
return f/0.984375;
}
#define CLOUD_COVER 0.55
#define CLOUD_SHARPNESS 0.015
// Wind - Used to animate the clouds
vec2 wind_vec = vec2(0.001 + frameTimeCounter*0.005, 0.003 + frameTimeCounter * 0.005);
// Set up domain
vec2 q = (gl_FragCoord.xy / 2024);
vec2 p = -1.0 + 3.0 * q + wind_vec;
// Create noise using fBm
float f = fbm( 4.0*p );
float cover = CLOUD_COVER;
float sharpness = CLOUD_SHARPNESS;
float c = f - (1.0 - cover);
if ( c < 0.0 )
c = 0.0;
f = 1.0 - (pow(sharpness, c));
color += f/15*TimeDay;
color += vec3(0.03, 0.19, 0.99)*f/1000*TimeMidnight;

Rotate a vector about another vector

I am writing a 3d vector class for OpenGL. How do I rotate a vector v1 about another vector v2 by an angle A?
You may find quaternions to be a more elegant and efficient solution.
After seeing this answer bumped recently, I though I'd provide a more robust answer. One that can be used without necessarily understanding the full mathematical implications of quaternions. I'm going to assume (given the C++ tag) that you have something like a Vector3 class with 'obvious' functions like inner, cross, and *= scalar operators, etc...
#include <cfloat>
#include <cmath>
...
void make_quat (float quat[4], const Vector3 & v2, float angle)
{
// BTW: there's no reason you can't use 'doubles' for angle, etc.
// there's not much point in applying a rotation outside of [-PI, +PI];
// as that covers the practical 2.PI range.
// any time graphics / floating point overlap, we have to think hard
// about degenerate cases that can arise quite naturally (think of
// pathological cancellation errors that are *possible* in seemingly
// benign operations like inner products - and other running sums).
Vector3 axis (v2);
float rl = sqrt(inner(axis, axis));
if (rl < FLT_EPSILON) // we'll handle this as no rotation:
{
quat[0] = 0.0, quat[1] = 0.0, quat[2] = 0.0, quat[3] = 1.0;
return; // the 'identity' unit quaternion.
}
float ca = cos(angle);
// we know a maths library is never going to yield a value outside
// of [-1.0, +1.0] right? Well, maybe we're using something else -
// like an approximating polynomial, or a faster hack that's a little
// rough 'around the edge' cases? let's *ensure* a clamped range:
ca = (ca < -1.0f) ? -1.0f : ((ca > +1.0f) ? +1.0f : ca);
// now we find cos / sin of a half-angle. we can use a faster identity
// for this, secure in the knowledge that 'sqrt' will be valid....
float cq = sqrt((1.0f + ca) / 2.0f); // cos(acos(ca) / 2.0);
float sq = sqrt((1.0f - ca) / 2.0f); // sin(acos(ca) / 2.0);
axis *= sq / rl; // i.e., scaling each element, and finally:
quat[0] = axis[0], quat[1] = axis[1], quat[2] = axis[2], quat[3] = cq;
}
Thus float quat[4] holds a unit quaternion that represents the axis and angle of rotation, given the original arguments (, v2, A).
Here's a routine for quaternion multiplication. SSE/SIMD can probably speed this up, but complicated transform & lighting are typically GPU-driven in most scenarios. If you remember complex number multiplication as a little weird, quaternion multiplication is more so. Complex number multiplication is a commutative operation: a*b = b*a. Quaternions don't even preserve this property, i.e., q*p != p*q :
static inline void
qmul (float r[4], const float q[4], const float p[4])
{
// quaternion multiplication: r = q * p
float w0 = q[3], w1 = p[3];
float x0 = q[0], x1 = p[0];
float y0 = q[1], y1 = p[1];
float z0 = q[2], z1 = p[2];
r[3] = w0 * w1 - x0 * x1 - y0 * y1 - z0 * z1;
r[0] = w0 * x1 + x0 * w1 + y0 * z1 - z0 * y1;
r[1] = w0 * y1 + y0 * w1 + z0 * x1 - x0 * z1;
r[2] = w0 * z1 + z0 * w1 + x0 * y1 - y0 * x1;
}
Finally, rotating a 3D 'vector' v (or if you prefer, the 'point' v that the question has named v1, represented as a vector), using the quaternion: float q[4] has a somewhat strange formula: v' = q * v * conjugate(q). Quaternions have conjugates, similar to complex numbers. Here's the routine:
static inline void
qrot (float v[3], const float q[4])
{
// 3D vector rotation: v = q * v * conj(q)
float r[4], p[4];
r[0] = + v[0], r[1] = + v[1], r[2] = + v[2], r[3] = +0.0;
glView__qmul(r, q, r);
p[0] = - q[0], p[1] = - q[1], p[2] = - q[2], p[3] = q[3];
glView__qmul(r, r, p);
v[0] = r[0], v[1] = r[1], v[2] = r[2];
}
Putting it all together. Obviously you can make use of the static keyword where appropriate. Modern optimising compilers may ignore the inline hint depending on their own code generation heuristics. But let's just concentrate on correctness for now:
How do I rotate a vector v1 about another vector v2 by an angle A?
Assuming some sort of Vector3 class, and (A) in radians, we want the quaternion representing the rotation by the angle (A) about the axis v2, and we want to apply that quaternion rotation to v1 for the result:
float q[4]; // we want to find the unit quaternion for `v2` and `A`...
make_quat(q, v2, A);
// what about `v1`? can we access elements with `operator [] (int)` (?)
// if so, let's assume the memory: `v1[0] .. v1[2]` is contiguous.
// you can figure out how you want to store and manage your Vector3 class.
qrot(& v1[0], q);
// `v1` has been rotated by `(A)` radians about the direction vector `v2` ...
Is this the sort of thing that folks would like to see expanded upon in the Beta Documentation site? I'm not altogether clear on its requirements, expected rigour, etc.
This may prove useful:
double c = cos(A);
double s = sin(A);
double C = 1.0 - c;
double Q[3][3];
Q[0][0] = v2[0] * v2[0] * C + c;
Q[0][1] = v2[1] * v2[0] * C + v2[2] * s;
Q[0][2] = v2[2] * v2[0] * C - v2[1] * s;
Q[1][0] = v2[1] * v2[0] * C - v2[2] * s;
Q[1][1] = v2[1] * v2[1] * C + c;
Q[1][2] = v2[2] * v2[1] * C + v2[0] * s;
Q[2][0] = v2[0] * v2[2] * C + v2[1] * s;
Q[2][1] = v2[2] * v2[1] * C - v2[0] * s;
Q[2][2] = v2[2] * v2[2] * C + c;
v1[0] = v1[0] * Q[0][0] + v1[0] * Q[0][1] + v1[0] * Q[0][2];
v1[1] = v1[1] * Q[1][0] + v1[1] * Q[1][1] + v1[1] * Q[1][2];
v1[2] = v1[2] * Q[2][0] + v1[2] * Q[2][1] + v1[2] * Q[2][2];
Use a 3D rotation matrix.
The easiest-to-understand way would be rotating the coordinate axis so that vector v2 aligns with the Z axis, then rotate by A around the Z axis, and rotate back so that the Z axis aligns with v2.
When you have written down the rotation matrices for the three operations, you'll probably notice that you apply three matrices after each other. To reach the same effect, you can multiply the three matrices.
I found this here:
http://steve.hollasch.net/cgindex/math/rotvec.html
let
[v] = [vx, vy, vz] the vector to be rotated.
[l] = [lx, ly, lz] the vector about rotation
| 1 0 0|
[i] = | 0 1 0| the identity matrix
| 0 0 1|
| 0 lz -ly |
[L] = | -lz 0 lx |
| ly -lx 0 |
d = sqrt(lx*lx + ly*ly + lz*lz)
a the angle of rotation
then
matrix operations gives:
[v] = [v]x{[i] + sin(a)/d*[L] + ((1 - cos(a))/(d*d)*([L]x[L]))}
I wrote my own Matrix3 class and Vector3Library that implemented this vector rotation. It works absolutely perfectly. I use it to avoid drawing models outside the field of view of the camera.
I suppose this is the "use a 3d rotation matrix" approach. I took a quick look at quaternions, but have never used them, so stuck to something I could wrap my head around.

glRotate divide-by-zero

I think I understand why calling glRotate(#, 0, 0, 0) results in a divide-by-zero. The rotation vector, a, is normalized: a' = a/|a| = a/0
Is that the only situation glRotate could result in a divide-by-zero? Yes, I know glRotate is deprecated. Yes, I know the matrix is on the OpenGL manual. No, I don't know linear algebra enough to confidently answer the question from the matrix. Yes, I think it would help. Yes, I asked this already in #opengl (can you tell?). And no, I didn't get an answer.
I would say yes. And I would say that you are right about the normalization step as well. The matrix shown in the OpenGL manual only consists of multiplications. And multiplying a vector would result into the same. Of course, it would do strange things if you result in a vector of (0,0,0). OpenGL states in the same manual that |x,y,z|=1 (or OpenGL will normalize).
So IF it wouldn't normalize, you would end up with a very empty matrix of:
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 1
Which will implode your object in the strangest ways. So DON'T call this function with a zero-vector. If you would like to, tell me why.
And I recommend using a library like GLM to do your matrix calculations if it gets too complicated for some simple glRotates.
Why should it divide by zero when you can check for that?:
/**
* Generate a 4x4 transformation matrix from glRotate parameters, and
* post-multiply the input matrix by it.
*
* \author
* This function was contributed by Erich Boleyn (erich#uruk.org).
* Optimizations contributed by Rudolf Opalla (rudi#khm.de).
*/
void
_math_matrix_rotate( GLmatrix *mat,
GLfloat angle, GLfloat x, GLfloat y, GLfloat z )
{
GLfloat xx, yy, zz, xy, yz, zx, xs, ys, zs, one_c, s, c;
GLfloat m[16];
GLboolean optimized;
s = (GLfloat) sin( angle * DEG2RAD );
c = (GLfloat) cos( angle * DEG2RAD );
memcpy(m, Identity, sizeof(GLfloat)*16);
optimized = GL_FALSE;
#define M(row,col) m[col*4+row]
if (x == 0.0F) {
if (y == 0.0F) {
if (z != 0.0F) {
optimized = GL_TRUE;
/* rotate only around z-axis */
M(0,0) = c;
M(1,1) = c;
if (z < 0.0F) {
M(0,1) = s;
M(1,0) = -s;
}
else {
M(0,1) = -s;
M(1,0) = s;
}
}
}
else if (z == 0.0F) {
optimized = GL_TRUE;
/* rotate only around y-axis */
M(0,0) = c;
M(2,2) = c;
if (y < 0.0F) {
M(0,2) = -s;
M(2,0) = s;
}
else {
M(0,2) = s;
M(2,0) = -s;
}
}
}
else if (y == 0.0F) {
if (z == 0.0F) {
optimized = GL_TRUE;
/* rotate only around x-axis */
M(1,1) = c;
M(2,2) = c;
if (x < 0.0F) {
M(1,2) = s;
M(2,1) = -s;
}
else {
M(1,2) = -s;
M(2,1) = s;
}
}
}
if (!optimized) {
const GLfloat mag = SQRTF(x * x + y * y + z * z);
if (mag <= 1.0e-4) {
/* no rotation, leave mat as-is */
return;
}
x /= mag;
y /= mag;
z /= mag;
/*
* Arbitrary axis rotation matrix.
*
* This is composed of 5 matrices, Rz, Ry, T, Ry', Rz', multiplied
* like so: Rz * Ry * T * Ry' * Rz'. T is the final rotation
* (which is about the X-axis), and the two composite transforms
* Ry' * Rz' and Rz * Ry are (respectively) the rotations necessary
* from the arbitrary axis to the X-axis then back. They are
* all elementary rotations.
*
* Rz' is a rotation about the Z-axis, to bring the axis vector
* into the x-z plane. Then Ry' is applied, rotating about the
* Y-axis to bring the axis vector parallel with the X-axis. The
* rotation about the X-axis is then performed. Ry and Rz are
* simply the respective inverse transforms to bring the arbitrary
* axis back to its original orientation. The first transforms
* Rz' and Ry' are considered inverses, since the data from the
* arbitrary axis gives you info on how to get to it, not how
* to get away from it, and an inverse must be applied.
*
* The basic calculation used is to recognize that the arbitrary
* axis vector (x, y, z), since it is of unit length, actually
* represents the sines and cosines of the angles to rotate the
* X-axis to the same orientation, with theta being the angle about
* Z and phi the angle about Y (in the order described above)
* as follows:
*
* cos ( theta ) = x / sqrt ( 1 - z^2 )
* sin ( theta ) = y / sqrt ( 1 - z^2 )
*
* cos ( phi ) = sqrt ( 1 - z^2 )
* sin ( phi ) = z
*
* Note that cos ( phi ) can further be inserted to the above
* formulas:
*
* cos ( theta ) = x / cos ( phi )
* sin ( theta ) = y / sin ( phi )
*
* ...etc. Because of those relations and the standard trigonometric
* relations, it is pssible to reduce the transforms down to what
* is used below. It may be that any primary axis chosen will give the
* same results (modulo a sign convention) using thie method.
*
* Particularly nice is to notice that all divisions that might
* have caused trouble when parallel to certain planes or
* axis go away with care paid to reducing the expressions.
* After checking, it does perform correctly under all cases, since
* in all the cases of division where the denominator would have
* been zero, the numerator would have been zero as well, giving
* the expected result.
*/
xx = x * x;
yy = y * y;
zz = z * z;
xy = x * y;
yz = y * z;
zx = z * x;
xs = x * s;
ys = y * s;
zs = z * s;
one_c = 1.0F - c;
/* We already hold the identity-matrix so we can skip some statements */
M(0,0) = (one_c * xx) + c;
M(0,1) = (one_c * xy) - zs;
M(0,2) = (one_c * zx) + ys;
/* M(0,3) = 0.0F; */
M(1,0) = (one_c * xy) + zs;
M(1,1) = (one_c * yy) + c;
M(1,2) = (one_c * yz) - xs;
/* M(1,3) = 0.0F; */
M(2,0) = (one_c * zx) - ys;
M(2,1) = (one_c * yz) + xs;
M(2,2) = (one_c * zz) + c;
/* M(2,3) = 0.0F; */
/*
M(3,0) = 0.0F;
M(3,1) = 0.0F;
M(3,2) = 0.0F;
M(3,3) = 1.0F;
*/
}
#undef M
matrix_multf( mat, m, MAT_FLAG_ROTATION );
}