Faster quaternion vector multiplication doesn't work - c++

I need a faster quaternion-vector multiplication routine for my math library. Right now I'm using the canonical v' = qv(q^-1), which produces the same result as multiplying the vector by a matrix made from the quaternion, so I'm confident in it's correctness.
So far I've implemented 3 alternative "faster" methods:
#1, I have no idea where I got this one from:
v' = (q.xyz * 2 * dot(q.xyz, v)) + (v * (q.w*q.w - dot(q.xyz, q.zyx))) + (cross(q.xyz, v) * q.w * w)
Implemented as:
vec3 rotateVector(const quat& q, const vec3& v)
{
vec3 u(q.x, q.y, q.z);
float s = q.w;
return vec3(u * 2.0f * vec3::dot(u, v))
+ (v * (s*s - vec3::dot(u, u)))
+ (vec3::cross(u, v) * s * 2.0f);
}
#2, courtesy of this fine blog
t = 2 * cross(q.xyz, v);
v' = v + q.w * t + cross(q.xyz, t);
Implemented as:
__m128 rotateVector(__m128 q, __m128 v)
{
__m128 temp = _mm_mul_ps(vec4::cross(q, v), _mm_set1_ps(2.0f));
return _mm_add_ps(
_mm_add_ps(v, _mm_mul_ps(_mm_shuffle_ps(q, q, _MM_SHUFFLE(3, 3, 3, 3)), temp)),
vec4::cross(q, temp));
}
And #3, from numerous sources,
v' = v + 2.0 * cross(cross(v, q.xyz) + q.w * v, q.xyz);
implemented as:
__m128 rotateVector(__m128 q, __m128 v)
{
//return v + 2.0 * cross(cross(v, q.xyz) + q.w * v, q.xyz);
return _mm_add_ps(v,
_mm_mul_ps(_mm_set1_ps(2.0f),
vec4::cross(
_mm_add_ps(
_mm_mul_ps(_mm_shuffle_ps(q, q, _MM_SHUFFLE(3, 3, 3, 3)), v),
vec4::cross(v, q)),
q)));
}
All 3 of these produce incorrect results. I have, however, noticed some interesting patterns. First of all, #1 and #2 produce the same result. #3 produces the same result that I get from multiplying the vector by a derived matrix if said matrix is transposed (I discovered this by accident, previously my quat-to-matrix code assumed row-major matrices, which was incorrect).
The data storage of my quaternions are defined as:
union
{
__m128 data;
struct { float x, y, z, w; };
float f[4];
};
Are my implementations flawed, or am I missing something here?

Main issue, if you want to rotate the 3d vector by quaternion, you require to calculate all 9 scalars of rotation matrix. In your examples, calculation of rotation matrix is IMPLICIT. The order of calculation can be not optimal.
If you generate 3x3 matrix from quaternion and multiply vector, you should have same number of arithmetic operations (#see code at bottom).
What i recommend.
Try to generate matrix 3x3 and multiply your vector, measure the speed and compare with previous.
Analyze the explicit solution, and try to optimize for custom architecture.
try to implement alternative quaternion multiplication, and derived multiplication from equation q*v*q'.
//============
alternative multiplication pseudocode
/**
alternative way of quaternion multiplication,
can speedup multiplication for some systems (PDA for example)
http://mathforum.org/library/drmath/view/51464.html
http://www.lboro.ac.uk/departments/ma/gallery/quat/src/quat.ps
in provided code by url's many bugs, have to be rewriten.
*/
inline xxquaternion mul_alt( const xxquaternion& q) const {
float t0 = (x-y)*(q.y-q.x);
float t1 = (w+z)*(q.w+q.z);
float t2 = (w-z)*(q.y+q.x);
float t3 = (x+y)*(q.w-q.z);
float t4 = (x-z)*(q.z-q.y);
float t5 = (x+z)*(q.z+q.y);
float t6 = (w+y)*(q.w-q.x);
float t7 = (w-y)*(q.w+q.x);
float t8 = t5 + t6 + t7;
float t9 = (t4 + t8)*0.5;
return xxquaternion ( t3+t9-t6,
t2+t9-t7,
t1+t9-t8,
t0+t9-t5 );
// 9 multiplications 27 addidtions 8 variables
// but of couse we can clean 4 variables
/*
float r = w, i = z, j = y, k =x;
float br = q.w, bi = q.z, bj = q.y, bk =q.x;
float t0 = (k-j)*(bj-bk);
float t1 = (r+i)*(br+bi);
float t2 = (r-i)*(bj+bk);
float t3 = (k+j)*(br-bi);
float t4 = (k-i)*(bi-bj);
float t5 = (k+i)*(bi+bj);
float t6 = (r+j)*(br-bk);
float t7 = (r-j)*(br+bk);
float t8 = t5 + t6 + t7;
float t9 = (t4 + t8)*0.5;
float rr = t0+t9-t5;
float ri = t1+t9-t8;
float rj = t2+t9-t7;
float rk = t3+t9-t6;
return xxquaternion ( rk, rj, ri, rr );
*/
}
//============
explicit vector rotation variants
/**
rotate vector by quaternion
*/
inline vector3 rotate(const vector3& v)const{
xxquaternion q(v.x * w + v.z * y - v.y * z,
v.y * w + v.x * z - v.z * x,
v.z * w + v.y * x - v.x * y,
v.x * x + v.y * y + v.z * z);
return vector3(w * q.x + x * q.w + y * q.z - z * q.y,
w * q.y + y * q.w + z * q.x - x * q.z,
w * q.z + z * q.w + x * q.y - y * q.x)*( 1.0f/norm() );
// 29 multiplications, 20 addidtions, 4 variables
// 5
/*
// refrence implementation
xxquaternion r = (*this)*xxquaternion(v.x, v.y, v.z, 0)*this->inverted();
return vector3( r.x, r.y, r.z );
*/
/*
// alternative implementation
float wx, wy, wz, xx, yy, yz, xy, xz, zz, x2, y2, z2;
x2 = q.x + q.x; y2 = q.y + q.y; z2 = q.z + q.z;
xx = q.x * x2; xy = q.x * y2; xz = q.x * z2;
yy = q.y * y2; yz = q.y * z2; zz = q.z * z2;
wx = q.w * x2; wy = q.w * y2; wz = q.w * z2;
return vector3( v.x - v.x * (yy + zz) + v.y * (xy - wz) + v.z * (xz + wy),
v.y + v.x * (xy + wz) - v.y * (xx + zz) + v.z * (yz - wx),
v.z + v.x * (xz - wy) + v.y * (yz + wx) - v.z * (xx + yy) )*( 1.0f/norm() );
// 18 multiplications, 21 addidtions, 12 variables
*/
};
Good luck.

Related

Quaternion rotation works fine with y/z rotation but gets messed up when I add x rotation

So I've been learning about quaternions recently and decided to make my own implementation. I tried to make it simple but I still can't pinpoint my error. x/y/z axis rotation works fine on it's own and y/z rotation work as well, but the second I add x axis to any of the others I get a strange stretching output. I'll attach the important code for the rotations below:(Be warned I'm quite new to cpp).
Here is how I describe a quaternion (as I understand since they are unit quaternions imaginary numbers aren't required):
struct Quaternion {
float w, x, y, z;
};
The multiplication rules of quaternions:
Quaternion operator* (Quaternion n, Quaternion p) {
Quaternion o;
// implements quaternion multiplication rules:
o.w = n.w * p.w - n.x * p.x - n.y * p.y - n.z * p.z;
o.x = n.w * p.x + n.x * p.w + n.y * p.z - n.z * p.y;
o.y = n.w * p.y - n.x * p.z + n.y * p.w + n.z * p.x;
o.z = n.w * p.z + n.x * p.y - n.y * p.x + n.z * p.w;
return o;
}
Generating the rotation quaternion to multiply the total rotation by:
Quaternion rotate(float w, float x, float y, float z) {
Quaternion n;
n.w = cosf(w/2);
n.x = x * sinf(w/2);
n.y = y * sinf(w/2);
n.z = z * sinf(w/2);
return n;
}
And finally, the matrix calculations which turn the quaternion into an x/y/z position:
inline vector<float> quaternion_matrix(Quaternion total, vector<float> vec) {
float x = vec[0], y = vec[1], z = vec[2];
// implementation of 3x3 quaternion rotation matrix:
vec[0] = (1 - 2 * pow(total.y, 2) - 2 * pow(total.z, 2))*x + (2 * total.x * total.y - 2 * total.w * total.z)*y + (2 * total.x * total.z + 2 * total.w * total.y)*z;
vec[1] = (2 * total.x * total.y + 2 * total.w * total.z)*x + (1 - 2 * pow(total.x, 2) - 2 * pow(total.z, 2))*y + (2 * total.y * total.z + 2 * total.w * total.x)*z;
vec[2] = (2 * total.x * total.z - 2 * total.w * total.y)*x + (2 * total.y * total.z - 2 * total.w * total.x)*y + (1 - 2 * pow(total.x, 2) - 2 * pow(total.y, 2))*z;
return vec;
}
That's pretty much it (I also have a normalize function to deal with floating point errors), I initialize all objects quaternion to: w = 1, x = 0, y = 0, z = 0. I rotate a quaternion using an expression like this:
obj.rotation = rotate(angle, x-axis, y-axis, z-axis) * obj.rotation
where obj.rotation is the objects total quaternion rotation value.
I appreciate any help I can get on this issue, if anyone knows what's wrong or has also experienced this issue before. Thanks
EDIT: multiplying total by these quaternions output the expected rotation:
rotate(angle,1,0,0)
rotate(angle,0,1,0)
rotate(angle,0,0,1)
rotate(angle,0,1,1)
However, any rotations such as these make the model stretch oddly:
rotate(angle,1,1,0)
rotate(angle,1,0,1)
EDIT2: here is the normalize function I use to normalize the quaternions:
Quaternion normalize(Quaternion n, double tolerance) {
// adds all squares of quaternion values, if normalized, total will be 1:
double total = pow(n.w, 2) + pow(n.x, 2) + pow(n.y, 2) + pow(n.z, 2);
if (total > 1 + tolerance || total < 1 - tolerance) {
// normalizes value of quaternion if it exceeds a certain tolerance value:
n.w /= (float) sqrt(total);
n.x /= (float) sqrt(total);
n.y /= (float) sqrt(total);
n.z /= (float) sqrt(total);
}
return n;
}
To implement two rotations in sequence you need the quaternion product of the two elementary rotations. Each elementary rotation is specified by an axis and an angle. But in your code you did not make sure you have a unit vector (direction vector) for the axis.
Do the following modification
Quaternion rotate(float w, float x, float y, float z) {
Quaternion n;
float f = 1/sqrtf(x*x+y*y+z*z)
n.w = cosf(w/2);
n.x = f * x * sinf(w/2);
n.y = f * y * sinf(w/2);
n.z = f * z * sinf(w/2);
return n;
}
and then use it as follows
Quaternion n = rotate(angle1,1,0,0) * rotate(angle2,0,1,0)
for the combined rotation of angle1 about the x-axis, and angle2 about the y-axis.
As pointed out in comments, you are not initializing your quaternions correctly.
The following quaternions are not rotations:
rotate(angle,0,1,1)
rotate(angle,1,1,0)
rotate(angle,1,0,1)
The reason is the axis is not normalized e.g., the vector (0,1,1) is not normalized. Also make sure your angles are in radians.

How does this lighting calculation work?

I have that piece of code that is responsible for lighting a pyramid.
float Geometric3D::calculateLight(int vert1, int vert2, int vert3) {
float ax = tabX[vert2] - tabX[vert1];
float ay = tabY[vert2] - tabY[vert1];
float az = tabZ[vert2] - tabZ[vert1];
float bx = tabX[vert3] - tabX[vert1];
float by = tabY[vert3] - tabY[vert1];
float bz = tabZ[vert3] - tabZ[vert1];
float Nx = (ay * bz) - (az * by);
float Ny = (az * bx) - (ax * bz);;
float Nz = (ax * by) - (ay * bx);;
float Lx = -300.0f;
float Ly = -300.0f;
float Lz = -1000.0f;
float lenN = sqrtf((Nx * Nx) + (Ny * Ny) + (Nz * Nz));
float lenL = sqrtf((Lx * Lx) + (Ly * Ly) + (Lz * Lz));
float res = ((Nx * Lx) + (Ny * Ly) + (Nz * Lz)) / (lenN * lenL);
if (res < 0.0f)
res = -res;
return res;
}
I cannot understand calculations at the end. Can someone explain me the maths that is behind them? I know that firstly program calculates two vectors of a plane to compute the normal of it (which goes for vector N). Vector L stand for lighting but what happens next? Why do we calculate length of normal and light then multiply it and divide by their sizes?

How to speed up bilinear interpolation of image?

I'm trying to rotate image with interpolation, but it's too slow for real time for big images.
the code something like:
for(int y=0;y<dst_h;++y)
{
for(int x=0;x<dst_w;++x)
{
//do inverse transform
fPoint pt(Transform(Point(x, y)));
//in coor of src
int x1= (int)floor(pt.x);
int y1= (int)floor(pt.y);
int x2= x1+1;
int y2= y1+1;
if((x1>=0&&x1<src_w&&y1>=0&&y1<src_h)&&(x2>=0&&x2<src_w&&y2>=0&&y2<src_h))
{
Mask[y][x]= 1; //show pixel
float dx1= pt.x-x1;
float dx2= 1-dx1;
float dy1= pt.y-y1;
float dy2= 1-dy1;
//bilinear
pd[x].blue= (dy2*(ps[y1*src_w+x1].blue*dx2+ps[y1*src_w+x2].blue*dx1)+
dy1*(ps[y2*src_w+x1].blue*dx2+ps[y2*src_w+x2].blue*dx1));
pd[x].green= (dy2*(ps[y1*src_w+x1].green*dx2+ps[y1*src_w+x2].green*dx1)+
dy1*(ps[y2*src_w+x1].green*dx2+ps[y2*src_w+x2].green*dx1));
pd[x].red= (dy2*(ps[y1*src_w+x1].red*dx2+ps[y1*src_w+x2].red*dx1)+
dy1*(ps[y2*src_w+x1].red*dx2+ps[y2*src_w+x2].red*dx1));
//nearest neighbour
//pd[x]= ps[((int)pt.y)*src_w+(int)pt.x];
}
else
Mask[y][x]= 0; //transparent pixel
}
pd+= dst_w;
}
How I can speed up this code, I try to parallelize this code but it seems there is no speed up because of memory access pattern (?).
The key is to do most of your computations as ints. The only thing that is necessary to do as a float is the weighting. See here for a good resource.
From that same resource:
int px = (int)x; // floor of x
int py = (int)y; // floor of y
const int stride = img->width;
const Pixel* p0 = img->data + px + py * stride; // pointer to first pixel
// load the four neighboring pixels
const Pixel& p1 = p0[0 + 0 * stride];
const Pixel& p2 = p0[1 + 0 * stride];
const Pixel& p3 = p0[0 + 1 * stride];
const Pixel& p4 = p0[1 + 1 * stride];
// Calculate the weights for each pixel
float fx = x - px;
float fy = y - py;
float fx1 = 1.0f - fx;
float fy1 = 1.0f - fy;
int w1 = fx1 * fy1 * 256.0f;
int w2 = fx * fy1 * 256.0f;
int w3 = fx1 * fy * 256.0f;
int w4 = fx * fy * 256.0f;
// Calculate the weighted sum of pixels (for each color channel)
int outr = p1.r * w1 + p2.r * w2 + p3.r * w3 + p4.r * w4;
int outg = p1.g * w1 + p2.g * w2 + p3.g * w3 + p4.g * w4;
int outb = p1.b * w1 + p2.b * w2 + p3.b * w3 + p4.b * w4;
int outa = p1.a * w1 + p2.a * w2 + p3.a * w3 + p4.a * w4;
wow you are doing a lot inside most inner loop like:
1.float to int conversions
can do all on floats ...
they are these days pretty fast
the conversion is what is killing you
also you are mixing float and ints together (if i see it right) which is the same ...
2.transform(x,y)
any unnecessary call makes heap trashing and slow things down
instead add 2 variables xx,yy and interpolate them insde your for loops
3.if ....
why to heck are you adding if ?
limit the for ranges before loop and not inside ...
the background can be filled with other fors before or later

Compute gradient for voxel data efficiently

What is the most efficient way of computing the gradient for fixed sized voxel data, such as the source code below. Note that I need the gradient at any point in space. The gradients will be used for estimating normals in a marching cubes implementation.
#import <array>
struct VoxelData {
VoxelData(float* data, unsigned int xDim, unsigned int yDim, unsigned int zDim)
:data(data), xDim(xDim), yDim(yDim), zDim(zDim)
{}
std::array<float,3> get_gradient(float x, float y, float z){
std::array<float,3> res;
// compute gradient efficiently
return res;
}
float get_density(int x, int y, int z){
if (x<0 || y<0 || z<0 || x >= xDim || y >= yDim || z >= zDim){
return 0;
}
return data[get_element_index(x, y, z)];
}
int get_element_index(int x, int y, int z){
return x * zDim * yDim + y*zDim + z;
}
const float* const data;
const unsigned int xDim;
const unsigned int yDim;
const unsigned int zDim;
};
Update 1
A demo project of the problem can be found here:
https://github.com/mortennobel/OpenGLVoxelizer
Currently the output is like the picture below (based on MooseBoys code):
Update 2
The solution that I'm looking for must give fairly accurate gradients, since they are used as normals in a visualisation and visual artefacts like the ones below must be avoided.
Update 2
Solution from the user example is:
The following produces a linearly interpolated gradient field:
std::array<float,3> get_gradient(float x, float y, float z){
std::array<float,3> res;
// x
int xi = (int)(x + 0.5f);
float xf = x + 0.5f - xi;
float xd0 = get_density(xi - 1, (int)y, (int)z);
float xd1 = get_density(xi, (int)y, (int)z);
float xd2 = get_density(xi + 1, (int)y, (int)z);
res[0] = (xd1 - xd0) * (1.0f - xf) + (xd2 - xd1) * xf; // lerp
// y
int yi = (int)(y + 0.5f);
float yf = y + 0.5f - yi;
float yd0 = get_density((int)x, yi - 1, (int)z);
float yd1 = get_density((int)x, yi, (int)z);
float yd2 = get_density((int)x, yi + 1, (int)z);
res[1] = (yd1 - yd0) * (1.0f - yf) + (yd2 - yd1) * yf; // lerp
// z
int zi = (int)(z + 0.5f);
float zf = z + 0.5f - zi;
float zd0 = get_density((int)x, (int)y, zi - 1);
float zd1 = get_density((int)x, (int)y, zi);
float zd2 = get_density((int)x, (int)y, zi + 1);
res[2] = (zd1 - zd0) * (1.0f - zf) + (zd2 - zd1) * zf; // lerp
return res;
}
One important technique for optimization in many implementations involves time/space trade off. As a suggestion, anywhere you can pre-calc and cache your results may be worth looking at.
In general Sobel filters provide slightly nicer results than simple central tendency, but take longer to compute (the Sobel is essentially a smooth filter combined with central tendency). A classic Sobel requires weighting 26 samples, while central tendency only requires 6. However, there is a trick: with GPUs you can get hardware-based trilinear interpolation for free. That means you can compute a Sobel with 8 texture reads, and this can be done in parallel across the GPU. The following page illustrates this technique using GLSL
http://www.mccauslandcenter.sc.edu/mricrogl/notes/gradients
For your project you would probably want to compute the gradients on the GPU and use GPGPU methods to copy the results back from the GPU to the CPU for further processing.
MooseBoys already posted a component-wise linear interpolation. It is discontinuous in the y and z component though, whereever (int)x changes from one value to the next (same thing for the other components). This might cause such a rough picture as you are seeing it. If you have enough performance to spare you can improve this by considering not just (int)x but (int)(x+1) aswell. This might look like the following
std::array<float,3> get_gradient(float x, float y, float z){
std::array<float,3> res;
int xim = (int)(x + 0.5f);
float xfm = x + 0.5f - xi;
int yim = (int)(y + 0.5f);
float yfm = y + 0.5f - yi;
int zim = (int)(z + 0.5f);
float zfm = z + 0.5f - zi;
int xi = (int)x;
float xf = x - xi;
int yi = (int)y;
float yf = y - yi;
int zi = (int)z;
float zf = z - zi;
float xd0 = yf*( zf *get_density(xim - 1, yi+1, zi+1)
+ (1.0f - zf)*get_density(xim - 1, yi+1, zi))
+(1.0f - yf)*(zf *get_density(xim - 1, yi , zi+1)
+ (1.0f - zf)*get_density(xim - 1, yi , zi));
float xd1 = yf*( zf *get_density(xim, yi+1, zi+1)
+ (1.0f - zf)*get_density(xim, yi+1, zi))
+(1.0f - yf)*(zf *get_density(xim, yi , zi+1)
+ (1.0f - zf)*get_density(xim, yi , zi));
float xd2 = yf*( zf *get_density(xim + 1, yi+1, zi+1)
+ (1.0f - zf)*get_density(xim + 1, yi+1, zi))
+(1.0f - yf)*(zf *get_density(xim + 1, yi , zi+1)
+ (1.0f - zf)*get_density(xim + 1, yi , zi));
res[0] = (xd1 - xd0) * (1.0f - xfm) + (xd2 - xd1) * xfm;
float yd0 = xf*( zf *get_density(xi+1, yim-1, zi+1)
+ (1.0f - zf)*get_density(xi+1, yim-1, zi))
+(1.0f - xf)*(zf *get_density(xi , yim-1, zi+1)
+ (1.0f - zf)*get_density(xi , yim-1, zi));
float yd1 = xf*( zf *get_density(xi+1, yim , zi+1)
+ (1.0f - zf)*get_density(xi+1, yim , zi))
+(1.0f - xf)*(zf *get_density(xi , yim , zi+1)
+ (1.0f - zf)*get_density(xi , yim , zi));
float yd2 = xf*( zf *get_density(xi+1, yim+1, zi+1)
+ (1.0f - zf)*get_density(xi+1, yim+1, zi))
+(1.0f - xf)*(zf *get_density(xi , yim+1, zi+1)
+ (1.0f - zf)*get_density(xi , yim+1, zi));
res[1] = (yd1 - yd0) * (1.0f - yfm) + (yd2 - yd1) * yfm;
float zd0 = xf*( yf *get_density(xi+1, yi+1, zim-1)
+ (1.0f - yf)*get_density(xi+1, yi , zim-1))
+(1.0f - xf)*(yf *get_density(xi, yi+1, zim-1)
+ (1.0f - yf)*get_density(xi, yi , zim-1));
float zd1 = xf*( yf *get_density(xi+1, yi+1, zim)
+ (1.0f - yf)*get_density(xi+1, yi , zim))
+(1.0f - xf)*(yf *get_density(xi, yi+1, zim)
+ (1.0f - yf)*get_density(xi, yi , zim));
float zd2 = xf*( yf *get_density(xi+1, yi+1, zim+1)
+ (1.0f - yf)*get_density(xi+1, yi , zim+1))
+(1.0f - xf)*(yf *get_density(xi, yi+1, zim+1)
+ (1.0f - yf)*get_density(xi, yi , zim+1));
res[2] = (zd1 - zd0) * (1.0f - zfm) + (zd2 - zd1) * zfm;
return res;
}
This can probably be written a bit more concise, but maybe this way you can still see what is happening. If this still is not smooth enough for you will have to look into cubic / spline interpolation or similar.

Rotate a vector about another vector

I am writing a 3d vector class for OpenGL. How do I rotate a vector v1 about another vector v2 by an angle A?
You may find quaternions to be a more elegant and efficient solution.
After seeing this answer bumped recently, I though I'd provide a more robust answer. One that can be used without necessarily understanding the full mathematical implications of quaternions. I'm going to assume (given the C++ tag) that you have something like a Vector3 class with 'obvious' functions like inner, cross, and *= scalar operators, etc...
#include <cfloat>
#include <cmath>
...
void make_quat (float quat[4], const Vector3 & v2, float angle)
{
// BTW: there's no reason you can't use 'doubles' for angle, etc.
// there's not much point in applying a rotation outside of [-PI, +PI];
// as that covers the practical 2.PI range.
// any time graphics / floating point overlap, we have to think hard
// about degenerate cases that can arise quite naturally (think of
// pathological cancellation errors that are *possible* in seemingly
// benign operations like inner products - and other running sums).
Vector3 axis (v2);
float rl = sqrt(inner(axis, axis));
if (rl < FLT_EPSILON) // we'll handle this as no rotation:
{
quat[0] = 0.0, quat[1] = 0.0, quat[2] = 0.0, quat[3] = 1.0;
return; // the 'identity' unit quaternion.
}
float ca = cos(angle);
// we know a maths library is never going to yield a value outside
// of [-1.0, +1.0] right? Well, maybe we're using something else -
// like an approximating polynomial, or a faster hack that's a little
// rough 'around the edge' cases? let's *ensure* a clamped range:
ca = (ca < -1.0f) ? -1.0f : ((ca > +1.0f) ? +1.0f : ca);
// now we find cos / sin of a half-angle. we can use a faster identity
// for this, secure in the knowledge that 'sqrt' will be valid....
float cq = sqrt((1.0f + ca) / 2.0f); // cos(acos(ca) / 2.0);
float sq = sqrt((1.0f - ca) / 2.0f); // sin(acos(ca) / 2.0);
axis *= sq / rl; // i.e., scaling each element, and finally:
quat[0] = axis[0], quat[1] = axis[1], quat[2] = axis[2], quat[3] = cq;
}
Thus float quat[4] holds a unit quaternion that represents the axis and angle of rotation, given the original arguments (, v2, A).
Here's a routine for quaternion multiplication. SSE/SIMD can probably speed this up, but complicated transform & lighting are typically GPU-driven in most scenarios. If you remember complex number multiplication as a little weird, quaternion multiplication is more so. Complex number multiplication is a commutative operation: a*b = b*a. Quaternions don't even preserve this property, i.e., q*p != p*q :
static inline void
qmul (float r[4], const float q[4], const float p[4])
{
// quaternion multiplication: r = q * p
float w0 = q[3], w1 = p[3];
float x0 = q[0], x1 = p[0];
float y0 = q[1], y1 = p[1];
float z0 = q[2], z1 = p[2];
r[3] = w0 * w1 - x0 * x1 - y0 * y1 - z0 * z1;
r[0] = w0 * x1 + x0 * w1 + y0 * z1 - z0 * y1;
r[1] = w0 * y1 + y0 * w1 + z0 * x1 - x0 * z1;
r[2] = w0 * z1 + z0 * w1 + x0 * y1 - y0 * x1;
}
Finally, rotating a 3D 'vector' v (or if you prefer, the 'point' v that the question has named v1, represented as a vector), using the quaternion: float q[4] has a somewhat strange formula: v' = q * v * conjugate(q). Quaternions have conjugates, similar to complex numbers. Here's the routine:
static inline void
qrot (float v[3], const float q[4])
{
// 3D vector rotation: v = q * v * conj(q)
float r[4], p[4];
r[0] = + v[0], r[1] = + v[1], r[2] = + v[2], r[3] = +0.0;
glView__qmul(r, q, r);
p[0] = - q[0], p[1] = - q[1], p[2] = - q[2], p[3] = q[3];
glView__qmul(r, r, p);
v[0] = r[0], v[1] = r[1], v[2] = r[2];
}
Putting it all together. Obviously you can make use of the static keyword where appropriate. Modern optimising compilers may ignore the inline hint depending on their own code generation heuristics. But let's just concentrate on correctness for now:
How do I rotate a vector v1 about another vector v2 by an angle A?
Assuming some sort of Vector3 class, and (A) in radians, we want the quaternion representing the rotation by the angle (A) about the axis v2, and we want to apply that quaternion rotation to v1 for the result:
float q[4]; // we want to find the unit quaternion for `v2` and `A`...
make_quat(q, v2, A);
// what about `v1`? can we access elements with `operator [] (int)` (?)
// if so, let's assume the memory: `v1[0] .. v1[2]` is contiguous.
// you can figure out how you want to store and manage your Vector3 class.
qrot(& v1[0], q);
// `v1` has been rotated by `(A)` radians about the direction vector `v2` ...
Is this the sort of thing that folks would like to see expanded upon in the Beta Documentation site? I'm not altogether clear on its requirements, expected rigour, etc.
This may prove useful:
double c = cos(A);
double s = sin(A);
double C = 1.0 - c;
double Q[3][3];
Q[0][0] = v2[0] * v2[0] * C + c;
Q[0][1] = v2[1] * v2[0] * C + v2[2] * s;
Q[0][2] = v2[2] * v2[0] * C - v2[1] * s;
Q[1][0] = v2[1] * v2[0] * C - v2[2] * s;
Q[1][1] = v2[1] * v2[1] * C + c;
Q[1][2] = v2[2] * v2[1] * C + v2[0] * s;
Q[2][0] = v2[0] * v2[2] * C + v2[1] * s;
Q[2][1] = v2[2] * v2[1] * C - v2[0] * s;
Q[2][2] = v2[2] * v2[2] * C + c;
v1[0] = v1[0] * Q[0][0] + v1[0] * Q[0][1] + v1[0] * Q[0][2];
v1[1] = v1[1] * Q[1][0] + v1[1] * Q[1][1] + v1[1] * Q[1][2];
v1[2] = v1[2] * Q[2][0] + v1[2] * Q[2][1] + v1[2] * Q[2][2];
Use a 3D rotation matrix.
The easiest-to-understand way would be rotating the coordinate axis so that vector v2 aligns with the Z axis, then rotate by A around the Z axis, and rotate back so that the Z axis aligns with v2.
When you have written down the rotation matrices for the three operations, you'll probably notice that you apply three matrices after each other. To reach the same effect, you can multiply the three matrices.
I found this here:
http://steve.hollasch.net/cgindex/math/rotvec.html
let
[v] = [vx, vy, vz] the vector to be rotated.
[l] = [lx, ly, lz] the vector about rotation
| 1 0 0|
[i] = | 0 1 0| the identity matrix
| 0 0 1|
| 0 lz -ly |
[L] = | -lz 0 lx |
| ly -lx 0 |
d = sqrt(lx*lx + ly*ly + lz*lz)
a the angle of rotation
then
matrix operations gives:
[v] = [v]x{[i] + sin(a)/d*[L] + ((1 - cos(a))/(d*d)*([L]x[L]))}
I wrote my own Matrix3 class and Vector3Library that implemented this vector rotation. It works absolutely perfectly. I use it to avoid drawing models outside the field of view of the camera.
I suppose this is the "use a 3d rotation matrix" approach. I took a quick look at quaternions, but have never used them, so stuck to something I could wrap my head around.