I am working on a project where I'm basically preforming PCA millions of times on sets of 20-100 points. Currently, we are using some legacy code that is using GNU's GSL linear algebra pack to do SVD on covariance matrix. This works, but is very slow.
I was wondering if there are any simple methods to do eigen decompositions on a 3x3 symmetric matrix, so that I can just put it on the GPU and let it run in parallel.
Since the matrices themselves are so small, I wasn't sure what kind of algorithm to use, because it seems like they were designed for large matrices or data sets. There's also the choice of doing a straight SVD on the data set, but I'm not sure what would be the best option.
I have to admit, I'm not stellar at Linear Algebra, especially when considering algorithm advantages. Any help would be greatly appreciated.
(I'm working in C++ right now)
Using the characteristic polynomial works, but it tends to be somewhat numerically unstable (or at the very least inaccurate).
A standard algorithm to compute eigensystems for symmetric matrices is the QR method. For 3x3 matrices, a very slick implementation is possible by building the orthogonal transform out of rotations and representing them as a Quaternion. A (quite short!) implementation of this idea in C++, assuming you have a 3x3 matrix and a Quaternion class, can be found here. The algorithm should be fairly suitable for GPU implementation because it's iterative (and thus self-correcting), can make reasonably good use of fast low-dimensional vector math primitives when they're available and uses a fairly small number of vector registers (so it allows for more active threads).
Most methods are efficient for bigger matrices. For small ones the analytical method ist the quickest and simplest, but is in some cases inaccurate.
Joachim Kopp developed a optimized "hybrid" method for a 3x3 symmetric matrix, which relays on the analytical mathod, but falls back to QL algorithm.
An other solution for 3x3 symmetric matrices can be found here (symmetric tridiagonal QL algorithm).
// Slightly modified version of Stan Melax's code for 3x3 matrix diagonalization (Thanks Stan!)
// source: http://www.melax.com/diag.html?attredirects=0
void Diagonalize(const Real (&A)[3][3], Real (&Q)[3][3], Real (&D)[3][3])
{
// A must be a symmetric matrix.
// returns Q and D such that
// Diagonal matrix D = QT * A * Q; and A = Q*D*QT
const int maxsteps=24; // certainly wont need that many.
int k0, k1, k2;
Real o[3], m[3];
Real q [4] = {0.0,0.0,0.0,1.0};
Real jr[4];
Real sqw, sqx, sqy, sqz;
Real tmp1, tmp2, mq;
Real AQ[3][3];
Real thet, sgn, t, c;
for(int i=0;i < maxsteps;++i)
{
// quat to matrix
sqx = q[0]*q[0];
sqy = q[1]*q[1];
sqz = q[2]*q[2];
sqw = q[3]*q[3];
Q[0][0] = ( sqx - sqy - sqz + sqw);
Q[1][1] = (-sqx + sqy - sqz + sqw);
Q[2][2] = (-sqx - sqy + sqz + sqw);
tmp1 = q[0]*q[1];
tmp2 = q[2]*q[3];
Q[1][0] = 2.0 * (tmp1 + tmp2);
Q[0][1] = 2.0 * (tmp1 - tmp2);
tmp1 = q[0]*q[2];
tmp2 = q[1]*q[3];
Q[2][0] = 2.0 * (tmp1 - tmp2);
Q[0][2] = 2.0 * (tmp1 + tmp2);
tmp1 = q[1]*q[2];
tmp2 = q[0]*q[3];
Q[2][1] = 2.0 * (tmp1 + tmp2);
Q[1][2] = 2.0 * (tmp1 - tmp2);
// AQ = A * Q
AQ[0][0] = Q[0][0]*A[0][0]+Q[1][0]*A[0][1]+Q[2][0]*A[0][2];
AQ[0][1] = Q[0][1]*A[0][0]+Q[1][1]*A[0][1]+Q[2][1]*A[0][2];
AQ[0][2] = Q[0][2]*A[0][0]+Q[1][2]*A[0][1]+Q[2][2]*A[0][2];
AQ[1][0] = Q[0][0]*A[0][1]+Q[1][0]*A[1][1]+Q[2][0]*A[1][2];
AQ[1][1] = Q[0][1]*A[0][1]+Q[1][1]*A[1][1]+Q[2][1]*A[1][2];
AQ[1][2] = Q[0][2]*A[0][1]+Q[1][2]*A[1][1]+Q[2][2]*A[1][2];
AQ[2][0] = Q[0][0]*A[0][2]+Q[1][0]*A[1][2]+Q[2][0]*A[2][2];
AQ[2][1] = Q[0][1]*A[0][2]+Q[1][1]*A[1][2]+Q[2][1]*A[2][2];
AQ[2][2] = Q[0][2]*A[0][2]+Q[1][2]*A[1][2]+Q[2][2]*A[2][2];
// D = Qt * AQ
D[0][0] = AQ[0][0]*Q[0][0]+AQ[1][0]*Q[1][0]+AQ[2][0]*Q[2][0];
D[0][1] = AQ[0][0]*Q[0][1]+AQ[1][0]*Q[1][1]+AQ[2][0]*Q[2][1];
D[0][2] = AQ[0][0]*Q[0][2]+AQ[1][0]*Q[1][2]+AQ[2][0]*Q[2][2];
D[1][0] = AQ[0][1]*Q[0][0]+AQ[1][1]*Q[1][0]+AQ[2][1]*Q[2][0];
D[1][1] = AQ[0][1]*Q[0][1]+AQ[1][1]*Q[1][1]+AQ[2][1]*Q[2][1];
D[1][2] = AQ[0][1]*Q[0][2]+AQ[1][1]*Q[1][2]+AQ[2][1]*Q[2][2];
D[2][0] = AQ[0][2]*Q[0][0]+AQ[1][2]*Q[1][0]+AQ[2][2]*Q[2][0];
D[2][1] = AQ[0][2]*Q[0][1]+AQ[1][2]*Q[1][1]+AQ[2][2]*Q[2][1];
D[2][2] = AQ[0][2]*Q[0][2]+AQ[1][2]*Q[1][2]+AQ[2][2]*Q[2][2];
o[0] = D[1][2];
o[1] = D[0][2];
o[2] = D[0][1];
m[0] = fabs(o[0]);
m[1] = fabs(o[1]);
m[2] = fabs(o[2]);
k0 = (m[0] > m[1] && m[0] > m[2])?0: (m[1] > m[2])? 1 : 2; // index of largest element of offdiag
k1 = (k0+1)%3;
k2 = (k0+2)%3;
if (o[k0]==0.0)
{
break; // diagonal already
}
thet = (D[k2][k2]-D[k1][k1])/(2.0*o[k0]);
sgn = (thet > 0.0)?1.0:-1.0;
thet *= sgn; // make it positive
t = sgn /(thet +((thet < 1.E6)?sqrt(thet*thet+1.0):thet)) ; // sign(T)/(|T|+sqrt(T^2+1))
c = 1.0/sqrt(t*t+1.0); // c= 1/(t^2+1) , t=s/c
if(c==1.0)
{
break; // no room for improvement - reached machine precision.
}
jr[0 ] = jr[1] = jr[2] = jr[3] = 0.0;
jr[k0] = sgn*sqrt((1.0-c)/2.0); // using 1/2 angle identity sin(a/2) = sqrt((1-cos(a))/2)
jr[k0] *= -1.0; // since our quat-to-matrix convention was for v*M instead of M*v
jr[3 ] = sqrt(1.0f - jr[k0] * jr[k0]);
if(jr[3]==1.0)
{
break; // reached limits of floating point precision
}
q[0] = (q[3]*jr[0] + q[0]*jr[3] + q[1]*jr[2] - q[2]*jr[1]);
q[1] = (q[3]*jr[1] - q[0]*jr[2] + q[1]*jr[3] + q[2]*jr[0]);
q[2] = (q[3]*jr[2] + q[0]*jr[1] - q[1]*jr[0] + q[2]*jr[3]);
q[3] = (q[3]*jr[3] - q[0]*jr[0] - q[1]*jr[1] - q[2]*jr[2]);
mq = sqrt(q[0] * q[0] + q[1] * q[1] + q[2] * q[2] + q[3] * q[3]);
q[0] /= mq;
q[1] /= mq;
q[2] /= mq;
q[3] /= mq;
}
}
I am not stellar at linear algebra either but since Murphy stated that "When you don't know what you're talking about, everything is possible", it is possible that the CULA pack might be relevant to your needs. They do SVD and Eigenvalues decomposition
Related
Yes, I know that it is a popular problem. But I found nowhere the full clear implementing code without using OpenGL classes or a lot of headers files.
Okay, the math solution is to transfer ellipsoid to sphere. Then find intersections dots (if they exist of course) and make inverse transformation. Because affine transformation respect intersection.
But I have difficulties when trying to implement this.
I tried something for sphere but it is completely incorrect.
double CountDelta(Point X, Point Y, Sphere S)
{
double a = 0.0;
for(int i = 0; i < 3; i++){
a += (Y._coordinates[i] - X._coordinates[i]) * (Y._coordinates[i] - X._coordinates[i]);
}
double b = 0.0;
for(int i = 0; i < 3; i++)
b += (Y._coordinates[i] - X._coordinates[i]) * (X._coordinates[i] - S._coordinates[i]);
b *= 2;
double c = - S.r * S.r;
for(int i = 0; i < 3; i++)
c += (X._coordinates[i] - S._coordinates[i]) * (X._coordinates[i] - S._coordinates[i]);
return b * b - 4 * a * c;
}
Let I have start point P = (Px, Py, Pz), direction V = (Vx, Vy, Vz), ellipsoid = (Ex, Ey, Ec) and (a, b, c). How to construct clear code?
Let a line from P to P + D intersecting a sphere of center C and radius R.
WLOG, C can be the origin and R unit (otherwise translate by -C and scale by 1/R). Now using the parametric equation of the line and the implicit equation of the sphere,
(Px + t Dx)² + (Py + t Dy)² + (Pz + t Dz)² = 1
or
(Dx² + Dy² + Dz²) t² + 2 (Dx Px + Dy Py + Dz Pz) t + Px² + Py² + Pz² - 1 = 0
(Vectorially, D² t² + 2 D P t + P² - 1 = 0 and t = (- D P ±√((D P)² - D²(P² - 1))) / D².)
Solve this quadratic equation for t and get the two intersections as P + t D. (Don't forget to invert the initial transformations.)
For the ellipsoid, you can either plug the parametric equation of the line directly into the implicit equation of the conic, or reduce the conic (and the points simultaneously) and plug in the reduced equation.
A line in the 2D plane can be represented with the implicit equation
f(x,y) = a*x + b*y + c = 0
= dot((a,b,c),(x,y,1))
If a^2 + b^2 = 1, then f is considered normalized and f(x,y) gives you the Euclidean (signed) distance to the line.
Say you are given a 3xK matrix (in Eigen) where each column represents a line:
Eigen::Matrix<float,3,Eigen::Dynamic> lines;
and you wish to normalize all K lines. Currently I do this a follows:
for (size_t i = 0; i < K; i++) { // for each column
const float s = lines.block(0,i,2,1).norm(); // s = sqrt(a^2 + b^2)
lines.col(i) /= s; // (a, b, c) /= s
}
I know there must be a more clever and efficient method for this that does not require looping. Any ideas?
EDIT: The following turns out being slower for optimized code... hmmm..
Eigen::VectorXf scales = lines.block(0,0,2,K).colwise().norm().cwiseInverse()
lines *= scales.asDiagonal()
I assume that this as something to do with creating KxK matrix scales.asDiagonal().
P.S. I could use Eigen::Hyperplane somehow, but the docs seem little opaque.
I have this function to reach a certain 1 dimensional value accelerated and damped with overshoot. That is: given an inital value, a velocity and a acceleration (force/mass), the target value is attained by accelerating to it and gets increasingly damped while getting closer to the target value.
This all works fine, howver If i want to know what the TotalAngle is after time 't' I have to run this function say N steps with a 'small' dt to find the 'limit'.
I was wondering If i can (and how) to intergrate over dt so that the TotalAngle can be determined given a time 't' initially.
Regards, Tanks for any help.
dt = delta time step per frame
input = 1
TotalAngle = 0 at t=0
Velocity = 0 at t=0
void FAccelDampedWithOvershoot::Update(float dt, float input, float& Velocity, float& TotalAngle)
{
const float Force = 500000.f;
const float DampForce = 5000.f;
const float MaxAngle = 45.f;
const float InvMass = 1.f / 162400.f;
float target = MaxAngle * input;
float ratio = (target - TotalAngle) / MaxAngle;
float fMove = Force * ratio;
float fDamp = -Velocity * DampForce;
Velocity += (fMove + fDamp) * invMass * dt;
TotalAngle += Velocity * dt;
}
Updated with fixed bugs in math
Originally I've lost mass and MaxAngle a few times. This is why you should first solve it on a paper and then enter to the SO rather than trying to solve it in the text editor.
Anyway, I've fixed the math and now it seems to work reasonably well. I put fixed solution just over previous one.
Well, this looks like a Newtonian mechanics which means differential equations. Let's try to solve them.
SO is not very friendly to math formulas and I'm a bit bored to type characters so here is what I use:
F = Force
Fd = DampForce
MA = MaxAngle
A= TotalAngle
v = Velocity
m = 1 / InvMass
' for derivative i.e. something' is 1-st derivative of something by t and something'' is 2-nd derivative
if I divide you last two lines of code by dt and merge in all the other lines I can get (I also assume that input = 1 as other case is obviously symmetrical)
v' = ([F * (1 - A / MA)] - v * Fd) / m
and applying A' = v we get
m * A'' = F(1 - A/MA) - Fd * A'
or moving to one side we get a simple 2-nd order differential equation
m * A'' + Fd * A' + F/MA * A = F
IIRC, the way to solve it is to first solve characteristic equation which here is
m * x^2 + Fd * x + F/MA = 0
x[1,2] = (-Fd +/- sqrt(Fd^2 - 4*F*m/MA))/ (2*m)
I expect that part under sqrt i.e. (Fd^2 - 4*F*m/MA) is negative thus solution should be of the following form. Let
Dm = Fd/(2*m)
K = sqrt(F/MA/m - Dm^2)
(note the negated value under sqrt so it works now) then
A(t) = e^(-Dm*t) * [P * sin(K*t) + Q * cos(K*t)] + C
where P, Q and C are some constants.
The solution is easier to find as a sum of two solutions: some specific solution for
m * A'' + Fd * A' + F/MA * A = F
and a general solution for homogeneou
m * A'' + Fd * A' + F/MA * A = 0
that makes original conditions fit. Obviously specific solution A(t) = MA works and thus C = MA. So now we need to fit P and Q of general solution to match starting conditions. To find them we need
A(0) = - MA
A'(0) = V(0) = 0
Given that e^0 = 1, sin(0) = 0 and cos(0) = 1 you get something like
Q = -MA
P = 0
or
P = 0
Q = - MA
C = MA
thus
A(t) = MA * [1 - e^(-Dm*t) * cos(K*t)]
where
Dm = Fd/(2*m)
K = sqrt(F/MA/m - Dm^2)
which kind of makes sense given your task.
Note also that this equation assumes that everything happens in radians rather than degrees (i.e. derivative of [sin(t)]' is just cos(t)) so you should transform all your constants accordingly or transform the solution.
const float Force = 500000.f * M_PI / 180;
const float DampForce = 5000.f * M_PI / 180;
const float MaxAngle = M_PI_4;
which on my machine produces
Dm = 0.000268677541
K = 0.261568546
This seems to be similar to original funcion is I step with dt = 0.01f and the main obstacle seems to be precision loss because of float
Hope this helps!
This is not a full answer and I am sure someone else can work it out, but there is no room in the comments and it may help you find a better solution.
The image below shows the velocity (blue) as your function integrates at time steps 1. The red shows the function below that calculates the value for time t
The function F(t)
F(t) = sin((t / f) * pi * 2) * (1 / (((t / f) + a) ^ c)) * b
With f = 23.7, a = 1.4, c = 2, and b= 50 that give the red plot in the image above
All the values are just approximation.
f determines the frequency and is close to a match,
a,b,c control the falloff in amplitude and are a by eye guestimate.
If it does not matter that you have a perfect match then this will work for you. totalAngle uses the same function but t has 0.25 added to it. Unfortunately I did not get any values for a,b,c for totalAngle and I did notice that it was offset so you will have to add the offset value d (I normalised everything so have no idea what the range of totalAngle was)
Function F(t) for totalAngle
F(t) = sin(((t+0.25) / f) * pi * 2) * (1 / ((((t+0.25) / f) + a) ^ c)) * b + d
Sorry only have f = 23.7, c= 2, a~1.4 nothing for b=? d=?
On modern processors, float division is a good order of magnitude slower than float multiplication (when measured by reciprocal throughput).
I'm wondering if there are any algorithms out there for computating a fast approximation to x/y, given certain assumptions and tolerance levels. For example, if you assume that 0<x<y, and are willing to accept any output that is within 10% of the true value, are there algorithms faster than the built-in FDIV operation?
I hope that this helps because this is probably as close as your going to get to what you are looking for.
__inline__ double __attribute__((const)) divide( double y, double x ) {
// calculates y/x
union {
double dbl;
unsigned long long ull;
} u;
u.dbl = x; // x = x
u.ull = ( 0xbfcdd6a18f6a6f52ULL - u.ull ) >> (unsigned char)1;
// pow( x, -0.5 )
u.dbl *= u.dbl; // pow( pow(x,-0.5), 2 ) = pow( x, -1 ) = 1.0/x
return u.dbl * y; // (1.0/x) * y = y/x
}
See also:
Another post about reciprocal approximation.
The Wikipedia page.
FDIV is usually exceptionally slower than FMUL just b/c it can't be piped like multiplication and requires multiple clk cycles for iterative convergence HW seeking process.
Easiest way is to simply recognize that division is nothing more than the multiplication of the dividend y and the inverse of the divisor x. The not so straight forward part is remembering a float value x = m * 2 ^ e & its inverse x^-1 = (1/m)*2^(-e) = (2/m)*2^(-e-1) = p * 2^q approximating this new mantissa p = 2/m = 3-x, for 1<=m<2. This gives a rough piece-wise linear approximation of the inverse function, however we can do a lot better by using an iterative Newton Root Finding Method to improve that approximation.
let w = f(x) = 1/x, the inverse of this function f(x) is found by solving for x in terms of w or x = f^(-1)(w) = 1/w. To improve the output with the root finding method we must first create a function whose zero reflects the desired output, i.e. g(w) = 1/w - x, d/dw(g(w)) = -1/w^2.
w[n+1]= w[n] - g(w[n])/g'(w[n]) = w[n] + w[n]^2 * (1/w[n] - x) = w[n] * (2 - x*w[n])
w[n+1] = w[n] * (2 - x*w[n]), when w[n]=1/x, w[n+1]=1/x*(2-x*1/x)=1/x
These components then add to get the final piece of code:
float inv_fast(float x) {
union { float f; int i; } v;
float w, sx;
int m;
sx = (x < 0) ? -1:1;
x = sx * x;
v.i = (int)(0x7EF127EA - *(uint32_t *)&x);
w = x * v.f;
// Efficient Iterative Approximation Improvement in horner polynomial form.
v.f = v.f * (2 - w); // Single iteration, Err = -3.36e-3 * 2^(-flr(log2(x)))
// v.f = v.f * ( 4 + w * (-6 + w * (4 - w))); // Second iteration, Err = -1.13e-5 * 2^(-flr(log2(x)))
// v.f = v.f * (8 + w * (-28 + w * (56 + w * (-70 + w *(56 + w * (-28 + w * (8 - w))))))); // Third Iteration, Err = +-6.8e-8 * 2^(-flr(log2(x)))
return v.f * sx;
}
I'm trying to get fourier transforms to work, I have to do it for an assignment and I think I have it to where it should be working and i'm not sure why it's not. I think it has something to do with the complex numbers since 'i' is involved. I've looked at many references and I understand the formula but i'm having trouble programming it. this is what i have so far
void NaiveDFT::Apply( Image & img )
{
//make the fourier transform using the naive method and set that to the image.
Image dft(img);
Pixel ** dftData = dft.GetImageData();
Pixel ** imgData = img.GetImageData();
for(unsigned u = 0; u < img.GetWidth(); ++u)
{
for(unsigned v = 0; v < img.GetHeight(); ++v)
{
std::complex<double> sum = 0;
for(unsigned x = 0; x < img.GetWidth(); ++x)
{
for(unsigned y = 0; y < img.GetHeight(); ++y)
{
std::complex<double> i = sqrt(std::complex<double>(-1));
std::complex<double> theta = 2 * M_PI * (((u * x) / img.GetWidth()) + ((v * y) / img.GetHeight()));
sum += std::complex<double>(imgData[x][y]._red) * cos(theta) + (-i * sin(theta));
//sum += std::complex<double>(std::complex<double>(imgData[x][y]._red) * pow(EULER, -i * theta));
}
}
dftData[u][v] = (sum.imag() / (img.GetWidth() * img.GetHeight()));
}
}
img = dft;
}
I have a few test images i'm testing this with and i'm either getting like an all black image or like, an all gray image.
I've also tried the sum of e^(-i*2*PI*(x*u*width + y*v*height) * 1/width * height which gets the same result as expected although it's still not the desiered output.
I've also tried the sum.real() number and that doesn't look right either
if anyone has any tips or can point me in the right direction, that'd be great, at this point, i just keep trying different things and checking the output until I get what I should be getting.
thanks.
I think that there can be a problem during the multiplication with the complex term. The line:
sum += std::complex<double>(imgData[x][y]._red) * cos(theta) + (-i * sin(theta));
should be:
sum += std::complex<double>(imgData[x][y]._red) * ( cos(theta) + -i * sin(theta));
Moreover, while calculating theta you need to use double precision:
std::complex<double> theta = 2 * M_PI * ((((double)u * x) / (double)(img.GetWidth())) + (((double)v * y) / (double)(img.GetHeight())));