How do reverse perspective with OpenGL? - c++

I'm looking for a way to do reverse perspective with OpenGL and C++. For now, I'm using glFrustum to have a classic perspective, but I would like to know if a reverse pespective as presented here (https://en.wikipedia.org/wiki/Reverse_perspective and below) is possible ? If not, is there any other way with OpenGL to do this ?

This question really intrigued me. I'm not entirely convinced that what I'm going to refer to now as 'Byzantine perspective' can be accommodated by a transform analogous to that provided by (pre-core profile) glFrustum. I have done some work on it though, inspired by derivations in Computer Graphics: Principles and Practice (2nd Ed), and the formulation for the OpenGL CCS / NDCS.
Unfortunately, the original S.O. site doesn't allow for embedded LaTeX, so the matrices will not be pretty. Consider this answer a work in progress
So far, I've derived the matrix transform resulting in what is sometimes referred to as the 'normalized frustum'. The far plane at Z = -1, the near plane at Z = - N / F, and the R, L, T, B planes having unit slope. (this would be very clear given a good diagram)
[ 2N / (R - L) 0 (R + L) / (R - L) 0 ]
[ 0 2N / (T - B) (T + B) / (T - B) 0 ]
[ 0 0 1 0 ]
[ 0 0 0 F ]
Call this matrix: [F.p]. For any point: P = (x, y, - N, 1)^T on the near plane, it's easy to demonstrate that the transformed homogeneous point lies on the Z = - N / F plane. (Note: ^T is the 'transpose' operator to make it clear that this is in fact a column vector.)
Likewise, given a point: P = (x, y, - F, 1)^T on the far plane, the transformed homogeneous point lines on the Z = -1 plane.
A Byzantine perspective requires another constraint - we'll use the variable D, where Z = - D is a point analogous to the 'eye' at Z = 0, or the 'projection reference point' (PRP).
As you have already gathered from the image you've provided, parallel lines converge at Z = - D, rather than at the 'eye'. You don't want the image from the the point of convergence however. You want to visualise the effect from the 'front'. The question is - can we construct an OpenGL matrix similar to that provided by glFrustum which yields a Byzantine perspective? And can it be made to fit within a GL pipeline?
What I've derived so far, is the 'normalized Byzantine frustum'. Again, the far plane is at Z = -1, the near plane at Z = - N / F, and R, L, T, B planes have unit slope - albeit the negative slope of the regular frustum. (again a clear picture would be worth a thousand words here)
[ 2N / (R - L) 0 (R + L) / (R - L) 0 ]
[ 0 2N / (T - B) (T + B) / (T - B) 0 ]
[ 0 0 N / F 0 ]
[ 0 0 0 N ]
Call this matrix: [F.b]. The X,Y coordinate transforms are identical, but the Z,W component transforms differ. This is somewhat intuitive given that this is, in a sense, a 'reverse' of the orthodox view volume.
Again, given a point: P = (x, y, - N, 1)^T on the near plane, the transformed homogeneous point lies on the Z = - N / F plane, and the homogeneous transform of a point: P = (x, y, - F, 1)^T on the far plane lies on the Z = -1 plane.
Given the similarities in the matrices, and the fact that only simple perspective transforms (and a few trivial scaling and translation matrices) are required to yield parallel projection matrices that conform to OpenGL clip coordinate space (CCS), and its NDCS projection, it seems likely that an OpenGL 'Byzantine' projection could be made to work. I just need more time to work on it...

Related

OpenGL sutherland-hodgman polygon clipping algorithm in homogeneous coordinates (4D, CCS)

I have two questions. (I marked 1, 2 below)
In OpenGl, the clipping is done by sutherland-hodgman.
However, I wonder how to work sutherland-hodgman algorithm in homogeneous system (4D)
I made a situation.
In VCS, there is a line, R= (0, 3, -2, 1), S = (0, 0, 1, 1) (End points of the line)
And a frustum is right = 1, left = -1, near = 1, far = 3, top = 4, bottom = -4
Therefore, the projection matrix P is
1 0 0 0
0 1/4 0 0
0 0 -2 -3
0 0 -1 0
If we calculate the line with the P, then the each end points is like that
R' = (0, 3/4, 1, 2), S' = (0, 0, -5, -1)
I know that perspective division should not be done now, because if we do perspective division, the clipping result is not correct.
Here I am curious. What makes a correct clipping because we did not just do perspective division. What mathematical properties are here?
How to calculate the clipping result in above situation?
(The fact that two intersections occur in the w-y coordinate system makes me confused. I thought the result line is one, not divided two parts)
I'm not quite sure whether you understood the sutherland-hodgman algorithm correctly (or at least I didn't get your example). Thus I will prove here, that it doesn't make any difference whether clipping happens before or after the perspective divide. The proof is only shown for one plane (clipping has to be done against all 6 planes), since applying multiple such clipping operations after each other makes not difference here.
Let's assume we have two points (as you described) R' and S' in clip space. And we have a clipping plane P given in hessian normal form [n, p] (if we take the left plane this is [1,0,0,1]).
If we would be calculating in pure 3d space (R^3), then checking whether a line crosses this plane would be done by calculating the signed distance of both points to the plane and checking if the sign is different. The signed distance for a point X = [x/w,y/w,z/w] is given by
D = dot(n, X) + p
Let's write down the actual equation we have (including the perspective divide):
d = n_x * x/w + n_y * y/w + n_z * z/w + p
In order to find the exact intersection point, we would, again in R^3 space, calculate for both points (A = R'/R'w, B = S'/S'w) the distance to the plane (da, db) and perform a linear interpolation (I will only write the equations for the x-coordinate here since y and z are working similar):
x = A_x * (1 - da/(da - db)) + A_y * (da/(da-db))
x = R'x/R'w * (1 - da/(da - db)) + S'x/S'w * (da/(da-db))
And w = 1 (since we interpolate between two points both having w = 1)
Now we already know from the previous discussion, that clipping has to happen before the perspective divide, thus we have to adapt this equation. This means, that for each point, the clipping cube has a different scaling w. Lt's see what happens when we try to perform the the same operations in P^3 (before the perspective divide):
First, we "revert" the perspective divide to get to X=[x,y,z,w] for which the distance to the plane is given by
d = n_x * x/w + n_y * y/w + n_z * z/w + p
d = (n_x * x + n_y * y + n_z * z) / w + p
d * w = n_x * x + n_y * y + n_z * z + p * w
d * w = dot([n, p], [x,y,z,w])
d * w = dot(P, X)
Since we are only interested in the sign of the whole calculation, which we haven't changed by our operations, we can compare the D*ws and get the same inside-out result as in R^3.
For the two points R' and S', the calculated distances in P^3 are dr = da * R'w and ds = db * S'w. When we now use the same interpolation equation as above but for R' and S' we get for x:
x' = R'x * (1 - (da * R'w)/(da * R'w - db * S'w)) + S'x * (da * R'w)/(da * R'w - db * S'w)
On the first view this looks rather different from the result we got in R^3, but since we are still in P^3 (thus x'), we still have to do the perspective divide on the result (this is allowed here, since the interpolated point will always be at the border of the view-frustum and thus dividing by w will not introduce any problems). The interpolated w component is given as:
w' = R'w * (1 - (da * R'w)/(da * R'w - db * S'w)) + S'w * (da * R'w)/(da * R'w - db * S'w)
And when calculating x/w we get
x = x' / w';
x = R'x/R'w * (1 - da/(da - db)) + S'x/S'w * (da/(da-db))
which is exactly the same result as when calculating everything in R^3.
Conclusion: The interpolation gives the same result, no matter if we perform the perspective divide first and interpolation afterwards or interpolating first and dividing then. But with the second variant we avoid the problem with points flipping from behind the viewer to the front since we are only dividing points that are guaranteed to be inside (or on the border) of the viewing frustum.
You speak of polygon clipping in a homogeneous system (4D) but from your question I assume that you actually mean homogeneous coordinates, which makes a lot more sense. (There are many possible homogenous systems.)
Ok, so you want to use "4D" coordinates, which are really "3D coordinates and a w term". The w term represents (projection transformations) the projective term that partially relates the screen-space coordinate to the original world space position. Assuming that you are NOT interested in projective space clipping, this term is not relevant.
I'm assuming this because the clipping box you describe is axis-aligned on planes in 3D. Even if it was rotated or scaled in 3D space, each of the planes would still be a 3D plane, the 4th coordinate always being '1'.
So how to clip:
clip line segment L against each of the planes of the clipping box, i.e. 6 clipping planes in total (you describe the normals of each clipping plane aptly), and see if any intersection point v is shared by the line and the tested plane P so that
v lies on the line segment (i.e. a t between 0 and 1)
v lies within the bounds of the plane P (i.e. the coordinate should not lie beyond any of the adjacent planes. Since you are using axis-aligned clipping planes, this is easy to check.)
Any of these intersections between a (3D + w) line and one of the 3D planes occurs in 3D, and intersection points have to be a 3D coordinates. You can extend each of these coordinates with a 4th w coordinate into a "4D" coordinate so that you can further transform them using 4x4 matrices for view and projection processing.

Understand Translation Matrix in OpenGL

Assume we want to translate a point p(1, 2, 3, w=1) with a vector v(a, b, c, w=0) to a new point p'
Note: w=0 represents a vector and w=1 represent a point in OpenGL, please correct me if I'm wrong.
In Affine transformation definition, we have:
p + v = p'
=> p(1, 2, 3, 1) + v(a, b, c, 0) = p(1 + a, 2 + b, 3 + c, 1)
=> point + vector = point (everything works as expected)
In OpenGL, the translation matrix is as following:
1 0 0 a
0 1 0 b
0 0 1 c
0 0 0 1
I assume (a, b, c, 1) is the vector from Affine transformation definition
why we have w=1, but not w=0 such as
1 0 0 a
0 1 0 b
0 0 1 c
0 0 0 0
Note: w=0 represents a vector and w=1 represent a point in OpenGL, please correct me if I'm wrong.
You are wrong. First of all, this hasn't really anything to do with OpenGL. This is about homogenous coordinates, which is a purely mathematical concept. It works by embedding an n-dimensional vector space into an n+1 dimensional vector space. In the 3D case, we use 4D homogenous coordinates, with the definition that the homogenous vector (x, y, z, w) represents the 3D point (x/w, y/w, z/w) in cartesian coordinates.
As a result, for any w != 0, you get a certain finite point, and for w = 0, you are discribing an infinitely far away point into a specific direction. This means that the homogenous coordinates are more powerful in the regard that they can actually describe infinitely far away points with finite coordinates (which is something which comes very handy for perspective transformations, where infinitely far away points are mapped to finite points, and vice versa).
You can, as a shortcut, imagine (x,y,z,0) as some direction vector. But for a point, it is not just w=1, but any w value unequal 0. Conceptually, this means that any cartesian 3D point is represented by a line in homogenous space (we did go up one dimension, so this actually makes sense).
I assume (a, b, c, 1) is the vector from Affine transformation definition why we have w=1, but not w=0?
Your assumption is wrong. One thing about homogenous coordinates is that we do not apply a translation in the 4D space. We get the effect of the translation in the 3D space by actually doing a shearing operation in 4D space.
So what we really want to do in homogenous space is
(x + w *a, y + w*b, z+ w*c, w)
since the 3D interpretation of the resulting vector will then be
(x + w*a) / w == x/w + a
(y + w*b) / w == y/w + b
(z + w*c) / w == z/w + c
which will represent the translation that we were after.
So to try to make this even more clear:
What you wrote in your question:
p(1, 2, 3, 1) + v(a, b, c, 0) = p(1 + a, 2 + b, 3 + c, 1)
Is explicitely not what we want to do. What you describe is an affine translation with respect to the 4D vector space.
But what we actually want is a translation in the 3D cartesian coordinates, so
(1, 2, 3) + (a, b, c) = (1 + a, 2 + b, 3 + c)
Applying your formula would actually mean doing a translation in the homogenous space, which would have the effect of doing a translation which is scaled by the w coordinate, while the formula I gave will always translate the point by (a,b,c), no matter what w we chose for the point.
This is of course not true if we chose w=0. Then, we will get no change at all, which is also correct because a translation will never change directions - your formula would change the direction. Your formula is correct only for w=1, which is aonly a special case. But the key point here is that we are not doing a vector addition after all, but a matrix * vector multiplication. And homogenous coordinates just allow us (among other, more powerful things), to represent a translation via matrix multiplication. But this does not mean that we can just interpret the last column as a translation vector as if we did vector addition.
Simple Answer
The reason is the way how matrix multiplications work. If you multiply a matrix by a vector then the w-component of the result is the inner product of the 4th line of the matrix with the vector. After applying the transformation, a point should still be a point and a direction should be a direction. If you would set that to a 0-vector, the result will always be 0 and thus, the resulting vector will have changed from position (w=1) to direction (w=0).
More detailed answer
The definition of a affine transformation is:
x' = A * x + t,
where is a A is a linear map and t a translation. Traditionally, linear maps are written by mathematicians in matrix form. Note, that t is here, similar to x, a 3-dimensional vector. It would now be cumbersome (and less general, thinking of projective mappings), if we would always have to handle the linear mapping matrix and the translation vector. This can be solved by introducing an additional dimension to the mapping, the so-called homogeneous coordinate, which allows us to store the linear mapping as well as the translation vector in a combined 4x4 matrix. This is called augmented matrix and by definition,
x' A | t x
[ ] = [ | ] * [ ]
1 0 | 1 1
It should also be noted, that affine transformations can now be combined very easily by just multiplying there augmented matrices, which would be hard to do in matrix plus vector notation.
One should also note, that the bottom-right 1 is not part of the translation vector, which is still 3-dimensional, but of the matrix augmentation.
You might also want to read the section about "Augmented matrix" here: https://en.wikipedia.org/wiki/Affine_transformation#Augmented_matrix

How to fit a plane to a 3D point cloud?

I want to fit a plane to a 3D point cloud. I use a RANSAC approach, where I sample several points from the point cloud, calculate the plane, and store the plane with the smallest error. The error is the distance between the points and the plane. I want to do this in C++, using Eigen.
So far, I sample points from the point cloud and center the data. Now, I need to fit the plane to the samples points. I know I need to solve Mx = 0, but how do I do this? So far I have M (my samples), I want to know x (the plane) and this fit needs to be as close to 0 as possible.
I have no idea where to continue from here. All I have are my sampled points and I need more data.
From you question I assume that you are familiar with the Ransac algorithm, so I will spare you of lengthy talks.
In a first step, you sample three random points. You can use the Random class for that but picking them not truly random usually gives better results. To those points, you can simply fit a plane using Hyperplane::Through.
In the second step, you repetitively cross out some points with large Hyperplane::absDistance and perform a least-squares fit on the remaining ones. It may look like this:
Vector3f mu = mean(points);
Matrix3f covar = covariance(points, mu);
Vector3 normal = smallest_eigenvector(covar);
JacobiSVD<Matrix3f> svd(covariance, ComputeFullU);
Vector3f normal = svd.matrixU().col(2);
Hyperplane<float, 3> result(normal, mu);
Unfortunately, the functions mean and covariance are not built-in, but they are rather straightforward to code.
Recall that the equation for a plane passing through origin is Ax + By + Cz = 0, where (x, y, z) can be any point on the plane and (A, B, C) is the normal vector perpendicular to this plane.
The equation for a general plane (that may or may not pass through origin) is Ax + By + Cz + D = 0, where the additional coefficient D represents how far the plane is away from the origin, along the direction of the normal vector of the plane. [Note that in this equation (A, B, C) forms a unit normal vector.]
Now, we can apply a trick here and fit the plane using only provided point coordinates. Divide both sides by D and rearrange this term to the right-hand side. This leads to A/D x + B/D y + C/D z = -1. [Note that in this equation (A/D, B/D, C/D) forms a normal vector with length 1/D.]
We can set up a system of linear equations accordingly, and then solve it by an Eigen solver as follows.
// Example for 5 points
Eigen::Matrix<double, 5, 3> matA; // row: 5 points; column: xyz coordinates
Eigen::Matrix<double, 5, 1> matB = -1 * Eigen::Matrix<double, 5, 1>::Ones();
// Find the plane normal
Eigen::Vector3d normal = matA.colPivHouseholderQr().solve(matB);
// Check if the fitting is healthy
double D = 1 / normal.norm();
normal.normalize(); // normal is a unit vector from now on
bool planeValid = true;
for (int i = 0; i < 5; ++i) { // compare Ax + By + Cz + D with 0.2 (ideally Ax + By + Cz + D = 0)
if ( fabs( normal(0)*matA(i, 0) + normal(1)*matA(i, 1) + normal(2)*matA(i, 2) + D) > 0.2) {
planeValid = false; // 0.2 is an experimental threshold; can be tuned
break;
}
}
This method is equivalent to the typical SVD-based method, but much faster. It is suitable for use when points are known to be roughly in a plane shape. However, the SVD-based method is more numerically stable (when the plane is far far away from origin) and robust to outliers.

Why do we need perspective division?

I know perspective division is done by dividing x,y, and z by w, to get normalized device coordinates. But I am not able to understand the purpose of doing that. Also, does it have anything to do with clipping?
Some details that complement the general answers:
The idea is to project a point (x,y,z) on screen to have (xs,ys,d).
The next figure shows this for the y coordinate.
We know from school that
tan(alpha) = ys / d = y / z
This means that the projection is computed as
ys = d*y/z = y /w
w = z / d
This is enough to apply a projection.
However in OpenGL, you want (xs,ys,zs) to be normalized device coordinates in [-1,1] and yes this has something to do with clipping.
The extrema values for (xs,ys,zs) represent the unit cube and everything outside it will be clipped.
So a projection matrix usually takes into consideration the clipping limits (Frustum) to make a single transformation that, with the perspective division, simultaneously apply a projection and transform the projected coordinates along with the z to normalized device coordinates.
I mean why do we need that?
In layman terms: To make perspective distortion work. In a perspective projection matrix, the Z coordinate gets "mixed" into the W output component. So the smaller the value of the Z coordinate, i.e. the closer to the origin, the more things get scaled up, i.e. bigger on screen.
To really distill it to the basic concept, and why the op is division (instead of e.g. square root or some such), consider that an object twice as far should appear with dimensions exactly one half as large. Obtain 1/2 from 2 by... division.
There are many geometric ways to arrive at the same conclusion. A diagram serves as visual proof for this, really.
Dividing x, y, z by w is a "trick" you do with "homogeneous coordinates". To convert a R⁴ vector back to R³ by dividing by the 4th component (or w component as you said). A process called dehomogenizing.
Why you use homogeneous coordinate? That topic is a little bit more involved, I try to explain. I hope I do it justice.
However I will use the x1, x2, x3, x4 as the components of a vector instead of x, y, z, w:
Consider a 3x3 Matrix M and column vectors x, a, b, c of R³. x=(x1, x2, x3) and x1,x2,x3 being scalars or components of x.
With the 3x3 Matrix can do all linear transformations on a vector x you could do with the linear combination:
x' = x1*a + x2*b + x3*c (1).
(x' is the transformed vector that holds the result of transforming x).
Khan Academy on his Course Linear Algebra has a section explaining the fact that every linear transformation can be written as a matrix product with a vector.
You can try this out for example by putting the column vectors a, b, c in the columns of the Matrix M = [ a b c ].
So with the matrix product you essentially get the upper linear combination:
x' = M * x = [a b c] * x = a*x1 + b*x2 + c*x3 (2).
However this operation only accounts for rotation, scaling and shearing transformations. The origin (0, 0, 0) will always stay at (0, 0, 0).
For this you need another kind of transformation named "translation" (moving a vector or adding a vector to the vector).
Consider the translation column vector t = (t1, t2, t3) and the linear combination
x' = x1*a + x2*b + x3*c + t (3).
With this linear combination you can translate, rotate, scale and shear a vector. As you can see this Linear Combination does actually move the origin vector (0, 0, 0) to (0+t1, 0+t2, 0+t3).
However you can't put this translation into a 3x3 Matrix.
So what Graphics Programmers or Mathematicians came up with is adding another dimension to the Matrix and Vectors like this:
M is 4x4 Matrix, x~ vector in R⁴ with x~=(x1, x2, x3, x4). a, b, c, t also being column vectors of R⁴ (last components of a,b,c being 0 and last component for t being 1 - I keep the names the same to later show the similarity between homogeneous linear combination and (3) ). x~ is the homogeneous coordinate of x.
Now watch what happens if we take a vector x of R³ and put it into x~ of R⁴.
This vector will be in homogeneous coordinates in R⁴ x~=(x1, x2, x3, 1). The last component simply being 1 if it is a point and 0 if it's simply a direction (which couldn't be translated anyway).
So you have the linear combination:
x~' = M * x = [a b c t] * x = x1*a + x2*b + x3*c + x4*t (4).
(x~' is the result vector when transforming the homogeneous vector x~)
Since we took a vector from R³ and put it into R⁴ our x4 component is 1 we have:
x~' = x1*a + x2*b + x3*c + 1*t
<=> x~' = x1*a + x2*b + x3*c + t (5).
which is exactly the upper linear transformation (3) with the translation t. This is called an affine transformation (linear transf. + translation).
So with a 3x3 Matrix and a vector of R³ you couldn't do translations. However adding another dimension having a vector in R⁴ and a Matrix in R^4x4 you actually can do it.
However when you want to return to R³ you have to divide the first components with the last one. This is called "dehomogenizing". Which is the the x4 component or in your variable naming the w-component. So x is the original coordinate in R³. Be x~ in R⁴ and the homogenized vector of x. And x' in R³ of x~.
x' = (x1/x4, x2/x4, x3/x4) (6).
Then x' is the dehomogenized vector of the vector x~.
Coming back to perspective division:
(I will leave it out, because many here have explained the divide by z already. It's because of the relationship of a right triangle, being similar which leads you to simplify that with a given focal length f a z values with y coordinate lands at y' = f*y/z. Also since you stated [I hope I didn't misread that you already know why this is done I simply leave a link to a YT-Video here, I find it very well explained on the course lecture CMU 15-462/662 ).
When dehomogenizing the division by the w-component is a pretty handy property when returning to R³. When you apply homogeneous perspective Matrix of 4x4 on a vector you simply put the z component into the w component and let the dehomogenizing process (as in (6) ) perform the perspective divide. So you can setup the w-Component in a way that the division by w divides by z and also maps the values from 0 to 1 (basically you put the range of z-near to z-far values into a range floating points are precise at).
This is also described by Ravi Ramamoorthi in his Course CSE167 when he explains how to set up the perspective projection matrix.
I hope this helped to understand the rational of putting z into the w component. Sorry for my horrible formatting and lengthy text. Yet I hope it helped more than it confused.
Best of luck!
Actually, via standard notational convention from a 4x4 perspective matrix with sightline along a 'z' direction, 'w' differs by 1 from the distance ratio. Also that ratio, though interpreted correctly, is normally expressed as -z/d where 'z' is negative (therefore producing the correct ratio) because, again, in common notational convention, the camera is looking in the negative 'z' direction.
The reason for the offset by 1 needs to be explained. Many references put the origin at the image plane rather than the center of projection. With that convention (again with the camera looking along the negative 'z' direction) the distance labeled 'z' in the similar triangles diagram is thereby replaced by (d-z). Then substituting that for 'z' the expression for 'w' becomes, instead of 'z/d', (d-z)/d = [1-z/d]. To some these conventions may seem unorthodox but they are quite popular among analysts.

How to project a point onto a plane in 3D?

I have a 3D point (point_x,point_y,point_z) and I want to project it onto a 2D plane in 3D space which (the plane) is defined by a point coordinates (orig_x,orig_y,orig_z) and a unary perpendicular vector (normal_dx,normal_dy,normal_dz).
How should I handle this?
Make a vector from your orig point to the point of interest:
v = point-orig (in each dimension);
Take the dot product of that vector with the unit normal vector n:
dist = vx*nx + vy*ny + vz*nz; dist = scalar distance from point to plane along the normal
Multiply the unit normal vector by the distance, and subtract that vector from your point.
projected_point = point - dist*normal;
Edit with picture:
I've modified your picture a bit. Red is v. dist is the length of blue and green, equal to v dot normal. Blue is normal*dist. Green is the same vector as blue, they're just plotted in different places. To find planar_xyz, start from point and subtract the green vector.
This is really easy, all you have to do is find the perpendicular (abbr here |_) distance from the point P to the plane, then translate P back by the perpendicular distance in the direction of the plane normal. The result is the translated P sits in the plane.
Taking an easy example (that we can verify by inspection) :
Set n=(0,1,0), and P=(10,20,-5).
The projected point should be (10,10,-5). You can see by inspection that Pproj is 10 units perpendicular from the plane, and if it were in the plane, it would have y=10.
So how do we find this analytically?
The plane equation is Ax+By+Cz+d=0. What this equation means is "in order for a point (x,y,z) to be in the plane, it must satisfy Ax+By+Cz+d=0".
What is the Ax+By+Cz+d=0 equation for the plane drawn above?
The plane has normal n=(0,1,0). The d is found simply by using a test point already in the plane:
(0)x + (1)y + (0)z + d = 0
The point (0,10,0) is in the plane. Plugging in above, we find, d=-10. The plane equation is then 0x + 1y + 0z - 10 = 0 (if you simplify, you get y=10).
A nice interpretation of d is it speaks of the perpendicular distance you would need to translate the plane along its normal to have the plane pass through the origin.
Anyway, once we have d, we can find the |_ distance of any point to the plane by the following equation:
There are 3 possible classes of results for |_ distance to plane:
0: ON PLANE EXACTLY (almost never happens with floating point inaccuracy issues)
+1: >0: IN FRONT of plane (on normal side)
-1: <0: BEHIND plane (ON OPPOSITE SIDE OF NORMAL)
Anyway,
Which you can verify as correct by inspection in the diagram above
This answer is an addition to two existing answers.
I aim to show how the explanations by #tmpearce and #bobobobo boil down to the same thing, while at the same time providing quick answers to those who are merely interested in copying the equation best suited for their situation.
Method for planes defined by normal n and point o
This method was explained in the answer by #tmpearce.
Given a point-normal definition of a plane with normal n and point o on the plane, a point p', being the point on the plane closest to the given point p, can be found by:
p' = p - (n ⋅ (p - o)) × n
Method for planes defined by normal n and scalar d
This method was explained in the answer by #bobobobo.
Given a plane defined by normal n and scalar d, a point p', being the point on the plane closest to the given point p, can be found by:
p' = p - (n ⋅ p + d) × n
If instead you've got a point-normal definition of a plane (the plane is defined by normal n and point o on the plane) #bobobobo suggests to find d:
d = -n ⋅ o
and insert this into equation 2. This yields:
p' = p - (n ⋅ p - n ⋅ o) × n
A note about the difference
Take a closer look at equations 1 and 4. By comparing them you'll see that equation 1 uses n ⋅ (p - o) where equation 2 uses n ⋅ p - n ⋅ o. That's actually two ways of writing down the same thing:
n ⋅ (p - o) = n ⋅ p - n ⋅ o = n ⋅ p + d
One may thus choose to interpret the scalar d as if it were a 'pre-calculation'. I'll explain: if a plane's n and o are known, but o is only used to calculate n ⋅ (p - o),
we may as well define the plane by n and d and calculate n ⋅ p + d instead, because we've just seen that that's the same thing.
Additionally for programming using d has two advantages:
Finding p' now is a simpler calculation, especially for computers. Compare:
using n and o: 3 subtractions + 3 multiplications + 2 additions
using n and d: 0 subtractions + 3 multiplications + 3 additions.
Using d limits the definition of a plane to only 4 real numbers (3 for n + 1 for d), instead of 6 (3 for n + 3 for o). This saves ⅓ memory.
It's not sufficient to provide only the plane origin and the normal vector. This does define the 3d plane, however this does not define the coordinate system on the plane.
Think that you may rotate your plane around the normal vector with regard to its origin (i.e. put the normal vector at the origin and "rotate").
You may however find the distance of the projected point to the origin (which is obviously invariant to rotation).
Subtract the origin from the 3d point. Then do a cross product with the normal direction. If your normal vector is normalized - the resulting vector's length equals to the needed value.
EDIT
A complete answer would need an extra parameter. Say, you supply also the vector that denotes the x-axis on your plane.
So we have vectors n and x. Assume they're normalized.
The origin is denoted by O, your 3D point is p.
Then your point is projected by the following:
x = (p - O) dot x
y = (p - O) dot (n cross x)
Let V = (orig_x,orig_y,orig_z) - (point_x,point_y,point_z)
N = (normal_dx,normal_dy,normal_dz)
Let d = V.dotproduct(N);
Projected point P = V + d.N
I think you should slightly change the way you describe the plane. Indeed, the best way to describe the plane is via a vector n and a scalar c
(x, n) = c
The (absolute value of the) constant c is the distance of the plane from the origin, and is equal to (P, n), where P is any point on the plane.
So, let P be your orig point and A' be the projection of a new point A onto the plane. What you need to do is find a such that A' = A - a*n satisfies the equation of the plane, that is
(A - a*n, n) = (P, n)
Solving for a, you find that
a = (A, n) - (P, n) = (A, n) - c
which gives
A' = A - [(A, n) - c]n
Using your names, this reads
c = orig_x*normal_dx + orig_y*normal_dy+orig_z*normal_dz;
a = point_x*normal_dx + point_y*normal_dy + point_z*normal_dz - c;
planar_x = point_x - a*normal_dx;
planar_y = point_y - a*normal_dy;
planar_z = point_z - a*normal_dz;
Note: your code would save one scalar product if instead of the orig point P you store c=(P, n), which means basically 25% less flops for each projection (in case this routine is used many times in your code).
Let r be the point to project and p be the result of the projection. Let c be any point on the plane and let n be a normal to the plane (not necessarily normalised). Write p = r + m d for some scalar m which will be seen to be indeterminate if their is no solution.
Since (p - c).n = 0 because all points on the plane satisfy this restriction one has (r - c).n + m(d . n) = 0 and so m = [(c - r).n]/[d.n] where the dot product (.) is used. But if d.n = 0 there is no solution. For example if d and n are perpendicular to one another no solution is available.