OpenGL custom rendering pipeline: Perspective matrix - opengl

I am attempting to work in LWJGL to display a simple quad using my own matrices. I've been looking around for awhile and have found a few perspective matrix implementations, these two in particular:
[cot(fov/2)/a 0 0 0]
[0 cot(fov/2) 0 0]
[0 0 -f/(f-n) -1]
[0 0 -f*n/(f-n) 0]
and:
[cot(fov/2)/a 0 0 0]
[0 cot(fov/2) 0 0]
[0 0 -(f+n)/(f-n) -1]
[0 0 -(2*f*n)/(f-n) 0]
Both of these provide the same effect, as expected (got them from here and here, respectively). The issue is in my understanding of how multiplying this by the modelview matrix, then a vertex, then dividing each x, y, and z value by its w value gives a screen coordinate. More specifically, if I multiply either of these by the modelview matrix then by a vertex (10, 10, 0, 1), it gives a w=0. That in itself is a big smack in the face. I conclude either the matrices are wrong, or I am missing something completely. In my actual test program, the vertices don't even end up on screen even though the camera position at (0,0,0) and no rotation would make it so. I even have tried many different z values, positive and negative, to see if it was just a clipping plane. Am I missing something here?
EDIT: After a lot of checking over, I've narrowed down the problem I am facing. The biggest issue is that the z-axis does not appear to be remapped to the range I specify (n to f). Any object just zooms in or out a little bit when I translate it along the z-axis then pops out of existence as it moves past the range [-1, 1]. I think this is also making me more confused. I set my far plane to 100 and my near to 0.1, and it behaves like anything but.

Both of these provide the same effect, as expected
While the second projection matrix form is very standard, the first one gives a different effect. If you have z==1 and w==0, the projection will be:
Matrix 1: -f/(f-n) / -f*n/(f-n) = f / f*n = 1 / n
Matrix 2: -(f+n)/(f-n) / -(2*f*n)/(f-n) = (f+n) / (2*f2n)
The result is clearly different. You should always use the second form.
if I multiply either of these by the modelview matrix then by a vertex
(10, 10, 0, 1), it gives a w=0. That in itself is a big smack in the
face
For a focal length d the projection is computed as (ignoring aspect ratio):
x'= d*x/z = x / w
y'= d*y/z = y / w
where
w = z / d
If you have z==0 this means that you want to project a point that is already in the eye and only points beyond d are visible. In practice this point will be clipped because z is not within the range n (near) and f (far) (n is expected as a positive constant)

Related

OpenGL sutherland-hodgman polygon clipping algorithm in homogeneous coordinates (4D, CCS)

I have two questions. (I marked 1, 2 below)
In OpenGl, the clipping is done by sutherland-hodgman.
However, I wonder how to work sutherland-hodgman algorithm in homogeneous system (4D)
I made a situation.
In VCS, there is a line, R= (0, 3, -2, 1), S = (0, 0, 1, 1) (End points of the line)
And a frustum is right = 1, left = -1, near = 1, far = 3, top = 4, bottom = -4
Therefore, the projection matrix P is
1 0 0 0
0 1/4 0 0
0 0 -2 -3
0 0 -1 0
If we calculate the line with the P, then the each end points is like that
R' = (0, 3/4, 1, 2), S' = (0, 0, -5, -1)
I know that perspective division should not be done now, because if we do perspective division, the clipping result is not correct.
Here I am curious. What makes a correct clipping because we did not just do perspective division. What mathematical properties are here?
How to calculate the clipping result in above situation?
(The fact that two intersections occur in the w-y coordinate system makes me confused. I thought the result line is one, not divided two parts)
I'm not quite sure whether you understood the sutherland-hodgman algorithm correctly (or at least I didn't get your example). Thus I will prove here, that it doesn't make any difference whether clipping happens before or after the perspective divide. The proof is only shown for one plane (clipping has to be done against all 6 planes), since applying multiple such clipping operations after each other makes not difference here.
Let's assume we have two points (as you described) R' and S' in clip space. And we have a clipping plane P given in hessian normal form [n, p] (if we take the left plane this is [1,0,0,1]).
If we would be calculating in pure 3d space (R^3), then checking whether a line crosses this plane would be done by calculating the signed distance of both points to the plane and checking if the sign is different. The signed distance for a point X = [x/w,y/w,z/w] is given by
D = dot(n, X) + p
Let's write down the actual equation we have (including the perspective divide):
d = n_x * x/w + n_y * y/w + n_z * z/w + p
In order to find the exact intersection point, we would, again in R^3 space, calculate for both points (A = R'/R'w, B = S'/S'w) the distance to the plane (da, db) and perform a linear interpolation (I will only write the equations for the x-coordinate here since y and z are working similar):
x = A_x * (1 - da/(da - db)) + A_y * (da/(da-db))
x = R'x/R'w * (1 - da/(da - db)) + S'x/S'w * (da/(da-db))
And w = 1 (since we interpolate between two points both having w = 1)
Now we already know from the previous discussion, that clipping has to happen before the perspective divide, thus we have to adapt this equation. This means, that for each point, the clipping cube has a different scaling w. Lt's see what happens when we try to perform the the same operations in P^3 (before the perspective divide):
First, we "revert" the perspective divide to get to X=[x,y,z,w] for which the distance to the plane is given by
d = n_x * x/w + n_y * y/w + n_z * z/w + p
d = (n_x * x + n_y * y + n_z * z) / w + p
d * w = n_x * x + n_y * y + n_z * z + p * w
d * w = dot([n, p], [x,y,z,w])
d * w = dot(P, X)
Since we are only interested in the sign of the whole calculation, which we haven't changed by our operations, we can compare the D*ws and get the same inside-out result as in R^3.
For the two points R' and S', the calculated distances in P^3 are dr = da * R'w and ds = db * S'w. When we now use the same interpolation equation as above but for R' and S' we get for x:
x' = R'x * (1 - (da * R'w)/(da * R'w - db * S'w)) + S'x * (da * R'w)/(da * R'w - db * S'w)
On the first view this looks rather different from the result we got in R^3, but since we are still in P^3 (thus x'), we still have to do the perspective divide on the result (this is allowed here, since the interpolated point will always be at the border of the view-frustum and thus dividing by w will not introduce any problems). The interpolated w component is given as:
w' = R'w * (1 - (da * R'w)/(da * R'w - db * S'w)) + S'w * (da * R'w)/(da * R'w - db * S'w)
And when calculating x/w we get
x = x' / w';
x = R'x/R'w * (1 - da/(da - db)) + S'x/S'w * (da/(da-db))
which is exactly the same result as when calculating everything in R^3.
Conclusion: The interpolation gives the same result, no matter if we perform the perspective divide first and interpolation afterwards or interpolating first and dividing then. But with the second variant we avoid the problem with points flipping from behind the viewer to the front since we are only dividing points that are guaranteed to be inside (or on the border) of the viewing frustum.
You speak of polygon clipping in a homogeneous system (4D) but from your question I assume that you actually mean homogeneous coordinates, which makes a lot more sense. (There are many possible homogenous systems.)
Ok, so you want to use "4D" coordinates, which are really "3D coordinates and a w term". The w term represents (projection transformations) the projective term that partially relates the screen-space coordinate to the original world space position. Assuming that you are NOT interested in projective space clipping, this term is not relevant.
I'm assuming this because the clipping box you describe is axis-aligned on planes in 3D. Even if it was rotated or scaled in 3D space, each of the planes would still be a 3D plane, the 4th coordinate always being '1'.
So how to clip:
clip line segment L against each of the planes of the clipping box, i.e. 6 clipping planes in total (you describe the normals of each clipping plane aptly), and see if any intersection point v is shared by the line and the tested plane P so that
v lies on the line segment (i.e. a t between 0 and 1)
v lies within the bounds of the plane P (i.e. the coordinate should not lie beyond any of the adjacent planes. Since you are using axis-aligned clipping planes, this is easy to check.)
Any of these intersections between a (3D + w) line and one of the 3D planes occurs in 3D, and intersection points have to be a 3D coordinates. You can extend each of these coordinates with a 4th w coordinate into a "4D" coordinate so that you can further transform them using 4x4 matrices for view and projection processing.

Get camera matrix from OpenGL

I render a 3D mesh model using OpenGL with perspective camera – gluPerspective(fov, aspect, near, far).
Then I use rendered image in a computer vision algorithm.
At some point that algorithm requires camera matrix K (along with several vertices on the model and their corresponding projections) in order to estimate camera position: rotation matrix R and translation vector t. I can estimate R and t by using any algorithm which solves Perspective-n-Point problem.
I construct K from the OpenGL projection matrix (see how here)
K = [fX, 0, pX | 0, fY, pY | 0, 0, 1]
If I want to project a model point 'by hand' I can compute:
X_proj = K*(R*X_model + t)
x_pixel = X_proj[1] / X_proj[3]
y_pixel = X_proj[2] / X_proj[3]
Anyway, I pass this camera matrix in a PnP algorithm and it works just fine.
But then I had to change perspective projection to orthographic one.
As far as I understand when using orthographic projection the camera matrix becomes:
K = [1, 0, 0 | 0, 1, 0 | 0, 0, 0]
So I changed gluPerspective to glOrtho. Following the same way I constructed K from OpenGL projection matrix, and it turned out that fX and fY are not ones but 0.0037371. Is this a scaled orthographic projection or what?
Moreover, in order to project model vertices 'by hand' I managed to do the following:
X_proj = K*(R*X_model + t)
x_pixel = X_proj[1] + width / 2
y_pixel = X_proj[2] + height / 2
Which is not what I expected (that plus width and hight divided by 2 seems strange...). I tried to pass this camera matrix to POSIT algorithm to estimate R and t, and it doesn't converge. :(
So here are my questions:
How to get orthographic camera matrix from OpenGL?
If the way I did it is correct then is it true orthographic? Why POSIT doesn't work?
Orthographic projection will not use the depth to scale down farther points. Though, it will scale the points to fit inside the NDC which means it will scale the values to fit inside the range [-1,1].
This matrix from Wikipedia shows what this means:
So, it is correct to have numbers other than 1.
For your way of computing by hand, I believe it's not scaling back to screen coordinates and that makes it wrong. As I said, the output of projection matrices will be in the range [-1,1], and if you want to get the pixel coordinates, I believe you should do something similar to this:
X_proj = K*(R*X_model + t)
x_pixel = X_proj[1]*width/2 + width / 2
y_pixel = X_proj[2]*height/2 + height / 2
Anyway, I think you'd be better if you used modern OpenGL with libraries like GLM. In this case, you have the exact projection matrices used at hand.

Why does graphics pipeline need mapping to clip coordinates and normalized device coordinates?

On perspective projection, if I use simple projection matrix like:
1 0 0 0
0 1 0 0
0 0 1 0
0 0 1/near 0
, which is just projecting onto the image plane. It can be easily get view space coordinates by discarding and normalizing, I think.
If on orthogonal projection, it even does not need the projection matrix.
But, OpenGL graphics pipeline has the above process, though the perspective projection causes a depth precision error.
Why does it need mapping to clip coordinates and normalized device coordinates?
Added
If I use the above projection matrix,
1 0 0 0
p = ( 0 1 0 0 )
0 0 1 0
0 0 1/n 0
v_eye = (x y z 1)
v_clip = p * v_eye = (x y z z/n)
v_ndc = v_clip / v_clip.w = (nx/z ny/z n 1)
Then, v_ndc can be clipped by discarding values over top, bottom, left, right.
Values over far also can be clipped in the same way before multiplying the projection matrix.
Well, it looks like silly though, I think it's easier than before.
ps. I noticed that the depth buffer can't be written in this way. Then, can't it be written before the projection?
Sorry for silly question and gibberish...
In case of orthographic projections, you are right: The perspective divide is not required, but it des not introduce any error, since it is a division by 1. (A orthographic projection matrix contains always [0, 0, 0, 1] in the last row).
For perspective projection, this is a bit more complex:
Let's look at the simplest perspective projection:
1 0 0 0
P = ( 0 1 0 0 )
0 0 1 0
0 0 1 0
Then a vector v=[x,y,z,1] (in view space) gets projected to
v_p = P * v = [x, y, z, z],
which is in projektive space.
Now the perspectve divide is needed to get the perspectve effect (objects closer to the viewer look larger):
v_ndc = v / v.w = [x'/z y'/z, z'/z, 1]
I don't see how this could be achieved without the perspective divide.
Why does it need mapping to clip coordinates and normalized device coordinates?
The space where the programmer leaves the vertices to the GL to be taken care of is the clip space. It's the 4D homogeneous space where the vertices exist before normalization / perspective division. This division, useful to perform perspective projection, is the mapping needed to transform the vertices from clip space to NDC (3D). Why? Similar triangles.
View Space Point
*
/ |
Proj /- |
Y ^ Plane /-- |
| /-- |
| *-- |y
| /-- | |
| /-- |y' |
| /--- | |
<-----+------------+------------+-------
Z O |
|-----d------| |
|------------z------------|
Perspective projection is where rays from the eye/origin cuts through a projection plane hitting the points present in the space. The point where the ray intersects the plane is the projection of the point hit. Lets say we want to project point P on to the projection plane, where all points have z = d. The projected location of P i.e. P' needs to be found. We know that z' will be d (since projection planes lies there). To find y', we know
y ⁄ z = y' ⁄ z' (similar triangles)
y ⁄ z = y' ⁄ d (z' = d by defn. of proj. plane)
y' = (d * y) ⁄ z
This division by z is called the perspective division. This shows that in perspective projection, objects farther, with larger z, appear smaller and objects closer, will smaller z, appear larger.
Another thing which convenient to perform in clip space is, obviously, clipping. In 4D, clipping is which is just checking if the points lie within a range as opposed to the costlier division.
In case of orthographic projection, the projection isn't a frustum but a cuboid — parallel rays come from infinity and not the origin. Hence for point P = (x, y, z), the Z values are just dropped, giving P' = (x, y). Thus the perspective division does nothing (divides by 1) in this case.

Ways to "invert Z-axis" in shader-based core-profile OpenGL?

In my hobbyist shader-based (non-FFP) GL (3.2+ core) "engine", everything in world-space and model-space is by design "left-handed" (and to stay that way), so X-axis goes from -1 ("left") to 1 ("right"), Y from -1 ("bottom") to 1 ("top") and Z from -1 ("near") to 1 ("far").
Now, by default in OpenGL the NDC-space works the same but the clip space doesn't, from what I gather, here z extends from 1 ("near") to -1 ("far").
At the same time I want to ideally keep using the "kinda-sorta inofficial quasi-standard" matrix functions for lookat and perspective, currently defined as:
func (me *Mat4) Lookat(eyePos, lookTarget, upVec *Vec3) {
l := lookTarget.Sub(eyePos)
l.Normalize()
s := l.Cross(upVec)
s.Normalize()
u := s.Cross(l)
me[0], me[4], me[8], me[12] = s.X, u.X, -l.X, -eyePos.X
me[1], me[5], me[9], me[13] = s.Y, u.Y, -l.Y, -eyePos.Y
me[2], me[6], me[10], me[14] = s.Z, u.Z, -l.Z, -eyePos.Z
me[3], me[7], me[11], me[15] = 0, 0, 0, 1
}
// a: aspect ratio. n: near-plane. f: far-plane.
func (me *Mat4) Perspective(fovY, a, n, f float64) {
s := 1 / math.Tan(DegToRad(fovY)/2) // scaling
me[0], me[4], me[8], me[12] = s/a, 0, 0, 0
me[1], me[5], me[9], me[13] = 0, s, 0, 0
me[2], me[6], me[10], me[14] = 0, 0, (f+n)/(n-f), (2*f*n)/(n-f)
me[3], me[7], me[11], me[15] = 0, 0, -1, 0
}
So, for the lookat-part to have my world-space camera (positive-Z) work with lookat (negative-Z) as per this pseudocode:
// world-space position:
camPos := cam.Pos
// normalized direction-vector, up/right/forward are 1 not -1:
camTarget := cam.Dir
// lookat-target:
camTarget.Add(&camPos)
// invert both Z:
camPos.Z, camTarget.Z = -camPos.Z, -camTarget.Z
// compute lookat-matrix:
cam.mat.Lookat(&camPos, &camTarget, &Vec3{0, 1, 0})
That works well. Moving the camera in all 6 degrees of freedom produces correct on-screen movement and correct new camera world-space coords.
But geometry is still inverted on the Z-axis. When I position two boxes, A at (-2, 1, -2), to appear near-left and B (2, 1, 2) to appear far-right, then A appears far-left and B appears near-right. Z is still inverted here.
Now, these nodes have their own world-space coords and update from those their own model-to-world matrices. I shouldn't invert posZ there as they form a hierarchy of sub-nodes multiplying with their parents transforms and all that. They're still in model or world space, which as per my decree is to remain left-handed.
Their world-to-camera calculation happens on the CPU at my end, not in a vertex shader which just gets a single final (mvp/clip-space) matrix.
When that happens -- multiplication of world-space-object-matrix with clip-space lookat-and-projection matrix -- at that point I need to somehow invert Z.
What's the best way to do this? Or, more generally speaking, what's a common way that works? Do I have to modify the projection to accept left-handed but output-to-GL right-handed? If so, how? And then wouldn't I also have to modify lookat? Is there a smart way to do all this without having to modify the somewhat-standard lookat/projection matrices while also keeping model-transform-matrices in left-handed coords?
In Perspective, changing me[11] from -1 to 1 should invert the z axis the way you're describing. If that isn't correct, try negating me[10]. Of course, because the z axis is inverted the directions of your rotations will affected as well. If I recall right rotations around the y axis, as well as maybe the x axis will be inverted. If this is the case you should be able to just negate the rotations to counteract it.

Z Value after Perspective Divide is always less than -1

So I'm writing my own custom 3D transformation pipeline in order to gain a better understanding of how it all works. I can get everything rendering to the screen properly and I'm now about to go back and look at clipping.
From my understanding, I should be clipping a vertex point if the x or y value after the perspective divide is outside the bounds of [-1, 1] and in my case if the z value is outside the bounds of [0, 1].
When i implement that however, my z value is always -1.xxxxxxxxxxx where xxxxxxx is a very small number.
This is a bit long, and I apologize, but I wanted to make sure I gave all the information I could.
First conventions:
I'm using a left-handed system where a Matrix looks like this:
[m00, m01, m02, m03]
[m10, m11, m12, m13]
[m20, m21, m22, m23]
[m30, m31, m32, m33]
And my vectors are columns looking like this:
[x]
[y]
[z]
[w]
My camera is set up with:
A vertical FOV in radians of PI/4.
An aspect ration of 1. (Square view port)
A near clip value of 1.
A far clip value of 1000.
An initial world x position of 0.
An initial world y position of 0.
An initial world z position of -500.
The camera is looking down the position Z axis (0, 0, 1)
Given a vertex, the pipeline works like this:
Step 1: Multiply the vertex by the camera matrix.
Step 2: Multiply the vertex by the projection matrix.
Projection matrix is:
[2.41421, 0, 0, 0]
[0 2.41421, 0, 0]
[0, 0, 1.001001, 1]
[0, 0, -1.001001, 0]
Step 3: Multiply the x, y and z components by 1/w.
Step 4: [This is where the problem is] Clip the vertex if outside bounds.
Step 5: Convert to screen coordinates.
An example vertex that I have is
(-100, -100, 0, 1)
After multiplying by the camera matrix i get:
(-100, -100, 500, 1)
Which makes sense because relative to the camera, that vertex is 100 units to the left and down and 500 units ahead. It is also between the near clip of 1 and the far clip of 1000. W is still 1.
After multiplying by the projection matrix i get:
(-241.42135, -241.42135, 601.600600, -600.600600)
This I'm not sure if it makes sense. The x and y seem to be correct, but i'm iffy about the z and w since the next step of perspective divide is odd.
After the perspective divide I get:
(0.401966, 0.401966, -1.001665, 1)
Again the x and y make sense, they are both within the bounds of [-1, 1]. But the z value is clearly outside the bounds even though i believe it should still be within the frustrum. W is back to 1 which again makes sense.
Again apologies for the novel, but I'm hoping someone can help me figure out what I'm doing incorrectly.
Thanks!
Ok, it looks like I figured out what the problem it was.
My projection matrix was:
[2.41421, 0, 0, 0]
[0 2.41421, 0, 0]
[0, 0, 1.001001, 1]
[0, 0, -1.001001, 0]
But it really should be transposed and be:
[2.41421, 0, 0, 0]
[0 2.41421, 0, 0]
[0, 0, 1.001001, -1.001001]
[0, 0, 1, 0]
When using this matrix, my x and y values stay the same as expected and now my z values are constrained to be within [0, 1] and only exceed that range if they are outside the near of far clip plane.
The only issue now is that I'm quite confused as to whether I'm using a right or left handed system.
All i know is that now it works...
I may be out of my league here, but I thought that the purpose of the projection matrix and perspective divide were to discover the 2D position of that point on the screen. In that case, the left-over z value would not necessarily have any meaning any more, since the math is all geared towards finding those two x and y values.
Update: I think I have it figured out. Your math is all correct. The camera and frustum you describe has a near clipping plane at Z=1, so your example point at (-100, 100, 0) is actually outside of the clipping plane, so that z-buffer value of just below -1 makes perfect sense.
Try a sample point with a z-coordinate inside your frustum, say with a z-coordinate of 2.