Z Value after Perspective Divide is always less than -1 - opengl

So I'm writing my own custom 3D transformation pipeline in order to gain a better understanding of how it all works. I can get everything rendering to the screen properly and I'm now about to go back and look at clipping.
From my understanding, I should be clipping a vertex point if the x or y value after the perspective divide is outside the bounds of [-1, 1] and in my case if the z value is outside the bounds of [0, 1].
When i implement that however, my z value is always -1.xxxxxxxxxxx where xxxxxxx is a very small number.
This is a bit long, and I apologize, but I wanted to make sure I gave all the information I could.
First conventions:
I'm using a left-handed system where a Matrix looks like this:
[m00, m01, m02, m03]
[m10, m11, m12, m13]
[m20, m21, m22, m23]
[m30, m31, m32, m33]
And my vectors are columns looking like this:
[x]
[y]
[z]
[w]
My camera is set up with:
A vertical FOV in radians of PI/4.
An aspect ration of 1. (Square view port)
A near clip value of 1.
A far clip value of 1000.
An initial world x position of 0.
An initial world y position of 0.
An initial world z position of -500.
The camera is looking down the position Z axis (0, 0, 1)
Given a vertex, the pipeline works like this:
Step 1: Multiply the vertex by the camera matrix.
Step 2: Multiply the vertex by the projection matrix.
Projection matrix is:
[2.41421, 0, 0, 0]
[0 2.41421, 0, 0]
[0, 0, 1.001001, 1]
[0, 0, -1.001001, 0]
Step 3: Multiply the x, y and z components by 1/w.
Step 4: [This is where the problem is] Clip the vertex if outside bounds.
Step 5: Convert to screen coordinates.
An example vertex that I have is
(-100, -100, 0, 1)
After multiplying by the camera matrix i get:
(-100, -100, 500, 1)
Which makes sense because relative to the camera, that vertex is 100 units to the left and down and 500 units ahead. It is also between the near clip of 1 and the far clip of 1000. W is still 1.
After multiplying by the projection matrix i get:
(-241.42135, -241.42135, 601.600600, -600.600600)
This I'm not sure if it makes sense. The x and y seem to be correct, but i'm iffy about the z and w since the next step of perspective divide is odd.
After the perspective divide I get:
(0.401966, 0.401966, -1.001665, 1)
Again the x and y make sense, they are both within the bounds of [-1, 1]. But the z value is clearly outside the bounds even though i believe it should still be within the frustrum. W is back to 1 which again makes sense.
Again apologies for the novel, but I'm hoping someone can help me figure out what I'm doing incorrectly.
Thanks!

Ok, it looks like I figured out what the problem it was.
My projection matrix was:
[2.41421, 0, 0, 0]
[0 2.41421, 0, 0]
[0, 0, 1.001001, 1]
[0, 0, -1.001001, 0]
But it really should be transposed and be:
[2.41421, 0, 0, 0]
[0 2.41421, 0, 0]
[0, 0, 1.001001, -1.001001]
[0, 0, 1, 0]
When using this matrix, my x and y values stay the same as expected and now my z values are constrained to be within [0, 1] and only exceed that range if they are outside the near of far clip plane.
The only issue now is that I'm quite confused as to whether I'm using a right or left handed system.
All i know is that now it works...

I may be out of my league here, but I thought that the purpose of the projection matrix and perspective divide were to discover the 2D position of that point on the screen. In that case, the left-over z value would not necessarily have any meaning any more, since the math is all geared towards finding those two x and y values.
Update: I think I have it figured out. Your math is all correct. The camera and frustum you describe has a near clipping plane at Z=1, so your example point at (-100, 100, 0) is actually outside of the clipping plane, so that z-buffer value of just below -1 makes perfect sense.
Try a sample point with a z-coordinate inside your frustum, say with a z-coordinate of 2.

Related

Get camera matrix from OpenGL

I render a 3D mesh model using OpenGL with perspective camera – gluPerspective(fov, aspect, near, far).
Then I use rendered image in a computer vision algorithm.
At some point that algorithm requires camera matrix K (along with several vertices on the model and their corresponding projections) in order to estimate camera position: rotation matrix R and translation vector t. I can estimate R and t by using any algorithm which solves Perspective-n-Point problem.
I construct K from the OpenGL projection matrix (see how here)
K = [fX, 0, pX | 0, fY, pY | 0, 0, 1]
If I want to project a model point 'by hand' I can compute:
X_proj = K*(R*X_model + t)
x_pixel = X_proj[1] / X_proj[3]
y_pixel = X_proj[2] / X_proj[3]
Anyway, I pass this camera matrix in a PnP algorithm and it works just fine.
But then I had to change perspective projection to orthographic one.
As far as I understand when using orthographic projection the camera matrix becomes:
K = [1, 0, 0 | 0, 1, 0 | 0, 0, 0]
So I changed gluPerspective to glOrtho. Following the same way I constructed K from OpenGL projection matrix, and it turned out that fX and fY are not ones but 0.0037371. Is this a scaled orthographic projection or what?
Moreover, in order to project model vertices 'by hand' I managed to do the following:
X_proj = K*(R*X_model + t)
x_pixel = X_proj[1] + width / 2
y_pixel = X_proj[2] + height / 2
Which is not what I expected (that plus width and hight divided by 2 seems strange...). I tried to pass this camera matrix to POSIT algorithm to estimate R and t, and it doesn't converge. :(
So here are my questions:
How to get orthographic camera matrix from OpenGL?
If the way I did it is correct then is it true orthographic? Why POSIT doesn't work?
Orthographic projection will not use the depth to scale down farther points. Though, it will scale the points to fit inside the NDC which means it will scale the values to fit inside the range [-1,1].
This matrix from Wikipedia shows what this means:
So, it is correct to have numbers other than 1.
For your way of computing by hand, I believe it's not scaling back to screen coordinates and that makes it wrong. As I said, the output of projection matrices will be in the range [-1,1], and if you want to get the pixel coordinates, I believe you should do something similar to this:
X_proj = K*(R*X_model + t)
x_pixel = X_proj[1]*width/2 + width / 2
y_pixel = X_proj[2]*height/2 + height / 2
Anyway, I think you'd be better if you used modern OpenGL with libraries like GLM. In this case, you have the exact projection matrices used at hand.

Screen position unprojection without W

I understand the basic concept of how to unproject:
let mut z = 0.0;
gl::ReadPixels(x as i32, y as i32, 1, 1, gl::DEPTH_COMPONENT, gl::FLOAT, &z);
// window position to screen position
let screen_position = self.to_screen_position(x, y);
// screen position to world position
let world_position = self.projection_matrix().invert() *
Vector4::new(screen_position.x, screen_position.y, z, 1.0);
But this doesn't take the W coordinate properly - when I render things from world space to screen space, they end up with a W != 1, because of the perspective transformation (https://www.opengl.org/sdk/docs/man2/xhtml/gluPerspective.xml). When I transform back from screen space to world space (with an assumption of W=1), the objects are in the wrong position.
As I understand it, W is a scaling factor for all the other coordinates. If this is the case, doesn't it mean screen vectors (0, 0, -1, 1) and (0, 0, -2, 2) will map to the same window coordinates, and that unprojecting doesn't necessarily produce unique results without further work?
Thanks!
Because of the perspective transformation, you can't really ignore W.
I would suggest looking at the source code for the gluUnproject function here: http://www.opengl.org/wiki/GluProject_and_gluUnProject_code. You'll see that what this does is:
Calculate projection*modelView matrix and invert it.
Multiply a vector made from the screen position (in the code, winZ=0 would correspond to the near plane, winZ=1 to the far plane of your perspective projection; W will always be 1).
Divide the calculated vector's X, Y and Z by W.
Note that if you do it like this, the result's W should be ignored (i.e. assumed to be 1).
You can also look here to see how the transformations in OpenGL work - if you're not familiar with this, I'd suggest reading about Clip coordinates and Normalized Device coordinates.

OpenGL custom rendering pipeline: Perspective matrix

I am attempting to work in LWJGL to display a simple quad using my own matrices. I've been looking around for awhile and have found a few perspective matrix implementations, these two in particular:
[cot(fov/2)/a 0 0 0]
[0 cot(fov/2) 0 0]
[0 0 -f/(f-n) -1]
[0 0 -f*n/(f-n) 0]
and:
[cot(fov/2)/a 0 0 0]
[0 cot(fov/2) 0 0]
[0 0 -(f+n)/(f-n) -1]
[0 0 -(2*f*n)/(f-n) 0]
Both of these provide the same effect, as expected (got them from here and here, respectively). The issue is in my understanding of how multiplying this by the modelview matrix, then a vertex, then dividing each x, y, and z value by its w value gives a screen coordinate. More specifically, if I multiply either of these by the modelview matrix then by a vertex (10, 10, 0, 1), it gives a w=0. That in itself is a big smack in the face. I conclude either the matrices are wrong, or I am missing something completely. In my actual test program, the vertices don't even end up on screen even though the camera position at (0,0,0) and no rotation would make it so. I even have tried many different z values, positive and negative, to see if it was just a clipping plane. Am I missing something here?
EDIT: After a lot of checking over, I've narrowed down the problem I am facing. The biggest issue is that the z-axis does not appear to be remapped to the range I specify (n to f). Any object just zooms in or out a little bit when I translate it along the z-axis then pops out of existence as it moves past the range [-1, 1]. I think this is also making me more confused. I set my far plane to 100 and my near to 0.1, and it behaves like anything but.
Both of these provide the same effect, as expected
While the second projection matrix form is very standard, the first one gives a different effect. If you have z==1 and w==0, the projection will be:
Matrix 1: -f/(f-n) / -f*n/(f-n) = f / f*n = 1 / n
Matrix 2: -(f+n)/(f-n) / -(2*f*n)/(f-n) = (f+n) / (2*f2n)
The result is clearly different. You should always use the second form.
if I multiply either of these by the modelview matrix then by a vertex
(10, 10, 0, 1), it gives a w=0. That in itself is a big smack in the
face
For a focal length d the projection is computed as (ignoring aspect ratio):
x'= d*x/z = x / w
y'= d*y/z = y / w
where
w = z / d
If you have z==0 this means that you want to project a point that is already in the eye and only points beyond d are visible. In practice this point will be clipped because z is not within the range n (near) and f (far) (n is expected as a positive constant)

Ways to "invert Z-axis" in shader-based core-profile OpenGL?

In my hobbyist shader-based (non-FFP) GL (3.2+ core) "engine", everything in world-space and model-space is by design "left-handed" (and to stay that way), so X-axis goes from -1 ("left") to 1 ("right"), Y from -1 ("bottom") to 1 ("top") and Z from -1 ("near") to 1 ("far").
Now, by default in OpenGL the NDC-space works the same but the clip space doesn't, from what I gather, here z extends from 1 ("near") to -1 ("far").
At the same time I want to ideally keep using the "kinda-sorta inofficial quasi-standard" matrix functions for lookat and perspective, currently defined as:
func (me *Mat4) Lookat(eyePos, lookTarget, upVec *Vec3) {
l := lookTarget.Sub(eyePos)
l.Normalize()
s := l.Cross(upVec)
s.Normalize()
u := s.Cross(l)
me[0], me[4], me[8], me[12] = s.X, u.X, -l.X, -eyePos.X
me[1], me[5], me[9], me[13] = s.Y, u.Y, -l.Y, -eyePos.Y
me[2], me[6], me[10], me[14] = s.Z, u.Z, -l.Z, -eyePos.Z
me[3], me[7], me[11], me[15] = 0, 0, 0, 1
}
// a: aspect ratio. n: near-plane. f: far-plane.
func (me *Mat4) Perspective(fovY, a, n, f float64) {
s := 1 / math.Tan(DegToRad(fovY)/2) // scaling
me[0], me[4], me[8], me[12] = s/a, 0, 0, 0
me[1], me[5], me[9], me[13] = 0, s, 0, 0
me[2], me[6], me[10], me[14] = 0, 0, (f+n)/(n-f), (2*f*n)/(n-f)
me[3], me[7], me[11], me[15] = 0, 0, -1, 0
}
So, for the lookat-part to have my world-space camera (positive-Z) work with lookat (negative-Z) as per this pseudocode:
// world-space position:
camPos := cam.Pos
// normalized direction-vector, up/right/forward are 1 not -1:
camTarget := cam.Dir
// lookat-target:
camTarget.Add(&camPos)
// invert both Z:
camPos.Z, camTarget.Z = -camPos.Z, -camTarget.Z
// compute lookat-matrix:
cam.mat.Lookat(&camPos, &camTarget, &Vec3{0, 1, 0})
That works well. Moving the camera in all 6 degrees of freedom produces correct on-screen movement and correct new camera world-space coords.
But geometry is still inverted on the Z-axis. When I position two boxes, A at (-2, 1, -2), to appear near-left and B (2, 1, 2) to appear far-right, then A appears far-left and B appears near-right. Z is still inverted here.
Now, these nodes have their own world-space coords and update from those their own model-to-world matrices. I shouldn't invert posZ there as they form a hierarchy of sub-nodes multiplying with their parents transforms and all that. They're still in model or world space, which as per my decree is to remain left-handed.
Their world-to-camera calculation happens on the CPU at my end, not in a vertex shader which just gets a single final (mvp/clip-space) matrix.
When that happens -- multiplication of world-space-object-matrix with clip-space lookat-and-projection matrix -- at that point I need to somehow invert Z.
What's the best way to do this? Or, more generally speaking, what's a common way that works? Do I have to modify the projection to accept left-handed but output-to-GL right-handed? If so, how? And then wouldn't I also have to modify lookat? Is there a smart way to do all this without having to modify the somewhat-standard lookat/projection matrices while also keeping model-transform-matrices in left-handed coords?
In Perspective, changing me[11] from -1 to 1 should invert the z axis the way you're describing. If that isn't correct, try negating me[10]. Of course, because the z axis is inverted the directions of your rotations will affected as well. If I recall right rotations around the y axis, as well as maybe the x axis will be inverted. If this is the case you should be able to just negate the rotations to counteract it.

How can I calculate camera position by comparing two photographs?

I'm trying to calculate the cameras position for an image. I have 2 images of a rubiks cube. The first image is considered to be the base image and the next image is the image after the camera has moved. So for the first image I assume that the camera is at (0,0,0). On this image I then identify the 4 corners of the front face of the rubiks cube as shown here (4 corners identified by the 4 blue circles).
Then for the next image (after camera movement), I identify the same face of the rubiks cube as show here
So by assuming the first image as the base image, does anyone know if/how i can calculate how much the camera has moved for image 2 as shown here:
I would suggest you use OpenCV for this. I also think, this question would be more suited to StackOverflow.
The textbook on this subject would be "Multiple-View Geometry" by Hartley and Zisserman. http://www.robots.ox.ac.uk/~vgg/hzbook/ (There is a sample chapter on the Fundamental Matrix on that website.)
Basically, first find the Fundamental Matrix, then by knowing the intrinsic parameters of the camera, find a solution to the position.
Fundamental Matrix: http://en.wikipedia.org/wiki/Fundamental_matrix_%28computer_vision%29
Intrinsic Parameters: Stuff like the focal length and where the principal point is on the image plane. If you have F, then E = K^t * F * K, if K is the intrinsic matrix and the same for both images.
How to find a solution to the camera position: http://en.wikipedia.org/wiki/Essential_matrix#Determining_R_and_t_from_E
Algorithm
This is how I would do it in OpenCV. I have done this before, so it ought to work.
1. Run Feature Detection and Detector Extractor on both images.
2. Match Features.
3. Use F = cv::findFundamentalMatrix with Ransac.
4. E = K.t() * F * K. // K needs to be found beforehand.
5. Do SingularValueDecomposition of E such that E = U * S * V.t()
6. R = U * W.inv() * V.t() // W = [[0, -1, 0], [1, 0, 0], [0, 0, 1]]
7. Tx = V * Z * V.t() // Z = [[0, -1, 0], [1, 0, 0], [0, 0, 0]]
8. get t from Tx (matrix version of cross product)
9. Find the correct solution. R.t() and -t are possiblities.
10. Get overall scale by knowing the length of the size of the Rubrik's cube.
Alternative Solutions
I am certain that a more straightforward approach can also work. The benefit of this approach is that no human input is needed (unsupervised). This is not true for the optional step 10 (determining scale).
A different solution would exploit the knowledge of the geometry of the Rubrik's cube. For example, six (5.5) points are needed to estimate the position of the camera, if the point's 3D position is known.
Unfortunatly, I am not aware of any software that does this for you automatically.
So here is the alternative algorithm:
Write down the coordinates of the corners of the cube as (X_i, Y_i, Z_i), and possibly also points with other knowable positions.
Mark the corresponding points u_i = (x_i, y_i).
For every correspondence create two lines in a matrix A.
(X_i, Y_i, Z_i, 1, 0, 0, 0, 0, -x_iX_i, -x_iY_i, -x_iZ_i -x_i)
(0, 0, 0, 0, X_i, Y_i, Z_i, 1, -y_iX_i, -y_iY_i, -y_iZ_i -y_i)
Then find p such that Ap = 0. I.e. p is the right kernel of A, or the least-squared solution to Ap=0.
De-flatten p, to create a 3x4 matrix. P.