glReadPixels() how to get actual depth instead of normalized values? - opengl

I'm using pyopengl to get a depth map.
I am able to get a normalized depth map using glReadPixels(). How can I revert the normalized values to the actual depth in world coordinates?
I've tried playing with glDepthRange(), but it always performs some normalization. Can I disable the normalization at all?

When you draw your geometry, your vertex shader is supposed to transform everything into normalized device coordinates (where each component is between -1 and 1) via the view/projection matrix. There is no way to avoid it, everything outside of this range will get clipped (or clamped, if you enable depth clamping). Then, these device coordinates are transformed into window coordinates - X and Y coordinates are mapped into range specified with glViewport and Z into range set with glDepthRange.
You can't disable normalization, because the final values are required to be in 0..1 range. But you can apply the reverse transformation: first, map your depth values back to -1..1 range (if you didn't use glDepthRange, all you have to do is multiply them by 2 and subtract 1). Then, you need to apply the inverse of your projection matrix - you can either do that explicitly by calculating its inverse, or avoid matrix operations by looking into how your perspective matrix is calculated. For a typical matrix, the inverse transform will be
zNorm = 2 * zBuffer - 1
zView = 2 * near * far / ((far - near) * zNorm - near - far)
(Note that zView will be negative, between -near and -far, because in OpenGL your Z axis normally points towards the camera).
Although normally you don't want only depth - you want the full 3D points, so you might as well reconstruct the vector in normalized coordinates and then apply the inverse projection/view transform.

After the projection to the viewport, the coordinates of the scene are normalized device coordinates (NDC). The normalized device space is a cube, with the left, bottom, front coordinate of (-1, -1, -1) and the right, top, back coordinate of (1, 1, 1). The geometry in this cube is "visible" on the viewport (unless it is covered).
The Z coordinate of the normalized device space, is mapped to the depth range (glDepthRange), which is general in [0, 1].
How the z-coordinate of the view space is transformed to a normalized device Z-coordinate and further a depth, depends on the projection matrix.
While at Orthographic Projection, the Z component is calculated by the linear function, at Perspective Projection, the Z component is calculated by the rational function.
See How to render depth linearly in modern OpenGL with gl_FragCoord.z in fragment shader?.
This means , to convert form the depth of the depth buffer to the original Z-coordinate, the projection (Orthographic or Perspective), and the near plane and far plane has to be known.
In the following is assumed that the depth range is in [0, 1] and depth is a value in this range:
Orthographic Projection
n = near, f = far
z_eye = depth * (f-n) + n;
z_linear = z_eye
Perspective Projection
n = near, f = far
z_ndc = 2 * depth - 1.0;
z_eye = 2 * n * f / (f + n - z_ndc * (f - n));
If the perspective projection matrix is known this can be done as follows:
A = prj_mat[2][2]
B = prj_mat[3][2]
z_eye = B / (A + z_ndc)
Note, in any case transformation by the inverse projection matrix, would transform a normalized device coordinate to a coordinate in view space.

Related

How to undo camera transformation and perspective?

I am trying to orient a 3d object at the world origin such that it doesn't change its position wrt camera when I move the camera OR change its field of view. I tried doing this
Object Transform = Inverse(CameraProjectionMatrix)
How do I undo the perspective divide because when I change the fov, the object is affected by it
In detail it looks like
origin(0.0, 0.0, 0.0, 1.0f);
projViewInverse = Camera.projViewMatrix().inverse();
projectionMatrix = Camera.projViewMatrix();
projectedOrigin = projectionMatrix * origin;
topRight(0.5f, 0.5f, 0.f);
scaleFactor = 1.0/projectedOrigin.z();
scale(scaleFactor,scaleFactor,scaleFactor);
finalMatrix = projViewInverse * Scaling(w) * Translation(topRight);
if you use gfx pipeline where positions (w=1.0) and vectors (w=0.0) are transformed to NDC like this:
(x',y',z',w') = M*(x,y,z,w) // applying transforms
(x'',y'') = (x',y')/w' // perspective divide
where M are all your 4x4 homogenyuous transform matrices multiplied in their order together. If you want to go back to the original (x,y,z) you need to know w' which can be computed from z. The equation depends on your projection. In such case you can do this:
w' = f(z') // z' is usually the value encoded in depth buffer and can obtained
(x',y') = (x'',y'')*w' // screen -> camera
(x,y) = Inverse(M)*(x',y',z',w') // camera -> world
However this can be used only if you know the z' and can derive w' from it. So what is usually done (if we can not) is to cast ray from camera focal point through the (x'',y'') and stop at wanted perpendicular distance to camera. For perspective projection you can look at it as triangle similarity:
So for each vertex you want to transform you need its projected x'',y'' position on the znear plane (screen) and then just scale the x'',y'' by the ratio between distances to camera focal point (*z1/z0). Now all we need is the focal length z0. That one dependss on the kind of projection matrix you use. I usually encounter 2 versions when you are in camera coordinate system then point (0,0,0) is either the focal point or znear plane. However the projection matrix can be any hence also the focal point position can vary ...
Now when you have to deal with aspect ratio then the first method deals with it internally as its inside the M. The second method needs to apply inverse of aspect ratio correction before conversion. So apply it directly on x'',y''

OpenGL converting between different right hand notations

I'm displaying an array of 3D points with OpenGL. The problem is the 3D points are from a sensor where X is forward, Y is to the left, Z is up. From my understanding OpenGL has X to the right, Y up, Z out of screen. So when I use a lot of the examples of projection matrices, and cameras the points are obviously not viewed the right way, or the way that makes sense.
So to compare the two (S for sensor, O for OpenGL):
Xs == -Zo, Ys == -Xo, Zs == Yo.
Now my questions are:
How can I rotate the the points from S to O. I tried rotating by 90degrees around X, then Z but it doesn't appear to be working.
Do I even need to rotate to OpenGL convention, can I make up my own Axes (use the sensors orientation), and change the camera code? Or will some assumptions break somewhere in the graphics pipeline?
My implementation based on the answer below:
glm::mat4 model = glm::mat4(0.0f);
model[0][1] = -1;
model[1][2] = 1;
model[2][0] = -1;
// My input to the shader was a mat4 for the model matrix so need to
// make sure the bottom right element is 1
model[3][3] = 1;
The one line in the shader:
// Note that the above matrix is OpenGL to Sensor frame conversion
// I want Sensor to OpenGL so I need to take the inverse of the model matrix
// In the real implementation I will change the code above to
// take inverse before sending to shader
" gl_Position = projection * view * inverse(model) * vec4(lidar_pt.x, lidar_pt.y, lidar_pt.z, 1.0f);\n"
In order to convert the sensor data's coordinate system into OpenGL's right-handed world-space, where the X axis points to the right, Y points up and Z points towards the user in front of the screen (i.e. "out of the screen") you can very easily come up with a 3x3 rotation matrix that will perform what you want:
Since you said that in the sensor's coordinate system X points into the screen (which is equivalent to OpenGL's -Z axis, we will map the sensor's (1, 0, 0) axis to (0, 0, -1).
And your sensor's Y axis points to the left (as you said), so that will be OpenGL's (-1, 0, 0). And likewise, the sensor's Z axis points up, so that will be OpenGL's (0, 1, 0).
With this information, we can build the rotation matrix:
/ 0 -1 0\
| 0 0 1|
\-1 0 0/
Simply multiply your sensor data vertices with this matrix before applying OpenGL's view and projection transformation.
So, when you multiply that out with a vector (Sx, Sy, Sz), you get:
Ox = -Sy
Oy = Sz
Oz = -Sx
(where Ox/y/z is the point in OpenGL coordinates and Sx/y/z is the sensor coordinates).
Now, you can just build a transformation matrix (right-multiply against your usual model-view-projection matrix) and let a shader transform the vertices by that or you simply pre-transform the sensor vertices before uploading to OpenGL.
You hardly ever need angles in OpenGL when you know your linear algebra math.

Display recursively rendered scene into a plane

I have to render 2 scenes separately and embed one of them into another scene as a plane. The sub scene that is rendered as a plane will use a view matrix calculated from relative camera position and perspective matrix considering distance and calculated skew to render sub scene as if that scene is placed actually on the point.
For describing more detail, this is a figure to describe the simpler case.
(In this case, we have the sub scene on the center line of the main frustum)
It is easy to calculate perspective matrix visualized as red frustum by using these parameters.
However, it is very difficult for me to solve the other case. If there were the sub scene outside of the center line, I should skew the projection matrix to correspond with scene outside.
I think this is kind of oblique perspective projection. And also this is very similar to render mirror. How do I calculate this perspective matrix?
As #Rabbid76 already pointed out this is just a standard asymmetric frustum. For that, you just need to know the coordinates of the rectangle on the near plane you are going to use, in eye-space.
However, there is also another option: You can also modify the existing projection matrix. That approach will be easier if you know the position of your rectangle in window coordinates or normalized devices coordinates. You can simply pre-multiply scale and translation matrices to select any sub-region of your original frustum.
Let's assume that your viewport is w * h pixels wide, and starts at (0,0) in the window. And you want to create a frustum which just renders a sub-rectangle which starts at the lower left corner of pixel (x,y), and which is a pixels wide and b pixels tall.
Convert to NDC:
x_ndc = (x / w) * 2 - 1 and y_ndc = (y / h) * 2 - 1
a_ndc = (a / w) * 2 and b_ndc = (b / h) * 2
Create a scale and translation transform which maps the range [x_ndc, x_ndc+a_ndc] to [-1,1], and similiar for y:
( 2/a_ndc 0 0 -2*x_ndc/a_ndc-1 )
M = ( 0 2/b_ndc 0 -2*y_ndc/b_ndc-1 )
( 0 0 1 0 )
( 0 0 0 1 )
(note that the factor 2 is going to be cancled out. Instead of going to [-1,1] NDC space in step 1, we could also just have used the normalized [0,1], I just wanted to use the standard spaces.)
Pre-Multiply M to the original projection matrix P:
P' = M * P
Note that even though we defined the transformation in NDC space, and P works in clip space before the division, the math still will work out. By using the homogenous coordinates, the translation part of M will be scaled by w accordingly. The resulting matrix will just be a general asymmetric projection matrix.
Now this does not adjust the near and far clipping planes of the original projection. But you can adjust them in the very same way by adding appropriate scale and translation to the z coordinate.
Also note that using this approach, you are not even restricted to selecting an axis-parallel rectangle, you can also rotate or skew it arbitrarily, so basically, you can select an arbitrary parallelogram in window space.
How do I calculate this perspective matrix?
An asymmetric perspective (column major order) projection matrix is set up like this:
m[16] = [
2*n/(r-l), 0, 0, 0,
0, 2*n/(t-b), 0, 0,
(r+l)/(r-l), (t+b)/(t-b), -(f+n)/(f-n), -1,
0, 0, -2*f*n/(f-n), 0];
Where r, l, b, and t are the left, right, bottom and top distances to the frustum planes on the near plane. n and f are the distances to the near and far plane.
In common, in a framework or a library a projection matrix like this is set up by a function called frustum.
e.g.
OpenGL Mathematics: glm::frustum
OpenGL fixed function pipeline: glFrustum

OpenGL Perspective Projection pixel perfect drawing

The target is to draw a shape, lets say a triangle, pixel-perfect (vertices shall be specified in pixels) and be able to transform it in the 3rd dimension.
I've tried it with a orthogonal projection matrix and everything works fine, but the shape doesn't have any depth - if I rotate it around the Y axis it looks like I would just scale it around the X axis. (because a orthogonal projection obviously behaves like this). Now I want to try it with a perspective projection. But with this projection, the coordinate system changes completely, and due to this I can't specify my triangles verticies with pixels. Also if the size of my window changes, the size of shape changes too (because of the changed coordinate system).
Is there any way to change the coordinate system of the perspective projection so that I can specify my vertices like if I would use the orthogonal projection? Or do anyone have a Idea how to achieve the target described in the first sentence?
The projection matrix describes the mapping from 3D points of a scene, to 2D points of the viewport. It transforms from eye space to the clip space, and the coordinates in the clip space are transformed to the normalized device coordinates (NDC) by dividing with the w component of the clip coordinates. The NDC are in range (-1,-1,-1) to (1,1,1).
At Perspective Projection the projection matrix describes the mapping from 3D points in the world as they are seen from of a pinhole camera, to 2D points of the viewport. The eye space coordinates in the camera frustum (a truncated pyramid) are mapped to a cube (the normalized device coordinates).
Perspective Projection Matrix:
r = right, l = left, b = bottom, t = top, n = near, f = far
2*n/(r-l) 0 0 0
0 2*n/(t-b) 0 0
(r+l)/(r-l) (t+b)/(t-b) -(f+n)/(f-n) -1
0 0 -2*f*n/(f-n) 0
where:
aspect = w / h
tanFov = tan( fov_y * 0.5 );
prjMat[0][0] = 2*n/(r-l) = 1.0 / (tanFov * aspect)
prjMat[1][1] = 2*n/(t-b) = 1.0 / tanFov
I assume that the view matrix is the identity matrix, and thus the view space coordinates are equal to the world coordinates.
If you want to draw a polygon, where the vertex coordinates are translated 1:1 into pixels, then you have to draw the polygon in parallel plane to the viewport. This means all points have to be draw with the same depth. The depth has to choose that way, that the transformation of a point in normalized device coordinates, by the inverse projection matrix gives the vertex coordinates in pixel. Note, the homogeneous coordinates given by the transformation with the inverse projection matrix, have to be divided by the w component of the homogeneous coordinates, to get cartesian coordinates.
This means, that the depth of the plane depends on the field of view angle of the projection:
Assuming you set up a perspective projection like this:
float vp_w = .... // width of the viewport in pixel
float vp_h = .... // height of the viewport in pixel
float fov_y = ..... // field of view angle (y axis) of the view port in degrees < 180°
gluPerspective( fov_y, vp_w / vp_h, 1.0, vp_h*2.0f );
Then the depthZ of the plane with a 1:1 relation of vertex coordinates and pixels, will be calculated like this:
float angRad = fov_y * PI / 180.0;
float depthZ = -vp_h / (2.0 * tan( angRad / 2.0 ));
Note, the center point of the projection to the view port is (0,0), so the bottom left corner point of the plane is (-vp_w/2, -vp_h/2, depthZ) and the top right corner point is (vp_w/2, vp_h/2, depthZ). Ensure, that the near plane of the perspective projetion is less than -depthZ and the far plane is greater than -depthZ.
See further:
Both depth buffer and triangle face orientation are reversed in OpenGL
Transform the modelMatrix

A Depth buffer with two different projection matrices

I am using the default OpenGL values like glDepthRangef(0.0,1.0);, gldepthfunc(GL_LESS); and glClearDepthf(1.f); because my projection matrices change the right hand coordinate to the left hand coordinate. I mean, My near plane and the far plane z-values are supposed to be [-1 , 1] in NDC.
The problem is when I draw two objects at the one FBO including same RBOs, for example, like this code below,
glEnable(GL_DEPTH_TEST);
glClearDepthf(1.f);
glClearColor(0.0,0.0,0.0,0.0);
glClear(GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT);
drawObj1(); // this uses 1) the orthogonal projection below
drawObj2(); // this uses 2) the perspective projection below
glDisable(GL_DEPTH_TEST);
always, the object1 is above the object2.
1) orthogonal
2) perspective
However, when they use same projection whatever it is, it works fine.
Which part do you think I should go over?
--Updated--
Coverting Eye coordinate to NDC to Screen coordinate, what really happens?
My understanding is because after both of projections, its NDC shape is same as images below, its z-value after multiplying 2) perspective matrix doesn't have to be distorted. However, according to the derbass's good answer, if z-value in the view coordinate is multiplied by the perspective matrix, the z-value would be hyperbolically distorted in NDC.
If so, if one vertex position, for example, is [-240.0, 0.0, -100.0] in the eye(view) coordinate with [w:480.0,h:320.0], and I clipped it with [-0.01,-100], would it be [-1,0,-1] or [something>=-1,0,-1] in NDC ? And its z value is still same as -1, isn't it? when its z-value is distorted?
1) Orthogonal
2) Perspective
You can't expect that the z values of your vertices are projected to the same window space z value just because you use the same near and far values for a perspecitive and an orthogonal projection matrix.
In the prespecitve case, the eye space z value will be hyperbolically distorted to the NDC z value. In the orthogonal case, it is just linaerily scaled and shifted.
If your "Obj2" lies just in a flat plane z_eye=const, you can pre-calulate the distorted depth it should have in the perspective case. But if it has a non-zero extent into depth, this will not work. I can think of different approaches to deal with the situation:
"Fix" the depth of object two in the fragment shader by adjusting the gl_FragDepth according to the hyperbolic distortion your z buffer expects.
Use a linear z-buffer, aka. a w buffer.
These approaches are conceptually the inverse of each other. In both cases, you have play with gl_FragDepth so that it matches the conventions of the other render pass.
UPDATE
My understanding is because after both of projections, its NDC shape
is same as images below, its z-value after multiplying 2) perspective
matrix doesn't have to be distorted.
Well, these images show the conversion from clip space to NDC. And that transfromation is what the projection matrix followed by the perspective divide do. When it is in normalized device coords, no further distortion does occur. It is just linearily transformed to window space z according to the glDepthRange() setup.
However, according to the
derbass's good answer, if z-value in the view coordinate is multiplied
by the perspective matrix, the z-value would be hyperbolically
distorted in NDC.
The perspective matrix is applied to the complete 4D homogenous eye space vector, so it is applied to z_eye as well as to x_eye, y_eye and also w_eye (which is typically just 1, but doesn't have to).
So the resulting NDC coordinates for the perspective case are hyberbolically distorted to
f + n 2 * f * n B
z_ndc = ------- + ----------------- = A + -------
n - f (n - f) * z_eye z_eye
while, in the orthogonal case, they are just linearily transformed to
- 2 f + n
z_ndc = ------- z_eye - --------- = C * z_eye + D
f - n (f - n)
For n=1 and f=10, it will look like this (note that I plotted the range partly outside of the frustum. Clipping will prevent these values from occuring in the GL, of course).
If so, if one vertex position, for example, is [-240.0, 0.0, -100.0]
in the eye(view) coordinate with [w:480.0,h:320.0], and I clipped it
with [-0.01,-100], would it be [-1,0,-1] or [something>=-1,0,-1] in
NDC ? And its z value is still same as -1, isn't it? when its z-value
is distorted?
Points at the far plane are always transformed to z_ndc=1, and points at the near plane to z_ndc=-1. This is how the projection matrices were constructed, and this is exactly where the two graphs in the plot above intersect. So for these trivial cases, the different mappings do not matter at all. But for all other distances, they will.