GLSL compute world coordinate from eye depth and screen position - opengl

I'm trying to recover WORLD position of a point knowing it's depth in EYE space, computed as follow (in a vertex shader) :
float depth = - uModelView * vec4( inPos , 1.0 ) ;
where inPos is a point in world space (Obviously, I don't want to recover this particular point, but a point where depth is expressed in that format).
And it's normalized screen position (between 0 and 1), computed as follow (in a fragment shader ) :
vec2 screen_pos = ( vec2( gl_FragCoord.xy ) - vec2( 0.5 ) ) / uScreenSize.xy ;
I can access to the following info :
uScreenSize : as it's name suggest, it's screen width and height
uCameraPos : camera position in WORLD space
and standard matrices :
uModelView : model view camera matrix
uModelViewProj : model view projection matrix
uProjMatrix : projection matrix
How can I compute position (X,Y,Z) of a point in WORLD space ? (not in EYE space)
I can't have access to other (I can't use near, far, left, right, ...) because projection matrix is not restricted to perspective or orthogonal.
Thanks in advance.

I get your question right, you have x and y as window space (and already converted to normalized device space [-1,1]), but z in eye space, and want to recosntruct the world space position.
I can't have access to other (I can't use near, far, left, right, ...)
because projection matrix is not restricted to perspective or
orthogonal.
Well, actually, there is not much besides an orthogonal or projective mapping which can be achieved by matrix multiplication in homogenous space. However, the projection matrix is sufficient, as long as it is invertible (In theory, a projection matrix could transform all points to a plane, line or a single point. In that case, some information is lost and it will never be able to reconstruct the original data. But that would be a very untypical case).
So what you can get from the projection matrix and your 2D position is actually a ray in eye space. And you can intersect this with the z=depth plane to get the point back.
So what you have to do is calculate the two points
vec4 p = inverse(uProjMatrix) * vec4 (ndc_x, ndc_y, -1, 1);
vec4 q = inverse(uProjMatrix) * vec4 (ndc_x, ndc_y, 1, 1);
which will mark two points on the ray in eye space. Do not forget to divide p and q by the respective w component to get the 3D coordinates. Now, you simply need to intersect this with your z=depth plane and get the eye space x and y. Finally, you can use the inverse of the uModelView matrix to project that point back to object space.
However, you said that you want world space. But that is impossible. You would need the view matrix to do that, but you have not listed that as a given. All you have is the compisition of the model and view matrix, and you need to know at least one of these to reconstruct the world space position. The cameraPosition is not enoguh. You also need the orientation.

Related

OpenGL vertex shader for pinhole camera model

I am trying to implement a simple OpenGL renderer that simulates a pinhole camera model (as defined for example here). Currently I use the vertex shader to map the 3D vertices to the clip space, where K in the shader contains [focal length x, focal length y, principal point x, principal point y] and zrange is the depth range of the vertices.
#version 330 core
layout (location = 0) in vec3 vin;
layout (location = 1) in vec3 cin;
layout (location = 2) in vec3 nin;
out vec3 shader_pos;
out vec3 shader_color;
out vec3 shader_normal;
uniform vec4 K;
uniform vec2 zrange;
uniform vec2 imsize;
void main() {
vec3 uvd;
uvd.x = (K[0] * vin.x + K[2] * vin.z) / vin.z;
uvd.y = (K[1] * vin.y + K[3] * vin.z) / vin.z;
uvd.x = 2 * uvd.x / (imsize[0]) - 1;
uvd.y = 2 * uvd.y / (imsize[1]) - 1;
uvd.z = 2 * (vin.z - zrange[0]) / (zrange[1] - zrange[0]) - 1;
shader_pos = uvd;
shader_color = cin;
shader_normal = nin;
gl_Position = vec4(uvd.xyz, 1.0);
}
I verify the renderings with a simple ray-tracer, however there seems to be an offset stemming from my OpenGL implementation. The depth values are different, but not by an affine offset as it would be caused by a wrong remapping (see the slanted surface on the tetrahedron, ignoring the errors on the edges).
I am trying to implement a simple OpenGL renderer that simulates a pinhole camera model.
A standard perspective projection matrix already implements a pinhole camera model. What you're doing here is just having more calculations per vertex, which could all be pre-calculated on the CPU and put in a single matrix.
The only difference is the z range. But a "pinhole camera" does not have a z range, all points are projected to the image plane. So what you want here is a pinhole camera model for x and y, and a linear mapping for z.
However, your implementation is wrong. A GPU will interpolate the z linearly in window space. That means, it will calculate the barycentric coordinates of each fragment with respect to the 2D projection of the triangle of the window. However, when using a perspective projection, and when the triangle is not excatly parallel to the image plane, those barycentric coordinates will not be those the respective 3D point would have had with respect to the actual 3D primitive before the projection.
The trick here is that since in screen space, we typically have x/z and y/z as the vertex coordinates, and when we interpolate linaerily inbetween that, we also have to interpolate 1/z for the depth. However, in reality, we don't divide by z, but w (and let the projection matrix set w_clip = [+/-]z_eye for us). After the division by w_clip, we get a hyperbolic mapping of the z value, but with the nice property that it can be linearly interpolated in window space.
What this means is that by your use of a linear z mapping, your primitives now would have to be bend along the z dimension to get the correct result. Look at the following top-down view of the situation. The "lines" represent flat triangles, looked from straight above:
In eye space, the view rays would all go from the origin through each pixel (we could imagine the 2D pixel raster on the near plane, for example). In NDC, we have transformed this to an orthograhic projection. The pixels still can be imagined at the near plane, but all view rays now are parallel.
In the standard hyperbolical mapping, the point in the middle of the frustum is compressed much towards the end. However, the traingle still is flat.
If you use a linear mapping instead, your triangle would have not to be flat any more. Look for example at the intersection point between the two traingles. It must have the same x (and y) coordinate as in the hyperbolic case, for the correct result.
However, you only transform the vertices according to a linear z value, the GPU will still linearly interpolate the result, so in your case, you would get straight connections between your transformed points, your intersection point between the two triangles is moved, and your depth values are all wrong except for the actual vertex points itself.
If you want to use a linear depth buffer, you have to correct the depth of each fragment in the fragment shader, to implement the required non-linear interpolation on your own. Doing so would break a lot of the clever depth test optimizations GPUs do, notably early Z and hierachical Z, so while it is possible, you'l loose some performance.
The much better solution is: Just use a standard hyperbolic depth value. Just linearize the depth values after you read them back. Also, don't do the z Division in the vertex shader. You do not only break z this way, you also break the perspective-corrected interpolation of the varyings, so your shading will also be wrong. Let the GPU do the division, just shuffle the correct value into gl_Position.w. The GPU will internally not only do the divide, the perspective corrected interpolation also depends on w.

Negative values for gl_Position.w?

Is the w component of gl_Position required to be greater than zero? Because when I set it to a negative number nothing is drawn but positive numbers are fine.
gl_Position = vec4(vPos,0,-1);
Face culling is not enabled btw.
Has the w component of gl_Position required to be greater than zero?
No, but the result of gl_Position.xyz / gl_Position.w has to be in the range (-1,-1,-1) to (1,1,1), the normalized device space. This means each component (x, y and z) of the result, has to be >= -1.0 and <= 1.0.
But, if the w component is negative, nothing is draw anyway. Because gl_Position defines the clip space. The condition for a homogeneous coordinate to be in clip space is
-w <= x, y, z <= w.
If w = -1 this would mean:
1 <= x, y, z <= -1.
and that can never be fulfilled.
(see Why does GL divide gl_Position by W for you rather than letting you do it yourself?)
Explanation:
The coordinates which are set to gl_Position are Homogeneous coordinates. If the w component of a Homogeneous coordinate is 1, it is equal to the Cartesian coordinate built of the components xyz.
Homogeneous coordinates are used for the representation of the perspective projection.
In a rendering, each mesh of the scene usually is transformed by the model matrix, the view matrix and the projection matrix.
The projection matrix describes the mapping from 3D points of a scene, to 2D points of the viewport. The projection matrix transforms from view space to the clip space, and the coordinates in the clip space are transformed to the normalized device coordinates (NDC) in the range (-1, -1, -1) to (1, 1, 1) by dividing with the w component of the clip coordinates. Every geometry which is out of the NDC is clipped.
At Orthographic Projection the coordinates in the eye space are linearly mapped to normalized device coordinates. (TCommonly the w component is 1.0)
At Perspective Projection the projection matrix describes the mapping from 3D points in the world as they are seen from of a pinhole camera, to 2D points of the viewport. The eye space coordinates in the camera frustum (a truncated pyramid) are mapped to a cube (the normalized device coordinates).

Matrix Hell - Transforming a Point in a 3D Texture to World Space

Recently I have decided to add volumetric fog to my 3D game in DirectX. The technique I am using is from the book GPU Pro 6, but it is not necessary for you to own a copy of the book in order to help me :). Basically, the volumetric fog information is stored in a 3D texture. Now, I need to transform each texel of that 3D texture to world space. The texture is view-aligned, and by that I mean the X and the Y of that texture map to the X and Y of the screen, and the Z of that texture extends forwards in front of the camera. So essentially I need a function:
float3 CalculateWorldPosition(uint3 Texel)
{
//Do math
}
I know the view matrix, and the dimensions of the 3D texture (190x90x64 or 190x90x128), the projection matrix for the screen, etc.
However that is not all, unfortunately.
The depth buffer in DirectX is not linear, as you may know. This same effect needs to be applied to my 3D texture - texels need to be skewed so there are more near the camera than far, since detail near the camera must be better than further away. However, I think I have got a function to do this, correct me if I'm wrong:
//Where depth = 0, the texel is closest to the camera.
//Where depth = 1, the texel is the furthest from the camera.
//This function returns a new Z value between 0 and 1, skewing it
// so more Z values are near the camera.
float GetExponentialDepth(float depth /*0 to 1*/)
{
depth = 1.0f - depth;
//Near and far planes
float near = 1.0f;
//g_WorldDepth is the depth of the 3D texture in world/view space
float far = g_WorldDepth;
float linearZ = -(near + depth * (far - near));
float a = (2.0f * near * far) / (near - far);
float b = (far + near) / (near - far);
float result = (a / -linearZ) - b;
return -result * 0.5f + 0.5f;
}
Here is my current function that tries to find the world position from the texel (note that it is wrong):
float3 CalculateWorldPos(uint3 texel)
{
//Divide the texel by the dimensions, to get a value between 0 and 1 for
// each of the components
float3 pos = (float3)texel * float3(1.0f / 190.0f, 1.0f / 90.0f, 1.0f / (float)(g_Depth-1));
pos.xy = 2.0f * pos.xy - float2(1.0f, 1.0f);
//Skew the depth
pos.z = GetExponentialDepth(pos.z);
//Multiply this point, which should be in NDC coordinates,
// by the inverse of (View * Proj)
return mul(float4(pos, 1.0f), g_InverseViewProj).xyz;
}
However, projection matrices are also a little confusing to me, so here is the line that gets the projection matrix for the 3D texture, so one can correct me if it's incorrect:
//Note that the X and Y of the texture is 190 and 90 respectively.
//m_WorldDepth is the depth of the cuboid in world space.
XMMatrixPerspectiveFovLH(pCamera->GetFovY(), 190.0f / 90.0f, 1.0f, m_WorldDepth)
Also, I have read that projection matrices are not invertible (their inverse does not exist). If that is true, then maybe finding the inverse of (View * Proj) is incorrect, I'm not sure.
So, just to reiterate the question, given a 3D texture coordinate to a view-aligned cuboid, how can I find the world position of that point?
Thanks so much in advance, this problem has eaten up a lot of my time!
Let me first explain what the perspective projection matrix does.
The perspective projection matrix transforms a vector from view space to clip space, such that the x/y coordinates correspond to the horizontal/vertical position on the screen and the z coordinate corresponds to the depth. A vertex that is positioned znear units away from the camera is mapped to depth 0. A vertex that is positioned zfar units away from the camera is mapped to depth 1. The depth values right behind znear increase very quickly, whereas the depth values right in front of zfar only change slowly.
Specifically, given a z-coordinate, the resulting depth is:
depth = zfar / (zfar - znear) * (z - znear) / z
If you draw the frustum with lines after even spaces in depth (e.g. after every 0.1), you get cells. And the cells in the front are thinner than those in the back. If you draw enough cells, these cells map to your texels. In this configuration, it is exactly as you wish. There are more cells in the front (resulting in a higher resolution) than in the back. So you can just use the texel coordinate as the depth value (normalized to the [0,1] range). Here is the standard back projection for a given depth value into view space (assuming znear=1, zfar=10)
Your code doesn't work because of this line:
return mul(float4(pos, 1.0f), g_InverseViewProj).xyz;
There is a reason why we use 4D vectors and matrices. If you just throw the fourth dimension away, you get the wrong result. Instead, do the w-clip:
float4 transformed = mul(float4(pos, 1.0f), g_InverseViewProj);
return (transformed / transformed.w).xyz;
Btw, the 4D perspective projection matrix is perfectly invertible. Only if you remove one dimension, you get a non-quadratic matrix, which is not invertible. But that's not what we usually do in computer graphics. However, these matrices are also called projections (but in a different context).

openGL ray pick

The general Ray picking process should be as follows(experiment result proved to be right):
transform screen point to normalized device space direction vector:
float x = (2.0f * mouse_x) / width - 1.0f;
float y = 1.0f - (2.0f * mouse_y) / height;
float z = 1.0f;
vec3 ray_nds = vec3 (x, y, z);
transform direction vector to Homogeneous Clip Coordinates
vec4 ray_clip = vec4 (ray_nds.xy, -1.0, 1.0);
transform direction vector to eye space direction vector
vec4 ray_eye = inverse (projection_matrix) * ray_clip;
transform direction vector to world space, get a pick ray with world space camera position and the direction vector
My problem is, in normalized device space, why the z component of the direction vector is 1.0?
I mean, in OpenGL normalized device space, xyz component should all be in the range of -1~1, so the camera should be in the center of the plane z=-1. So the direction vector should be: view target position - camera position, and the z component should be 1-(-1)=2.0f. (in DirectX normalized device space, xy component is in the range of -1~1, z component is in the range of 0~1, the camera position should be in the center of the plane z=0, say, (0,0,0), and the z component of the direction vector should be 1-0=1)
ray_nds.z is completely irrelevant, because you don't use it anyway. That's because you don't know the pixel's depth.
ray_clip is not a direction, but a position on the near clipping plane (z=-1) after projection. If you undo this projection (with the inverse projection matrix) you end up with the same point in camera space. In camera space, the camera is centered at (0, 0, 0). The direction vector of the ray can be calculated with ray_eye - (0, 0, 0), which is essentially ray_eye. So if we ignore the w-component, we can use the position as a direction. This does only work in camera space! Both clip space and world space are most likely to have the projection center somewhere else.
Don't mix up the camera position in the different spaces. In camera space, it is at the origin. In clip space it can be assumed to be at (0, 0, -infinity). The point (x, y, ...) is just an arbitrary point that is covered by the according pixel. And you need any of them to define the ray.
the camera is NOT located at z=-1 (or 0) it is even behind that.
The near clip plane is located at z=-1. this makes all the complexity of this kind of maths, because all equations involves if traced would make curves that don't pass through 0. because of that we always carry over lots of zn and zm.
check that out, equation 4.2 : http://www.arcsynthesis.org/gltut/Positioning/Tut04%20Perspective%20Projection.html.
even scarier but more complete: http://www.songho.ca/opengl/gl_projectionmatrix.html
more links:
http://unspecified.wordpress.com/2012/06/21/calculating-the-gluperspective-matrix-and-other-opengl-matrix-maths/
http://schabby.de/projection-matrix/

C++/OpenGL convert world coords to screen(2D) coords

I am making a game in OpenGL where I have a few objects within the world space. I want to make a function where I can take in an object's location (3D) and transform it to the screen's location (2D) and return it.
I know the the 3D location of the object, projection matrix and view matrix in the following varibles:
Matrix projectionMatrix;
Matrix viewMatrix;
Vector3 point3D;
To do this transform, you must first take your model-space positions and transform them to clip-space. This is done with matrix multiplies. I will use GLSL-style code to make it obvious what I'm doing:
vec4 clipSpacePos = projectionMatrix * (viewMatrix * vec4(point3D, 1.0));
Notice how I convert your 3D vector into a 4D vector before the multiplication. This is necessary because the matrices are 4x4, and you cannot multiply a 4x4 matrix with a 3D vector. You need a fourth component.
The next step is to transform this position from clip-space to normalized device coordinate space (NDC space). NDC space is on the range [-1, 1] in all three axes. This is done by dividing the first three coordinates by the fourth:
vec3 ndcSpacePos = clipSpacePos.xyz / clipSpacePos.w;
Obviously, if clipSpacePos.w is zero, you have a problem, so you should check that beforehand. If it is zero, then that means that the object is in the plane of projection; it's view-space depth is zero. And such vertices are automatically clipped by OpenGL.
The next step is to transform from this [-1, 1] space to window-relative coordinates. This requires the use of the values you passed to glViewport. The first two parameters are the offset from the bottom-left of the window (vec2 viewOffset), and the second two parameters are the width/height of the viewport area (vec2 viewSize). Given these, the window-space position is:
vec2 windowSpacePos = ((ndcSpacePos.xy + 1.0) / 2.0) * viewSize + viewOffset;
And that's as far as you go. Remember: OpenGL's window-space is relative to the bottom-left of the window, not the top-left.