C++/OpenGL convert world coords to screen(2D) coords - c++

I am making a game in OpenGL where I have a few objects within the world space. I want to make a function where I can take in an object's location (3D) and transform it to the screen's location (2D) and return it.
I know the the 3D location of the object, projection matrix and view matrix in the following varibles:
Matrix projectionMatrix;
Matrix viewMatrix;
Vector3 point3D;

To do this transform, you must first take your model-space positions and transform them to clip-space. This is done with matrix multiplies. I will use GLSL-style code to make it obvious what I'm doing:
vec4 clipSpacePos = projectionMatrix * (viewMatrix * vec4(point3D, 1.0));
Notice how I convert your 3D vector into a 4D vector before the multiplication. This is necessary because the matrices are 4x4, and you cannot multiply a 4x4 matrix with a 3D vector. You need a fourth component.
The next step is to transform this position from clip-space to normalized device coordinate space (NDC space). NDC space is on the range [-1, 1] in all three axes. This is done by dividing the first three coordinates by the fourth:
vec3 ndcSpacePos = clipSpacePos.xyz / clipSpacePos.w;
Obviously, if clipSpacePos.w is zero, you have a problem, so you should check that beforehand. If it is zero, then that means that the object is in the plane of projection; it's view-space depth is zero. And such vertices are automatically clipped by OpenGL.
The next step is to transform from this [-1, 1] space to window-relative coordinates. This requires the use of the values you passed to glViewport. The first two parameters are the offset from the bottom-left of the window (vec2 viewOffset), and the second two parameters are the width/height of the viewport area (vec2 viewSize). Given these, the window-space position is:
vec2 windowSpacePos = ((ndcSpacePos.xy + 1.0) / 2.0) * viewSize + viewOffset;
And that's as far as you go. Remember: OpenGL's window-space is relative to the bottom-left of the window, not the top-left.

Related

Understanding the OpenGL projection matrix

I've been writing a program to display 3d models using OpenGL and until now I've used orthographic projection, but I want to switch to a perspective projection so that as the camera goes toward the model it appears to get larger. I understand that I have to multiply three matrices (model, view, and projection) together to correctly apply all of my transformations. As you can see in the following code, I have attempted to do that, and was able to correctly create the model and view matrices. I know these work properly because when I multiply the model and view projections together I can rotate and translate the object, as well as change the position and angle of the camera. My problem is that when I multiply that product by the projection matrix I can no longer see the object on the screen.
The default value for the camera struct here is {0,0,-.5} but I manipulate that value with the keyboard to move the camera around.
I am using GLFW+glad, and linmath.h for the matrix math.
//The model matrix controls where the object is positioned. The
//identity matrix means no transformations.
mat4x4_identity(m);
//Apply model transformations here.
//The view matrix controls camera position and angle.
vec3 eye={camera.x,camera.y,camera.z};
vec3 center={camera.x,camera.y,camera.z+1};
vec3 up={0,1,0};
mat4x4_look_at(v,eye,center,up);
//The projection matrix flattens the world to 2d to be rendered on a
//screen.
mat4x4_perspective(p, 1.57, width/(float)height, 1,10); //FOV of 90°
//mat4x4_ortho(p, -ratio, ratio, -1.f, 1.f, 1.f, -1.f);
//Apply the transformations. mvp=p*v*m.
mat4x4_mul(mvp, p, v);
mat4x4_mul(mvp, mvp, m);
When the perspective projection matrix is set up, then the distance to the near plan and far plane are set. In your case this is 1 for the near plane and 10 for the far plane:
mat4x4_perspective(p, 1.57, width/(float)height, 1,10);
The model is clipped by the near plane. The model has to be in clip space.
The eye space coordinates in the camera frustum (a truncated pyramid) are mapped to a cube (the normalized device coordinates).
All the geometry which is not in the volume of the frustum is clipped.
This means the distance of the model to the camera has to be greater than the distance to the near plane (1) and less than the distance to the far plane (10).
Since you can "see" the model when you don't use any projection matrix, the actual distance to the model is in range [-1, 1] (normalize device space). Note if you don't use a projection matrix, then the projection matrix is the identity matrix. This behaves like an orthographic projection, with a near plane distance of -1 and a far plane distance of 1.
Change the position of the camera to solve the issue:
e.g.
vec3 eye = {camera.x, camera.y, camera.z - 5}; // <--- 5 is in range [1, 10]
vec3 center = {camera.x, camera.y, camera.z};
vec3 up = {0, 1, 0};
mat4x4_look_at(v, eye, center, up);

OpenGL vertex shader for pinhole camera model

I am trying to implement a simple OpenGL renderer that simulates a pinhole camera model (as defined for example here). Currently I use the vertex shader to map the 3D vertices to the clip space, where K in the shader contains [focal length x, focal length y, principal point x, principal point y] and zrange is the depth range of the vertices.
#version 330 core
layout (location = 0) in vec3 vin;
layout (location = 1) in vec3 cin;
layout (location = 2) in vec3 nin;
out vec3 shader_pos;
out vec3 shader_color;
out vec3 shader_normal;
uniform vec4 K;
uniform vec2 zrange;
uniform vec2 imsize;
void main() {
vec3 uvd;
uvd.x = (K[0] * vin.x + K[2] * vin.z) / vin.z;
uvd.y = (K[1] * vin.y + K[3] * vin.z) / vin.z;
uvd.x = 2 * uvd.x / (imsize[0]) - 1;
uvd.y = 2 * uvd.y / (imsize[1]) - 1;
uvd.z = 2 * (vin.z - zrange[0]) / (zrange[1] - zrange[0]) - 1;
shader_pos = uvd;
shader_color = cin;
shader_normal = nin;
gl_Position = vec4(uvd.xyz, 1.0);
}
I verify the renderings with a simple ray-tracer, however there seems to be an offset stemming from my OpenGL implementation. The depth values are different, but not by an affine offset as it would be caused by a wrong remapping (see the slanted surface on the tetrahedron, ignoring the errors on the edges).
I am trying to implement a simple OpenGL renderer that simulates a pinhole camera model.
A standard perspective projection matrix already implements a pinhole camera model. What you're doing here is just having more calculations per vertex, which could all be pre-calculated on the CPU and put in a single matrix.
The only difference is the z range. But a "pinhole camera" does not have a z range, all points are projected to the image plane. So what you want here is a pinhole camera model for x and y, and a linear mapping for z.
However, your implementation is wrong. A GPU will interpolate the z linearly in window space. That means, it will calculate the barycentric coordinates of each fragment with respect to the 2D projection of the triangle of the window. However, when using a perspective projection, and when the triangle is not excatly parallel to the image plane, those barycentric coordinates will not be those the respective 3D point would have had with respect to the actual 3D primitive before the projection.
The trick here is that since in screen space, we typically have x/z and y/z as the vertex coordinates, and when we interpolate linaerily inbetween that, we also have to interpolate 1/z for the depth. However, in reality, we don't divide by z, but w (and let the projection matrix set w_clip = [+/-]z_eye for us). After the division by w_clip, we get a hyperbolic mapping of the z value, but with the nice property that it can be linearly interpolated in window space.
What this means is that by your use of a linear z mapping, your primitives now would have to be bend along the z dimension to get the correct result. Look at the following top-down view of the situation. The "lines" represent flat triangles, looked from straight above:
In eye space, the view rays would all go from the origin through each pixel (we could imagine the 2D pixel raster on the near plane, for example). In NDC, we have transformed this to an orthograhic projection. The pixels still can be imagined at the near plane, but all view rays now are parallel.
In the standard hyperbolical mapping, the point in the middle of the frustum is compressed much towards the end. However, the traingle still is flat.
If you use a linear mapping instead, your triangle would have not to be flat any more. Look for example at the intersection point between the two traingles. It must have the same x (and y) coordinate as in the hyperbolic case, for the correct result.
However, you only transform the vertices according to a linear z value, the GPU will still linearly interpolate the result, so in your case, you would get straight connections between your transformed points, your intersection point between the two triangles is moved, and your depth values are all wrong except for the actual vertex points itself.
If you want to use a linear depth buffer, you have to correct the depth of each fragment in the fragment shader, to implement the required non-linear interpolation on your own. Doing so would break a lot of the clever depth test optimizations GPUs do, notably early Z and hierachical Z, so while it is possible, you'l loose some performance.
The much better solution is: Just use a standard hyperbolic depth value. Just linearize the depth values after you read them back. Also, don't do the z Division in the vertex shader. You do not only break z this way, you also break the perspective-corrected interpolation of the varyings, so your shading will also be wrong. Let the GPU do the division, just shuffle the correct value into gl_Position.w. The GPU will internally not only do the divide, the perspective corrected interpolation also depends on w.

Matrix Hell - Transforming a Point in a 3D Texture to World Space

Recently I have decided to add volumetric fog to my 3D game in DirectX. The technique I am using is from the book GPU Pro 6, but it is not necessary for you to own a copy of the book in order to help me :). Basically, the volumetric fog information is stored in a 3D texture. Now, I need to transform each texel of that 3D texture to world space. The texture is view-aligned, and by that I mean the X and the Y of that texture map to the X and Y of the screen, and the Z of that texture extends forwards in front of the camera. So essentially I need a function:
float3 CalculateWorldPosition(uint3 Texel)
{
//Do math
}
I know the view matrix, and the dimensions of the 3D texture (190x90x64 or 190x90x128), the projection matrix for the screen, etc.
However that is not all, unfortunately.
The depth buffer in DirectX is not linear, as you may know. This same effect needs to be applied to my 3D texture - texels need to be skewed so there are more near the camera than far, since detail near the camera must be better than further away. However, I think I have got a function to do this, correct me if I'm wrong:
//Where depth = 0, the texel is closest to the camera.
//Where depth = 1, the texel is the furthest from the camera.
//This function returns a new Z value between 0 and 1, skewing it
// so more Z values are near the camera.
float GetExponentialDepth(float depth /*0 to 1*/)
{
depth = 1.0f - depth;
//Near and far planes
float near = 1.0f;
//g_WorldDepth is the depth of the 3D texture in world/view space
float far = g_WorldDepth;
float linearZ = -(near + depth * (far - near));
float a = (2.0f * near * far) / (near - far);
float b = (far + near) / (near - far);
float result = (a / -linearZ) - b;
return -result * 0.5f + 0.5f;
}
Here is my current function that tries to find the world position from the texel (note that it is wrong):
float3 CalculateWorldPos(uint3 texel)
{
//Divide the texel by the dimensions, to get a value between 0 and 1 for
// each of the components
float3 pos = (float3)texel * float3(1.0f / 190.0f, 1.0f / 90.0f, 1.0f / (float)(g_Depth-1));
pos.xy = 2.0f * pos.xy - float2(1.0f, 1.0f);
//Skew the depth
pos.z = GetExponentialDepth(pos.z);
//Multiply this point, which should be in NDC coordinates,
// by the inverse of (View * Proj)
return mul(float4(pos, 1.0f), g_InverseViewProj).xyz;
}
However, projection matrices are also a little confusing to me, so here is the line that gets the projection matrix for the 3D texture, so one can correct me if it's incorrect:
//Note that the X and Y of the texture is 190 and 90 respectively.
//m_WorldDepth is the depth of the cuboid in world space.
XMMatrixPerspectiveFovLH(pCamera->GetFovY(), 190.0f / 90.0f, 1.0f, m_WorldDepth)
Also, I have read that projection matrices are not invertible (their inverse does not exist). If that is true, then maybe finding the inverse of (View * Proj) is incorrect, I'm not sure.
So, just to reiterate the question, given a 3D texture coordinate to a view-aligned cuboid, how can I find the world position of that point?
Thanks so much in advance, this problem has eaten up a lot of my time!
Let me first explain what the perspective projection matrix does.
The perspective projection matrix transforms a vector from view space to clip space, such that the x/y coordinates correspond to the horizontal/vertical position on the screen and the z coordinate corresponds to the depth. A vertex that is positioned znear units away from the camera is mapped to depth 0. A vertex that is positioned zfar units away from the camera is mapped to depth 1. The depth values right behind znear increase very quickly, whereas the depth values right in front of zfar only change slowly.
Specifically, given a z-coordinate, the resulting depth is:
depth = zfar / (zfar - znear) * (z - znear) / z
If you draw the frustum with lines after even spaces in depth (e.g. after every 0.1), you get cells. And the cells in the front are thinner than those in the back. If you draw enough cells, these cells map to your texels. In this configuration, it is exactly as you wish. There are more cells in the front (resulting in a higher resolution) than in the back. So you can just use the texel coordinate as the depth value (normalized to the [0,1] range). Here is the standard back projection for a given depth value into view space (assuming znear=1, zfar=10)
Your code doesn't work because of this line:
return mul(float4(pos, 1.0f), g_InverseViewProj).xyz;
There is a reason why we use 4D vectors and matrices. If you just throw the fourth dimension away, you get the wrong result. Instead, do the w-clip:
float4 transformed = mul(float4(pos, 1.0f), g_InverseViewProj);
return (transformed / transformed.w).xyz;
Btw, the 4D perspective projection matrix is perfectly invertible. Only if you remove one dimension, you get a non-quadratic matrix, which is not invertible. But that's not what we usually do in computer graphics. However, these matrices are also called projections (but in a different context).

OpenGL shadow mapping with deferred rendering, position transformation

I am using deferred rendering where i store the eye space position in a texture accordingly:
vertex:
gl_Position = vec4(vertex_position, 1.0);
geometry:
vertexOut.position = vec3(viewMatrix * modelMatrix * gl_in[i].gl_Position);
fragment:
positionOut = vec3(vertexIn.position);
Now, in the second pass (lighting pass) I am trying to sample my shadow map, using UV coordinates calculated from this vec4
vec4 lightSpacePos = lightProjectionMatrix * lightViewMatrix * lightModelMatrix * vec4(position, 1.0);
The position used is the same position stored and sampled from the position texture.
Do I need to transfrom the position with the inverse camera view matrix before doing this calculation? To bring it back to world space or how should I proceed?
Typically shadow mapping is done by comparing the window-space Z coordinate (this is what a depth texture stores) of your current fragment vs. your light. This must be done using a common reference orientation, so that involves re-projecting your current fragment's position from the perspective of your light.
You have the view-space position right now, which is relative to your current camera and not particularly useful. To do this effectively you want world-space position. You can get that if you transform the view-space position by the inverse view matrix.
Given world-space position, transform into clip-space from light's perspective:
// This will be in clip-space
vec4 lightSpacePos = lightProjectionMatrix * lightViewMatrix * vec4 (worldPos);
// Transform it into NDC-space by dividing by w
lightSpacePos /= lightSpacePos.w;
// Range is now [-1.0, 1.0], but you need [0.0, 1.0]
lightSpacePos = lightSpacePos * vec4 (0.5) + vec4 (0.5);
Assuming default depth range, lightSpacePos is now ready for use. xy contains the texture coordinates to sample from your shadow map and z contains the depth to use for comparison.
For a more thorough explanation, see the following answer.
Incidentally, you will want to eliminate your position texture from your G-Buffer to achieve reasonable performance. It is very easy to reconstruct world- or view-space position given only the depth and the projection and view matrices and the arithmetic involved is much quicker than an extra texture fetch. Storing an additional texture with adequate precision to represent position in 3D space will burn through tons of memory bandwidth each frame and is completely unnecessary.
This article from the OpenGL Wiki explains how to do this. You can take it one step farther and work back to world-space, which is more desirable than view-space. You may need to tweak your depth buffer a little bit to get adequate precision, but it will still be quicker than storing position separately.

GLSL compute world coordinate from eye depth and screen position

I'm trying to recover WORLD position of a point knowing it's depth in EYE space, computed as follow (in a vertex shader) :
float depth = - uModelView * vec4( inPos , 1.0 ) ;
where inPos is a point in world space (Obviously, I don't want to recover this particular point, but a point where depth is expressed in that format).
And it's normalized screen position (between 0 and 1), computed as follow (in a fragment shader ) :
vec2 screen_pos = ( vec2( gl_FragCoord.xy ) - vec2( 0.5 ) ) / uScreenSize.xy ;
I can access to the following info :
uScreenSize : as it's name suggest, it's screen width and height
uCameraPos : camera position in WORLD space
and standard matrices :
uModelView : model view camera matrix
uModelViewProj : model view projection matrix
uProjMatrix : projection matrix
How can I compute position (X,Y,Z) of a point in WORLD space ? (not in EYE space)
I can't have access to other (I can't use near, far, left, right, ...) because projection matrix is not restricted to perspective or orthogonal.
Thanks in advance.
I get your question right, you have x and y as window space (and already converted to normalized device space [-1,1]), but z in eye space, and want to recosntruct the world space position.
I can't have access to other (I can't use near, far, left, right, ...)
because projection matrix is not restricted to perspective or
orthogonal.
Well, actually, there is not much besides an orthogonal or projective mapping which can be achieved by matrix multiplication in homogenous space. However, the projection matrix is sufficient, as long as it is invertible (In theory, a projection matrix could transform all points to a plane, line or a single point. In that case, some information is lost and it will never be able to reconstruct the original data. But that would be a very untypical case).
So what you can get from the projection matrix and your 2D position is actually a ray in eye space. And you can intersect this with the z=depth plane to get the point back.
So what you have to do is calculate the two points
vec4 p = inverse(uProjMatrix) * vec4 (ndc_x, ndc_y, -1, 1);
vec4 q = inverse(uProjMatrix) * vec4 (ndc_x, ndc_y, 1, 1);
which will mark two points on the ray in eye space. Do not forget to divide p and q by the respective w component to get the 3D coordinates. Now, you simply need to intersect this with your z=depth plane and get the eye space x and y. Finally, you can use the inverse of the uModelView matrix to project that point back to object space.
However, you said that you want world space. But that is impossible. You would need the view matrix to do that, but you have not listed that as a given. All you have is the compisition of the model and view matrix, and you need to know at least one of these to reconstruct the world space position. The cameraPosition is not enoguh. You also need the orientation.