openGL ray pick - c++

The general Ray picking process should be as follows(experiment result proved to be right):
transform screen point to normalized device space direction vector:
float x = (2.0f * mouse_x) / width - 1.0f;
float y = 1.0f - (2.0f * mouse_y) / height;
float z = 1.0f;
vec3 ray_nds = vec3 (x, y, z);
transform direction vector to Homogeneous Clip Coordinates
vec4 ray_clip = vec4 (ray_nds.xy, -1.0, 1.0);
transform direction vector to eye space direction vector
vec4 ray_eye = inverse (projection_matrix) * ray_clip;
transform direction vector to world space, get a pick ray with world space camera position and the direction vector
My problem is, in normalized device space, why the z component of the direction vector is 1.0?
I mean, in OpenGL normalized device space, xyz component should all be in the range of -1~1, so the camera should be in the center of the plane z=-1. So the direction vector should be: view target position - camera position, and the z component should be 1-(-1)=2.0f. (in DirectX normalized device space, xy component is in the range of -1~1, z component is in the range of 0~1, the camera position should be in the center of the plane z=0, say, (0,0,0), and the z component of the direction vector should be 1-0=1)

ray_nds.z is completely irrelevant, because you don't use it anyway. That's because you don't know the pixel's depth.
ray_clip is not a direction, but a position on the near clipping plane (z=-1) after projection. If you undo this projection (with the inverse projection matrix) you end up with the same point in camera space. In camera space, the camera is centered at (0, 0, 0). The direction vector of the ray can be calculated with ray_eye - (0, 0, 0), which is essentially ray_eye. So if we ignore the w-component, we can use the position as a direction. This does only work in camera space! Both clip space and world space are most likely to have the projection center somewhere else.
Don't mix up the camera position in the different spaces. In camera space, it is at the origin. In clip space it can be assumed to be at (0, 0, -infinity). The point (x, y, ...) is just an arbitrary point that is covered by the according pixel. And you need any of them to define the ray.

the camera is NOT located at z=-1 (or 0) it is even behind that.
The near clip plane is located at z=-1. this makes all the complexity of this kind of maths, because all equations involves if traced would make curves that don't pass through 0. because of that we always carry over lots of zn and zm.
check that out, equation 4.2 : http://www.arcsynthesis.org/gltut/Positioning/Tut04%20Perspective%20Projection.html.
even scarier but more complete: http://www.songho.ca/opengl/gl_projectionmatrix.html
more links:
http://unspecified.wordpress.com/2012/06/21/calculating-the-gluperspective-matrix-and-other-opengl-matrix-maths/
http://schabby.de/projection-matrix/

Related

Understanding the OpenGL projection matrix

I've been writing a program to display 3d models using OpenGL and until now I've used orthographic projection, but I want to switch to a perspective projection so that as the camera goes toward the model it appears to get larger. I understand that I have to multiply three matrices (model, view, and projection) together to correctly apply all of my transformations. As you can see in the following code, I have attempted to do that, and was able to correctly create the model and view matrices. I know these work properly because when I multiply the model and view projections together I can rotate and translate the object, as well as change the position and angle of the camera. My problem is that when I multiply that product by the projection matrix I can no longer see the object on the screen.
The default value for the camera struct here is {0,0,-.5} but I manipulate that value with the keyboard to move the camera around.
I am using GLFW+glad, and linmath.h for the matrix math.
//The model matrix controls where the object is positioned. The
//identity matrix means no transformations.
mat4x4_identity(m);
//Apply model transformations here.
//The view matrix controls camera position and angle.
vec3 eye={camera.x,camera.y,camera.z};
vec3 center={camera.x,camera.y,camera.z+1};
vec3 up={0,1,0};
mat4x4_look_at(v,eye,center,up);
//The projection matrix flattens the world to 2d to be rendered on a
//screen.
mat4x4_perspective(p, 1.57, width/(float)height, 1,10); //FOV of 90°
//mat4x4_ortho(p, -ratio, ratio, -1.f, 1.f, 1.f, -1.f);
//Apply the transformations. mvp=p*v*m.
mat4x4_mul(mvp, p, v);
mat4x4_mul(mvp, mvp, m);
When the perspective projection matrix is set up, then the distance to the near plan and far plane are set. In your case this is 1 for the near plane and 10 for the far plane:
mat4x4_perspective(p, 1.57, width/(float)height, 1,10);
The model is clipped by the near plane. The model has to be in clip space.
The eye space coordinates in the camera frustum (a truncated pyramid) are mapped to a cube (the normalized device coordinates).
All the geometry which is not in the volume of the frustum is clipped.
This means the distance of the model to the camera has to be greater than the distance to the near plane (1) and less than the distance to the far plane (10).
Since you can "see" the model when you don't use any projection matrix, the actual distance to the model is in range [-1, 1] (normalize device space). Note if you don't use a projection matrix, then the projection matrix is the identity matrix. This behaves like an orthographic projection, with a near plane distance of -1 and a far plane distance of 1.
Change the position of the camera to solve the issue:
e.g.
vec3 eye = {camera.x, camera.y, camera.z - 5}; // <--- 5 is in range [1, 10]
vec3 center = {camera.x, camera.y, camera.z};
vec3 up = {0, 1, 0};
mat4x4_look_at(v, eye, center, up);

Negative values for gl_Position.w?

Is the w component of gl_Position required to be greater than zero? Because when I set it to a negative number nothing is drawn but positive numbers are fine.
gl_Position = vec4(vPos,0,-1);
Face culling is not enabled btw.
Has the w component of gl_Position required to be greater than zero?
No, but the result of gl_Position.xyz / gl_Position.w has to be in the range (-1,-1,-1) to (1,1,1), the normalized device space. This means each component (x, y and z) of the result, has to be >= -1.0 and <= 1.0.
But, if the w component is negative, nothing is draw anyway. Because gl_Position defines the clip space. The condition for a homogeneous coordinate to be in clip space is
-w <= x, y, z <= w.
If w = -1 this would mean:
1 <= x, y, z <= -1.
and that can never be fulfilled.
(see Why does GL divide gl_Position by W for you rather than letting you do it yourself?)
Explanation:
The coordinates which are set to gl_Position are Homogeneous coordinates. If the w component of a Homogeneous coordinate is 1, it is equal to the Cartesian coordinate built of the components xyz.
Homogeneous coordinates are used for the representation of the perspective projection.
In a rendering, each mesh of the scene usually is transformed by the model matrix, the view matrix and the projection matrix.
The projection matrix describes the mapping from 3D points of a scene, to 2D points of the viewport. The projection matrix transforms from view space to the clip space, and the coordinates in the clip space are transformed to the normalized device coordinates (NDC) in the range (-1, -1, -1) to (1, 1, 1) by dividing with the w component of the clip coordinates. Every geometry which is out of the NDC is clipped.
At Orthographic Projection the coordinates in the eye space are linearly mapped to normalized device coordinates. (TCommonly the w component is 1.0)
At Perspective Projection the projection matrix describes the mapping from 3D points in the world as they are seen from of a pinhole camera, to 2D points of the viewport. The eye space coordinates in the camera frustum (a truncated pyramid) are mapped to a cube (the normalized device coordinates).

Matrix Hell - Transforming a Point in a 3D Texture to World Space

Recently I have decided to add volumetric fog to my 3D game in DirectX. The technique I am using is from the book GPU Pro 6, but it is not necessary for you to own a copy of the book in order to help me :). Basically, the volumetric fog information is stored in a 3D texture. Now, I need to transform each texel of that 3D texture to world space. The texture is view-aligned, and by that I mean the X and the Y of that texture map to the X and Y of the screen, and the Z of that texture extends forwards in front of the camera. So essentially I need a function:
float3 CalculateWorldPosition(uint3 Texel)
{
//Do math
}
I know the view matrix, and the dimensions of the 3D texture (190x90x64 or 190x90x128), the projection matrix for the screen, etc.
However that is not all, unfortunately.
The depth buffer in DirectX is not linear, as you may know. This same effect needs to be applied to my 3D texture - texels need to be skewed so there are more near the camera than far, since detail near the camera must be better than further away. However, I think I have got a function to do this, correct me if I'm wrong:
//Where depth = 0, the texel is closest to the camera.
//Where depth = 1, the texel is the furthest from the camera.
//This function returns a new Z value between 0 and 1, skewing it
// so more Z values are near the camera.
float GetExponentialDepth(float depth /*0 to 1*/)
{
depth = 1.0f - depth;
//Near and far planes
float near = 1.0f;
//g_WorldDepth is the depth of the 3D texture in world/view space
float far = g_WorldDepth;
float linearZ = -(near + depth * (far - near));
float a = (2.0f * near * far) / (near - far);
float b = (far + near) / (near - far);
float result = (a / -linearZ) - b;
return -result * 0.5f + 0.5f;
}
Here is my current function that tries to find the world position from the texel (note that it is wrong):
float3 CalculateWorldPos(uint3 texel)
{
//Divide the texel by the dimensions, to get a value between 0 and 1 for
// each of the components
float3 pos = (float3)texel * float3(1.0f / 190.0f, 1.0f / 90.0f, 1.0f / (float)(g_Depth-1));
pos.xy = 2.0f * pos.xy - float2(1.0f, 1.0f);
//Skew the depth
pos.z = GetExponentialDepth(pos.z);
//Multiply this point, which should be in NDC coordinates,
// by the inverse of (View * Proj)
return mul(float4(pos, 1.0f), g_InverseViewProj).xyz;
}
However, projection matrices are also a little confusing to me, so here is the line that gets the projection matrix for the 3D texture, so one can correct me if it's incorrect:
//Note that the X and Y of the texture is 190 and 90 respectively.
//m_WorldDepth is the depth of the cuboid in world space.
XMMatrixPerspectiveFovLH(pCamera->GetFovY(), 190.0f / 90.0f, 1.0f, m_WorldDepth)
Also, I have read that projection matrices are not invertible (their inverse does not exist). If that is true, then maybe finding the inverse of (View * Proj) is incorrect, I'm not sure.
So, just to reiterate the question, given a 3D texture coordinate to a view-aligned cuboid, how can I find the world position of that point?
Thanks so much in advance, this problem has eaten up a lot of my time!
Let me first explain what the perspective projection matrix does.
The perspective projection matrix transforms a vector from view space to clip space, such that the x/y coordinates correspond to the horizontal/vertical position on the screen and the z coordinate corresponds to the depth. A vertex that is positioned znear units away from the camera is mapped to depth 0. A vertex that is positioned zfar units away from the camera is mapped to depth 1. The depth values right behind znear increase very quickly, whereas the depth values right in front of zfar only change slowly.
Specifically, given a z-coordinate, the resulting depth is:
depth = zfar / (zfar - znear) * (z - znear) / z
If you draw the frustum with lines after even spaces in depth (e.g. after every 0.1), you get cells. And the cells in the front are thinner than those in the back. If you draw enough cells, these cells map to your texels. In this configuration, it is exactly as you wish. There are more cells in the front (resulting in a higher resolution) than in the back. So you can just use the texel coordinate as the depth value (normalized to the [0,1] range). Here is the standard back projection for a given depth value into view space (assuming znear=1, zfar=10)
Your code doesn't work because of this line:
return mul(float4(pos, 1.0f), g_InverseViewProj).xyz;
There is a reason why we use 4D vectors and matrices. If you just throw the fourth dimension away, you get the wrong result. Instead, do the w-clip:
float4 transformed = mul(float4(pos, 1.0f), g_InverseViewProj);
return (transformed / transformed.w).xyz;
Btw, the 4D perspective projection matrix is perfectly invertible. Only if you remove one dimension, you get a non-quadratic matrix, which is not invertible. But that's not what we usually do in computer graphics. However, these matrices are also called projections (but in a different context).

GLSL compute world coordinate from eye depth and screen position

I'm trying to recover WORLD position of a point knowing it's depth in EYE space, computed as follow (in a vertex shader) :
float depth = - uModelView * vec4( inPos , 1.0 ) ;
where inPos is a point in world space (Obviously, I don't want to recover this particular point, but a point where depth is expressed in that format).
And it's normalized screen position (between 0 and 1), computed as follow (in a fragment shader ) :
vec2 screen_pos = ( vec2( gl_FragCoord.xy ) - vec2( 0.5 ) ) / uScreenSize.xy ;
I can access to the following info :
uScreenSize : as it's name suggest, it's screen width and height
uCameraPos : camera position in WORLD space
and standard matrices :
uModelView : model view camera matrix
uModelViewProj : model view projection matrix
uProjMatrix : projection matrix
How can I compute position (X,Y,Z) of a point in WORLD space ? (not in EYE space)
I can't have access to other (I can't use near, far, left, right, ...) because projection matrix is not restricted to perspective or orthogonal.
Thanks in advance.
I get your question right, you have x and y as window space (and already converted to normalized device space [-1,1]), but z in eye space, and want to recosntruct the world space position.
I can't have access to other (I can't use near, far, left, right, ...)
because projection matrix is not restricted to perspective or
orthogonal.
Well, actually, there is not much besides an orthogonal or projective mapping which can be achieved by matrix multiplication in homogenous space. However, the projection matrix is sufficient, as long as it is invertible (In theory, a projection matrix could transform all points to a plane, line or a single point. In that case, some information is lost and it will never be able to reconstruct the original data. But that would be a very untypical case).
So what you can get from the projection matrix and your 2D position is actually a ray in eye space. And you can intersect this with the z=depth plane to get the point back.
So what you have to do is calculate the two points
vec4 p = inverse(uProjMatrix) * vec4 (ndc_x, ndc_y, -1, 1);
vec4 q = inverse(uProjMatrix) * vec4 (ndc_x, ndc_y, 1, 1);
which will mark two points on the ray in eye space. Do not forget to divide p and q by the respective w component to get the 3D coordinates. Now, you simply need to intersect this with your z=depth plane and get the eye space x and y. Finally, you can use the inverse of the uModelView matrix to project that point back to object space.
However, you said that you want world space. But that is impossible. You would need the view matrix to do that, but you have not listed that as a given. All you have is the compisition of the model and view matrix, and you need to know at least one of these to reconstruct the world space position. The cameraPosition is not enoguh. You also need the orientation.

C++/OpenGL convert world coords to screen(2D) coords

I am making a game in OpenGL where I have a few objects within the world space. I want to make a function where I can take in an object's location (3D) and transform it to the screen's location (2D) and return it.
I know the the 3D location of the object, projection matrix and view matrix in the following varibles:
Matrix projectionMatrix;
Matrix viewMatrix;
Vector3 point3D;
To do this transform, you must first take your model-space positions and transform them to clip-space. This is done with matrix multiplies. I will use GLSL-style code to make it obvious what I'm doing:
vec4 clipSpacePos = projectionMatrix * (viewMatrix * vec4(point3D, 1.0));
Notice how I convert your 3D vector into a 4D vector before the multiplication. This is necessary because the matrices are 4x4, and you cannot multiply a 4x4 matrix with a 3D vector. You need a fourth component.
The next step is to transform this position from clip-space to normalized device coordinate space (NDC space). NDC space is on the range [-1, 1] in all three axes. This is done by dividing the first three coordinates by the fourth:
vec3 ndcSpacePos = clipSpacePos.xyz / clipSpacePos.w;
Obviously, if clipSpacePos.w is zero, you have a problem, so you should check that beforehand. If it is zero, then that means that the object is in the plane of projection; it's view-space depth is zero. And such vertices are automatically clipped by OpenGL.
The next step is to transform from this [-1, 1] space to window-relative coordinates. This requires the use of the values you passed to glViewport. The first two parameters are the offset from the bottom-left of the window (vec2 viewOffset), and the second two parameters are the width/height of the viewport area (vec2 viewSize). Given these, the window-space position is:
vec2 windowSpacePos = ((ndcSpacePos.xy + 1.0) / 2.0) * viewSize + viewOffset;
And that's as far as you go. Remember: OpenGL's window-space is relative to the bottom-left of the window, not the top-left.