Calculate clipspace.w from clipspace.xyz and (inv) projection matrix - opengl

I'm using a logarithmic depth algorithmic which results in someFunc(clipspace.z) being written to the depth buffer and no implicit perspective divide.
I'm doing RTT / postprocessing so later on in a fragment shader I want to recompute eyespace.xyz, given ndc.xy (from the fragment coordinates) and clipspace.z (from someFuncInv() on the value stored in the depth buffer).
Note that I do not have clipspace.w, and my stored value is not clipspace.z / clipspace.w (as it would be when using fixed function depth) - so something along the lines of ...
float clip_z = ...; /* [-1 .. +1] */
vec2 ndc = vec2(FragCoord.xy / viewport * 2.0 - 1.0);
vec4 clipspace = InvProjMatrix * vec4(ndc, clip_z, 1.0));
clipspace /= clipspace.w;
... does not work here.
So is there a way to calculate clipspace.w out of clipspace.xyz, given the projection matrix or it's inverse?

clipspace.xy = FragCoord.xy / viewport * 2.0 - 1.0;
This is wrong in terms of nomenclature. "Clip space" is the space that the vertex shader (or whatever the last Vertex Processing stage is) outputs. Between clip space and window space is normalized device coordinate (NDC) space. NDC space is clip space divided by the clip space W coordinate:
vec3 ndcspace = clipspace.xyz / clipspace.w;
So the first step is to take our window space coordinates and get NDC space coordinates. Which is easy:
vec3 ndcspace = vec3(FragCoord.xy / viewport * 2.0 - 1.0, depth);
Now, I'm going to assume that your depth value is the proper NDC-space depth. I'm assuming that you fetch the value from a depth texture, then used the depth range near/far values it was rendered with to map it into a [-1, 1] range. If you didn't, you should.
So, now that we have ndcspace, how do we compute clipspace? Well, that's obvious:
vec4 clipspace = vec4(ndcspace * clipspace.w, clipspace.w);
Obvious and... not helpful, since we don't have clipspace.w. So how do we get it?
To get this, we need to look at how clipspace was computed the first time:
vec4 clipspace = Proj * cameraspace;
This means that clipspace.w is computed by taking cameraspace and dot-producting it by the fourth row of Proj.
Well, that's not very helpful. It gets more helpful if we actually look at the fourth row of Proj. Granted, you could be using any projection matrix, and if you're not using the typical projection matrix, this computation becomes more difficult (potentially impossible).
The fourth row of Proj, using the typical projection matrix, is really just this:
[0, 0, -1, 0]
This means that the clipspace.w is really just -cameraspace.z. How does that help us?
It helps by remembering this:
ndcspace.z = clipspace.z / clipspace.w;
ndcspace.z = clipspace.z / -cameraspace.z;
Well, that's nice, but it just trades one unknown for another; we still have an equation with two unknowns (clipspace.z and cameraspace.z). However, we do know something else: clipspace.z comes from dot-producting cameraspace with the third row of our projection matrix. The traditional projection matrix's third row looks like this:
[0, 0, T1, T2]
Where T1 and T2 are non-zero numbers. We'll ignore what these numbers are for the time being. Therefore, clipspace.z is really just T1 * cameraspace.z + T2 * cameraspace.w. And if we know cameraspace.w is 1.0 (as it usually is), then we can remove it:
ndcspace.z = (T1 * cameraspace.z + T2) / -cameraspace.z;
So, we still have a problem. Actually, we don't. Why? Because there is only one unknown in this euqation. Remember: we already know ndcspace.z. We can therefore use ndcspace.z to compute cameraspace.z:
ndcspace.z = -T1 + (-T2 / cameraspace.z);
ndcspace.z + T1 = -T2 / cameraspace.z;
cameraspace.z = -T2 / (ndcspace.z + T1);
T1 and T2 come right out of our projection matrix (the one the scene was originally rendered with). And we already have ndcspace.z. So we can compute cameraspace.z. And we know that:
clispace.w = -cameraspace.z;
Therefore, we can do this:
vec4 clipspace = vec4(ndcspace * clipspace.w, clipspace.w);
Obviously you'll need a float for clipspace.w rather than the literal code, but you get my point. Once you have clipspace, to get camera space, you multiply by the inverse projection matrix:
vec4 cameraspace = InvProj * clipspace;

Related

Motion Vector - how to calculate it properly?

I'm trying to wrap my head around calculating motion vectors (also called velocity buffer). I found this tutorial, but I'm not satisfied with explanations of how motion vector are calculated. Here is the code:
vec2 a = (vPosition.xy / vPosition.w) * 0.5 + 0.5;
vec2 b = (vPrevPosition.xy / vPrevPosition.w) * 0.5 + 0.5;
oVelocity = a - b;
Why are we multiplying our position vectors by 0.5 and then adding 0.5? I'm guessing that we're trying to get from clip space to NDC, but why? I completly don't understand that.
This is a mapping from the [-1, 1] clip space onto the [0, 1] texture space. Since lookups in the blur shader have to read from a textured at a position offset by the velocity vector, it's necessary to perform this conversion.
Note, that the + 0.5 part is actually unnecessary, since it cancels out in a-b anyway. So the same result would have been achieved by using something like
vec2 a = (vPosition.xy / vPosition.w);
vec2 b = (vPrevPosition.xy / vPrevPosition.w);
oVelocity = (a - b) * 0.5;
I don't know if there is any reason to prefer the first over the second, but my guess is that this code is written in the way it is because it builds up on a previous tutorial where the calculation had been the same.

Clipping triangles in screen space per pixel

As I understand it, in OpenGL polygons are usually clipped in clip space and only those triangles (or parts of the triangles if the clipping process splits them) that survive the comparison with +- w. This then requires implementation of a polygon clipping algorithm such as Sutherland-Hodgman.
I am implementing my own CPU rasterizer and for now would like to avoid doing that. I have the NDC coordinates of vertices available (not really normalized since I did not clip anything so the positions may not be in range [-1, 1]). I would like to interpolate these values for all pixels and only draw pixels the NDC coordinates of which fall within [-1, 1] in the x, y and z dimensions. I would then additionally perform the depth test.
Would this work? If yes what would the interpolation look like? Can I use the OpenGl spec (page 427 14.9) formula for attribute interpolation as described here? Alternatively, should I use the formula 14.10 which is used for depth (z) interpolation for all 3 coordinates (I don't really understand why a different one is used there)?
Update:
I have tried interpolating the NDC values per pixel by two methods:
w0, w1, w2 are the barycentric weights of the vertices.
1) float x_ndc = w0 * v0_NDC.x + w1 * v1_NDC.x + w2 * v2_NDC.x;
float y_ndc = w0 * v0_NDC.y + w1 * v1_NDC.y + w2 * v2_NDC.y;
float z_ndc = w0 * v0_NDC.z + w1 * v1_NDC.z + w2 * v2_NDC.z;
2)
float x_ndc = (w0*v0_NDC.x/v0_NDC.w + w1*v1_NDC.x/v1_NDC.w + w2*v2_NDC.x/v2_NDC.w) /
(w0/v0_NDC.w + w1/v1_NDC.w + w2/v2_NDC.w);
float y_ndc = (w0*v0_NDC.y/v0_NDC.w + w1*v1_NDC.y/v1_NDC.w + w2*v2_NDC.y/v2_NDC.w) /
(w0/v0_NDC.w + w1/w1_NDC.w + w2/v2_NDC.w);
float z_ndc = w0 * v0_NDC.z + w1 * v1_NDC.z + w2 * v2_NDC.z;
The clipping + depth test always looks like this:
if (-1.0f < z_ndc && z_ndc < 1.0f && z_ndc < currentDepth &&
1.0f < y_ndc && y_ndc < 1.0f &&
-1.0f < x_ndc && x_ndc < 1.0f)
Case 1) corresponds to using equation 14.10 for their interpolation. Case 2) corresponds to using equation 14.9 for interpolation.
Results documented in gifs on imgur.
1) Strange things happen when the second cube is behind the camera or when I go into a cube.
2) Strange artifacts are not visible but as the camera approaches vertices, they start disappearing. And since this is the perspective correct interpolation of attributes vertices (nearer to the camera?) have greater weight so as soon as a vertex gets clipped this information is interpolated with strong weight to the triangle pixels.
Is all of this expected or have I done something wrong?
Clipping against the near plane is not strictly necessary, unless the triangle goes to or past 0 in the camera-space Z. Once that happens, the homogeneous coordinate math gets weird.
Most hardware only bothers to clip triangles if they extend more than a screen's width outside the clip space or if they cross the camera-Z of zero. This kind of clipping is called "guard-band clipping", and it saves a lot of performance, since clipping isn't cheap.
So yes, the math can work fine. The main thing you have to do, when setting up your scan lines, is figure out where each of them start/end on screen. The interpolation math is the same either way.
I don't see any reason why this wouldn't work. But it will be ways slower than traditional clipping. Note, that you might get into trouble with triangles close to the projection center since they will be vanishingly small and might cause problems in the barycentric coordinate calculation.
The difference between equation 14.9 and 14.10 is, that depth is basically z/w (and remapped to [0, 1]). Since the perspective divide has already happened, it has to be left away during interpolation.

DirectX Converting Pixel World Position to Shadow Map Position Gives Weird, Tiled Results

I've been trying for some time now to get a screen-space pixel (provided by a deferred HLSL shader) to convert to light space. The results have been surprising to me as my light rendering seems to be tiling the depth buffer.
Importantly, the scene camera (or eye) and the light being rendered from start in the same position.
First, I extract the world position of the pixel using the code below:
float3 eye = Eye;
float4 position = {
IN.texCoord.x * 2 - 1,
(1 - IN.texCoord.y) * 2 - 1,
zbuffer.r,
1
};
float4 hposition = mul(position, EyeViewProjectionInverse);
position = float4(hposition.xyz / hposition.w, hposition.w);
float3 eyeDirection = normalize(eye - position.xyz);
The result seems to be correct as rendering the XYZ position as RGB respectively yields this (apparently correct) result:
The red component seems to be correctly outputting X as it moves to the right, and blue shows Z moving forward. The Y factor also looks correct as the ground is slightly below the Y axis.
Next (and to be sure I'm not going crazy), I decided to output the original depth buffer. Normally I keep the depth buffer in a Texture2D called DepthMap passed to the shader as input. In this case, however, I try to undo the pixel transformation by offsetting it back into the proper position and multiplying it by the eye's view-projection matrix:
float4 cpos = mul(position, EyeViewProjection);
cpos.xyz = cpos.xyz / cpos.w;
cpos.x = cpos.x * 0.5f + 0.5f;
cpos.y = 1 - (cpos.y * 0.5f + 0.5f);
float camera_depth = pow(DepthMap.Sample(Sampler, cpos.xy).r, 100); // Power 100 just to visualize the map since scales are really tiny
return float4(camera_depth, camera_depth, camera_depth, 1);
This yields a correct looking result as well (though I'm not 100% sure about the Z value). Also note that I've made the results exponential to better visualize the depth information (this is not done when attempting live comparisons):
So theoretically, I can use the same code to convert that pixel world position to light space by multiplying by the light's view-projection matrix. Correct? Here's what I tried:
float4 lpos = mul(position, ShadowLightViewProjection[0]);
lpos.xyz = lpos.xyz / lpos.w;
lpos.x = lpos.x * 0.5f + 0.5f;
lpos.y = 1 - (lpos.y * 0.5f + 0.5f);
float shadow_map_depth = pow(ShadowLightMap[0].Sample(Sampler, lpos.xy).r, 100); // Power 100 just to visualize the map since scales are really tiny
return float4(shadow_map_depth, shadow_map_depth, shadow_map_depth, 1);
And here's the result:
And another to show better how it's mapping to the world:
I don't understand what is going on here. It seems it might have something to do with the projection matrix, but I'm not that good with math to know for sure what is happening. It's definitely not the width/height of the light map as I've tried multiple map sizes and the projection matrix is calculated using FOV and aspect ratios never inputing width/height ever.
Finally, here's some C++ code showing how my perspective matrix (used for both eye and light) is calculated:
const auto ys = std::tan((T)1.57079632679f - (fov / (T)2.0));
const auto xs = ys / aspect;
const auto& zf = view_far;
const auto& zn = view_near;
const auto zfn = zf - zn;
row1(xs, 0, 0, 0);
row2(0, ys, 0, 0);
row3(0, 0, zf / zfn, 1);
row4(0, 0, -zn * zf / zfn, 0);
return *this;
I'm completely at a loss here. Any guidance or recommendations would be greatly appreciated!
EDIT - I also forgot to mention that the tiled image is upside down as if the y flip broke it. That's strange to me as it's required to get it back to eye texture space correctly.
I did some tweaking and fixed things here and there. Ultimately, my biggest issue was an unexpectedly transposed matrix. It's a bit complicated as to how the matrix got transposed, but that's why things were flipped. I also changed to D32 depth buffers (though I'm not sure that helped any) and made sure that any positions divided by their W affected all component (including W).
So code like this: hposition.xyz = hposition.xyz / hposition.w
became this: hposition = hposition / hposition.w
After all this tweaking, it's starting to look more like a shadow map.
Oh and the transposed matrix was the ViewProjection of the light.

What does (gl_FragCoord.z / gl_FragCoord.w) represent?

I want actual world space distance, and I get the feeling from experimentation that
(gl_FragCoord.z / gl_FragCoord.w)
is the depth in world space? But I'm not too sure.
EDIT I've just found where I had originally located this snippet of code. Apparently it is the actual depth from the camera?
This was asked (by the same person) and answered elsewhere. I'm paraphrasing and embellishing the answer here:
As stated in section 15.2.2 of the OpenGL 4.3 core profile specification (PDF), gl_FragCoord.w is 1 / clip.w, where clip.w is the W component of the clip-space position (ie: what you wrote to gl_Position).
gl_FragCoord.z is generated by the following process, assuming the usual transforms:
Camera-space to clip-space transform, via projection matrix multiplication in the vertex shader. clip.z = (projectionMatrix * cameraPosition).z
Transform to normalized device coordinates. ndc.z = clip.z / clip.w
Transform to window coordinates, using the glDepthRange near/far values. win.z = ((dfar-dnear)/2) * ndc.z + (dfar+dnear)/2.
Now, using the default depth range of near=0, far=1, we can define win.z in terms of clip-space: (clip.z/clip.w)/2 + 0.5. If we then divide this by gl_FragCoord.w, that is the equivalent of multiplying by clip.w, thus giving us:
(gl_FragCoord.z / gl_FragCoord.w) = clip.z/2 + clip.w/2 = (clip.z + clip.w) / 2
Using the standard projection matrix, clip.z represents a scale and offset from camera-space Z component. The scale and offset are defined by the camera's near/far depth values. clip.w is, again in the standard projection matrix, just the negation of the camera-space Z. Therefore, we can redefine our equation in those terms:
(gl_FragCoord.z / gl_FragCoord.w) = (A * cam.z + B -cam.z)/2 = (C * cam.z + D)
Where A and B represent the offset and scale based on near/far, and C = (A - 1)/2 and D = B / 2.
Therefore, gl_FragCoord.z / gl_FragCoord.w is not the camera-space (or world-space) distance to the camera. Nor is it the camera-space planar distance to the camera. But it is a linear transform of the camera-space depth. You could use it as a way to compare two depth values together, if they came from the same projection matrix and so forth.
To actually compute the camera-space Z, you need to either pass the camera near/far from your matrix (OpenGL already gives you the range near/far) and compute those A and B values from them, or you need to use the inverse of the projection matrix. Alternatively, you could just use the projection matrix directly yourself, since fragment shaders can use the same uniforms available to vertex shaders. You can pick the A and B terms directly from that matrix. A = projectionMatrix[2][2], and B = projectionMatrix[3][2].
According to the docs:
Available only in the fragment language, gl_FragDepth is an output variable that
is used to establish the depth value for the current fragment. If depth buffering
is enabled and no shader writes to gl_FragDepth, then the fixed function value
for depth will be used (this value is contained in the z component of
gl_FragCoord) otherwise, the value written to gl_FragDepth is used.
So, it looks like gl_FragDepth should just be gl_FragCoord.z unless you've set it somewhere else in your shaders.
As
gl_FragCoord.w = 1.0 / gl_Position.w
And (likely) your projection matrix gets w from -z (if the last column is 0,0,-1,0) then;
float distanceToCamera = 1.0 / gl_FragCoord.w;

GLSL gl_FragCoord.z Calculation and Setting gl_FragDepth

So, I've got an imposter (the real geometry is a cube, possibly clipped, and the imposter geometry is a Menger sponge) and I need to calculate its depth.
I can calculate the amount to offset in world space fairly easily. Unfortunately, I've spent hours failing to perturb the depth with it.
The only correct results I can get are when I go:
gl_FragDepth = gl_FragCoord.z
Basically, I need to know how gl_FragCoord.z is calculated so that I can:
Take the inverse transformation from gl_FragCoord.z to eye space
Add the depth perturbation
Transform this perturbed depth back into the same space as the original gl_FragCoord.z.
I apologize if this seems like a duplicate question; there's a number of other posts here that address similar things. However, after implementing all of them, none work correctly. Rather than trying to pick one to get help with, at this point, I'm asking for complete code that does it. It should just be a few lines.
For future reference, the key code is:
float far=gl_DepthRange.far; float near=gl_DepthRange.near;
vec4 eye_space_pos = gl_ModelViewMatrix * /*something*/
vec4 clip_space_pos = gl_ProjectionMatrix * eye_space_pos;
float ndc_depth = clip_space_pos.z / clip_space_pos.w;
float depth = (((far-near) * ndc_depth) + near + far) / 2.0;
gl_FragDepth = depth;
For another future reference, this is the same formula as given by imallett, which was working for me in an OpenGL 4.0 application:
vec4 v_clip_coord = modelview_projection * vec4(v_position, 1.0);
float f_ndc_depth = v_clip_coord.z / v_clip_coord.w;
gl_FragDepth = (1.0 - 0.0) * 0.5 * f_ndc_depth + (1.0 + 0.0) * 0.5;
Here, modelview_projection is 4x4 modelview-projection matrix and v_position is object-space position of the pixel being rendered (in my case calculated by a raymarcher).
The equation comes from the window coordinates section of this manual. Note that in my code, near is 0.0 and far is 1.0, which are the default values of gl_DepthRange. Note that gl_DepthRange is not the same thing as the near/far distance in the formula for perspective projection matrix! The only trick is using the 0.0 and 1.0 (or gl_DepthRange in case you actually need to change it), I've been struggling for an hour with the other depth range - but that is already "baked" in my (perspective) projection matrix.
Note that this way, the equation really contains just a single multiply by a constant ((far - near) / 2) and a single addition of another constant ((far + near) / 2). Compare that to multiply, add and divide (possibly converted to a multiply by an optimizing compiler) that is required in the code of imallett.