Motion Vector - how to calculate it properly? - opengl

I'm trying to wrap my head around calculating motion vectors (also called velocity buffer). I found this tutorial, but I'm not satisfied with explanations of how motion vector are calculated. Here is the code:
vec2 a = (vPosition.xy / vPosition.w) * 0.5 + 0.5;
vec2 b = (vPrevPosition.xy / vPrevPosition.w) * 0.5 + 0.5;
oVelocity = a - b;
Why are we multiplying our position vectors by 0.5 and then adding 0.5? I'm guessing that we're trying to get from clip space to NDC, but why? I completly don't understand that.

This is a mapping from the [-1, 1] clip space onto the [0, 1] texture space. Since lookups in the blur shader have to read from a textured at a position offset by the velocity vector, it's necessary to perform this conversion.
Note, that the + 0.5 part is actually unnecessary, since it cancels out in a-b anyway. So the same result would have been achieved by using something like
vec2 a = (vPosition.xy / vPosition.w);
vec2 b = (vPrevPosition.xy / vPrevPosition.w);
oVelocity = (a - b) * 0.5;
I don't know if there is any reason to prefer the first over the second, but my guess is that this code is written in the way it is because it builds up on a previous tutorial where the calculation had been the same.

Related

SceneKit - get position in screen coordinate

How can I get the coordinate of a point in the screen referential from the geometry shader entry point? SceneKit exposes a bunch of matrix transformations and their inverses (https://developer.apple.com/reference/scenekit/scnshadable > Using Inputs Provided by SceneKit) but I could not figure out how to use them...
for instance, in the geometry entry point that's executed in the vertex shader, I've tried:
varying vec2 screen_coords;
#pragma body
screen_coords = (u_inverseModelViewProjectionTransform * _geometry.position).xy
but that doesn't seem to work :(
To clarify, I'd like screen_coords to be in [0, 1] * [0, 1] if the point is in the screen depending on it's exact position, and something else if not.
It took me a while, but here it is:
vec4 screenPos = u_modelViewProjectionTransform * _geometry.position;
screen_coords = (screenPos.xy / screenPos.w + 1.0) * 0.5;
Explanation:
screenPos roughly corresponds to the coordinate in screen space. screenPos.xy / screenPos.w rescale those coordinate. Then weirdly this gives input [-1, 1] * [-1, 1] so I finally added an offset and scale to get the result in [0, 1] * [0, 1]
For anyone that stumbles on this in the future, in metal I think you can get the screen coords in the surface shader just by grabbing
in.fragmentPosition.xy * scn_frame.inverseResolution.xy
See here for more info

DirectX Converting Pixel World Position to Shadow Map Position Gives Weird, Tiled Results

I've been trying for some time now to get a screen-space pixel (provided by a deferred HLSL shader) to convert to light space. The results have been surprising to me as my light rendering seems to be tiling the depth buffer.
Importantly, the scene camera (or eye) and the light being rendered from start in the same position.
First, I extract the world position of the pixel using the code below:
float3 eye = Eye;
float4 position = {
IN.texCoord.x * 2 - 1,
(1 - IN.texCoord.y) * 2 - 1,
zbuffer.r,
1
};
float4 hposition = mul(position, EyeViewProjectionInverse);
position = float4(hposition.xyz / hposition.w, hposition.w);
float3 eyeDirection = normalize(eye - position.xyz);
The result seems to be correct as rendering the XYZ position as RGB respectively yields this (apparently correct) result:
The red component seems to be correctly outputting X as it moves to the right, and blue shows Z moving forward. The Y factor also looks correct as the ground is slightly below the Y axis.
Next (and to be sure I'm not going crazy), I decided to output the original depth buffer. Normally I keep the depth buffer in a Texture2D called DepthMap passed to the shader as input. In this case, however, I try to undo the pixel transformation by offsetting it back into the proper position and multiplying it by the eye's view-projection matrix:
float4 cpos = mul(position, EyeViewProjection);
cpos.xyz = cpos.xyz / cpos.w;
cpos.x = cpos.x * 0.5f + 0.5f;
cpos.y = 1 - (cpos.y * 0.5f + 0.5f);
float camera_depth = pow(DepthMap.Sample(Sampler, cpos.xy).r, 100); // Power 100 just to visualize the map since scales are really tiny
return float4(camera_depth, camera_depth, camera_depth, 1);
This yields a correct looking result as well (though I'm not 100% sure about the Z value). Also note that I've made the results exponential to better visualize the depth information (this is not done when attempting live comparisons):
So theoretically, I can use the same code to convert that pixel world position to light space by multiplying by the light's view-projection matrix. Correct? Here's what I tried:
float4 lpos = mul(position, ShadowLightViewProjection[0]);
lpos.xyz = lpos.xyz / lpos.w;
lpos.x = lpos.x * 0.5f + 0.5f;
lpos.y = 1 - (lpos.y * 0.5f + 0.5f);
float shadow_map_depth = pow(ShadowLightMap[0].Sample(Sampler, lpos.xy).r, 100); // Power 100 just to visualize the map since scales are really tiny
return float4(shadow_map_depth, shadow_map_depth, shadow_map_depth, 1);
And here's the result:
And another to show better how it's mapping to the world:
I don't understand what is going on here. It seems it might have something to do with the projection matrix, but I'm not that good with math to know for sure what is happening. It's definitely not the width/height of the light map as I've tried multiple map sizes and the projection matrix is calculated using FOV and aspect ratios never inputing width/height ever.
Finally, here's some C++ code showing how my perspective matrix (used for both eye and light) is calculated:
const auto ys = std::tan((T)1.57079632679f - (fov / (T)2.0));
const auto xs = ys / aspect;
const auto& zf = view_far;
const auto& zn = view_near;
const auto zfn = zf - zn;
row1(xs, 0, 0, 0);
row2(0, ys, 0, 0);
row3(0, 0, zf / zfn, 1);
row4(0, 0, -zn * zf / zfn, 0);
return *this;
I'm completely at a loss here. Any guidance or recommendations would be greatly appreciated!
EDIT - I also forgot to mention that the tiled image is upside down as if the y flip broke it. That's strange to me as it's required to get it back to eye texture space correctly.
I did some tweaking and fixed things here and there. Ultimately, my biggest issue was an unexpectedly transposed matrix. It's a bit complicated as to how the matrix got transposed, but that's why things were flipped. I also changed to D32 depth buffers (though I'm not sure that helped any) and made sure that any positions divided by their W affected all component (including W).
So code like this: hposition.xyz = hposition.xyz / hposition.w
became this: hposition = hposition / hposition.w
After all this tweaking, it's starting to look more like a shadow map.
Oh and the transposed matrix was the ViewProjection of the light.

Best way to interpolate triangle surface using 3 positions and normals for ray tracing

I am working on conventional Whitted ray tracing, and trying to interpolate surface of hitted triangle as if it was convex instead of flat.
The idea is to treat triangle as a parametric surface s(u,v) once the barycentric coordinates (u,v) of hit point p are known.
This surface equation should be calculated using triangle's positions p0, p1, p2 and normals n0, n1, n2.
The hit point itself is calculated as
p = (1-u-v)*p0 + u*p1 + v*p2;
I have found three different solutions till now.
Solution 1. Projection
The first solution I came to. It is to project hit point on planes that come through each of vertexes p0, p1, p2 perpendicular to corresponding normals, and then interpolate the result.
vec3 r0 = p0 + dot( p0 - p, n0 ) * n0;
vec3 r1 = p1 + dot( p1 - p, n1 ) * n1;
vec3 r2 = p2 + dot( p2 - p, n2 ) * n2;
p = (1-u-v)*r0 + u*r1 + v*r2;
Solution 2. Curvature
Suggested in a paper of Takashi Nagata "Simple local interpolation of surfaces using normal vectors" and discussed in question "Local interpolation of surfaces using normal vectors", but it seems to be overcomplicated and not very fast for real-time ray tracing (unless you precompute all necessary coefficients). Triangle here is treated as a surface of the second order.
Solution 3. Bezier curves
This solution is inspired by Brett Hale's answer. It is about using some interpolation of the higher order, cubic Bezier curves in my case.
E.g., for an edge p0p1 Bezier curve should look like
B(t) = (1-t)^3*p0 + 3(1-t)^2*t*(p0+n0*adj) + 3*(1-t)*t^2*(p1+n1*adj) + t^3*p1,
where adj is some adjustment parameter.
Computing Bezier curves for edges p0p1 and p0p2 and interpolating them gives the final code:
float u1 = 1 - u;
float v1 = 1 - v;
vec3 b1 = u1*u1*(3-2*u1)*p0 + u*u*(3-2*u)*p1 + 3*u*u1*(u1*n0 + u*n1)*adj;
vec3 b2 = v1*v1*(3-2*v1)*p0 + v*v*(3-2*v)*p2 + 3*v*v1*(v1*n0 + v*n2)*adj;
float w = abs(u-v) < 0.0001 ? 0.5 : ( 1 + (u-v)/(u+v) ) * 0.5;
p = (1-w)*b1 + w*b2;
Alternatively, one can interpolate between three edges:
float u1 = 1.0 - u;
float v1 = 1.0 - v;
float w = abs(u-v) < 0.0001 ? 0.5 : ( 1 + (u-v)/(u+v) ) * 0.5;
float w1 = 1.0 - w;
vec3 b1 = u1*u1*(3-2*u1)*p0 + u*u*(3-2*u)*p1 + 3*u*u1*( u1*n0 + u*n1 )*adj;
vec3 b2 = v1*v1*(3-2*v1)*p0 + v*v*(3-2*v)*p2 + 3*v*v1*( v1*n0 + v*n2 )*adj;
vec3 b0 = w1*w1*(3-2*w1)*p1 + w*w*(3-2*w)*p2 + 3*w*w1*( w1*n1 + w*n2 )*adj;
p = (1-u-v)*b0 + u*b1 + v*b2;
Maybe I messed something in code above, but this option does not seem to be very robust inside shader.
P.S. The intention is to get more correct origins for shadow rays when they are casted from low-poly models. Here you can find the resulted images from test scene. Big white numbers indicates number of solution (zero for original image).
P.P.S. I still wonder if there is another efficient solution which can give better result.
Keeping triangles 'flat' has many benefits and simplifies several stages required during rendering. Approximating a higher order surface on the other hand introduces quite significant tracing overhead and requires adjustments to your BVH structure.
When the geometry is being treated as a collection of facets on the other hand, the shading information can still be interpolated to achieve smooth shading while still being very efficient to process.
There are adaptive tessellation techniques which approximate the limit surface (OpenSubdiv is a great example). Pixar's Photorealistic RenderMan has a long history using subdivision surfaces. When they switched their rendering algorithm to path tracing, they've also introduced a pretessellation step for their subdivision surfaces. This stage is executed right before rendering begins and builds an adaptive triangulated approximation of the limit surface. This seems to be more efficient to trace and tends to use less resources, especially for the high-quality assets used in this industry.
So, to answer your question. I think the most efficient way to achieve what you're after is to use an adaptive subdivision scheme which spits out triangles instead of tracing against a higher order surface.
Dan Sunday describes an algorithm that calculates the barycentric coordinates on the triangle once the ray-plane intersection has been calculated. The point lies inside the triangle if:
(s >= 0) && (t >= 0) && (s + t <= 1)
You can then use, say, n(s, t) = nu * s + nv * t + nw * (1 - s - t) to interpolate a normal, as well as the point of intersection, though n(s, t) will not, in general, be normalized, even if (nu, nv, nw) are. You might find higher order interpolation necessary. PN-triangles were a similar hack for visual appeal rather than mathematical precision. For example, true rational quadratic Bezier triangles can describe conic sections.

Calculate clipspace.w from clipspace.xyz and (inv) projection matrix

I'm using a logarithmic depth algorithmic which results in someFunc(clipspace.z) being written to the depth buffer and no implicit perspective divide.
I'm doing RTT / postprocessing so later on in a fragment shader I want to recompute eyespace.xyz, given ndc.xy (from the fragment coordinates) and clipspace.z (from someFuncInv() on the value stored in the depth buffer).
Note that I do not have clipspace.w, and my stored value is not clipspace.z / clipspace.w (as it would be when using fixed function depth) - so something along the lines of ...
float clip_z = ...; /* [-1 .. +1] */
vec2 ndc = vec2(FragCoord.xy / viewport * 2.0 - 1.0);
vec4 clipspace = InvProjMatrix * vec4(ndc, clip_z, 1.0));
clipspace /= clipspace.w;
... does not work here.
So is there a way to calculate clipspace.w out of clipspace.xyz, given the projection matrix or it's inverse?
clipspace.xy = FragCoord.xy / viewport * 2.0 - 1.0;
This is wrong in terms of nomenclature. "Clip space" is the space that the vertex shader (or whatever the last Vertex Processing stage is) outputs. Between clip space and window space is normalized device coordinate (NDC) space. NDC space is clip space divided by the clip space W coordinate:
vec3 ndcspace = clipspace.xyz / clipspace.w;
So the first step is to take our window space coordinates and get NDC space coordinates. Which is easy:
vec3 ndcspace = vec3(FragCoord.xy / viewport * 2.0 - 1.0, depth);
Now, I'm going to assume that your depth value is the proper NDC-space depth. I'm assuming that you fetch the value from a depth texture, then used the depth range near/far values it was rendered with to map it into a [-1, 1] range. If you didn't, you should.
So, now that we have ndcspace, how do we compute clipspace? Well, that's obvious:
vec4 clipspace = vec4(ndcspace * clipspace.w, clipspace.w);
Obvious and... not helpful, since we don't have clipspace.w. So how do we get it?
To get this, we need to look at how clipspace was computed the first time:
vec4 clipspace = Proj * cameraspace;
This means that clipspace.w is computed by taking cameraspace and dot-producting it by the fourth row of Proj.
Well, that's not very helpful. It gets more helpful if we actually look at the fourth row of Proj. Granted, you could be using any projection matrix, and if you're not using the typical projection matrix, this computation becomes more difficult (potentially impossible).
The fourth row of Proj, using the typical projection matrix, is really just this:
[0, 0, -1, 0]
This means that the clipspace.w is really just -cameraspace.z. How does that help us?
It helps by remembering this:
ndcspace.z = clipspace.z / clipspace.w;
ndcspace.z = clipspace.z / -cameraspace.z;
Well, that's nice, but it just trades one unknown for another; we still have an equation with two unknowns (clipspace.z and cameraspace.z). However, we do know something else: clipspace.z comes from dot-producting cameraspace with the third row of our projection matrix. The traditional projection matrix's third row looks like this:
[0, 0, T1, T2]
Where T1 and T2 are non-zero numbers. We'll ignore what these numbers are for the time being. Therefore, clipspace.z is really just T1 * cameraspace.z + T2 * cameraspace.w. And if we know cameraspace.w is 1.0 (as it usually is), then we can remove it:
ndcspace.z = (T1 * cameraspace.z + T2) / -cameraspace.z;
So, we still have a problem. Actually, we don't. Why? Because there is only one unknown in this euqation. Remember: we already know ndcspace.z. We can therefore use ndcspace.z to compute cameraspace.z:
ndcspace.z = -T1 + (-T2 / cameraspace.z);
ndcspace.z + T1 = -T2 / cameraspace.z;
cameraspace.z = -T2 / (ndcspace.z + T1);
T1 and T2 come right out of our projection matrix (the one the scene was originally rendered with). And we already have ndcspace.z. So we can compute cameraspace.z. And we know that:
clispace.w = -cameraspace.z;
Therefore, we can do this:
vec4 clipspace = vec4(ndcspace * clipspace.w, clipspace.w);
Obviously you'll need a float for clipspace.w rather than the literal code, but you get my point. Once you have clipspace, to get camera space, you multiply by the inverse projection matrix:
vec4 cameraspace = InvProj * clipspace;

GLSL gl_FragCoord.z Calculation and Setting gl_FragDepth

So, I've got an imposter (the real geometry is a cube, possibly clipped, and the imposter geometry is a Menger sponge) and I need to calculate its depth.
I can calculate the amount to offset in world space fairly easily. Unfortunately, I've spent hours failing to perturb the depth with it.
The only correct results I can get are when I go:
gl_FragDepth = gl_FragCoord.z
Basically, I need to know how gl_FragCoord.z is calculated so that I can:
Take the inverse transformation from gl_FragCoord.z to eye space
Add the depth perturbation
Transform this perturbed depth back into the same space as the original gl_FragCoord.z.
I apologize if this seems like a duplicate question; there's a number of other posts here that address similar things. However, after implementing all of them, none work correctly. Rather than trying to pick one to get help with, at this point, I'm asking for complete code that does it. It should just be a few lines.
For future reference, the key code is:
float far=gl_DepthRange.far; float near=gl_DepthRange.near;
vec4 eye_space_pos = gl_ModelViewMatrix * /*something*/
vec4 clip_space_pos = gl_ProjectionMatrix * eye_space_pos;
float ndc_depth = clip_space_pos.z / clip_space_pos.w;
float depth = (((far-near) * ndc_depth) + near + far) / 2.0;
gl_FragDepth = depth;
For another future reference, this is the same formula as given by imallett, which was working for me in an OpenGL 4.0 application:
vec4 v_clip_coord = modelview_projection * vec4(v_position, 1.0);
float f_ndc_depth = v_clip_coord.z / v_clip_coord.w;
gl_FragDepth = (1.0 - 0.0) * 0.5 * f_ndc_depth + (1.0 + 0.0) * 0.5;
Here, modelview_projection is 4x4 modelview-projection matrix and v_position is object-space position of the pixel being rendered (in my case calculated by a raymarcher).
The equation comes from the window coordinates section of this manual. Note that in my code, near is 0.0 and far is 1.0, which are the default values of gl_DepthRange. Note that gl_DepthRange is not the same thing as the near/far distance in the formula for perspective projection matrix! The only trick is using the 0.0 and 1.0 (or gl_DepthRange in case you actually need to change it), I've been struggling for an hour with the other depth range - but that is already "baked" in my (perspective) projection matrix.
Note that this way, the equation really contains just a single multiply by a constant ((far - near) / 2) and a single addition of another constant ((far + near) / 2). Compare that to multiply, add and divide (possibly converted to a multiply by an optimizing compiler) that is required in the code of imallett.