why gl_FragCoord.w is 1/ w? [duplicate] - opengl

The gl_FragCoord variable stores four components: (x, y, z, 1/w)
What is the w coordinate and why is it stored as 1/w?

The GLSL and OpenGL specifications are needlessly confusing in this regard. The OpenGL spec is easier to understand: gl_FragCoord stores the X, Y, and Z components of the window-space vertex position. The X, Y, and Z values are calculated as described for computing the window-space position (though the pixel-center and upper-left origin can modify the X and Y values). This is described in the Coordinate Transforms section of the spec.
The W component of gl_FragCoord is (1 / Wc), where Wc is the clip-space vertex position. It's gl_Position.w from your vertex shader.
The only useful purpose for keeping Wc around is to reverse-transform gl_FragCoord to get the clip-space position back. Which, as that page shows, requires multiplying by Wc. But since gl_FragCoord only stores the inverse of this value, it now requires dividing by gl_FragCoord.w.
Therefore, we can assume that OpenGL stores it this way because OpenGL isn't allowed to make too much sense ;) See, it's a rule that every part of the OpenGL specification must have something that's a bit nonsensical. The XYZ components made too much sense, so they decided to have it stores the inverse of the value you actually want.
OK, technically this is a historical artifact from the days when 3D Labs created GLSL. I'm sure they did it for purely selfish hardware reasons, but I have no real proof of that.

A homogeneous coordinate is given by: (x, y, z, w), which projects to: (x/w, y/w, z/w). gl_FragCoord stores this projection, but rather than storing the (useless) (w/w) = (1) for the last component, it stores (1/w), to preserve useful information.

Related

When perspective division is necessary?

I'm very confused when it is necessary for a homogeneous coordinate (x, y, z, w) get divided by w (ie. converting to (x/w, y/w, z/w, 1)).
According to this page, when a homogeneous vertex coordinate is passed through a orthographical or projective matrix, the coordinate will be in clip coordinates. Perspective division is necessary for clip coordinates to get NDC. It will produce x, y, and z values in range (-1, 1).
I was following WebGL tutorial pages on orthographical and perspective projection. These tutorials didn't mention a single word about perspective division. I'm not sure if the division is still necessary after multiplying with projective matrix. Perhaps, the division is performed automatically during matrix multiplication?
The perspective divide (the conversion from clip space to NDC space) is neither necessary nor unnecessary; it is a part of the graphics pipeline. It happens automatically to every vertex that passes through the system.
You can make it into a no-op by setting a vertex's W component to 1.0 (which is what the orthographic projection matrix does, assuming the input position had a W of 1.0). But the division always happens.

OpenGL: confused by the unproject function settings

I need to implement raycasting. For this I need to convert the mouse cursor to world space.
I use the unproject function for this. I need to first find a point on the near plane, then on the far plane. After that, from the last substract the first one and we will get a ray.
But I don't understand how to set winZ correctly. Because in some implementations I see two ways: winZ = 0 (near plane), winZ = 1 (far plane) or winZ = -1 (near plane), winZ = 1 (far plane).
What's the difference in these ranges?
If it really is windows space z, then there is only the [0,1] range. Note that 0 doesn't necessarily mean the near plane and one the far plane, though. One can set up a projection matrix where near will end up at 1 and far at 0 (reversed z, which, in combination with the [0,1] z clip condition as explained below, has some precision advantages).
Also not that glDepthRange can further modify to which values (inside [0,1]) these two planes will be mapped.
To understand the unproject operation, you first need to understand the different coordinate spaces. Typically in a render API, you have to deal with these 3 spaces at the end of the transform chain:
clip space: This is what the output of the vertex shader is in, and where the actual clipping at least on a conceptual level happens. This space is still homogeneous with an arbitrary value for the w coordinate.
normalized device coordinates (NDC). This is what you get after the perspective division by the clip space w coordinate (after the clipping has been applied, which will eliminate the w<=0 cases completely).
window space. The 2D xy part are the actual pixel coordinates inside your output window (or your render target), and the transformation from NDC xy to window space xy is defined by the viewport settings. The z coordinate is the value which will go into the depth test, and depth buffer, it is in the range [0,1] or some sub-range of that (controlled via glDepthRange in the GL).
Other spaces before these in this list, like model/object space, world space, eye/view space, are completely up to you and do not concern the GPU directly at all (legacy fixed-function GL did care about eye space for lighting and fog, but nowadays, implementing this is all your job, and you can use whatever spaces you see fit).
Having established the spaces, the next relevant thing here is the viewing volume. This is the volume of the space which will actually mapped to your viewport, and it is a 3D volume bounded by the six planes: left, right, bottom, top, near far.
However, the actual view volume is set up by pure convention in the various render APIs (and the actual GPU hardware).
Since you tagged this question with "OpenGL", I'm going to begin with the default GL convention first:
Standard GL convention is the view volume is the completely symmetrical [-1,1] cube in NDC. Actually, this means that the clip condition in clip space is -w <= x,y,z <= w.
Direct3D uses a different convention: they use [-1,1] in NDC for x and y just like GL does, but [0,1] for the z dimension. This also means that the depth range transformation into window space can be identity in many cases (you seldom need to limit it to a sub-range of [0,1]). This convention has some numerical advantages because the GL convention of moving [-1,1] range to [0,1] for window space will make it loose precision around the (NDC) zero point.
Modern GL since GL 4.5 optionally allows you to switch to the [0,1] convention for z via glClipControl. Vulkan also supports both conventions, but uses [0,1] as the default.
There is not "the" unproject function, but the concept of "unprojecting" a point means calculating these transformations in reverse, going from window space back to some space before clip space, undoing the proejction matrix. For implementing an unproject function, you need to know which conventions were used.
Because in some implementations I see two ways: winZ = 0 (near plane), winZ = 1 (far plane) or winZ = -1 (near plane), winZ = 1 (far plane). What's the difference in these ranges?
Maybe they are not taking in a window space Z, but NDC z directly. Maybe the parameters are just named in a confusing or wrong manner. Maybe some of the implementations out there are just flat-out wrong.

Perspective divide: Why use the w component?

In OpenGL, I have read that a vertex should be represented by (x,y,z,w), where w = z. This is to enable perspective divide, whereby (x,y,z) are divided by w in order to determine their screen position due to the perspective effect. And if they were just divided by the original z value, then the z would be 1 everywhere.
My question is: Why do you need to divide the z component by w at all? Why can you not just divide the x and y components by z, such that the screen coordinates have the perspective effect applied, and then just used the original unmodified z component for depth testing? In this way, you would not have to use the w component at all....
Obviously I am missing something!
3D computer graphics is typically handled with homogeneous coordinates and in a projective vector space. The math behind this is a bit more than "just divide by w".
Using 4D homogeneous vectors and 4x4 matrices has the nice advantage that all sorts of affine transformations (and this includes especially the translation, which also relies on w) and projective transforms can be represented by simple matrix multiplications.
In OpenGL, I have read that a vertex should be represented by
(x, y, z, w), where w = z.
That is not true. A vertex should be represented by (x, y, z, w), where w is just w. In your typical case, the input w is actually 1, so it is usually not stored in the vertex data, but added on demand in the shaders etc.
Your typical projection matrix will set w_clip = -z_eye. But that is a different thing. This means that you just project along the -z direction in eye space. You could also put w_clip=2 *x_eye -3*y_eye + 4 * z_eye there, and your axis of projection would have the direction (2, -3, 4, 0).
My question is: Why do you need to divide the z component by w at all? Why can you not just divide the x and y components by z, such that the screen coordinates have the perspective effect applied, and then just used the original unmodified z component for depth testing?
Conceptually, the space is distorted along all 3 dimensions, not just in x and y. Furthermore, in the beginning, GPUs had just 16 bit or 24 bit integer precision for the depth buffer. In such a scenario, you definitively want to have a denser representation near the camera, and a sparse one far away.
Nowadays, with programmable vertex shaders and floating-point depth buffer formats, you can basically just store the z_eye value in the depth buffer, and use this for depth testing. However, this is typically referred to as W buffering, because the (clip space) w component is used.
There is another conceptional issue if you would divide by z: you wouldn't be able to use an orthogonal projection, you always would force some kind of perspective. Now one might argue that the division by z doesn't have to happen automatically, but one could apply it when needed in the vertex shader. But this won't work either. You must not apply the perspective divide in the vertex shader, because that would project points which lie behind the camera in front of the camera. As the vertex shader does not work on whole primitives, this would completely screw up any primitive there if at least one vertex lies behind the camera and another one lies in front of it. To deal with that situation, the clipping has to be applied before the divide - hence the name clip space.
In this way, you would not have to use the w component at all.
That is also not true. The w component is further used down the pipeline. It is essential for perspective-correct attribute interpolation.

Why do I divide Z by W in a perspective projection in OpenGL?

I guess this is more a math question than it is an OpenGL one, but I digress. Anyways, if the whole purpose of the perspective divide is to get usable x and y coordinates, why bother dividing z by w? Also how do I get w in the first place?
Actually, the explanation has much more to do with the limitations of the depth buffer than it does math.
At its simplest, "the depth buffer is a texture in which each on-screen pixel is assigned a grayscale value depending on its distance from the camera. This allows visual effects to easily alter with distance." Source
More accurately, a depth buffer is a texture containing the value of z/w for each fragment, where:
Z is the distance from the near clipping plane to the fragment.
W is the distance from the camera to the fragment.
In the following diagram illustrating the relationship between z, w, and z/w, n is equal to the zNear parameter passed to gluPerspective, or an equivalent function, and f is equal to the zFar parameter passed to the same function.
At a glance, this system look unintuitive. But as a result, z/w is always a floating-point value between 0 and 1 (0/n and f/f), and can therefore be represented as a single channel of a texture.
A second important note: the depth buffer is nonlinear, meaning an object exactly in between the near and far clipping planes is nowhere near a value of 0.5 in the depth buffer. As shown above, it would correlate to a value of 0.999 in the depth buffer. Depending on your view, this could be good or bad; you may want the depth buffer to be more detailed close-up (which it is), or offer even detail throughout (which it doesn't).
TL;DR:
You divide z by w so it is always in the range [0, 1].
W is the distance from the camera to the fragment.

What does the 1/w coordinate stand for in gl_FragCoord?

The gl_FragCoord variable stores four components: (x, y, z, 1/w)
What is the w coordinate and why is it stored as 1/w?
The GLSL and OpenGL specifications are needlessly confusing in this regard. The OpenGL spec is easier to understand: gl_FragCoord stores the X, Y, and Z components of the window-space vertex position. The X, Y, and Z values are calculated as described for computing the window-space position (though the pixel-center and upper-left origin can modify the X and Y values). This is described in the Coordinate Transforms section of the spec.
The W component of gl_FragCoord is (1 / Wc), where Wc is the clip-space vertex position. It's gl_Position.w from your vertex shader.
The only useful purpose for keeping Wc around is to reverse-transform gl_FragCoord to get the clip-space position back. Which, as that page shows, requires multiplying by Wc. But since gl_FragCoord only stores the inverse of this value, it now requires dividing by gl_FragCoord.w.
Therefore, we can assume that OpenGL stores it this way because OpenGL isn't allowed to make too much sense ;) See, it's a rule that every part of the OpenGL specification must have something that's a bit nonsensical. The XYZ components made too much sense, so they decided to have it stores the inverse of the value you actually want.
OK, technically this is a historical artifact from the days when 3D Labs created GLSL. I'm sure they did it for purely selfish hardware reasons, but I have no real proof of that.
A homogeneous coordinate is given by: (x, y, z, w), which projects to: (x/w, y/w, z/w). gl_FragCoord stores this projection, but rather than storing the (useless) (w/w) = (1) for the last component, it stores (1/w), to preserve useful information.