Manual perspective division - glsl

I needed to do a manual perspective division on something other than gl_Position, when I noticed my results were off, so I did some experiments.
I've noticed that if I let it use the default transformation, the resulting depth value (gl_Position.z) is correct:
void main()
gl_Position = MVP *vertexPosition;
However, when trying to do the perspective division manually, the depth value is not the same, even though it should be equivalent(?):
void main()
gl_Position = MVP *vertexPosition; /= gl_Position.w;
gl_Position.w = 1.0;
So, does the perspective division work differently in Vulkan? If so, how?
The depth range for the viewport is set to [0,1], in case that has any effect on this.

Vulkan has a number of invariance rules, specified in Appendix D. None of these rules state that the fixed-function vertex post-processing steps outlined in Chapter 23 will be invariant with the vertex shader attempting to do the same thing. As such, you cannot expect these two operations to yield invariant results.


Understanding shadow maps in OpenGL

I'm trying to implement omni-directional shadow mapping by following this tutorial from learnOpenGL, its idea is very simple: in the shadow pass, we're going to capture the scene from the light's perspective into a cubemap (shadow map), and we can use the geometry shader to build the depth cubemap with just one render pass. Here's the shader code for generating our shadow map:
vertex shader
#version 330 core
layout (location = 0) in vec3 aPos;
uniform mat4 model;
void main() {
gl_Position = model * vec4(aPos, 1.0);
geometry shader
#version 330 core
layout (triangles) in;
layout (triangle_strip, max_vertices=18) out;
uniform mat4 shadowMatrices[6];
out vec4 FragPos; // FragPos from GS (output per emitvertex)
void main() {
for (int face = 0; face < 6; ++face) {
gl_Layer = face; // built-in variable that specifies to which face we render.
for (int i = 0; i < 3; ++i) // for each triangle vertex {
FragPos = gl_in[i].gl_Position;
gl_Position = shadowMatrices[face] * FragPos;
fragment shader
#version 330 core
in vec4 FragPos;
uniform vec3 lightPos;
uniform float far_plane;
void main() {
// get distance between fragment and light source
float lightDistance = length( - lightPos);
// map to [0;1] range by dividing by far_plane
lightDistance = lightDistance / far_plane;
// write this as modified depth
gl_FragDepth = lightDistance;
Compared to classic shadow mapping, the main difference here is that we are explicitly writing to the depth buffer, with linear depth values between 0.0 and 1.0. Using this code I can correctly cast shadows in my own scene, but I cannot fully understand the fragment shader, and I think this code is flawed, here is why:
Image that we have 3 spheres sitting on a floor, and a point light above the spheres. Looking down the floor from the point light, we can see the -z slice of the shadow map: (in RenderDoc textures are displayed bottom up, sorry for that).
If we write gl_FragDepth = lightDistance in the fragment shader, we are manually updating the depth buffer so the hardware cannot perform the early z test, as a result, every fragment will go through our shader code to update the depth buffer, no fragment is discarded early to save performance. Now what if we draw the floor after the spheres?
The sphere fragments will write to the depth buffer first (per sample), followed by the floor fragments, but since the floor is farther away from the point light, it will overwrite the depth values of the sphere with larger values, and the shadow map will be incorrect. In this case, the order of drawing is important, distant objects must be drawn first, but it's not always possible to sort depth values for complex geometry. Perhaps we need something like order-independent transparency here?
To make sure that only the closest depth values are written to the shadow map, I modified the fragment shader a little bit:
// solution 1
gl_FragDepth = min(gl_FragDepth, lightDistance);
// solution 2
if (lightDistance < gl_FragDepth) {
gl_FragDepth = lightDistance;
// solution 3
gl_FragDepth = 1.0;
gl_FragDepth = min(gl_FragDepth, lightDistance);
However, according to the OpenGL specification, none of them is going to work. Solution 2 cannot work because, if we were to update gl_FragDepth manually, we must update it in all execution paths. As for solution 1, when we clear the depth buffer using glClearNamedFramebufferfv(id, GL_DEPTH, 0, &clear_depth), the depth buffer will be filled with value clear_depth, which is usually 1.0, but the default value of gl_FragDepth variable is not the same as clear_depth, it is actually undefined, so could be anything between 0 and 1. On my driver the default value is 0, so gl_FragDepth = min(0.0, lightDistance) is 0, the shadow map will be completely black. Solution 3 also won't work because we are still overwriting the previous depth value.
I learned that for OpenGL 4.2 and above, we can enforce the early z test by redeclaring the gl_FragDepth variable using:
layout (depth_<condition>) out float gl_FragDepth;
since my depth comparision function is the default glDepthFunc(GL_LESS), the condition needs to be depth_greater in order for the hardware to do early z. Unfortunately, this also won't work as we are writing linear depth values to the buffer, which are always less than the default non-linear depth value gl_FragCoord.z, so the condition is really depth_less. Now I'm completely stuck, the depth buffer seems to be way more difficult than I thought.
Where might my reasoning be wrong?
You said:
The sphere fragments will write to the depth buffer first (per sample),
followed by the floor fragments, but since the floor is farther away from the
point light, it will overwrite the depth values of the sphere with larger
values, and the shadow map will be incorrect.
But if your fragment shader is not using early depth tests, then the hardware will perform depth testing after the fragment shader has executed.
From the OpenGL 4.6 specification, section 14.9.4:
When...the active program was linked with early fragment tests disabled,
these operations [including depth buffer test] are performed only after
fragment program execution
So if you write to gl_FragDepth in the fragment shader, the hardware cannot take advantage of the speed gain of early depth testing, as you said, but that doesn't mean that depth testing won't occur. So long as you are using GL_LESS or GL_LEQUAL for the depth test, objects that are further away won't obscure objects that are closer.

OpenGL vertex shader for pinhole camera model

I am trying to implement a simple OpenGL renderer that simulates a pinhole camera model (as defined for example here). Currently I use the vertex shader to map the 3D vertices to the clip space, where K in the shader contains [focal length x, focal length y, principal point x, principal point y] and zrange is the depth range of the vertices.
#version 330 core
layout (location = 0) in vec3 vin;
layout (location = 1) in vec3 cin;
layout (location = 2) in vec3 nin;
out vec3 shader_pos;
out vec3 shader_color;
out vec3 shader_normal;
uniform vec4 K;
uniform vec2 zrange;
uniform vec2 imsize;
void main() {
vec3 uvd;
uvd.x = (K[0] * vin.x + K[2] * vin.z) / vin.z;
uvd.y = (K[1] * vin.y + K[3] * vin.z) / vin.z;
uvd.x = 2 * uvd.x / (imsize[0]) - 1;
uvd.y = 2 * uvd.y / (imsize[1]) - 1;
uvd.z = 2 * (vin.z - zrange[0]) / (zrange[1] - zrange[0]) - 1;
shader_pos = uvd;
shader_color = cin;
shader_normal = nin;
gl_Position = vec4(, 1.0);
I verify the renderings with a simple ray-tracer, however there seems to be an offset stemming from my OpenGL implementation. The depth values are different, but not by an affine offset as it would be caused by a wrong remapping (see the slanted surface on the tetrahedron, ignoring the errors on the edges).
I am trying to implement a simple OpenGL renderer that simulates a pinhole camera model.
A standard perspective projection matrix already implements a pinhole camera model. What you're doing here is just having more calculations per vertex, which could all be pre-calculated on the CPU and put in a single matrix.
The only difference is the z range. But a "pinhole camera" does not have a z range, all points are projected to the image plane. So what you want here is a pinhole camera model for x and y, and a linear mapping for z.
However, your implementation is wrong. A GPU will interpolate the z linearly in window space. That means, it will calculate the barycentric coordinates of each fragment with respect to the 2D projection of the triangle of the window. However, when using a perspective projection, and when the triangle is not excatly parallel to the image plane, those barycentric coordinates will not be those the respective 3D point would have had with respect to the actual 3D primitive before the projection.
The trick here is that since in screen space, we typically have x/z and y/z as the vertex coordinates, and when we interpolate linaerily inbetween that, we also have to interpolate 1/z for the depth. However, in reality, we don't divide by z, but w (and let the projection matrix set w_clip = [+/-]z_eye for us). After the division by w_clip, we get a hyperbolic mapping of the z value, but with the nice property that it can be linearly interpolated in window space.
What this means is that by your use of a linear z mapping, your primitives now would have to be bend along the z dimension to get the correct result. Look at the following top-down view of the situation. The "lines" represent flat triangles, looked from straight above:
In eye space, the view rays would all go from the origin through each pixel (we could imagine the 2D pixel raster on the near plane, for example). In NDC, we have transformed this to an orthograhic projection. The pixels still can be imagined at the near plane, but all view rays now are parallel.
In the standard hyperbolical mapping, the point in the middle of the frustum is compressed much towards the end. However, the traingle still is flat.
If you use a linear mapping instead, your triangle would have not to be flat any more. Look for example at the intersection point between the two traingles. It must have the same x (and y) coordinate as in the hyperbolic case, for the correct result.
However, you only transform the vertices according to a linear z value, the GPU will still linearly interpolate the result, so in your case, you would get straight connections between your transformed points, your intersection point between the two triangles is moved, and your depth values are all wrong except for the actual vertex points itself.
If you want to use a linear depth buffer, you have to correct the depth of each fragment in the fragment shader, to implement the required non-linear interpolation on your own. Doing so would break a lot of the clever depth test optimizations GPUs do, notably early Z and hierachical Z, so while it is possible, you'l loose some performance.
The much better solution is: Just use a standard hyperbolic depth value. Just linearize the depth values after you read them back. Also, don't do the z Division in the vertex shader. You do not only break z this way, you also break the perspective-corrected interpolation of the varyings, so your shading will also be wrong. Let the GPU do the division, just shuffle the correct value into gl_Position.w. The GPU will internally not only do the divide, the perspective corrected interpolation also depends on w.

Smooth gradient in fragment shader

I am looking for some way how to get smooth gradient with fragment shader. I have palette with 11 colors and value which used to define color (it lays in range from 0.0 to 1.0).
I am trying to get smooth color translation with such fragment shader:
#version 150 core
in float value;
uniform vec3 colors[11];
out vec4 out_Color;
void main(void)
int index = int(round(value * 10));
int floorIndex = 0;
if (index != 0) floorIndex = index - 1;
out_Color = vec4(mix(colors[floorIndex], colors[index], value), 1.0f);
But using such approach I could get only stepped colors distribution.
And my desirable result looks like:
I know how to get this with path-through shader just passing color as attribute, but this is not that way. I am going to get such smooth distribution with single float value passed to fragment shader.
Your mixing function is not really usefully applied here:
mix(colors[floorIndex], colors[index], value)
The problem is that, while value is in [0,1], it is not the proper mixing factor. You need it to be scaled to [0,1] for the sub-range you have selected. E.g, your code uses floorIndex=2 and index=3 when value is in [0.25, 0.35), so now you need a mix factor wich is 0.0 when value is 0.25, 0.5 when it is 0.3 and goint near 1.0 when it reaches 0.35 (at 3.5 the round will switch to the next step, of course).
So you will need something like:
float blend=(value * 10.0) - float(floorIndex) - 0.5;
mix(colors[floorIndex], colors[index], blend)
That can probably be optimized a bit by using the fract() operation.
Another thing which comes to mind is that you could use a (1D) texture for your palette and enable GL_LINEAR filtering. In that case, you can directly use value as texture coordinate and will get the result you need. This will be much simpler, and likely also more efficient as it moves most of the operations to the specialised texture sampling hardware. So if you don't have a specific reason for not using a texture, I strongly recommend doing that.

OpenGL GLSL SSAO Implementation

I try to implement Screen Space Ambient Occlusion (SSAO) based on the R5 Demo found here:
In Fact I try to adapt their SSAO - Linear shader to fit into my own little engine.
1) I calculate View Space surface normals and Linear depth values.
I Store them in a RGBA texture using the following shader:
varNormalVS = normalize(vec3(vmtInvTranspMatrix * vertexNormal));
depth = (modelViewMatrix * vertexPosition).z;
depth = (-depth-nearPlane)/(farPlane-nearPlane);
gl_Position = pvmtMatrix * vertexPosition;
gl_FragColor = vec4(varNormalVS.x,varNormalVS.y,varNormalVS.z,depth)
For my linear depth calculation I referred to:
Is it correct?
Texture seem to be correct, but maybe it is not?
2) The actual SSAO Implementation:
As mentioned above the original can be found here:
or faster: on pastebin
In contrast to the original I only use 2 input textures since one of my textures stores both, normals as RGB and Linear Depht als Alpha.
My second Texture, the random normal texture, looks like this:
I use almost exactly the same implementation but my results are wrong.
Before going into detail I want to clear some questions first:
1) ssao shader uses projectionMatrix and it's inverse matrix.
Since it is a post processing effect rendered onto a screen aligned quad via orthographic projection, the projectionMatrix is the orthographic matrix. Correct or Wrong?
2) Having a combined normal and Depth texture instead of two seperate ones.
In my opinion this is the biggest difference between the R5 implementation and my implementation attempt. I think this should not be a big problem, however, due to different depth textures this is most likley to cause problems.
Please note that R5_clipRange looks like this
vec4 R5_clipRange = vec4(nearPlane, farPlane, nearPlane * farPlane, farPlane - nearPlane);
float GetDistance (in vec2 texCoord)
//return texture2D(R5_texture0, texCoord).r * R5_clipRange.w;
const vec4 bitSh = vec4(1.0 / 16777216.0, 1.0 / 65535.0, 1.0 / 256.0, 1.0);
return dot(texture2D(R5_texture0, texCoord), bitSh) * R5_clipRange.w;
I have to admit I do not understand the code snippet. My depth his stored in the alpha of my texture and I thought it should be enought to just do this
return texture2D(texSampler0, texCoord).a * R5_clipRange.w;
Correct or Wrong?
Your normal texture seems wrong. My guess is that your vmtInvTranspMatrix is a model-view matrix. However it should be model-view-projection matrix (note you need screen space normals, not view space normals). The depth calculation is correct.
I've implemented SSAO once and the normal texture looks like this (note there is no blue here):
1) ssao shader uses projectionMatrix and it's inverse matrix.
Since it is a post processing effect rendered onto a screen aligned quad via orthographic projection, the projectionMatrix is the orthographic matrix. Correct or Wrong ?
If you mean the second pass where you are rendering a quad to compute the actual SSAO, yes. You can avoid the multiplication by the orthogonal projection matrix altogether. If you render screen quad with [x,y] dimensions ranging from -1 to 1, you can use really simple vertex shader:
const vec2 madd=vec2(0.5,0.5);
void main(void)
gl_Position = vec4(in_Position, -1.0, 1.0);
texcoord = in_Position.xy * madd + madd;
2) Having a combined normal and Depth texture instead of two seperate
Nah, that won't cause problems. It's a common practice to do so.

In OpenGL vertex shader, gl_Position doesn't get homogenized

I was expecting gl_Position to automatically get homogenized (divided by w), but it seems not working.. Why do the followings make different results?
void main() {
vec4 p;
... omitted ...
gl_Position = projectionMatrix * p;
... same as above ...
p = projectionMatrix * p;
gl_Position = p / p.w;
I think the two are supposed to generate the same results, but it seems it's not the case. 1 doesn't work while 2 is working as expected.. Could it possibly be a precision problem? Am I missing something? This is driving me almost crazy.. helps needed. Many thanks in advance!
the perspective divide cannot be done before clipping, which happens after the vertex shader is completed. So there is no reason that you could observe the w divide in the vertex shader.
The GL will do the perspective divide before the rasterization of the triangles, before the fragment shader runs, though.
What are you trying to do that does not work ?
From the GLSL spec 1.2:
The variable gl_Position is available
only in the vertex language and is
intended for writing the
homogeneous vertex position.
So it's not automatically homogenized.