I'm looking for a way to access OpenGL states from a shader. The GLSL Quick Reference Guide, an awesome resource, couldn't really help me out on this one.
In the example I'm working on I have the two following shaders:
Vertex:
void main()
{
gl_FrontColor = gl_Color;
gl_TexCoord[0] = gl_MultiTexCoord0;
gl_Position = ftransform();
}
Fragment:
uniform sampler2D tex;
uniform float flattening;
void main( void )
{
vec4 texel;
texel = texture2D(tex, gl_TexCoord[0].st);
texel.r *= gl_Color.r;
texel.g *= gl_Color.g;
texel.b *= gl_Color.b;
texel.a *= gl_Color.a;
gl_FragColor = texel;
}
When I'm rendering polygons that are not textured, their alpha values are correct, but they're assigned the color black.
1, What conditional check can I setup so that the variable 'texel' will be set a vec4(1.0, 1.0, 1.0, 1.0) rather than sampling from a texture when GL_TEXTURE_2D is disabled?
2, Would the processing be faster if I wrote different shaders for different texturing modes and switch between them where I would use glEnable/glDisable(GL_TEXTURE_2D)?
Sorry, but you can't access that kind of state from GLSL, period.
In fact, in the future GLSL you have to send all unforms/attributes yourself, i.e. no automagic gl_ModelViewMatrix, gl_LightPosition, gl_Normal or the like. Only basic stuff like gl_Position and gl_FragColor will be available.
That sort of voids your second question, but you could always use #ifdef's to enable/disable parts in your shader if you find that more convenient than writing separate shaders for different texture modes.
Related, do note that branching is generally rather slow so if you have the need for speed do avoid it as much as possible. (This matters especially irregular dynamic branching, as fragments are processed in SIMD blocks and all fragments in a block must compute the same instructions, even if they only apply for one or a few fragments.)
One method I've seen is to bind a 1x1 texture with a single texel of (1.0, 1.0, 1.0, 1.0) when rendering polygons that are not textured. A texture switch should be less expensive than a shader switch and a 1x1 texture will fit entirely inside the texture cache.
Related
I'm trying to implement omni-directional shadow mapping by following this tutorial from learnOpenGL, its idea is very simple: in the shadow pass, we're going to capture the scene from the light's perspective into a cubemap (shadow map), and we can use the geometry shader to build the depth cubemap with just one render pass. Here's the shader code for generating our shadow map:
vertex shader
#version 330 core
layout (location = 0) in vec3 aPos;
uniform mat4 model;
void main() {
gl_Position = model * vec4(aPos, 1.0);
}
geometry shader
#version 330 core
layout (triangles) in;
layout (triangle_strip, max_vertices=18) out;
uniform mat4 shadowMatrices[6];
out vec4 FragPos; // FragPos from GS (output per emitvertex)
void main() {
for (int face = 0; face < 6; ++face) {
gl_Layer = face; // built-in variable that specifies to which face we render.
for (int i = 0; i < 3; ++i) // for each triangle vertex {
FragPos = gl_in[i].gl_Position;
gl_Position = shadowMatrices[face] * FragPos;
EmitVertex();
}
EndPrimitive();
}
}
fragment shader
#version 330 core
in vec4 FragPos;
uniform vec3 lightPos;
uniform float far_plane;
void main() {
// get distance between fragment and light source
float lightDistance = length(FragPos.xyz - lightPos);
// map to [0;1] range by dividing by far_plane
lightDistance = lightDistance / far_plane;
// write this as modified depth
gl_FragDepth = lightDistance;
}
Compared to classic shadow mapping, the main difference here is that we are explicitly writing to the depth buffer, with linear depth values between 0.0 and 1.0. Using this code I can correctly cast shadows in my own scene, but I cannot fully understand the fragment shader, and I think this code is flawed, here is why:
Image that we have 3 spheres sitting on a floor, and a point light above the spheres. Looking down the floor from the point light, we can see the -z slice of the shadow map: (in RenderDoc textures are displayed bottom up, sorry for that).
If we write gl_FragDepth = lightDistance in the fragment shader, we are manually updating the depth buffer so the hardware cannot perform the early z test, as a result, every fragment will go through our shader code to update the depth buffer, no fragment is discarded early to save performance. Now what if we draw the floor after the spheres?
The sphere fragments will write to the depth buffer first (per sample), followed by the floor fragments, but since the floor is farther away from the point light, it will overwrite the depth values of the sphere with larger values, and the shadow map will be incorrect. In this case, the order of drawing is important, distant objects must be drawn first, but it's not always possible to sort depth values for complex geometry. Perhaps we need something like order-independent transparency here?
To make sure that only the closest depth values are written to the shadow map, I modified the fragment shader a little bit:
// solution 1
gl_FragDepth = min(gl_FragDepth, lightDistance);
// solution 2
if (lightDistance < gl_FragDepth) {
gl_FragDepth = lightDistance;
}
// solution 3
gl_FragDepth = 1.0;
gl_FragDepth = min(gl_FragDepth, lightDistance);
However, according to the OpenGL specification, none of them is going to work. Solution 2 cannot work because, if we were to update gl_FragDepth manually, we must update it in all execution paths. As for solution 1, when we clear the depth buffer using glClearNamedFramebufferfv(id, GL_DEPTH, 0, &clear_depth), the depth buffer will be filled with value clear_depth, which is usually 1.0, but the default value of gl_FragDepth variable is not the same as clear_depth, it is actually undefined, so could be anything between 0 and 1. On my driver the default value is 0, so gl_FragDepth = min(0.0, lightDistance) is 0, the shadow map will be completely black. Solution 3 also won't work because we are still overwriting the previous depth value.
I learned that for OpenGL 4.2 and above, we can enforce the early z test by redeclaring the gl_FragDepth variable using:
layout (depth_<condition>) out float gl_FragDepth;
since my depth comparision function is the default glDepthFunc(GL_LESS), the condition needs to be depth_greater in order for the hardware to do early z. Unfortunately, this also won't work as we are writing linear depth values to the buffer, which are always less than the default non-linear depth value gl_FragCoord.z, so the condition is really depth_less. Now I'm completely stuck, the depth buffer seems to be way more difficult than I thought.
Where might my reasoning be wrong?
You said:
The sphere fragments will write to the depth buffer first (per sample),
followed by the floor fragments, but since the floor is farther away from the
point light, it will overwrite the depth values of the sphere with larger
values, and the shadow map will be incorrect.
But if your fragment shader is not using early depth tests, then the hardware will perform depth testing after the fragment shader has executed.
From the OpenGL 4.6 specification, section 14.9.4:
When...the active program was linked with early fragment tests disabled,
these operations [including depth buffer test] are performed only after
fragment program execution
So if you write to gl_FragDepth in the fragment shader, the hardware cannot take advantage of the speed gain of early depth testing, as you said, but that doesn't mean that depth testing won't occur. So long as you are using GL_LESS or GL_LEQUAL for the depth test, objects that are further away won't obscure objects that are closer.
I'm rendering light spheres for a deferred renderer and I'm in the process of switching to instancing for better performance. I have the following vertex shader:
in vec3 position;
uniform int test_index;
uniform mat4 projectionMatrix;
uniform mat4 viewMatrix;
uniform mat4 modelMatrix[256];
void main(void) {
gl_Position = projectionMatrix * viewMatrix * modelMatrix[test_index] * vec4(position, 1.0);
}
I upload the matrices to the shader with
val location = glGetUniformLocation(programId, "modelMatrix[$i]")
glUniformMatrix4fv(location, false, buf)
When I use the int uniform in the index (Hardcoded to a 0 for debugging purposes), the sphere disappears, except when I clip into geometry (in which case it renders as a white circle). The same happens when I use gl_InstanceID as my index.
Weirdly I noticed that the problem also occurs when I pass an int from vertex to fragment shader and use it there for something completely different, regardless of what I use as the index.
The problem disappears instantly and rendering is completely fine when I hardcode modelMatrix[0] in the shader instead of modelMatrix[test_index].
I've got a different shader (for skeletal animation) which uploads a mat4 uniform array the exact same way, also being indexed with an int, but I've got no such problems there...
I don't really know what to make of this, so any advice on how I can debug this is much appreciated. I'm using OpenGL 3.3 on Kotlin+LWJGL
Edit: This probably has nothing to do with the uniform. The following also does not work:
int i = 0;
gl_Position = projectionMatrix * viewMatrix * modelMatrix[i] * vec4(position, 1.0f);
OpenGL has a limit on how many uniforms one can use. The same applies to attributes too (but that's not the problem here). An array of 256 matrices is very likely to exceed the allowed amount.
The reason why the code only breaks when using the int uniform is that glsl compilers do a lot of optimization under the hood, for example, removing unused uniforms. So if you hardcode the array location in the shader, the compiler will notice that only a single matrix is ever used and might remove all the others.
When you need more uniforms than what OpenGL allows for, you have to use a uniform buffer object (UBO) or a shader storage buffer (SSBO).
I would like to use shaders to increase my in-game performance.
As you can see, I am cutting out the far-away pixels from rendering, using the discard function.
However, for some reason this is not increasing FPS. I have got the exact same 5 FPS that I have got without any shaders at all! The low FPS is because I chose to render an absolute ton of trees, but since they are not seen... Then why are they causing lag!?
My fragment shader code:
varying vec4 position_in_view_space;
uniform sampler2D color_texture;
void main()
{
float dist = distance(position_in_view_space,
vec4(0.0, 0.0, 0.0, 1.0));
if (dist < 1000.0)
{
gl_FragColor = gl_Color;
// color near origin
}
else
{
//kill;
discard;
//gl_FragColor = vec4(0.3, 0.3, 0.3, 1.0);
// color far from origin
}
}
My vertex shader code:
varying vec4 position_in_view_space;
void main()
{
gl_TexCoord[0] = gl_MultiTexCoord0;
gl_FrontColor = gl_Color;
position_in_view_space = gl_ModelViewMatrix * gl_Vertex;
// transformation of gl_Vertex from object coordinates
// to view coordinates;
gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
}
(I have got very slightly modified fragment shader for the terrain texturing and a lower distance for the trees (Ship in that screenshot is too far from trees), however it is basically the same idea.)
So, could someone please tell me why discard is not improving performance? If it can't, is there another way to make unwanted vertices not render (or render less vertices further away) to increase performance?
using discard will actually prevent the GPU from using the depth buffer before invoking the fragment shader.
It's much simpler to adjust the far plane of the projection matrix so unwanted vertices are outside the render box.
Also consider not calling the glDraw for distant trees in the first place.
you can for example group the trees per island and then per island check if it is close enough for the trees to render and just not call glDraw* when it's not.
I recently read (for the first time) that passing an array with texture coordinates to the fragment shader for multiple lookups of a sampler is way quicker than calculating them in the fragment shader, because opengl can then prefetch these pixels. Is that true ? If so, what are the limitations ? Do I have to tell opengl that these coordinates will be used a texCoords ?
As with most performance questions, the answer is dependent on the OpenGL implementation. I know of at least one implementation which does have this optimization and one which doesn't.
In the implementations I know anything about, the prefetch happens only if you use a fragment shader input (i.e. a variable declared with varying or in) directly as an argument to a texture function. So:
in vec2 tex_coord;
main()
{
gl_FragColor = texture( diffuse_texture, tex_coord );
}
would qualify, but:
gl_FragColor = texture( diffuse_texture, 0.5 * tex_coord + 0.5 );
would not. Note that in this case the calculation could easily be done in the vertex shader. Also the compiler might be smart enough to calculate the texture coordinate early. The case most likely to hurt performance is where one set of texture coordinates is dependent on the result of another texture look-up:
vec2 tex_coord2 = texture( offset_texture, tex_coord ).rg;
gl_FragColor = texture( diffuse_texture, tex_coord2 );
Now the second texture look up can't happen until the first one completes.
You don't have to tell OpenGL that the coordinates will be used as texture coordinates.
I am trying to make a custom light shader and was trying a lot of different things over time.
Some of the solutions I found work better, others worse. For this question I'm using the solution which worked best so far.
My problem is, that if I move the "camera" around, the light positions seems to move around, too. This solution has very slight but noticeable movement in it and the light position seems to be above where it should be.
Default OpenGL lighting (w/o any shaders) works fine (steady light positions) but I need the shader for multitexturing and I'm planning on using portions of it for lighting effects once it's working.
Vertex Source:
varying vec3 vlp, vn;
void main(void)
{
gl_Position = ftransform();
vn = normalize(gl_NormalMatrix * -gl_Normal);
vlp = normalize(vec3(gl_LightSource[0].position.xyz) - vec3(gl_ModelViewMatrix * -gl_Vertex));
gl_TexCoord[0] = gl_MultiTexCoord0;
}
Fragment Source:
uniform sampler2D baseTexture;
uniform sampler2D teamTexture;
uniform vec4 teamColor;
varying vec3 vlp, vn;
void main(void)
{
vec4 newColor = texture2D(teamTexture, vec2(gl_TexCoord[0]));
newColor = newColor * teamColor;
float teamBlend = newColor.a;
// mixing the textures and colorizing them. this works, I tested it w/o lighting!
vec4 outColor = mix(texture2D(baseTexture, vec2(gl_TexCoord[0])), newColor, teamBlend);
// apply lighting
outColor *= max(dot(vn, vlp), 0.0);
outColor.a = texture2D(baseTexture, vec2(gl_TexCoord[0])).a;
gl_FragColor = outColor;
}
What am I doing wrong?
I can't be certain any of these are the problem, but they could cause one.
First, you need to normalize your per-vertex vn and vlp (BTW, try to use more descriptive variable names. viewLightPosition is a lot easier to understand than vlp). I know you normalized them in the vertex shader, but the fragment shader interpolation will denormalize them.
Second, this isn't particularly wrong so much as redundant. vec3(gl_LightSource[0].position.xyz). The "position.xyz" is already a vec3, since the swizzle mask (".xyz") only has 3 components. You don't need to cast it to a vec3 again.