How Many Shader Programs Do I Really Need?

How Many Shader Programs Do I Really Need? - opengl

Let's say I have a shader set up to use 3 textures, and that I need to render some polygon that needs all the same shader attributes except that it requires only 1 texture. I have noticed on my own graphics card that I can simply call glDisableVertexAttrib() to disable the other two textures, and that doing so apparently causes the disabled texture data received by the fragment shader to be all white (1.0f). In other words, if I have a fragment shader instruction (pseudo-code)...
final_red = tex0.red * tex1.red * tex2.red
...the operation produces the desired final value regardless whether I have 1, 2, or 3 textures enabled. From this comes a number of questions:
Is it legit to disable expected textures like this, or is it a coincidence that my particular graphics card has this apparent mathematical safeguard?
Is the "best practice" to create a separate shader program that only expects a single texture for single texture rendering?
If either approach is valid, is there a benefit to creating a second shader program? I'm thinking it would be cost less time to make 2 glDisableVertexAttrib() calls than to make a glUseProgram() + 5-6 glGetUniform() calls, but maybe #4 addresses that issue.
When changing the active shader program with glUseProgram() do I need to call glGetUniform... functions every time to re-establish the location of each uniform in the program, or is the location of each expected to be consistent until the shader program is deallocated?

Disabling vertex attributes would not really disable your textures, it would just give you undefined texture coordinates. That might produce an affect similar to disabling a certain texture, but to do this properly you should use a uniform or possibly subroutines (if you have dozens of variations of the same shader).
As far as time taken to disable a vertex array state, that's probably going to be slower than changing a uniform value. Setting uniform values don't really affect the render pipeline state, they're just small changes to memory. Likewise, constantly swapping the current GLSL program does things like invalidate shader cache, so that's also significantly more expensive than setting a uniform value.
If you're on a modern GL implementation (GL 4.1+ or one that implements GL_ARB_separate_shader_objects) you can even set uniform values without binding a GLSL program at all, simply by calling glProgramUniform* (...)
I am most concerned with the fact that you think you need to call glGetUniformLocation (...) each time you set a uniform's value. The only time the location of a uniform in a GLSL program changes is when you link it. Assuming you don't constantly re-link your GLSL program, you only need to query those locations once and store them persistently.

Related

OpenGL: using instanced drawing to draw with the framebuffer I'm drawing into [duplicate]

This question already has an answer here:
(How) can a shader view the current render-buffer?
(1 answer)
Closed 3 years ago.
I'm trying to basically "add" framebuffers, or better the colortexture attachmement of framebuffers. I found a way to do this is by having a shader which gets all the textures and renders their combination.
But to improve the performance wouldn't it be better to just have one shader and framebuffer, and then through instanced drawing the shader draws onto the framebuffer colortexture attachement it is using for drawing?
A bit better explained:
I have 2 framebuffers: Default and Framebuffer1.
I bind Framebuffer1
and give the colortexture attachment of Framebuffer1 as uniform "Fb1_cta" to the following fragment shader:
out vec4 FragColor;
in vec2 TexCoords;
uniform sampler2D Fb1_cta;
void main()
{
vec3 text = texture(Framebuffer1, TexCoords).rgb;
FragColor = vec4(vec3(0.5) + text, 1.0);
}
So i draw into Framebuffer1, but also use the current colortexture attachement for the drawing.
Now I call glDrawArraysInstanced with instancecount 2.
The first renderpass should draw the whole texture in grey (rgb = (0.5, 0.5, 0.5)) and the second should add another vec3(0.5) to that, so the result will be white. That however didn't really work so I split the glDrawArraysInstanced into 2 glDrawArrays and checked the 2 results.
Now while the first pass works as intended:Result of first rendering
The second didn't (btw this is the same result as with the glDrawArraysInstanced):Result of second rendering
To me this pretty much looks like the two renderpasses aren't done sequentially, but in parallel. So I did rerun my code but this time with a bit of time passing between the calls and that seemed to have solved the issue.
Now I wonder is there any way to tell OpenGL that those calls should truly be sequential and might there even be a way to do it with glDrawArraysInstanced to improve the performance?
Is there in general a more elegant solution to this kind of problem?

In general, you cannot read from a texture image that is also being rendered to. To achieve the level of performance necessary for real-time rendering, it is essential to take advantage of parallelism wherever possible. Fragment shader invocations are generally not processed sequentially. On a modern GPU, there will be thousands and thousands of fragment shader invocations running concurrently during rendering. Even fragment shader invocations from separate draw calls. OpenGL and GLSL are designed specifically to enable this sort of parallelization.
From the OpenGL 4.6 specification, section 9.3.1:
Specifically, the values of rendered fragments are undefined if any shader stage
fetches texels and the same texels are written via fragment shader outputs, even
if the reads and writes are not in the same draw call, unless any of the following
exceptions apply:
The reads and writes are from/to disjoint sets of texels (after accounting for
texture filtering rules).
There is only a single read and write of each texel, and the read is in
the fragment shader invocation that writes the same texel (e.g. using
texelFetch2D(sampler, ivec2(gl_FragCoord.xy), 0);).
If a texel has been written, then in order to safely read the result a texel fetch
must be in a subsequent draw call separated by the command
void TextureBarrier( void );
TextureBarrier will guarantee that writes have completed and caches have
been invalidated before subsequent draw calls are executed.
The OpenGL implementation is allowed to (and, as you have noticed, will actually) run multiple drawcalls concurrently if possible. Across all the fragment shader invocations that your two drawcalls produce, you do have some that read and write from/to the same sets of texels. There is more than a single read and write of each texel from different fragment shader invocations. The drawcalls are not separated by a call to glTextureBarrier(). Thus, your code produces undefined results.
A drawcall alone does not constitute a rendering pass. A rendering pass is usually understood as the whole set of operations that produce a certain piece of output (like a particular image in a framebuffer) that is then usually again consumed as an input into another pass. To make your two draw calls "truly sequential", you could call glTextureBarrier() between issuing the draw calls.
But if all you want to do is draw two triangles, one after the other, on top of each other into the same framebuffer, all you have to do is draw two triangles and use additive blending. You don't need instancing. You don't need separate drawcalls. Just draw two triangles. OpenGL requires blending to take place in the order in which the triangles from which the fragments originated were specified. Be aware that if you happen to have depth testing enabled, chances are your depth test is going to prevent the second triangle from ever being drawn unless you did change the depth testing function to something other than the default.
The downside of blending is that you're limited to a set of a few fixed functions that you can select as your blend function. But add is one of them. If you need more complex blending functions, there are vendor-specific extensions that enable what is typically called "programmable blending" on some GPUs…
Note that all of the above only concerns drawcalls that read-from and write to the same target. Drawcalls that read from a target that ealier drawcalls rendered to are guaranteed to be sequenced after the drawcalls that produced their input.

OpenGL GLSL: Output from shader to memory

I need to have some variable/object in the graphics memory that can be accessed within my fragment shader and within my normal C# code. Preferably a 16Byte vec4
What I want to do:
[In C#] Read variable from graphic memory to cpu memory
[In C#] Set variable to zero
[In C#] Execute normal drawing of my scene
[In Shader] One of the fragment passes writes something to the variable (UPDATE)
Restart the loop
(UPDATE)
I pass the current mouse coordinates to the fragment shader with uniform variables. The fragment shader then checks if it is the corresponding pixel. If yes it writes a certain color for colorpicking into the variable. The reason I dont write to a FS output is that I simply didn't find any solution on the internet on how to get this output into my normal memory. Additionaly i would have an output for each pixel instead of one
What I want is basically a uniform variable that a shader can write to.
Is there any kind of variable/object that fits my needs and if so how performant will it be?

A "uniform" that your shader can write to is the wrong term. Uniform means uniform (as in the same value everywhere). If a specific shader invocation is changing the value, it is not uniform anymore.
You can use atomic counters for this sort of thing; increment the counter for every test that passes and later check for non-zero state. This is vastly simpler than setting up a general-purpose Shader Storage Buffer Object and then worrying about making memory access to it coherent.
Occlusion queries are also available for older hardware. They work surprisingly similarly to atomic counters, where you can (very roughly) count the number of fragments that pass a depth test. Do not count on its accuracy, use discard in your fragment shader for any pixel that does not pass your test condition and then test for a non-zero fragment count in the query readback.
As for performance, as long as you can deal with a couple frames worth of latency between issuing a command and later using the result, you should be fine.
If you try to use either, an atomic counter or an occlusion query and read-back the result during the same frame, you will stall the pipeline and eliminate CPU/GPU parallelism.
I would suggest inserting a fence sync object in the command stream and then checking the status of the fence once per-frame before attempting to read results back. This will prevent stalling.

OpenGL Texture Usage

So recently I started reading through OpenGL wiki articles. This is how I picture OpenGL texturing described from here. Though, several points are not clear yet.
Will following statements be true, false or depends?
Binding two textures to same texture unit is impossible.
Binding two samplers to same texture unit is impossible.
Binding one texture to two different texture units is impossible.
Binding one sampler to two different texture units is impossible.
It is application's responsibility to be clear about what sampler type is passed to what uniform variable.
It is shader program's responsibility to make sure to take sampler as correct type of uniform variable.
number of texture units are large enough. Let each mesh loaded to application occupy as much texture unit as it please.
Some Sampler parameters are duplicate of texture parameters. They will override texture parameter setting.
Some Sampler parameters are duplicate of sampler description in shader program. Shader program's description will override samplers parameters.

I'm going through your statements in the following. Sometimes I will argue with quotes from the OpenGL 4.5 core profile specification. None of that is specific to GL 4.5, I just chose it because that is the most recent version.
1. Binding two textures to same texture unit is
impossible.
If I'd say "false", it would be probably misleading. The exact statement would be "Binding two textures to the same target of the same texture unit is impossible." Technically, you can say, bind a 2D texture and a 3D texture to the same unit. But you cannot use both in the same draw call. Note that this is a dynamic error condition which depends on what values you set the sampler uniforms to. Quote from section 7.10 "Samplers" of the GL spec:
It is not allowed to have variables of different sampler types
pointing to the same texture image unit within a program object. This
situation can only be detected at the next rendering command issued
which triggers shader invocations, and an INVALID_OPERATION error will
then be generated.
So the GL will detect this error condition as soon as you actually try to draw something (or otherwise trigger shader invocations) with that shaders while you configured it such that two sampler uniforms reference different targets of the same unit. But it is not an error before. If you temporarily set both uniforms to the same value but do not try to draw in that state, no error is ever generated.
2. Binding two samplers to same texture unit is impossible.
You probably mean Sampler Objects (as opposed to just "sampler" types in GLSL), so this is true.
3. Binding one texture to two different texture units is impossible.
False. You can bind the same texture object to as many units as there are available. However, that is quite a useless operation. Back in the days of the fixed-function pipeline, there were some corner cases where this was of limited use. For example, I've saw someone binding the same texture twice and use register combiners to multiply both together, because he needed some square operation. However, with shaders, you can sample the texture once and do anything you want with the result, so there is really no use case left for this.
4. Binding one sampler to two different texture units is impossible.
False. A single sampler object can be referenced by multiple texture units. You can just create a sampler object for each sampling state you need, no need to create redundant ones.
5. It is application's responsibility to be clear about what sampler type is passed to what uniform variable.
6. It is shader program's responsibility to make sure to take sampler as correct type of uniform variable.
I'm not really sure what exaclty you are asking here. The sampler variable in your shader selectes the texture target and must also match the internal data fromat of the texture object you want to use (i.e. for isampler or usampler, you'll need unnormalized integer texture formats otherwise results are undefined).
But I don't know what "what sampler type is passed to what uniform variable" is supposed to mean here. As far as the GL client side is concerned, the opaque sampler uniforms are just something which can be set to the index of the texture unit which is to be used, and that is done as an integer via glUniform1i or the like. There is no "sampler type" passed to a uniform variable.
7. number of texture units are large enough. Let each mesh loaded to application occupy as much texture unit as it please.
Not in the general case. The required GL_MAX_TEXTURE_IMAGE_UNITS (which defines how many different texture units a fragment shader can access) by the GL 4.5 spec is just 16. (There are separate limits per shder stage, so there is GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS, GL_MAX_GEOMETRY_TEXTURE_IMAGE_UNITS and so on. They are all required to be at least 16 in the current spec.)
Usually, you have to switch textures inbetween draw calls. The usage of array textures and texture atlasses might allow one to further reduce the number of necessary state switches (and, ultimately, draw calls).
Very modern GPUs also support GL_ARB_bindless_texture, which completely bypasses the "texture unit" layer of indirection and allows the shader to directly reference a texture object by some opaque handle (which basically boils down to referencing some virtual GPU memory address under the hood). However, that feature is not yet part of the OpenGL standard.
8. Some Sampler parameters are duplicate of texture parameters. They will override texture parameter setting.
Yes. Traditionally, there were no separate sampler objects in the GL. Instead, the sampler states like filtering or wrap modes were part of the texture object itself. But modern hardware does not operate this way, so the sampler objects API has been introduced as the GL_ARB_sampler_objects extension (nowadays, a core feature of GL). If a sampler object is bound to a texture unit, its settings will override the sampler state present in the texture object.
9. Some Sampler parameters are duplicate of sampler description in shader program. Shader program's description will override samplers parameters.
I'm not sure what you mean by that. What "sampler descripitons" does a shader program define? There is only the declaration of the uniform and possibly the initialization via layout(binding=...). However, that is just the initial value. The client can update that any time by setting the uniform to another value, so that is not really "overriding" anything. But I'm not sure if you mean that.

Tessellation Shaders

I am trying to learn tessellation shaders in openGL 4.1. I understood most of the things. I have one question.
What is gl_InvocationID?
Can any body please explain in some easy way?

gl_InvocationID has two current uses, but it represents the same concept in both.
In Geometry Shaders, you can have GL run your geometry shader multiple times per-primitive. This is useful in scenarios where you want to draw the same thing from several perspectives. Each time the shader runs on the same set of data, gl_InvocationID is incremented.
The common theme between Geometry and Tessellation Shaders is that each invocation shares the same input data. A Tessellation Control Shader can read every single vertex in the input patch primitive, and you actually need gl_InvocationID to make sense of which data point you are supposed to be processing.
This is why you generally see Tessellation Control Shaders written something like this:
gl_out [gl_InvocationID].gl_Position = gl_in [gl_InvocationID].gl_Position;
gl_in and gl_out are potentially very large arrays in Tessellation Control Shaders (equal in size to GL_PATCH_VERTICES), and you have to know which vertex you are interested in.
Also, keep in mind that you are not allowed to write to any index other than gl_out [gl_InvocationID] from a Tessellation Control Shader. That property keeps invoking Tessellation Control Shaders in parallel sane (it avoids order dependencies and prevents overwriting data that a different invocation already wrote).

OpenGL 4.1 GL_ARB_separate_program_objects usefulness

I have been reading this OpenGL4.1 new features review.I don't really understand the idea behind GL_ARB_separate_program_objects usage , at least based on how the post author puts it:
It allows to independently use shader stages without changing others
shader stages. I see two mains reasons for it: Direct3D, Cg and even
the old OpenGL ARB program does it but more importantly it brings some
software design flexibilities allowing to see the graphics pipeline at
a lower granularity. For example, my best enemy the VAO, is a
container object that links buffer data, vertex layout data and GLSL
program input data. Without a dedicated software design, this means
that when I change the material of an object (a new fragment shader),
I need different VAO... It's fortunately possible to keep the same VAO
and only change the program by defining a convention on how to
communicate between the C++ program and the GLSL program. It works
well even if some drawbacks remains.
Now ,this line :
For example, my best enemy the VAO, is a container object that links buffer data, vertex layout data and GLSL program input data.Without a dedicated software design, this means that when I change the material of an object (a new fragment shader), I need different VAO...
makes me wonder.In my OpenGL programs I use VAO objects and I can switch between different shader programs without doing any change to VAO itself.So ,have I misunderstood the whole idea? Maybe he means we can switch shaders for the same program without re-linking ?

I'm breaking this answer up into multiple parts.
What the purpose of ARB_separate_shader_objects is
The purpose of this functionality is to be able to easily mix-and-match between vertex/fragment/geometry/tessellation shaders.
Currently, you have to link all shader stages into one monolithic program. So I could be using the same vertex shader code with two different fragment shaders. But this results in two different programs.
Each program has its own set of uniforms and other state. Which means that if I want to change some uniform data in the vertex shader, I have to change it in both programs. I have to use glGetUniformLocation on each (since they could have different locations). I then have to set the value on each one individually.
That's a big pain, and it's highly unnecessary. With separate shaders, you don't have to. You have a program that contains just the vertex shader, and two programs that contain the two fragment shaders. Changing vertex shader uniforms doesn't require two glGetUniformLocation calls. Indeed, it's easier to cache the data, since there's only one vertex shader.
Also, it deals with the combinatorial explosion of shader combinations.
Let's say you have a vertex shader that does simple rigid transformations: it takes a model-to-camera matrix and a camera-to-clip matrix. Maybe a matrix for normals too. And you have a fragment shader that will sample from some texture, do some lighting computations based on the normal, and return a color.
Now let's say you add another fragment shader that takes extra lighting and material parameters. It doesn't have any new inputs from the vertex shaders (no new texture coordinates or anything), just new uniforms. Maybe it's for projective lighting, which the vertex shader isn't involved with. Whatever.
Now let's say we add a new vertex shader that does vertex weighted skinning. It provides the same outputs as the old vertex shader, but it has a bunch of uniforms and input weights for skinning.
That gives us 2 vertex shaders and 2 fragment shaders. A total of 4 program combinations.
What happens when we add 2 more compatible fragment shaders? We get 8 combinations. If we have 3 vertex and 10 fragment shaders, we have 30 total program combinations.
With separate shaders, 3 vertex and 10 fragment shaders needs 30 program pipeline objects, but only 13 program objects. That's over 50% fewer program objects than the non-separate case.
Why the quoted text is wrong
Now ,this line [...] makes me wonder.
It should make you wonder; it's wrong in several ways. For example:
the VAO, is a container object that links buffer data, vertex layout data and GLSL program input data.
No, it does not. It ties buffer objects that provide vertex data to the vertex formats for that data. And it specifies which vertex attribute indices that goes to. But how tightly coupled this is to "GLSL program input data" is entirely up to you.
Without a dedicated software design, this means that when I change the material of an object (a new fragment shader), I need different VAO...
Unless this line equates "a dedicated software design" with "reasonable programming practice", this is pure nonsense.
Here's what I mean. You'll see example code online that does things like this when they set up their vertex data:
glBindBuffer(GL_ARRAY_BUFFER, buffer_object);
glEnableVertexAttribArray(glGetAttribLocation(prog, "position"));
glVertexAttribPointer(glGetAttribLocation(prog, "position"), ...);
There is a technical term for this: terrible code. The only reason to do this is if the shader specified by prog is somehow not under your direct control. And if that's the case... how do you know that prog has an attribute named "position" at all?
Reasonable programming practice for shaders is to use conventions. That's how you know prog has an attribute named "position". But if you know that every program is going to have an attribute named "position", why not take it one step further? When it comes time to link a program, do this:
GLuint prog = glCreateProgram();
glAttachShader(prog, ...); //Repeat as needed.
glBindAttribLocation(prog, 0, "position");
After all, you know that this program must have an attribute named "position"; you're going to assume that when you get it's location later. So cut out the middle man and tell OpenGL what location to use.
This way, you don't have to use glGetAttribLocation; just use 0 when you mean "position".
Even if prog doesn't have an attribute named "position", this will still link successfully. OpenGL doesn't mind if you bind attribute locations that don't exist. So you can just apply a series of glBindAttribLocation calls to every program you create, without problems. Indeed, you can have multiple conventions for your attribute names, and as long as you stick to one set or the other, you'll be fine.
Even better, stick it in the shader and don't bother with the glBindAttribLocation solution at all:
#version 330
layout(location = 0) in vec4 position;
In short: always use conventions for your attribute locations. If you see glGetAttribLocation in a program, consider that a code smell. That way, you can use any VAO for any program, since the VAO is simply written against the convention.
I don't see how having a convention equates to "dedicated software design", but hey, I didn't write that line either.

I can switch between different shader programs
Yes, but you have to replace whole programs altogether. Separate shader objects allow you to replace only one stage (e.g. only vertex shader).
If you have for example N vertex shaders and M vertex shaders, using conventional linking you would have N * M program objects (to cover all posible combinations). Using separate shader objects, they are separated from each other, and thus you need to keep only N + M shader objects. That's a significant improvement in complex scenarios.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js