OpenGL - switch shaders - opengl

I'm trying to get my head around the following:
I've an object that I want to be able to render with two different sets of vertex/fragment shaders, that each have their uniforms and textures, and go back and forth between those two settings. (I know in this case I could have just one shader with a uniform dictating which logic to run, but this is part of a larger things where I can't do that)
Should I use one or two gl programs (created by glCreateProgram())?
If I use two programs, is it fine to discard the one not being used, and rebuild it if need later on? Or is it too slow?
If I use just one program:
can I compile shaders just once at the beginning?
when switching should I detach old shaders, attach new ones and link the program again?
should I recompute all uniforms locations after linking?
should I rebind array buffers after linking?
should I do something to remove previously attached textures?

Should I use one or two gl programs (created by glCreateProgram())?
It depends on the case. A general rule could be avoid branching inside shader code. So if you have 2 different shaders, for 2 different effects, just compile 2 programs and bind the desired one.
If I use two programs, is it fine to discard the one not being used,
and rebuild it if need later on? Or is it too slow?
This is generally not needed and wrong (maybe unless you have huge memory issues). Shader compilation is a slow process.
It's more common to compile all the necessary resources once, at the application startup or the first time needed, and leave them allocated and ready to use.
can I compile shaders just once at the beginning?
Yes.
For all the remaining questions: I think you are taking the wrong approach.
I'd say:
Compile all shader variants at the beginning (or when you need a program)
Bind a shader and the relative resources, set uniforms before drawing with that shader.
When change shader, re-bind the resources or update uniform values only if different and needed
Release resources and programs at the end of the application

Related

OO architecture for simple shader-based GL program

I'm designing an OO program (in C++) which deals with moderately-simple graphics implemented with OpenGL. I've written a couple of vertex shader programs for my Drawable objects to use. In addition, I've encapsulated shader management (compilation, linking and usage) in a Shader class.
My question is, since all my classes will be rendering visual objects using this small set of shaders and they all have a pointer to a Shader object, does it make sense to provide a common reference for all of them to use, so that it would avoid having "the same" shader code compiled more than once?
If so, why? Is it really important to prevent duplication of shader code? (My program will likely have thousands of independent visual elements that need to be rendered together). I'm new to OpenGL and performance/efficiency issues are still very obscure to me...
EDIT: Moreover, I wonder what will then happen with my shader uniforms; will they be shared as well? How's that supposed to allow me to, e.g. rotate my elements at a different rate? Is it better to write element-uniforms (i.e. the model matrix) every time I want to draw each element, than to have replicated shader code?
I would wager that in most if not all OpenGL implementations, compiling and linking the same shader multiple times would result in multiple copies of the shader binaries and space for uniforms, etc. Calling glUseProgram to switch between your duplicate copies will still cause a state change, despite the same code being run on your GPU cores before and after the call. With a sufficiently complex enough scene, you'll probably be switching textures as well so there will be a state change anyways.
It may not be your bottleneck, but it certainly is wasteful. A good pattern for static content like shaders and textures is to have one or more Manager classes (AssetManager, TextureManager, etc.) that will lazy-load (or pre-load) all of your assets and give you a shared pointer (or some other memory-management strategy) when you ask for it, typically by some string ID.
About the edit:
Yes, your uniforms will be shared and will also remain loaded after you unbind. This is the preferred way to do it because updating a uniform is more than an order of magnitude faster than binding a new shader. You would just set the model matrix uniforms for every new object but keep the same shader.
Uniforms are stored with the shader, so switching shaders means loading in all of the uniforms anyways.

Correct usage / purpose of OpenGL Program Pipeline Objects

With OpenGL 4.1 and ARB_separate_shader_objects, we are able to store different stages of the shading pipeline in shader programs. As we know, to use these, we need attach them to a Program Pipeline Object, which is then bound.
My question is, why do we need the program pipeline objects at all? In my renderer, I have only one of these, and I change it's attachments to change shaders. I can't think of any case where you'd actually want more than one of these. If you store many pipeline objects, each containing different combinations of shader programs, then things end up even messier than not using separate shaders at all.
So, what is the purpose of the pipeline object? Is changing attachments (much) more expensive than binding a different pipeline object? What's the reason that the spec has this, rather than, say, having glUseProgramStages operate in the same way as glUseProgram?
The principle reason pipeline objects exist is that linking stages together in a program object did have certain advantages. For example, there are a lot of inter-shader-stage validation rules. And if the sequence of separate programs aren't valid, then people need to know.
With a program that links all stages together, you can detect these validation failures at link time. All of these tests are done precisely once and no more.
If you made "glUseProgramStages operate in the same way as glUseProgram", then every single time you render with a new set of shaders, the system will have to do inter-stage validation tests. Pipelines represent a convenient way to cache such tests. If you set their programs once and never modify them afterwards, then the result of validation for a pipeline will never change. Thus validation happens exactly once, just as it did for multi-stage programs.
Another issue is that implementations may need to do some minor shader fixup work when associating certain programs with each other. Pipeline objects represent a convenient place to cache such fix-up work. Without them, they'd have to be done every single time you change shaders.
Why do we need the program pipeline objects?
We don't need program pipeline objects they are purely optional. Using one Program Object for every shader combination that is in use is the easiest and most common way to do it.
So, what is the purpose of the pipeline object?
From https://www.opengl.org/registry/specs/ARB/separate_shader_objects.txt:
[...] Many developers build their shader content around the
mix-and-match approach where they can use a single vertex shader with
multiple fragment shaders (or vice versa). This extension adopts a "mix-and-match" shader stage model for GLSL
allowing multiple different GLSL program objects to be bound at once
each to an individual rendering pipeline stage independently of
other stage bindings. This allows program objects to contain only
the shader stages that best suit the applications needs. [...]

Will updating a uniform value stall the whole rendering pipeline?

The glBufferSubData manpage's notes section contains the following paragraph:
Consider using multiple buffer objects to avoid stalling the rendering pipeline during data store updates. If any rendering in the pipeline makes reference to data in the buffer object being updated by glBufferSubData, especially from the specific region being updated, that rendering must drain from the pipeline before the data store can be updated.
While the glUniform* manpage doesn't mention the pipeline at all.
However, I would have thought that uniforms are just as important as buffers, given that they're supposed to be uniform across all shader invocations.
So, if I perform a draw call, change a uniform value and then perform another draw call on the same shader, will both draw calls run concurrently with different uniform values, or will the second draw call have to wait until every stage (vert/geom/frag) is complete on the first one?
The question in its general form is pretty much unanswerable. However consider this:
Since the advent of GLSL, and ARB's assembly language before that, uniform/parameter state has always been stored in the shader object. Only since uniform blocks and buffer objects has it been possible to separate uniform state from programs. So until that point, a good 5+ years, the only way to change a uniform was to change it in the program.
This means that pretty much every program that uses GLSL uses it in the standard way: bind a program, change uniforms, render, change uniforms, render, etc.
Now, imagine if doing this simple and obvious thing which hundreds of OpenGL programs did induced a full pipeline stall.
Driver developers are not stupid; even Intel's driver developers aren't that stupid. Whatever their hardware looks like, they can find a way to make uniform changes not induce a pipeline stall.

How exactly is GLSL's "coherent" memory qualifier interpreted by GPU drivers for multi-pass rendering?

The GLSL specification states, for the "coherent" memory qualifier: "memory variable where reads and writes are coherent with reads and writes from other shader invocations".
In practice, I'm unsure how this is interpreted by modern-day GPU drivers with regards to multiple rendering passes. When the GLSL spec states "other shader invocations", does that refer to shader invocations running only during the current pass, or any possible shader invocations in past or future passes? For my purposes, I define a pass as a "glBindFramebuffer-glViewPort-glUseProgram-glDrawABC-glDrawXYZ-glDraw123" cycle; where I'm currently executing 2 such passes per "render loop iteration" but may have more per iteration later on.
When the GLSL spec states "other shader invocations", does that refer to shader invocations running only during the current pass, or any possible shader invocations in past or future passes?
It means exactly what it says: "other shader invocations". It could be the same program code. It could be different code. It doesn't matter: shader invocations that aren't this one.
Normally, OpenGL handles synchronization for you, because OpenGL can track this fairly easily. If you map a range of a buffer object, modify it, and unmap it, OpenGL knows how much stuff you've (potentially) changed. If you use glTexSubImage2D, it knows how much stuff you changed. If you do transform feedback, it can know exactly how much data was written to the buffer.
If you do transform feedback into a buffer, then bind it as a source for vertex data, OpenGL knows that this will stall the pipeline. That it must wait until the transform feedback has completed, and then clear some caches in order to use the buffer for vertex data.
When you're dealing with image load/store, you lose all of this. Because so much could be written in a completely random, unknown, and unknowable fashion, OpenGL generally plays really loose with the rules in order to allow you flexibility to get the maximum possible performance. This triggers a lot of undefined behavior.
In general, the only rules you can follow are those outlined in section 2.11.13 of the OpenGL 4.2 specification. The biggest one (for shader-to-shader talk) is the rule on stages. If you're in a fragment shader, it is safe to assume that the vertex shader(s) that specifically computed the point/line/triangle for your triangle have completed. Therefore, you can freely load values that were stored by them. But only from the ones that made you.
Your shaders cannot make assumptions that shaders executed in previous rendering commands have completed (I know that sounds odd, given what was just said, but remember: "only from the ones that made you"). Your shaders cannot make assumptions that other invocations of the same shader, using the same uniforms, images, textures, etc, in the same rendering command have completed, except where the above applies.
The only thing you can assume is that writes your shader instance itself made are visible... to itself. So if you do an imageStore and do an imageLoad to the same memory location through the same variable, then you are guaranteed to get the same value back.
Well, unless someone else wrote to it in the meantime.
Your shaders cannot assume that a later rendering command will certainly fetch values written (via image store or atomic updates) by a previous one. No matter how much later! It doesn't matter what you've bound to the context. It doesn't matter what you've uploaded or downloaded (technically. Odds are you'll get correct behavior in some cases, but undefined behavior is still undefined).
If you need that guarantee, if you need to issue a rendering command that will fetch values written by image store/atomic updates, you must explicitly ask synchronize memory sometime after issuing the writing call and before issuing the reading call. This is done with glMemoryBarrier.
Therefore, if you render something that does image storing, you cannot render something that uses the stored data until an appropriate barrier has been sent (either explicitly in the shader or explicitly in OpenGL code). This could be an image load operation. But it could be rendering from a buffer object written by shader code. It could be a texture fetch. It could be doing blending to an image attached to the FBO. It doesn't matter; you can't do it.
Note that this applies for all operations that deal with image load/store/atomic, not just shader operations. So if you use image store to write to an image, you won't necessarily read the right data unless you use a GL_TEXTURE_UPDATE_BARRIER_BIT​ barrier.

Organizing GLSL shaders in OpenGL engine

Which is better ?
To have one shader program with a lot of uniforms specifying
lights to use, or mappings to do (e.g. I need one mesh to be parallax mapped, and another one parallax/specular mapped). I'd make a cached list of uniforms for lazy transfers, and just change a couple of uniforms for every next mesh if it needs to do so.
To have a lot of shader programs for every needed case, each one with small amount of uniforms, and do the lazy bind with glUseProgram for every mesh if it needs to do so. Here I assume that meshes are properly batched, to avoid redundant switches.
Most modern engines I know have a "shader cache" and use the second option, because apparently it's faster.
Also you can take a look at the ARB_shader_subroutine which allows dynamic linkage. But I think it's only available on DX11 class hardware.
Generally, option 2 will be faster/better unless you have a truly huge number of programs. You can also use buffer objects shared across programs so that you need not reset any values when you change programs.
In addition, once you link a program, you can free all of the shaders that you linked into the program. This will free up all the source code and any pre-link info the driver is keeping around, leaving just the fully-linked program in memory.
I would tend to believe that it depends on the specific application. And yes since it would be more efficient to say run 100 programs where they each may have about 2-16 uniforms each; it may be better to have a trade off of the two. I would tend to think that having say maybe 10 - 20 programs for your most common shading techniques would be sufficient or a few more. For example you might want to have one program / shader to do all your bump mapping, one to do all of your fog effects, one to do reflections, one to do refractions.
Now outside the scope of your question I think it would pertain here as well, one thing to incorporate into your engine would be a BatchProcess & BatchManager class setup to reduce the amount of CPU - GPU calls over the bus as this would prove efficient as well. I don't think there is a 1 fits all solution to your question as I would believe that it would be application specific just as setting up the relationship between how many batches (buckets) of vertices (primitives) your engine would have and how many vertices each of those batches would contain.
To try to make this a bit more clear: one game might have 4 containers or batches where each batch can hold up to 10,000 vertices to be considered to be full before the BatchManager decides to empty that bucket sending all of those vertices to the Graphics Card for the Rendering pipeline to be processed and drawn where a different game may have 10 buckets with 5,000 vertices, or another game might have 8 buckets with 12,0000 vertices.
So there could be a trade off of trying to combine the two according to your needs. If you have 1 single program with 100s of uniforms; the single program is easier to manage within the pipeline, but the shaders would be over cumbersome to read and manage. Then again have shaders with very few uniforms is quite easy to read and manage but having 100s of programs is a little harder to manage on the CPU before linking and sending them to be rendered properly. I would personally try to find a middle ground to where I have enough programs to do each specific task that is completely unique from each other such as doing fog density on one and a volumetric shadow mapping on another where each program has just enough uniforms to do the calculations required.
The next step would then be to do some bench mark testing to see where you efficiency and your overhead are balanced to make the appropriate adjustments.