Correct usage / purpose of OpenGL Program Pipeline Objects - opengl

With OpenGL 4.1 and ARB_separate_shader_objects, we are able to store different stages of the shading pipeline in shader programs. As we know, to use these, we need attach them to a Program Pipeline Object, which is then bound.
My question is, why do we need the program pipeline objects at all? In my renderer, I have only one of these, and I change it's attachments to change shaders. I can't think of any case where you'd actually want more than one of these. If you store many pipeline objects, each containing different combinations of shader programs, then things end up even messier than not using separate shaders at all.
So, what is the purpose of the pipeline object? Is changing attachments (much) more expensive than binding a different pipeline object? What's the reason that the spec has this, rather than, say, having glUseProgramStages operate in the same way as glUseProgram?

The principle reason pipeline objects exist is that linking stages together in a program object did have certain advantages. For example, there are a lot of inter-shader-stage validation rules. And if the sequence of separate programs aren't valid, then people need to know.
With a program that links all stages together, you can detect these validation failures at link time. All of these tests are done precisely once and no more.
If you made "glUseProgramStages operate in the same way as glUseProgram", then every single time you render with a new set of shaders, the system will have to do inter-stage validation tests. Pipelines represent a convenient way to cache such tests. If you set their programs once and never modify them afterwards, then the result of validation for a pipeline will never change. Thus validation happens exactly once, just as it did for multi-stage programs.
Another issue is that implementations may need to do some minor shader fixup work when associating certain programs with each other. Pipeline objects represent a convenient place to cache such fix-up work. Without them, they'd have to be done every single time you change shaders.

Why do we need the program pipeline objects?
We don't need program pipeline objects they are purely optional. Using one Program Object for every shader combination that is in use is the easiest and most common way to do it.
So, what is the purpose of the pipeline object?
From https://www.opengl.org/registry/specs/ARB/separate_shader_objects.txt:
[...] Many developers build their shader content around the
mix-and-match approach where they can use a single vertex shader with
multiple fragment shaders (or vice versa). This extension adopts a "mix-and-match" shader stage model for GLSL
allowing multiple different GLSL program objects to be bound at once
each to an individual rendering pipeline stage independently of
other stage bindings. This allows program objects to contain only
the shader stages that best suit the applications needs. [...]

Related

When does it make sense to turn off the rasterization step?

In vulkan there is a struct which is required for pipeline creation, named VkPipelineRasterizationStateCreateInfo. In this struct there is a member named rasterizerDiscardEnable. If this member is set to VK_TRUE then all primitives are discarded before the rasterization step. This disables any output to the framebuffer.
I cannot think of a scenario where this might make any sense. In which cases could it be useful?
It would be for any case where you're executing the rendering pipeline solely for the side effects of the vertex processing stage(s). For example, you could use a GS to feed data into a buffer, which you later render from.
Now in many cases you could use a compute shader to do something similar. But you can't use a CS to efficiently implement tessellation; that's best done by the hardware tessellator. So if you want to capture data generated by tessellation (presumably because you'll be rendering with it multiple times), you have to use a rendering process.
A useful side-effect (though not necessarily the intended use case) of this parameter is for benchmarking / determining the bottle-neck of your Vulkan application: If discarding all primitives before the rasterization stage (and thus before any fragment shaders are ever executed) does not improve your frame-rate then you can rule out that your application performance is fragment stage-bound.

OO architecture for simple shader-based GL program

I'm designing an OO program (in C++) which deals with moderately-simple graphics implemented with OpenGL. I've written a couple of vertex shader programs for my Drawable objects to use. In addition, I've encapsulated shader management (compilation, linking and usage) in a Shader class.
My question is, since all my classes will be rendering visual objects using this small set of shaders and they all have a pointer to a Shader object, does it make sense to provide a common reference for all of them to use, so that it would avoid having "the same" shader code compiled more than once?
If so, why? Is it really important to prevent duplication of shader code? (My program will likely have thousands of independent visual elements that need to be rendered together). I'm new to OpenGL and performance/efficiency issues are still very obscure to me...
EDIT: Moreover, I wonder what will then happen with my shader uniforms; will they be shared as well? How's that supposed to allow me to, e.g. rotate my elements at a different rate? Is it better to write element-uniforms (i.e. the model matrix) every time I want to draw each element, than to have replicated shader code?
I would wager that in most if not all OpenGL implementations, compiling and linking the same shader multiple times would result in multiple copies of the shader binaries and space for uniforms, etc. Calling glUseProgram to switch between your duplicate copies will still cause a state change, despite the same code being run on your GPU cores before and after the call. With a sufficiently complex enough scene, you'll probably be switching textures as well so there will be a state change anyways.
It may not be your bottleneck, but it certainly is wasteful. A good pattern for static content like shaders and textures is to have one or more Manager classes (AssetManager, TextureManager, etc.) that will lazy-load (or pre-load) all of your assets and give you a shared pointer (or some other memory-management strategy) when you ask for it, typically by some string ID.
About the edit:
Yes, your uniforms will be shared and will also remain loaded after you unbind. This is the preferred way to do it because updating a uniform is more than an order of magnitude faster than binding a new shader. You would just set the model matrix uniforms for every new object but keep the same shader.
Uniforms are stored with the shader, so switching shaders means loading in all of the uniforms anyways.

OpenGL - switch shaders

I'm trying to get my head around the following:
I've an object that I want to be able to render with two different sets of vertex/fragment shaders, that each have their uniforms and textures, and go back and forth between those two settings. (I know in this case I could have just one shader with a uniform dictating which logic to run, but this is part of a larger things where I can't do that)
Should I use one or two gl programs (created by glCreateProgram())?
If I use two programs, is it fine to discard the one not being used, and rebuild it if need later on? Or is it too slow?
If I use just one program:
can I compile shaders just once at the beginning?
when switching should I detach old shaders, attach new ones and link the program again?
should I recompute all uniforms locations after linking?
should I rebind array buffers after linking?
should I do something to remove previously attached textures?
Should I use one or two gl programs (created by glCreateProgram())?
It depends on the case. A general rule could be avoid branching inside shader code. So if you have 2 different shaders, for 2 different effects, just compile 2 programs and bind the desired one.
If I use two programs, is it fine to discard the one not being used,
and rebuild it if need later on? Or is it too slow?
This is generally not needed and wrong (maybe unless you have huge memory issues). Shader compilation is a slow process.
It's more common to compile all the necessary resources once, at the application startup or the first time needed, and leave them allocated and ready to use.
can I compile shaders just once at the beginning?
Yes.
For all the remaining questions: I think you are taking the wrong approach.
I'd say:
Compile all shader variants at the beginning (or when you need a program)
Bind a shader and the relative resources, set uniforms before drawing with that shader.
When change shader, re-bind the resources or update uniform values only if different and needed
Release resources and programs at the end of the application

Will updating a uniform value stall the whole rendering pipeline?

The glBufferSubData manpage's notes section contains the following paragraph:
Consider using multiple buffer objects to avoid stalling the rendering pipeline during data store updates. If any rendering in the pipeline makes reference to data in the buffer object being updated by glBufferSubData, especially from the specific region being updated, that rendering must drain from the pipeline before the data store can be updated.
While the glUniform* manpage doesn't mention the pipeline at all.
However, I would have thought that uniforms are just as important as buffers, given that they're supposed to be uniform across all shader invocations.
So, if I perform a draw call, change a uniform value and then perform another draw call on the same shader, will both draw calls run concurrently with different uniform values, or will the second draw call have to wait until every stage (vert/geom/frag) is complete on the first one?
The question in its general form is pretty much unanswerable. However consider this:
Since the advent of GLSL, and ARB's assembly language before that, uniform/parameter state has always been stored in the shader object. Only since uniform blocks and buffer objects has it been possible to separate uniform state from programs. So until that point, a good 5+ years, the only way to change a uniform was to change it in the program.
This means that pretty much every program that uses GLSL uses it in the standard way: bind a program, change uniforms, render, change uniforms, render, etc.
Now, imagine if doing this simple and obvious thing which hundreds of OpenGL programs did induced a full pipeline stall.
Driver developers are not stupid; even Intel's driver developers aren't that stupid. Whatever their hardware looks like, they can find a way to make uniform changes not induce a pipeline stall.

Organizing GLSL shaders in OpenGL engine

Which is better ?
To have one shader program with a lot of uniforms specifying
lights to use, or mappings to do (e.g. I need one mesh to be parallax mapped, and another one parallax/specular mapped). I'd make a cached list of uniforms for lazy transfers, and just change a couple of uniforms for every next mesh if it needs to do so.
To have a lot of shader programs for every needed case, each one with small amount of uniforms, and do the lazy bind with glUseProgram for every mesh if it needs to do so. Here I assume that meshes are properly batched, to avoid redundant switches.
Most modern engines I know have a "shader cache" and use the second option, because apparently it's faster.
Also you can take a look at the ARB_shader_subroutine which allows dynamic linkage. But I think it's only available on DX11 class hardware.
Generally, option 2 will be faster/better unless you have a truly huge number of programs. You can also use buffer objects shared across programs so that you need not reset any values when you change programs.
In addition, once you link a program, you can free all of the shaders that you linked into the program. This will free up all the source code and any pre-link info the driver is keeping around, leaving just the fully-linked program in memory.
I would tend to believe that it depends on the specific application. And yes since it would be more efficient to say run 100 programs where they each may have about 2-16 uniforms each; it may be better to have a trade off of the two. I would tend to think that having say maybe 10 - 20 programs for your most common shading techniques would be sufficient or a few more. For example you might want to have one program / shader to do all your bump mapping, one to do all of your fog effects, one to do reflections, one to do refractions.
Now outside the scope of your question I think it would pertain here as well, one thing to incorporate into your engine would be a BatchProcess & BatchManager class setup to reduce the amount of CPU - GPU calls over the bus as this would prove efficient as well. I don't think there is a 1 fits all solution to your question as I would believe that it would be application specific just as setting up the relationship between how many batches (buckets) of vertices (primitives) your engine would have and how many vertices each of those batches would contain.
To try to make this a bit more clear: one game might have 4 containers or batches where each batch can hold up to 10,000 vertices to be considered to be full before the BatchManager decides to empty that bucket sending all of those vertices to the Graphics Card for the Rendering pipeline to be processed and drawn where a different game may have 10 buckets with 5,000 vertices, or another game might have 8 buckets with 12,0000 vertices.
So there could be a trade off of trying to combine the two according to your needs. If you have 1 single program with 100s of uniforms; the single program is easier to manage within the pipeline, but the shaders would be over cumbersome to read and manage. Then again have shaders with very few uniforms is quite easy to read and manage but having 100s of programs is a little harder to manage on the CPU before linking and sending them to be rendered properly. I would personally try to find a middle ground to where I have enough programs to do each specific task that is completely unique from each other such as doing fog density on one and a volumetric shadow mapping on another where each program has just enough uniforms to do the calculations required.
The next step would then be to do some bench mark testing to see where you efficiency and your overhead are balanced to make the appropriate adjustments.