OO architecture for simple shader-based GL program - c++

I'm designing an OO program (in C++) which deals with moderately-simple graphics implemented with OpenGL. I've written a couple of vertex shader programs for my Drawable objects to use. In addition, I've encapsulated shader management (compilation, linking and usage) in a Shader class.
My question is, since all my classes will be rendering visual objects using this small set of shaders and they all have a pointer to a Shader object, does it make sense to provide a common reference for all of them to use, so that it would avoid having "the same" shader code compiled more than once?
If so, why? Is it really important to prevent duplication of shader code? (My program will likely have thousands of independent visual elements that need to be rendered together). I'm new to OpenGL and performance/efficiency issues are still very obscure to me...
EDIT: Moreover, I wonder what will then happen with my shader uniforms; will they be shared as well? How's that supposed to allow me to, e.g. rotate my elements at a different rate? Is it better to write element-uniforms (i.e. the model matrix) every time I want to draw each element, than to have replicated shader code?

I would wager that in most if not all OpenGL implementations, compiling and linking the same shader multiple times would result in multiple copies of the shader binaries and space for uniforms, etc. Calling glUseProgram to switch between your duplicate copies will still cause a state change, despite the same code being run on your GPU cores before and after the call. With a sufficiently complex enough scene, you'll probably be switching textures as well so there will be a state change anyways.
It may not be your bottleneck, but it certainly is wasteful. A good pattern for static content like shaders and textures is to have one or more Manager classes (AssetManager, TextureManager, etc.) that will lazy-load (or pre-load) all of your assets and give you a shared pointer (or some other memory-management strategy) when you ask for it, typically by some string ID.
About the edit:
Yes, your uniforms will be shared and will also remain loaded after you unbind. This is the preferred way to do it because updating a uniform is more than an order of magnitude faster than binding a new shader. You would just set the model matrix uniforms for every new object but keep the same shader.
Uniforms are stored with the shader, so switching shaders means loading in all of the uniforms anyways.

Related

OpenGL - switch shaders

I'm trying to get my head around the following:
I've an object that I want to be able to render with two different sets of vertex/fragment shaders, that each have their uniforms and textures, and go back and forth between those two settings. (I know in this case I could have just one shader with a uniform dictating which logic to run, but this is part of a larger things where I can't do that)
Should I use one or two gl programs (created by glCreateProgram())?
If I use two programs, is it fine to discard the one not being used, and rebuild it if need later on? Or is it too slow?
If I use just one program:
can I compile shaders just once at the beginning?
when switching should I detach old shaders, attach new ones and link the program again?
should I recompute all uniforms locations after linking?
should I rebind array buffers after linking?
should I do something to remove previously attached textures?
Should I use one or two gl programs (created by glCreateProgram())?
It depends on the case. A general rule could be avoid branching inside shader code. So if you have 2 different shaders, for 2 different effects, just compile 2 programs and bind the desired one.
If I use two programs, is it fine to discard the one not being used,
and rebuild it if need later on? Or is it too slow?
This is generally not needed and wrong (maybe unless you have huge memory issues). Shader compilation is a slow process.
It's more common to compile all the necessary resources once, at the application startup or the first time needed, and leave them allocated and ready to use.
can I compile shaders just once at the beginning?
Yes.
For all the remaining questions: I think you are taking the wrong approach.
I'd say:
Compile all shader variants at the beginning (or when you need a program)
Bind a shader and the relative resources, set uniforms before drawing with that shader.
When change shader, re-bind the resources or update uniform values only if different and needed
Release resources and programs at the end of the application

OpenGL Bind same program again

Does OpenGL check whether the program I want to bind is already bound? Or do I have to do this myself?
I want to switch shaders depending on whether the object has a normal map.
Binding a different GLSL program every time you draw an object would definitely be inefficient. FBOs and GLSL programs have some of the highest validation cost of all object types. Any smart implementation is going to know when you bind the same program and avoid any of that extra work, but state caching to avoid redundant binds is still useful.
However, real performance gains are possible if you sort all draws in such a way that opaque objects without normal maps are all drawn together and then opaque objects with them are drawn together. Opaque geometry does not have a strict order dependence, so you can minimize shader changes doing something like that. That is what you should be aiming at, rather than trying to minimize redundant binds (which the driver probably already does).

Should I sort by buffer use when rendering?

I'm designing the sorting part of my rendering engine. I know that changing the render target, shader program, texture bindings, and more are expensive and therefore one should sort the draw order based on them to reduce state changes. However, what about sorting based on what index buffer is bound, and which vertex buffers are used for attributes?
I'm confused about these because VAOs are mandatory and they encapsulate all of that state. So should I peek behind the scenes of vertex array objects (VAOs), see what state they set and sort based on it? Or should I just not care in what order VAOs are called?
This is what confuses me so much about vertex array objects. It makes sense to me to not be switching which buffers are in use over and over and yet VAOs just seem to force one to not care about that.
Is there a general vague or not agreed on order on which to sort stuff for rendering/game engines?
I know that binding a buffer simply changes some global state but surely it must be beneficial to the hardware to draw from the same buffer multiple times, maybe some small cache coherency?
While VAOs are mandated in GL 3.1 without GL_ARB_compatibility or core 3.2+, you do not have to use them the way they are intended... that is to say, you can bind a single VAO for the duration of your application's lifetime and continue to bind and unbind VBOs, etc. the traditional way if this somehow makes your life easier. Valve is famous for advocating doing this in their presentation on porting the Source engine from D3D to GL... I tend to disagree with them on some points though. A lot of things that they mention in their presentation make me cringe as someone who has years of experience with both D3D and OpenGL; they are making suggestions on how to port something to an API they have a minimal working knowledge of.
Getting back to your performance concern though, there can be validation overhead for changing bound resources frequently, so it is actually more than just "simply changing a global state." All GL commands have to do validation in order to determine if they need to set an error state. They will validate your input parameters (which is pretty trivial), as well as the state of any resource the command needs to use (this can be complicated).
Other types of GL objects like FBOs, textures and GLSL programs have more rigorous validation and more complicated memory dependencies than buffer objects and vertex arrays do. Swapping a vertex pointer should be cheaper in the grand scheme of things than most other kinds of object bindings, especially since a lot of stuff can be deferred by an implementation until you actually issue a glDrawElements (...) command.
Nevertheless, the best way to tackle this problem is just to increase reuse of vertex buffers. Object reuse is pretty high to begin with for vertex buffers, if you have 200 instances of the same opaque model in a scene you can potentially draw all 200 of them back-to-back and never have to change a vertex pointer. Materials tend to change far more frequently than actual vertex buffers, and so you would generally sort your drawing first and foremost by material (sub-sorted by associated states like opaque/translucent, texture(s), shader(s), etc.). You can add another level to batch sorting to draw all batches that share the same vertex data after they have been sorted by material. The ultimate goal is usually to minimize the number of draw commands necessary to complete your frame, and using priority/hierarchy-based sorting with emphasis on material often delivers the best results.
Furthermore, if you can fit multiple LODs of your model into a single vertex buffer, instead of swapping between different vertex buffers sometimes you can just draw different sets of indices or even just a different range of indices from a single index buffer. In a very similar way, texture swapping pressure can be alleviated by using packed texture atlases / sprite sheets instead of a single texture object for each texture.
You can definitely squeeze out some performance by reducing the number of changes to vertex array state, but the takeaway message here is that vertex array state is pretty cheap compared to a lot of other states that change frequently. If you can quickly implement a secondary sort to reduce vertex state changes then go for it, but I would not invest a lot of time in anything more sophisticated unless you know it is a bottleneck. Prioritize texture, shader and framebuffer state first as a general rule.

Front to back rendering vs shaders swapping

Lets consider such situation.
The scene contains given objects: ABCDE
Where order from camera (from nearest to farthest)
AEBDC
And objects AC use shader1,ED shader 2,B shader3
Objects AC use shame shader but different texture.
Now what to deal with such situation?
Render everything from front to back (5 swaps)
Render by shader group which are sorted(3 shaders swaps).
Marge all shader programs to one(1 swap).
Does instructions like glUniform,glBindTexture etc. to change value in already in use program cause overhead?
There is no one answer to this question. Does changing OpenGL state "cause overhead"? Of course they do; nothing is free. The question is whether the overhead caused by state change will be worse than the less effective depth test support.
That cannot be answered, because the answer depends on how much overdraw there is, how costly your fragment shaders are, how many state changes a particular sequence of draw calls will require, and numerous other intangibles that cannot be known beforehand.
That's why profiling before optimization is important.
Profile, profile and even more profile :)
I would like to add one thing though:
In your situation you can use idea of a rendering queue. It is some sort of manager for drawing objects. Instead of drawing an object you call renderQueue.add(myObject). Then, when you add() all the needed objects you can call renderQueue.renderAll(). This method can handle all the sorting (by the distance, by the shader, by material, etc) and that way it can be more useful when profiling (and then changing the way you render).
Of course this is only a rough idea.

How exactly is GLSL's "coherent" memory qualifier interpreted by GPU drivers for multi-pass rendering?

The GLSL specification states, for the "coherent" memory qualifier: "memory variable where reads and writes are coherent with reads and writes from other shader invocations".
In practice, I'm unsure how this is interpreted by modern-day GPU drivers with regards to multiple rendering passes. When the GLSL spec states "other shader invocations", does that refer to shader invocations running only during the current pass, or any possible shader invocations in past or future passes? For my purposes, I define a pass as a "glBindFramebuffer-glViewPort-glUseProgram-glDrawABC-glDrawXYZ-glDraw123" cycle; where I'm currently executing 2 such passes per "render loop iteration" but may have more per iteration later on.
When the GLSL spec states "other shader invocations", does that refer to shader invocations running only during the current pass, or any possible shader invocations in past or future passes?
It means exactly what it says: "other shader invocations". It could be the same program code. It could be different code. It doesn't matter: shader invocations that aren't this one.
Normally, OpenGL handles synchronization for you, because OpenGL can track this fairly easily. If you map a range of a buffer object, modify it, and unmap it, OpenGL knows how much stuff you've (potentially) changed. If you use glTexSubImage2D, it knows how much stuff you changed. If you do transform feedback, it can know exactly how much data was written to the buffer.
If you do transform feedback into a buffer, then bind it as a source for vertex data, OpenGL knows that this will stall the pipeline. That it must wait until the transform feedback has completed, and then clear some caches in order to use the buffer for vertex data.
When you're dealing with image load/store, you lose all of this. Because so much could be written in a completely random, unknown, and unknowable fashion, OpenGL generally plays really loose with the rules in order to allow you flexibility to get the maximum possible performance. This triggers a lot of undefined behavior.
In general, the only rules you can follow are those outlined in section 2.11.13 of the OpenGL 4.2 specification. The biggest one (for shader-to-shader talk) is the rule on stages. If you're in a fragment shader, it is safe to assume that the vertex shader(s) that specifically computed the point/line/triangle for your triangle have completed. Therefore, you can freely load values that were stored by them. But only from the ones that made you.
Your shaders cannot make assumptions that shaders executed in previous rendering commands have completed (I know that sounds odd, given what was just said, but remember: "only from the ones that made you"). Your shaders cannot make assumptions that other invocations of the same shader, using the same uniforms, images, textures, etc, in the same rendering command have completed, except where the above applies.
The only thing you can assume is that writes your shader instance itself made are visible... to itself. So if you do an imageStore and do an imageLoad to the same memory location through the same variable, then you are guaranteed to get the same value back.
Well, unless someone else wrote to it in the meantime.
Your shaders cannot assume that a later rendering command will certainly fetch values written (via image store or atomic updates) by a previous one. No matter how much later! It doesn't matter what you've bound to the context. It doesn't matter what you've uploaded or downloaded (technically. Odds are you'll get correct behavior in some cases, but undefined behavior is still undefined).
If you need that guarantee, if you need to issue a rendering command that will fetch values written by image store/atomic updates, you must explicitly ask synchronize memory sometime after issuing the writing call and before issuing the reading call. This is done with glMemoryBarrier.
Therefore, if you render something that does image storing, you cannot render something that uses the stored data until an appropriate barrier has been sent (either explicitly in the shader or explicitly in OpenGL code). This could be an image load operation. But it could be rendering from a buffer object written by shader code. It could be a texture fetch. It could be doing blending to an image attached to the FBO. It doesn't matter; you can't do it.
Note that this applies for all operations that deal with image load/store/atomic, not just shader operations. So if you use image store to write to an image, you won't necessarily read the right data unless you use a GL_TEXTURE_UPDATE_BARRIER_BIT​ barrier.