Does OpenGL check whether the program I want to bind is already bound? Or do I have to do this myself?
I want to switch shaders depending on whether the object has a normal map.
Binding a different GLSL program every time you draw an object would definitely be inefficient. FBOs and GLSL programs have some of the highest validation cost of all object types. Any smart implementation is going to know when you bind the same program and avoid any of that extra work, but state caching to avoid redundant binds is still useful.
However, real performance gains are possible if you sort all draws in such a way that opaque objects without normal maps are all drawn together and then opaque objects with them are drawn together. Opaque geometry does not have a strict order dependence, so you can minimize shader changes doing something like that. That is what you should be aiming at, rather than trying to minimize redundant binds (which the driver probably already does).
Related
I'm designing an OO program (in C++) which deals with moderately-simple graphics implemented with OpenGL. I've written a couple of vertex shader programs for my Drawable objects to use. In addition, I've encapsulated shader management (compilation, linking and usage) in a Shader class.
My question is, since all my classes will be rendering visual objects using this small set of shaders and they all have a pointer to a Shader object, does it make sense to provide a common reference for all of them to use, so that it would avoid having "the same" shader code compiled more than once?
If so, why? Is it really important to prevent duplication of shader code? (My program will likely have thousands of independent visual elements that need to be rendered together). I'm new to OpenGL and performance/efficiency issues are still very obscure to me...
EDIT: Moreover, I wonder what will then happen with my shader uniforms; will they be shared as well? How's that supposed to allow me to, e.g. rotate my elements at a different rate? Is it better to write element-uniforms (i.e. the model matrix) every time I want to draw each element, than to have replicated shader code?
I would wager that in most if not all OpenGL implementations, compiling and linking the same shader multiple times would result in multiple copies of the shader binaries and space for uniforms, etc. Calling glUseProgram to switch between your duplicate copies will still cause a state change, despite the same code being run on your GPU cores before and after the call. With a sufficiently complex enough scene, you'll probably be switching textures as well so there will be a state change anyways.
It may not be your bottleneck, but it certainly is wasteful. A good pattern for static content like shaders and textures is to have one or more Manager classes (AssetManager, TextureManager, etc.) that will lazy-load (or pre-load) all of your assets and give you a shared pointer (or some other memory-management strategy) when you ask for it, typically by some string ID.
About the edit:
Yes, your uniforms will be shared and will also remain loaded after you unbind. This is the preferred way to do it because updating a uniform is more than an order of magnitude faster than binding a new shader. You would just set the model matrix uniforms for every new object but keep the same shader.
Uniforms are stored with the shader, so switching shaders means loading in all of the uniforms anyways.
I'm designing the sorting part of my rendering engine. I know that changing the render target, shader program, texture bindings, and more are expensive and therefore one should sort the draw order based on them to reduce state changes. However, what about sorting based on what index buffer is bound, and which vertex buffers are used for attributes?
I'm confused about these because VAOs are mandatory and they encapsulate all of that state. So should I peek behind the scenes of vertex array objects (VAOs), see what state they set and sort based on it? Or should I just not care in what order VAOs are called?
This is what confuses me so much about vertex array objects. It makes sense to me to not be switching which buffers are in use over and over and yet VAOs just seem to force one to not care about that.
Is there a general vague or not agreed on order on which to sort stuff for rendering/game engines?
I know that binding a buffer simply changes some global state but surely it must be beneficial to the hardware to draw from the same buffer multiple times, maybe some small cache coherency?
While VAOs are mandated in GL 3.1 without GL_ARB_compatibility or core 3.2+, you do not have to use them the way they are intended... that is to say, you can bind a single VAO for the duration of your application's lifetime and continue to bind and unbind VBOs, etc. the traditional way if this somehow makes your life easier. Valve is famous for advocating doing this in their presentation on porting the Source engine from D3D to GL... I tend to disagree with them on some points though. A lot of things that they mention in their presentation make me cringe as someone who has years of experience with both D3D and OpenGL; they are making suggestions on how to port something to an API they have a minimal working knowledge of.
Getting back to your performance concern though, there can be validation overhead for changing bound resources frequently, so it is actually more than just "simply changing a global state." All GL commands have to do validation in order to determine if they need to set an error state. They will validate your input parameters (which is pretty trivial), as well as the state of any resource the command needs to use (this can be complicated).
Other types of GL objects like FBOs, textures and GLSL programs have more rigorous validation and more complicated memory dependencies than buffer objects and vertex arrays do. Swapping a vertex pointer should be cheaper in the grand scheme of things than most other kinds of object bindings, especially since a lot of stuff can be deferred by an implementation until you actually issue a glDrawElements (...) command.
Nevertheless, the best way to tackle this problem is just to increase reuse of vertex buffers. Object reuse is pretty high to begin with for vertex buffers, if you have 200 instances of the same opaque model in a scene you can potentially draw all 200 of them back-to-back and never have to change a vertex pointer. Materials tend to change far more frequently than actual vertex buffers, and so you would generally sort your drawing first and foremost by material (sub-sorted by associated states like opaque/translucent, texture(s), shader(s), etc.). You can add another level to batch sorting to draw all batches that share the same vertex data after they have been sorted by material. The ultimate goal is usually to minimize the number of draw commands necessary to complete your frame, and using priority/hierarchy-based sorting with emphasis on material often delivers the best results.
Furthermore, if you can fit multiple LODs of your model into a single vertex buffer, instead of swapping between different vertex buffers sometimes you can just draw different sets of indices or even just a different range of indices from a single index buffer. In a very similar way, texture swapping pressure can be alleviated by using packed texture atlases / sprite sheets instead of a single texture object for each texture.
You can definitely squeeze out some performance by reducing the number of changes to vertex array state, but the takeaway message here is that vertex array state is pretty cheap compared to a lot of other states that change frequently. If you can quickly implement a secondary sort to reduce vertex state changes then go for it, but I would not invest a lot of time in anything more sophisticated unless you know it is a bottleneck. Prioritize texture, shader and framebuffer state first as a general rule.
Lets consider such situation.
The scene contains given objects: ABCDE
Where order from camera (from nearest to farthest)
AEBDC
And objects AC use shader1,ED shader 2,B shader3
Objects AC use shame shader but different texture.
Now what to deal with such situation?
Render everything from front to back (5 swaps)
Render by shader group which are sorted(3 shaders swaps).
Marge all shader programs to one(1 swap).
Does instructions like glUniform,glBindTexture etc. to change value in already in use program cause overhead?
There is no one answer to this question. Does changing OpenGL state "cause overhead"? Of course they do; nothing is free. The question is whether the overhead caused by state change will be worse than the less effective depth test support.
That cannot be answered, because the answer depends on how much overdraw there is, how costly your fragment shaders are, how many state changes a particular sequence of draw calls will require, and numerous other intangibles that cannot be known beforehand.
That's why profiling before optimization is important.
Profile, profile and even more profile :)
I would like to add one thing though:
In your situation you can use idea of a rendering queue. It is some sort of manager for drawing objects. Instead of drawing an object you call renderQueue.add(myObject). Then, when you add() all the needed objects you can call renderQueue.renderAll(). This method can handle all the sorting (by the distance, by the shader, by material, etc) and that way it can be more useful when profiling (and then changing the way you render).
Of course this is only a rough idea.
I've worked on a variety of demo projects with OpenGL and C++, but they've all involved simply rendering a single cube (or similarly simple mesh) with some interesting effects. For a simple scene like this, the vertex data for the cube could be stored in an inelegant global array. I'm now looking into rendering more complex scenes, with multiple objects of different types.
I think it makes sense to have different classes for different types of objects (Rock, Tree, Character, etc), but I'm wondering how to cleanly break up the data and rendering functionality for objects in the scene. Each class will store its own array of vertex positions, texture coordinates, normals, etc. However I'm not sure where to put the OpenGL calls. I'm thinking that I will have a loop (in a World or Scene class) that iterates over all the objects in the scene and renders them.
Should rendering them involve calling a render method in each object (Rock::render(), Tree::render(),...) or a single render method that takes in an object as a parameter (render(Rock), render(Tree),...)? The latter seems cleaner, since I won't have duplicate code in each class (although that could be mitigated by inheriting from a single RenderableObject class), and it allows the render() method to be easily replaced if I want to later port to DirectX. On the other hand, I'm not sure if I can keep them separate, since I might need OpenGL specific types stored in the objects anyway (vertex buffers, for example). In addition, it seems a bit cumbersome to have the render functionality separate from the object, since it will have to call lots of Get() methods to get the data from the objects. Finally, I'm not sure how this system would handle objects that have to be drawn in different ways (different shaders, different variables to pass in to the shaders, etc).
Is one of these designs clearly better than the other? In what ways can I improve upon them to keep my code well-organised and efficient?
Firstly, dont even bother with platform independence right now. wait until you have a much better idea of your architecture.
Doing a lot of draw calls/state changes is slow. The way that you do it in an engine is you generally will want to have a renderable class that can draw itself. This renderable will associated to whatever buffers it needs (e.g. vertex buffers) and other information (like vertex format, topology, index buffers etc). Shader input layouts can be associated to vertex formats.
You will want to have some primitive geo classes, but defer anything complex to some type of mesh class which handles indexed tris. For a performant app, you will want to batch up calls (and potentially data) for similar input types in your shading pipeline to minimise unneccesary state changes and pipeline flushes.
Shaders parameters and textures are generally controlled via some material class that is associated to the renderable.
Each renderable in a scene itself is usually a component of a node in a hierarchical scene graph, where each node usually inherits the transform of its ancestors through some mechanism. You will probably want a scene culler that uses a spatial partitioning scheme to do fast visibility determination and avoid draw call overhead for things out of view.
The scripting/behaviour part of most interactive 3d apps is tightly connected or hooked into its scene graph node framework and an event/messaging system.
This all fits together in a high level loop where you update each subsystem based on time and draw the scene at current frame.
Obviously there are tonnes of little details left out but it can become very complex depending on how generalised and performant you want to be and what kind of visual complexity you are aiming for.
Your question of draw(renderable), vs renderable.draw() is more or less irrelevant until you determine how all the parts fit together.
[Update] After working in this space a bit more, some added insight:
Having said that, in commercial engines, its usually more like draw(renderBatch) where each render batch is an aggregation of objects that are homogenous in some meaningful way to the GPU, since iterating over heterogeneous objects (in a "pure" OOP scene graph via polymorphism) and calling obj.draw() one-by-one has horrible cache locality and is generally an inefficient use of GPU resources. It is very useful to take a data-oriented approach to designing how an engine talks to its underlying graphics API(s) in the most efficient way possible, batching up things as much as possible without negatively effecting the code structure/readability.
A practical suggestion is to write a first engine using a naive/"pure" approach to get really familiar with the domain space. Then on a second pass (or probably rewrite), focus on the hardware: things like memory representation, cache locality, pipeline state, bandwidth, batching, and parallelism. Once you really start considering these things, you will realise that most of your initial design goes out the window. Good fun.
I think OpenSceneGraph is kind of an answer. Take a look at it and its implementation. It should provide you with some interesting insights on how to use OpenGL, C++ and OOP.
Here is what I have implemented for a physical simulation and what worked pretty well and was on a good level of abstraction. First I'd separate the functionality into classes such as:
Object - container that holds all the necessary object information
AssetManager - loads the models and textures, owns them (unique_ptr), returns a raw pointer to the resources to the object
Renderer - handles all OpenGL calls etc., allocates the buffers on GPU and returns render handles of the resources to the object (when wanting the renderer to draw the object I call the renderer giving it model render handle, texture handle and model matrix), renderer should aggregate such information o be able to draw them in batches
Physics - calculations that use the object along with it's resources (vertices especially)
Scene - connects all the above, can also hold some scene graph, depends on the nature of the application (can have multiple graphs, BVH for collisions, other representations for draw optimization etc.)
The problem is that GPU is now GPGPU (general purpose gpu) so OpenGL or Vulkan is not only a rendering framework anymore. For example physical calculations are being performed on the GPU. Therefore the renderer might now transform into something like GPUManager and other abstractions above it. Also the most optimal way to draw is in one call. In other words, one big buffer for the whole scene that can be also edited via compute shaders to prevent excessive CPU<->GPU communication.
So I'm rendering my scene in batches, to try and minimize state changes.
Since my shaders usually need multiple textures, I have a few of them bound at the same time to different texture units. Some of these textures are used in multiple batches, so could even stay bound.
Now my question is, is it fine to just re-bind all of the textures that I need during a batch, even though some of them may already be bound? Or should I check which ones are bound, and only bind new ones? How expensive is glBindTexture? I'm using shaders, is it bad to have unused textures bound to texture units that the shader won't sample from, or should I unbind them?
I'm mostly asking "how to make it fast", not "how to".
EDIT: Would it help if I supplied code?
The holy rule of GPU programming is that less you talk to GPU, the more she loves you.
glBindTexture is much more expensive than single condition check on CPU, so for frequent state changes, I think you should see if there is a need to transfer new texture index to GPU.
The answer is difficult, because any strong GL implementation should make binding a already bound texture a no-op as an optimization. You should try benchmarking to see if putting a condition makes a difference or not.
The same applies to unused textures in texture units, since the GL implementation does know what texture units are finally used, it should not affect performance (as an optimization) to have unused texture units.
Track which textures you've bound (don't use glGet(GL_TEXTURE_BINDING_2D) ) and only rebind if it changes. Some drivers will do this for you, some won't.
If you know what platform you're targeting in advance, it's pretty easy to do a simple benchmark to see how much difference it makes.
You can also sort your batches by what textures they are using to minimize texture changes.
I'd say the best way is to bind all the textures at once using glBindTextures https://www.khronos.org/opengl/wiki/GLAPI/glBindTextures . Or use bindless textures :)