OpenGL - Most efficient way to update model info (e.g. model matrix) - c++

So lets say I have abstracted OpenGL's general buffer object workflow into a Model class. All I need to do for a 3D model to appear in the OpenGL context is to initialise a Model object, add it to a container, and draw all Models in the container to in the render loop.
Lets say I have 1000 Models in my scene. The way in which I set a model's global coordinates now becomes very important.
I know of a couple of ways I can go about updating Model info such as its model matrix. One would be sharing one shader program for every model, and using glUniformMatrix4fv to set the model matrix for the shader just before drawing each Model in the render loop. Another way would be for each Model object to contain its own shader program, and the model matrix for that shader is set upon initialisation of the Model object. Then, in the render loop, glUseProgram is used for each Models shader program just before it is drawn.
What is the most efficient way of going about updating Model info such as its model matrix (as I feel as if my currently known methods are extremely inefficient)?

Since your question is somewhat general, I'll keep the answer general as well.
In general, changing state between OpenGL draw calls is costly so best performance is achieved by minimizing state changes. However, not all state changes are equal. An exhaustive list of which state changes are more costly than others is not really possible because their cost changes between vendors, driver versions, etc. A good understanding of the complete OpenGL pipeline and how computer hardware works can provide an intuition of which code paths are better. It is also really helpful to read Nvidia and AMD presentations (from GDC, Siggraph, etc.) that focus on optimizing graphical engine performances.
For the specific question you asked, it will most likely be much slower to use different shaders each containing their own matrix than sharing a single shader and setting the matrix in a uniform value before each draw call. Changing the active shader requires much more work by the driver to reconfigure the GPU pipeline than simply writing 16 floats to GPU memory. There are also other techniques that would allow you to store all your matrices in a single buffer and issue a single draw call with all your models if they use the same shader.
One way is to have all matrices in an array in a single SSBO or UBO. Then each mesh has an index that indicates which matrix it should use. The index could come from a vertex attribute (like the w component of the vertex position).

Related

OpenGL per-mesh material (shader)

So, I'm working on a simple game-engine with C++ and OpenGL 4. Right now I'm struggling with rendering imported models.
I'm using the FBX sdk to import fbx models using a very naive approach: basically I visit each node of the fbx and append the mesh data to a single big structure that is later used for rendering. However I want to be able to specify a different fragment shader for each material used by the model (for example a different shader for a car rims and lights).
As a reference, UE4 has a material system that allows the user to define a simple shader using a blueprint-like editor.
I would like to apply a similar concept to my engine, allowing to create a material object that specifies a piece of fragment shader code and a set of textures to use.
The problems I'm facing are:
It is clear that I must separate the draw calls for each model part that uses a different material, since I cannot swap program in the middle of a draw call (can I?): at this point, is it better to have a separate vao/vbo/ebo for each part or a single one and keep track of where a part ends and the next one begins? (I guess this is the best option)
Is it a good practice to pre-compile just the shader fragment and attach it to the current program on the fly (i.e. glAttach + glLinkProgram + glUseProgram) or is it better to pre-link an entire program for each material, considering that the vertex shader is always the same?
No, you cannot change the program in the middle of a draw call. There are different opinions and tests on how the GPU will perform based on the layout of your data. My experience is that, if you are not going to modify your meshes data after you upload them the first time, the most efficent way is to have a single VAO, with two VBO: one for indices and one for the rest of the data. When issuing draw calls, you offset the indexes buffer based on the mesh data (which you should keep track of), as well as offseting the configuration of the shader attributes. This approach allows for a more cache-friendly and efficent memory access, as the block of memory will be contigous. However, as I mentioned, there are cases where this wont be the most efficent approach (althought I believe it will be still efficent enough). It depends on your hardware and driver.
Precompile and link all your programs before launching the render loop. Its the most efficent approach
As an extra, I would recommend you to look into the UBER shaders technique. This methodology is based on creating a shader for different possible inputs, and create a set of defines or sub-routines architecture which allows you to compile different versions of the same shader (for instance, you might have a model with a normal texture and you will probably want to apply bump mapping, but other models might not have this texture, so executing the exact same shader will result in undefined behaviour or crash).

Render multiple models in OpenGL with a single draw call

I built a 2D graphical engine, and I created a batching system for it, so, if I have 1000 sprites with the same texture, I can draw them with one single call to openGl.
This is achieved by putting in a single vbo vertex array all the vertices of all the sprites with the same texture.
Instead of "print these vertices, print these vertices, print these vertices", I do "put all the vertices toghether, print", just to be very clear.
Easy enough, but now I'm trying to achieve the same thing in 3D, and I'm having a big problem.
The problem is that I'm using a Model View Projection matrix to place and render my models, which is the common approach to render a model in 3D space.
For each model on screen, I need to pass the MVP matrix to the shader, so that I can use it to transform each vertex to the correct position.
If I would do the transformation outside the shader, it would be executed by the cpu, which I not a good idea, for obvious reasons.
But the problem lies there. I need to pass the matrix to the shader, but for each model the matrix is different.
So I cannot do the same I did with 2d sprites, because changing a shader uniform requires a draw every time.
I hope I've been clear, maybe you have a good idea I didn't have or you already had the same problem. I know for a fact that there is a solution somewhere, because in engine like Unity, you can use the same shader for multiple models, and get away with one draw call
There exists a feature exactly like what you're looking for, and it's called instancing. With instancing, you store n matrices (or whatever else you need) in a Uniform Buffer and call glDrawElementsInstanced to draw n copies. In the shader, you get an extra input gl_InstanceID, with which you index into the Uniform Buffer to fetch the matrix you need for that particular instance.
You can read more about instancing here: https://www.opengl.org/wiki/Vertex_Rendering#Instancing
The answer depends on whether the vertex data for each item is identical or not. If it is, you can use instancing as in #orost's answer, using glDrawElementsInstanced, and gl_InstanceID within the vertex shader, and that method should be preferred.
However, if each 3D model requires different vertex data (which is frequently the case), you can still render them using a single draw call. To do this, you would add another stream into your vertex data with glVertexAttribPointer (and glEnableVertexAttribArray). This extra stream would contain the index of the matrix within the uniform buffer that vertex should use when rendering - so each mesh within the VBO would have an identical index in the extra stream. The uniform buffer contains the same data as in the instancing setup.
Note this method may require some extra CPU processing, if you need to redo the batching - for example, an object within a batch should not be rendered anymore. If this process is required frequently, it should be determined whether batching items is actually beneficial or not.
Besides instancing and adding another vertex attribute as some object ID, I'd like to also mention another strategy (which requires modern OpenGL, though):
The extension ARB_multi_draw_indirect (in core since GL 4.3) adds indirect drawing commands. These commands do source their parameters (number of vertices, starting index and so on) directly from another buffer object. With these functions, many different objects can be drawn with a single draw call.
However, as you still want some per-object state like transformation matrices, that feature is not enough. But in combination with ARB_shader_draw_parameters (not in core GL yet), you get the gl_DrawID parameter, which will be incremented by one for each single object in one mult draw indirect call. That way, you can index into some UBO, or TBO, or SSBO (or whatever) where you store per-object data.

issues abour shaders and transformations in opengl

If I'm not wrong, shaders are programs that run in GPU, right?
Do we send data to this programs using glUniformMatrix*?
I don't know if it's right but if I send a MVP matrix to the shader, the object's vertices that I want to render will use the position calculated by the shader right before calling the render function.
If I want to render a lot of objects and I must send the MVP matrix then render the object right after, so I will have a code that send to GPU -> render a lot of times. However if I'm not wrong again this is not a good practice because I'm losing performance because the cost of send information to GPU is very expensive. So a way to get a better performance is send all the informations to GPU then render all the objects.
And the questions of 1 million dollars is, How can the shader program identify that the MVP matrix is used by a single object and not another one?
If I'm not wrong, shaders are programs that run in GPU, right?
Possibly. Many implementations of OpenGL have software renderers that they can fall back to if resources on the GPU are constrained. But usually, yes, they're run on the GPU.
Do we send data to this programs using glUniformMatrix*?
That's the usual way. You also set things like texture coordinates either via immediate mode methods like glTexCoord*() (in legacy OpenGL), or via buffer objects.
I don't know if it's right but if I send a MVP matrix to the shader, the object's vertices that I want to render will use the position calculated by the shader right before calling the render function.
There are different types of shaders. A vertex shader is called once for each vertex. A fragment shader is called once per fragment (roughly once per output screen-space pixel that actually gets drawn). Generally you will probably want to send the model, view, and projection matrices separately to the vertex shader. (Or possibly in some combination that lifts some computations out of the shader.) Then you'll multiply each vertex by the appropriate matrix (or combo of matrices).
And there are other types of shaders beyond those, but those 2 are the most common.
If I want to render a lot of objects and I must send the MVP matrix then render the object right after, so I will have a code that send to GPU -> render a lot of times. However if I'm not wrong again this is not a good practice because I'm losing performance because the cost of send information to GPU is very expensive. So a way to get a better performance is send all the informations to GPU then render all the objects.
I wouldn't get overly worried about performance until you have shaders working properly. Performance can be dependent on a lot of different factors. One is how often you send or receive data to or from the GPU and how much data you're transferring. Another is how many passes you do for each shader, and another is the size of your textures, geometry, and other stuff.
And the questions of 1 million dollars is, How can the shader program identify that the MVP matrix is used by a single object and not another one?
The way I've done that in the past is to set the current shader program and uniforms via glUseProgram() and glUniform*(), then upload my geometry for an object, and repeat as necessary for each object or set of objects as needed.

Is there a way to use GLSL programs as filters?

Assume that we have different shader programs for different objects in a game. For example the player model has a shader that controls skeleton system (bone matrices multiplication etc.), or a particle has a shader for sparkling effects, wall has parallax mapping etc.
But what if I want to add fog to the game that must affect every one of these objects ? For example I have a room that will have a red fog, should I change EVERY glsl program to have fog code or is there a possible way to make global filters ? Should I change every glsl program when i want to add a feature ?
The typical process for this type of thing is to use a full-screen shader in post processing using the depth buffer from your fully rendered scene, or using a z-pass, which renders only to the depth buffer. You can chain them together and create any number of effects. It typically involves some render-to-texture work, and is not a real trivial task (too much to post code here), but it's not THAT difficult either.
If you want to take a look at a decent post-processing system, take a look at the PostFx system in Torque3D:
https://github.com/GarageGames/Torque3D
And here is an example of creating fog with GLSL in post:
http://isnippets.blogspot.com/2010/10/real-time-fog-using-post-processing-in.html

Is it possible to reuse glsl vertex shader output later?

I have a huge mesh(100k triangles) that needs to be drawn a few times and blend together every frame. Is it possible to reuse the vertex shader output of the first pass of mesh, and skip the vertex stage on later passes? I am hoping to save some cost on the vertex pipeline and rasterization.
Targeted OpenGL 3.0, can use features like transform feedback.
I'll answer your basic question first, then answer your real question.
Yes, you can store the output of vertex transformation for later use. This is called Transform Feedback. It requires OpenGL 3.x-class hardware or better (aka: DX10-hardware).
The way it works is in two stages. First, you have to set your program up to have feedback-based varyings. You do this with glTransformFeedbackVaryings. This must be done before linking the program, in a similar way to things like glBindAttribLocation.
Once that's done, you need to bind buffers (given how you set up your transform feedback varyings) to GL_TRANSFORM_FEEDBACK_BUFFER with glBindBufferRange, thus setting up which buffers the data are written into. Then you start your feedback operation with glBeginTransformFeedback and proceed as normal. You can use a primitive query object to get the number of primitives written (so that you can draw it later with glDrawArrays), or if you have 4.x-class hardware (or AMD 3.x hardware, all of which supports ARB_transform_feedback2), you can render without querying the number of primitives. That would save time.
Now for your actual question: it's probably not going to help buy you any real performance.
You're drawing terrain. And terrain doesn't really get any transformation. Typically you have a matrix multiplication or two, possibly with normals (though if you're rendering for shadow maps, you don't even have that). That's it.
Odds are very good that if you shove 100,000 vertices down the GPU with such a simple shader, you've probably saturated the GPU's ability to render them all. You'll likely bottleneck on primitive assembly/setup, and that's not getting any faster.
So you're probably not going to get much out of this. Feedback is generally used for either generating triangle data for later use (effectively pseudo-compute shaders), or for preserving the results from complex transformations like matrix palette skinning with dual-quaternions and so forth. A simple matrix multiply-and-go will barely be a blip on the radar.
You can try it if you like. But odds are you won't have any problems. Generally, the best solution is to employ some form of deferred rendering, so that you only have to render an object once + X for every shadow it casts (where X is determined by the shadow mapping algorithm). And since shadow maps require different transforms, you wouldn't gain anything from feedback anyway.