Batching arbitrary vertex data in OpenGL batch renderer - c++

I'm making a 2D batch renderer in OpenGL inspired by the XNA/MonoGame interface, but I've run into a small design problem and I'm looking for some input. Currently, you can submit vertex data in four general ways:
void Render(const Sprite& sprite);
void Render(const Shape& shape);
void Render(const Vertex* vertices, unsigned int length);
void Render(const Vertex* vertices, unsigned int length, const Texture* texture);
A sprite contains four vertices, color and texture coordinates while the other three can contain an arbitrary number (the sprite and shape have unique transformations). Everything can be textured or untextured. I want to batch everything to reduce the number of state changes and OpenGL draw calls. I feel it reasonable to assume that most submissions will have shared vertices so that I can use glDrawElements instead of glDrawArrays, but I have trouble figuring out how to batch things properly given what I described above.
The XNA/MonoGame sprite batchers work because they work solely with textured quads/triangles and not arbitrary shapes. Alternatively, I could do like the SFML renderer and issue a draw call for each drawable object, but that defeats the purpose of batch rendering.
I feel like my renderer is trying to "do everything" which is something I want to avoid since it usually quickly becomes too complex in my experience.
What I'm asking is: How could I redesign my renderer? Could I keep separate batch lists for different submissions? Could I modularize my renderer somehow? Should I just allow only textured objects as done in XNA/MonoGame?

Alright, so we need to minimize the number of state changes and draw calls. I'm assuming you're using modern OpenGL, including Vertex Buffer Objects and shaders.
One approach is to ensure that all vertex data has the same format. For example, each vertex has a position, color and texture coordinate (xyz, rgba, uv). If we interleave our vertex data in a VBO, we only need a single call to glVertexAttribPointer and glEnableVertexAttribArray, before rendering.
This means some redundant data for untextured objects, but we get to cram everything into a single batch, which is nice.
To handle the untextured objects, you could either bind a blank white texture and treat it as a textured object. Or, you could have a uniform variable (a float between 0 and 1) in your fragment shader, and blend between texture color and vertex color using the mix function.
To batch sprites and shapes we should first handle the transformations on the CPU, so that we always upload "world"-coordinates to the GPU. This saves us from having to set a transformation uniform for each sprite, which would each be require individual draw calls.
Furthermore, we need to sort by texture whenever possible, as texture bindings are among the more expensive operations you can do.
Our approach basically boils down to the following:
Maintain a single Vertex- and Index Buffer Object to store the data
Keep all vertex data in a single format and interleave the data in the VBO
Sort by texture
Flush the data (draw elements/arrays) in the buffers whenever we change texture (or set the texture-blend uniform, if we go with that option)
Getting the data from the CPU to GPU memory can be done in different ways. For example, by first allocating a large enough, empty memory buffer on the GPU, and using glBufferSubData to upload a subset of vertex/index data, whenever you do one of your Render calls.
Remember to profile!
It is very important to do profiling when doing this kind of work. For example, to compare the performance between batching and individual draw calls, or glDrawArrays vs glDrawElements. I recommend using gDebugger, which is a free and very good OpenGL profiler.
Also note that too big of a VBO can hurt your performance. So keep it to a reasonable size, and flush it with a draw call whenever it fills up.

Related

How would I store vertex, color, and index data separately? (OpenGL)

I'm starting to learn openGL (working with version 3.3) with intent to get a small 3d falling sand simulation up, akin to this:
https://www.youtube.com/watch?v=R3Ji8J2Kprw&t=41s
I have a little experience with setting up a voxel environment like Minecraft from some Udemy tutorials for Unity, but I want to build something simple from the ground up and not deal with all the systems already laid on top of things with Unity.
The first issue I've run into comes early. I want to build a system for rendering quads, because instancing a ton of cubes is ridiculously inefficient. I also want to be efficient with storage of vertices, colors, etc. Thus far in the opengl tutorials I've worked with the way to do this is to store each vertex in a float array with both position and color data, and set up the buffer object to read every set of six entries as three floats for position and three for color, using glVertexAttribPointer. The problem is that for each neighboring quad, the same vertices will be repeated because if they are made of different "blocks" they will be different colors, and I want to avoid this.
What I want to do instead to make things more efficient is store the vertices of a cube in one int array (positions will all be ints), then add each quad out of the terrain to an indices array (which will probably turn into each chunk's mesh later on). The indices array will store each quad's position, and a separate array will store each quad's color. I'm a little confused on how to set this up since I am rather new to opengl, but I know this should be doable based on what other people have done with minecraft clones, if not even easier since I don't need textures.
I just really want to get the framework for the chunks, blocks, world, etc, up and running so that I can get to the fun stuff like adding new elements. Anyone able to verify that this is a sensible way to do this (lol) and offer guidance on how to set this up in the rendering, I would very much appreciate.
Thus far in the opengl tutorials I've worked with the way to do this is to store each vertex in a float array with both position and color data, and set up the buffer object to read every set of six entries as three floats for position and three for color, using glVertexAttribPointer. The problem is that for each neighboring quad, the same vertices will be repeated because if they are made of different "blocks" they will be different colors, and I want to avoid this.
Yes, and perhaps there's a reason for that. You seem to be trying to save.. what, a few bytes of RAM? Your graphics card has 8GB of RAM on it, what it doesn't have is a general processing unit or an unlimited bus to do random lookups in other buffers for every single rendered pixel.
The indices array will store each quad's position, and a separate array will store each quad's color.
If you insist on doing it this way, nothing's stopping you. You don't even need the quad vertices, you can synthesize them in a geometry shader.
Just fill in a buffer with X|Y|Width|Height|Color(RGB) with glVertexAttribPointer like you already know, then run a geometry shader to synthesize two triangles for each entry in your input buffer (a quad), then your vertex shader projects it to world units (you mentioned integers, so you're not in world units initially), and then your fragment shader can color each rastered pixel according to its color entry.
ridiculously inefficient
Indeed, if that sounds ridiculously inefficient to you, it's because it is. You're essentially packing your data on the CPU, transferring it to the GPU, unpacking it and then processing it as normal. You can skip at least two of the steps, and even more if you consider that vertex shader outputs get cached within rasterized primitives.
There are many more variations of this insanity, like:
store vertex positions unpacked as normal, and store an index for the colors. Then store the colors in a linear buffer of some kind (texture, SSBO, generic buffer, etc) and look up each color index. That's even more inefficient, but it's closer to the algorithm you were suggesting.
store vertex positions for one quad and set up instanced rendering with a multi-draw command and a buffer to feed individual instance data (positions and colors). If you also have textures, you can use bindless textures for each quad instance. It's still rendering multiple objects, but it's slightly more optimized by your graphics driver.
or just store per-vertex data in a buffer and render it. Done. No pre-computations, no unlimited expansions, no crazy code, you have your vertex data and you render it.

OpenGL best practice for putting two different mesh in the same vertex VBO

After some searching, it is said that the separated VAOs which shares the exact same shader attribute layouts, merging these into one VAO and put all these datas into one VBO so that I can draw this objects with only one draw call.
This perfectly makes sense, but how about uniform variables? Say that I want to draw tree and ball. these have different count of vertices, have different transform but share exactly the same shader program.
Until now, My program was like
// generate VAOs, called only once
glGenVertexArray(1, &treeVaoId);
// generate VBO and bind tree vertices
// enabled vertex attribute and set it for shader program
glGenVertexArray(1, &ballVaoId);
// repeat for ball
// draw, called each frame
// give the tree's transform to shader as uniform
glDrawArrays(...) // first draw call
// repeat for ball
glDrawArrays(...) // second draw call
And with these vertices to one VAO and VBO, like:
glGenVertexArray(1, &treeBallId);
// generate enough size of VBO and bind tree vertices and ball vertices after it.
// enabled vertex attribute and set it for shader program
// to draw, I have to separately give transform to it's uniform to each tree and ball, but how?
glDrawArrays(...)
Sorry for poor example, but the point is, Is there a way for giving different uniform variable while drawing one VAO? Or, is my approach totally wrong?
The purpose of batching is to improve performance by minimizing state changes between draw calls (batching reduces them to 0, since there is nothing between draw calls). However, there are degrees of performance improvement, and not all state changes are equal.
On the scale of the costs of state changes, changing program uniforms is the least expensive state change. That's not to say that it's meaningless, but you should consider how much effort you really want to spend compared to the results you get out of it. Especially if you're not pushing hardware as fast as possible.
VAO changes (non-buffer-only changes, that is) are among the more expensive state changes, so you gained a lot by eliminating them.
As the name suggests, uniform variables cannot be changed from one shader instance to another within a draw call. Even a multi-draw call. But that doesn't mean that there's nothing that can be done.
Multi-draw functionality allows you to issue multiple draw calls in a single function call. These individual draws can get their vertex data from different parts of the vertex and index buffers. What you need is a way to communicate to your vertex shader which draw call it is taking part in, so that it can index an array of some sort to extract that draw call's per-object data.
Semi-recent hardware has access to gl_DrawID, which is the index into a multi-draw command of the particular draw call being executed. The ARB_shader_draw_parameters extension is fairly widely implemented. You can use that index to fetch per-object data from a UBO or SSBO.

OpenGL big projects, VAO-s and more

So I've been learning OpenGL 3.3 on https://open.gl/ and I got really confused about some stuff.
VAO-s. By my understanding they are used to store the glVertexAttribPointer calls.
VBO-s. They store vertecies. So if I am making something with multiple objects do I need a VBO for every object?
Shader Programs - Why do we need multiple ones and what exactly do they do ?
What exactly does this line do : glBindFragDataLocation(shaderProgram, 0, "outColor");
The most important thing is how does all of this fit into a big program? For what exactly are used the VAO-s? Most tutorials just cover the things just to drawing a cube or 2 with hard coded vertices, so how would one go to managing scenes with a lot of objects? I've read this thread and got a little bit of understanding on how the scene management happens and all but still I can't figure out how to connect the OpenGL stuff to it all.
1-Yes. VAOs store vertex array bindings in general. When you see that you're doing lots of calls that does enabling, disabling and changing of GPU states, you can do all that at some early point in the program and then use VAOs to take a "snapshot" ,of what is bound and what isn't, at that point in time. Later, during your actual draw calls, all you need to do is bind that VAO again to set all the vertex states to what they were then. Just like how VBOs are faster that immediate mode because they send all vertices at once, VAOs work faster by changing many vertex states at once.
2-VBOs are just another way to send your glPosition, glColor..etc coordinates to the GPU to render on screen. The idea is, unlike with immediate mode where you send your vertex data one by one with the gl*Attribute* calls, is to upload all your vertices to the GPU in advance and retrieve their location as an ID. At time of rendering, you're only going to point the GPU (you bind the VBO id to something like GL_ARRAY_BUFFER, and use glVertexAttribPointer to specify details of how you stored the vertices data) to that location and issue your order to render. That obviously saves lots of time by doing things overhead, and so it's much faster.
As for whether one should have one VBO per object or even one VBO for all the objects is up to the programmer and the structure of the objects they want to render. After all, VBOs themselves are just a bunch of data you stored in the GPU, and you tell the computer how they're arranged using the glVertexAttribPointer calls.
3-Shaders are used to define a pipeline - a routine - of what happens to the vertices, colors, normals..etc after they've been sent to the GPU until they're rendered as fragments or pixels on the screen. When you send vertices over to the GPU, they're often still 3D coordinates, but the screen is a 2D sheet of pixels. There still comes the process of re-positioning these vertices according to the ProjectionModelView matrices (job of vertex shader) and then "flattening" or rasterizing the 3D geometry (geometry shader) into a 2D plane. Then it follows with coloring the flattened 2D scene (fragment shader) and finally lighting the pixels on your screen accordingly. In OpenGL versions 1.5 core and below, you didn't have much control over those stages as it was all fixed (hence the term fixed pipeline). Just think about what you could do in any of these shader stages and you will see that there is a lot of awesome things you can do with them. For example, in the fragment shader, just before you send the fragment color to the GPU, negate the sign of the color and add 1 to have colors of objects rendered with that shader inverted!
As for how many shaders one needs to use, again, it's up to the programmer to decide whether to have many or not. They could merge all the functionalities they need into one big giant shader (uber shader) and switch these functionalities on and off with boolean uniforms (very often considered as a bad practice), or have every shader do a certain thing and bind the right one according to what they need.
What exactly does this line do :
glBindFragDataLocation(shaderProgram, 0, "outColor");
It means that whatever is stored in the out declared variable "outColor" at the end of the fragment shader execution will be sent to the GPU as the final primary fragment color.
The most important thing is how does all of this fit into a big
program? For what exactly are used the VAO-s? Most tutorials just
cover the things just to drawing a cube or 2 with hard coded vertices,
so how would one go to managing scenes with a lot of objects? I've
read this thread and got a little bit of understanding on how the
scene management happens and all but still I can't figure out how to
connect the OpenGL stuff to it all.
They all work together to draw your nice colored shapes on the screen. VBOs are the structures where the vertices of your scene are stored (all aligned in an ugly fashion), VertexAttribPointer calls to tell the GPU how the data in the VBO is arranged, VAOs to store all these VertexAttribPointer instructions ahead of time and send them all at once with simply binding one during rendering in your main loop, and shaders to give you more control during the process of drawing your scene on the screen.
All of this can sound overwhelming at first, but with practice you will get used to it.

Render multiple objects in OpenGL

So I read somewhere that I should use fewer VBOs as possible and render many objects from a single VBO.
I get the point, but isn't that less organized?
I planned on creating 3d shapes classes where each instance of the class has its own VBO that will be bound when it needs to be drawn. This comes in contrast to what I wrote earlier.
How do 3D applications (video games etc.) get this done?
Firstly a VBO is just a collection of vertices that is stored in your video card memory. You can use a single VBO and model matrix to transform it, you can also apply different shaders and textures on top of it.
This is best explained with a 2D game in mind where every graphic is basically a quad; 4 vertices.
Would you really need to flood your video card memory with repeated allocations for a VBO of 4 vertices?
Instead we can have a pointer to the VBO and model matrix that transforms it. The VBO is usually stored in a resource manager so it stays in memory for as long as we like.
So now we're reusing a single VBO for everything, we're applying different transformations and textures to it. It looks like there are lots and lots of VBO's but actually we're being very efficient with our video card memory.
Here's some pseudo code:
class Sprite {
mat4 transform;
GLuint VBO = ResourceManager.quadVBO;
}

OpenGL- drawarrays or drawelements?

I'm making a small 2D game demo and from what I've read, it's better to use drawElements() to draw an indexed triangle list than using drawArrays() to draw an unindexed triangle list.
But it doesn't seem possible as far as I know to draw multiple elements that are not connected in a single draw call with drawElements().
So for my 2D game demo where I'm only ever going to draw squares made of two triangles, what would be the best approach so I don't end having one draw call per object?
Yes, it's better to use indices in many cases since you don't have to store or transfer duplicate vertices and you don't have to process duplicate vertices (vertex shader only needs to be run once per vertex). In the case of quads, you reduce 6 vertices to 4, plus a small amount of index data. Two thirds is quite a good improvement really, especially if your vertex data is more than just position.
In summary, glDrawElements results in
Less data (mostly), which means more GPU memory for other things
Faster updating if the data changes
Faster transfer to the GPU
Faster vertex processing (no duplicates)
Indexing can affect cache performance, if the reference vertices that aren't near each other in memory. Modellers commonly produce meshes which are optimized with this in mind.
For multiple elements, if you're referring to GL_TRIANGLE_STRIP you could use glPrimitiveRestartIndex to draw multiple strips of triangles with the one glDrawElements call. In your case it's easy enough to use GL_TRIANGLES and reference 4 vertices with 6 indices for each quad. Your vertex array then needs to store all the vertices for all your quads. If they're moving you still need to send that data to the GPU every frame. You could position all the moving quads at the front of the array and only update the active ones. You could also store static vertex data in a separate array.
The typical approach to drawing a 3D model is to provide a list of fixed vertices for the geometry and move the whole thing with the model matrix (as part of the model-view). The confusing part here is that the mesh data is so small that, as you say, the overhead of the draw calls may become quite prominent. I think you'll have to draw a LOT of quads before you get to the stage where it'll be a problem. However, if you do, instancing or some similar idea such as particle systems is where you should look.
Perhaps only go down the following track if the draw calls or data transfer becomes a problem as there's a lot involved. A good way of implementing particle systems entirely on the GPU is to store instance attributes such as position/colour in a texture. Each frame you use an FBO/render-to-texture to "ping-pong" this data between another texture and update the attributes in a fragment shader. To draw the particles, you can set up a static VBO which stores quads with the attribute-data texture coordinates for use in the vertex shader where the particle position can be read and applied. I'm sure there's a bunch of good tutorials/implementations to follow out there (please comment if you know of a good one).