Fastest way to draw many textured quads in OpenGL 3+ - opengl

Since GL_QUADS has been removed from OpenGL 3.1 and above, what is the fastest way to draw lots of quads without using it? I've tried several different methods (below) and have ranked them on speed on my machine, but I was wondering if there is some better way, since the fastest way still seems wasteful and inelegant. I should mention that in each of these methods I'm using VBOs with interleaved vertex and texture coordinates, since I believe that to be best practice (though I may be wrong). Also, I should say that I can't reuse any vertices between separate quads because they will have different texture coordinates.
glDrawElements with GL_TRIANGLE_STRIP using a primitive restart index, so that the index array looks like {0, 1, 2, 3, PRI, 4, 5, 6, 7, PRI, ...}. This takes in the first 4 vertices in my VBO, treats them as a triangle strip to make a rectangle, and then treats the next 4 vertices as a separate strip. The problem here is just that the index array seems like a waste of space. The nice thing about GL_QUADS in earlier versions of OpenGL is that it automatically restarts primitives every 4 vertices. Still, this is the fastest method I can find.
Geometry shader. I pass in 1 vertex for each rectangle and then construct the appropriate triangle strip of 4 vertices in the shader. This seems like it would be the fastest and most elegant, but I've read, and now seen, that geometry shaders are not that efficient compared to passing in redundant data.
glDrawArrays with GL_TRIANGLES. I just draw every triangle independently, reusing no vertices.
glMultiDrawArrays with GL_TRIANGLE_STRIP, an array of all multiples of 4 for the "first" array, and an array of a bunch of 4's for the "count" array. This tells the video card to draw the first 4 starting at 0, then the first 4 starting at 4, and so on. The reason this is so slow, I think, is that you can't put these index arrays in a VBO.

You've covered all the typical good ways, but I'd like to suggest a few less typical ones that I suspect may have higher performance. Based on the wording of the question, I shall assume that you're trying to draw an m*n array of tiles, and they all need different texture coordinates.
A geometry shader is not the right tool to add and remove vertices. It's capable of doing that, but it's really intended for cases when you actually change the number of primitives you're rendering dynamically (e.g. shadow volume generation). If you just want to draw a whole bunch of adjacent different primitives with different texture coordinates, I suspect the absolute fastest way would be to use tessellation shaders. Just pass in a single quad and have the tessellator compute texture coordinates procedurally.
A similar and more portable method would be to look up each quad's texture coordinate. This is trivial: say you're drawing 50x20 quads, you would have a 50x20 texture that stores all your texture coordinates. Tap this texture in your vertex program (or perhaps more efficiently in your geometry program) and send the result in a varying to the fragment program for actual rendering.
Note that in both of the above cases, you can reuse vertices. In the first method, the intermediate vertices are generated on the fly. In the second, the vertices' texture coordinates are replaced in the shader with cached values from the texture.

Related

OpenGL geometry shader, setting input size

I've written my first geometry shader with success. It takes in lines and outputs a little triangle at the center of each.
I could do the same thing for triangles easily enough, but what about a cube? Is there a way to get a geometry shader to operate on an arbitrary number of points, or at the very least more than 3? I know I could compute the center myself and do another drawing operation, but I'd like to know if it's possible inside the shader.
Thanks.
Geometry shaders take as input a primitive, not a number of vertices. I mean yes, a specific primitive is made of a specific number of vertices. But GS's don't take vertex counts; they take primitives.
There are a number of special primitive types that allow GS's to access more vertices than those in the base primitive type. But these are for referencing vertices adjacent to the main primitive's vertices, and it's difficult to try to make them work as a general mechanism for consuming X vertices.
So you can only use a vertex count that matches a primitive's vertex count: 1, 2, 3, 4, or 6. Outside of these specific vertex counts, you can't make a GS do what you're trying to do.
You can attempt to employ tessellation, as patch vertex counts are user-specified (though limited by the implementation). But tessellation is more restrictive in terms of generating vertices.

Draw multiple shapes in one vbo

I want to render multiple 3D cubes from one vbo. Each cube has a uniform color.
At this time, I create a vbo where each vertex has a color information.
Is it posible to upload only one color for a one shape (list of verticies)?
I'm also want to mix GL_TRIANGLES and GL_LINES in the glDrawElements-method of the same shader. Is it posible?
//Edit : I only have OpenGL 2.1. Later I want to build this project on Android.
//Edit 2:
I want to render a large count of cubes (up to 150.000). One cube has 24 verticies of geometry and color and 34 indices. Now my idea is to create some vbo's (maybe 50) and share out the cubes to the vbo's. I hope that this minimizes the overhead.
Drawing lots of cubes
Yes, if you want to draw a bunch of cubes, you can specify the color for each cube once.
Create a VBO containing the vertexes for one cube.
// cube = 36 vertexes with glDrawArrays(GL_TRIANGLES)
vbo1 = [v1] [v2] [v3] ... [v36]
Create another VBO with the view matrix and color for each cube, and use an attribute divisor of 1. (You can use the same vbo, but I would use a separate one.)
vbo2 = [cube 1 mat, color] [cube 2 mat, color] ... [cube N mat, color]
Call glDrawElementsInstanced() or glDrawArraysInstanced(). This will draw the cube over and over again.
Alternatively, you can use glUniform() for each cube, but this will limit the number of cubes you can draw. The above method will let you draw thousands, easily.
Mixing GL_TRIANGLES and GL_LINES
You will have to call glDraw????() once for each type of primitive. You can use the same shader for both times, if you like.
Regarding your questions :
Is it possible to upload only one color for one shape ?
Yes , you can use a uniform instead of a vertex attribute(ofc this means changes in more places). However, you will need to set the uniform for each shape, and have a different drawcall for each differently colored shape .
Is it possible to mix GL_TRIANGLES and GL_LINES in the glDrawElements ?
Yes and no. Yes , but you will need a new drawcall (which is obvious). You cannot do on the same drawcall some shapes with GL_TRIANGLES and some shapes with GL_LINES.
In pseudocode this will look like this :
draw shapes 1,2,10 from the vbo using color red and GL_TRIANGLES
draw shapes 3,4,6 from the vbo using color blue and GL_LINES
draw shapes 7,8,9 from the vb using color blue and GL_TRIANGLES
With OpenGL 2.1, I don't think there's a reasonable way of specifying the color only once per cube, and still draw everything in a single draw call.
The most direct approach is that, instead of having the color attribute in a VBO, you specify it directly before the draw call. Assuming that you're using generic vertex attributes, where you would currently have:
glEnableVertexAttribArray(colorLoc);
glVertexAttripPointer(colorLoc, ...);
you do this:
glDisableVertexAttribArray(colorLoc);
glVertexAttrib3f(colorLoc, r, g, b);
where glDisableVertexAttribArray() is only needed if the array was previously enabled for the location.
The big disadvantage is that you can only draw cubes with the same color in one draw call. In the extreme case, that's one draw call per cube. Of course if you have multiple cubes with the same color, you could still batch those into a single draw call.
You wonder whether this is more efficient than having a color for each vertex in the VBO? Impossible to say in general. You'll always get the same answer in cases like this: Try both, and benchmark. I'm skeptical that you will find it beneficial. In my experience, it's fairly rare for fetching vertex data to be a major performance bottleneck. So cutting out one attribute will likely no give you much of a gain. On the other hand, making many small draw calls absolutely can (and often will) hurt performance.
There is one option you can use that is sort of a hybrid. I'm not necessarily recommending it, but just in the interest of brainstorming. If you use a fairly limited number of colors, you can use a single scalar attribute in the VBO that encodes a "color index". Then in the vertex shader, you can use a texture lookup to translate the "color index" to the actual color.
The really good options are beyond OpenGL 2.1. #DietrichEpp nicely explained instanced rendering, which is an elegant solution for cases like this.
And no, you can not have lines and triangles in the same draw call. Even the most flexible draw calls in OpenGL 4.x, like glDrawElementsIndirect(), still take only one primitive type.

OpenGL- drawarrays or drawelements?

I'm making a small 2D game demo and from what I've read, it's better to use drawElements() to draw an indexed triangle list than using drawArrays() to draw an unindexed triangle list.
But it doesn't seem possible as far as I know to draw multiple elements that are not connected in a single draw call with drawElements().
So for my 2D game demo where I'm only ever going to draw squares made of two triangles, what would be the best approach so I don't end having one draw call per object?
Yes, it's better to use indices in many cases since you don't have to store or transfer duplicate vertices and you don't have to process duplicate vertices (vertex shader only needs to be run once per vertex). In the case of quads, you reduce 6 vertices to 4, plus a small amount of index data. Two thirds is quite a good improvement really, especially if your vertex data is more than just position.
In summary, glDrawElements results in
Less data (mostly), which means more GPU memory for other things
Faster updating if the data changes
Faster transfer to the GPU
Faster vertex processing (no duplicates)
Indexing can affect cache performance, if the reference vertices that aren't near each other in memory. Modellers commonly produce meshes which are optimized with this in mind.
For multiple elements, if you're referring to GL_TRIANGLE_STRIP you could use glPrimitiveRestartIndex to draw multiple strips of triangles with the one glDrawElements call. In your case it's easy enough to use GL_TRIANGLES and reference 4 vertices with 6 indices for each quad. Your vertex array then needs to store all the vertices for all your quads. If they're moving you still need to send that data to the GPU every frame. You could position all the moving quads at the front of the array and only update the active ones. You could also store static vertex data in a separate array.
The typical approach to drawing a 3D model is to provide a list of fixed vertices for the geometry and move the whole thing with the model matrix (as part of the model-view). The confusing part here is that the mesh data is so small that, as you say, the overhead of the draw calls may become quite prominent. I think you'll have to draw a LOT of quads before you get to the stage where it'll be a problem. However, if you do, instancing or some similar idea such as particle systems is where you should look.
Perhaps only go down the following track if the draw calls or data transfer becomes a problem as there's a lot involved. A good way of implementing particle systems entirely on the GPU is to store instance attributes such as position/colour in a texture. Each frame you use an FBO/render-to-texture to "ping-pong" this data between another texture and update the attributes in a fragment shader. To draw the particles, you can set up a static VBO which stores quads with the attribute-data texture coordinates for use in the vertex shader where the particle position can be read and applied. I'm sure there's a bunch of good tutorials/implementations to follow out there (please comment if you know of a good one).

OpenGL: Multi-texturing an array of "linked" quads

I recently completed my system for loading an array of quads into VBOs. This system allows quads to share vertices in order to save a substantial amount of memory. For example, an array of 100x100 quads would use 100x100x4=40000 vertices normally (4 vertices per quad), but with this system, it would only use 101x101=10201 vertices. That is a huge amount of space saving when you get into even larger scales.
My problem is is that in order to texture each quad individually, each vertex needs a "UV" coordinate pair (or "ST" coordinate) to map one part of the texture to. This leads to the problem, how do I texture each quad independently of each other? Even if two of the same textured quads are next to each other, I cannot use the same texture coordinate for both of the quads. This is illustrated below:
*Each quad being 16x16 pixels in dimension and the texture coordinates having a range of 0 to 1.
To make things even more complicated, some quads in the array might not even be there (because that part of the terrain is just an empty block). So as you might have guessed, this is for a rendering engine for those 2D tile games everyone is trying to make.
Is there a way to texture quads using the vertex saving technique or will I just have to trash this method and just use the way less efficient way?
You can't.
Vertices in OpenGL are a collection of data. They may contain positions, but they also contain texture coordinates or other things. Every vertex, every collection of position/coordinate/etc, must be unique. So if you need to pair the same position with different texture coordinates, then you have different vertices.

GLSL: How to access nearby vertex colors? (bilinear interpolation without uniforms)

I'm trying to make bilinear color interpolation on a quad, i succeeded with the help of my previous question on here, but it has bad performance because its requires me to repeat glBegin() and glEnd() and 4 times glUniform() before glBegin().
The question is: is it anyhow possible to apply bilinear color interpolation on a quad like this:
glBegin(GL_QUADS);
glColor4f(...); glVertexAttrib2f(uv, 0, 0); glTexCoord2f(...); glVertex3f(...);
glColor4f(...); glVertexAttrib2f(uv, 1, 0); glTexCoord2f(...); glVertex3f(...);
glColor4f(...); glVertexAttrib2f(uv, 1, 1); glTexCoord2f(...); glVertex3f(...);
glColor4f(...); glVertexAttrib2f(uv, 0, 1); glTexCoord2f(...); glVertex3f(...);
... // here can be any amount of quads without repeating glBegin()/glEnd()
glEnd();
To do this, i think i should somehow access the nearby vertex colors, but how? Or is there any other solutions for this?
I need this to work this way so i can easily switch between different interpolation shaders.
Any other solution that works with one glBegin() command is good too, but sending all corner colors per vertex isnt acceptable, unless thats the only solution here?
Edit: The example code uses immediate mode for clarity only. Even with vertex arrays/buffers the problem would be the same: i would have to split the rendering calls into 4 vertices chunks, which causes the whole speed drop here!
Long story short: You cannot do this with a vertex shader.
The interpolator (or rasterizer) is one of the components of the graphics pipeline that is not programmable. Given how the graphics pipe works, neither a vertex shader nor a fragment shader are allowed access to anything but their vertex (or fragment, respectively), for reasons of speed, simplicity, and parallelism.
The workaround is to use a texture lookup, which has already been noted in previous answers.
In newer versions of OpenGL (3.0 and up I believe?) there is now the concept of a geometry shader. Geometry shaders are more complicated to implement than the relatively simple vertex and fragment shaders, but geometry shaders are given topological information. That is, they execute on a primitive (triangle, line, quad, etc) rather than a single point. With that information, they could create additional geometry in order to resolve your alternate color interpolation method.
However, that's far more complicated than necessary. I'd stick with a 4 texel texture map and implement your logic in your fragment lookup.
Under the hood, OpenGL (and all the hardware that it drives) will do everything as triangles, so if you choose to blend colors via vertex interpolation, it will be triangular interpolation because the hardware doesn't work any other way.
If you want "quad" interpolation, you should put your colors into a texture, because in hardware a texture is always "quad" shaped.
If you really think it's the number of draws that cause your performance drop, you can try to use Instancing (Using glDrawArrayInstanced+glVertexAttribDivisor), available in GL 3.1 core.
An alternative might be point sprites, depending on your usage model (mostly, maximum size of your quads, and are they always perpendicular to the view). That's available since GL 2.0 core.
Linear interpolation with colours specified per vertex can be set up efficiently using glColorPointer. Similarly you should use glTexCoordPointer/glVertexAttribPointer/glVertexPointer to replace all those individual per-vertex calls with a single call referencing the data in an array. Then render all your quads with a single (or at most a handful of) glDrawArrays or glDrawElements call. You'll see a huge improvement from this even without VBOs (which just change where the arrays are stored).
You mention you want to change shaders (between ShaderA and ShaderB say) on a quad by quad basis. You should either:
Arrange things so you can batch all of the ShaderA quads together and all the ShaderB quads together and render all of each together with a single call. Changing shader is generally quite expensive so you want to minimise the number of changes.
or
Implement all the different shader logic you want in a single "unified" shader, but selected by another vertex attribute which selects between the different codepaths. Whether this is anywhere near as efficient as the batching approach (which is preferable) depends on whether or not each "tile" of SIMD shaders tends to have to run a mixture of paths or just one.