Using more than one index list in a single VAO - opengl

I'm probably going about this all wrong, but hey.
I am rendering a large number of wall segments (for argument's sake, let's say 200). Every segment is one unit high, even and straight with no diagonals. All changes in direction are a change of 90 degrees.
I am representing each one as a four pointed triangle fan, AKA a quad. Each vertex has a three dimensional texture coordinate associated with it, such as 0,0,0, 0,1,7 or 10,1,129.
This all works fine, but I can't help but think it could be so much better. For instance, every point is duplicated at least twice (Every wall is a contiguous line of segments and there are some three & four way intersections) and the starting corner texture coordinates (0,0,X and 0,1,X) are going to be duplicated for every wall with texture number X on it. This could be compressed even further by moving the O coordinate into a third attribute and indexing the S and T coordinates separately.
The problem is, I can't seem to work out how to do this. VAOs only seem to allow one index, and taken as a lump, each position and texture coordinate form a unique snowflake never to be repeated. (Admittedly, this may not be true on certain corners, but that's a very edge case)
Is what I want to do possible, or am I going to have to stick with the (admittedly fine) method I currently use?

It depends on how much work you want to do.
OpenGL does not directly allow you to use multiple indices. But you can get the same effect.
The most direct way is to use a Buffer Texture to access an index list (using gl_VertexID), which you then use to access a second Buffer Texture containing your positions or texture coordinates. Basically, you'd be manually doing what OpenGL automatically does. This will naturally be slower per-vertex, as attributes are designed to be fast to access. You also lose some of the compression features, as Buffer Textures don't support as many formats.

each vertex and texture coordinate form a unique snowflake never to be repeated
A vertex is not just position, but the whole vector formed by position, texture coordinates and all the other attributes. It is, what you referred to as "snowflake".
And for only 200 walls, I'd not bother about memory consumption. It comes down to cache efficiency. And GPUs use vertices – and that means the whole position and other attributes vector – as caching key. Any "optimization" like you want to do them will probably have a negative effect on performance.
But having some duplicated vertices doesn't hurt so much, if they're not too far apart in the primitive index list. Today GPUs can hold about between 30 to 1000 vertices (that is after transformation, i.e. the shader stage) in their cache, depending on the number of vertex attributes are fed and the number of varying variables delivered to the fragment shader stage. So if a vertex (input key) has been cached, the shader won't be executed, but the cached result fed to fragment processing directly.
So the optimization you should really aim for is cache locality, i.e. batching things up in a way, that shared/duplicated vertices are sent to the GPU in quick succession.

Related

How would I store vertex, color, and index data separately? (OpenGL)

I'm starting to learn openGL (working with version 3.3) with intent to get a small 3d falling sand simulation up, akin to this:
https://www.youtube.com/watch?v=R3Ji8J2Kprw&t=41s
I have a little experience with setting up a voxel environment like Minecraft from some Udemy tutorials for Unity, but I want to build something simple from the ground up and not deal with all the systems already laid on top of things with Unity.
The first issue I've run into comes early. I want to build a system for rendering quads, because instancing a ton of cubes is ridiculously inefficient. I also want to be efficient with storage of vertices, colors, etc. Thus far in the opengl tutorials I've worked with the way to do this is to store each vertex in a float array with both position and color data, and set up the buffer object to read every set of six entries as three floats for position and three for color, using glVertexAttribPointer. The problem is that for each neighboring quad, the same vertices will be repeated because if they are made of different "blocks" they will be different colors, and I want to avoid this.
What I want to do instead to make things more efficient is store the vertices of a cube in one int array (positions will all be ints), then add each quad out of the terrain to an indices array (which will probably turn into each chunk's mesh later on). The indices array will store each quad's position, and a separate array will store each quad's color. I'm a little confused on how to set this up since I am rather new to opengl, but I know this should be doable based on what other people have done with minecraft clones, if not even easier since I don't need textures.
I just really want to get the framework for the chunks, blocks, world, etc, up and running so that I can get to the fun stuff like adding new elements. Anyone able to verify that this is a sensible way to do this (lol) and offer guidance on how to set this up in the rendering, I would very much appreciate.
Thus far in the opengl tutorials I've worked with the way to do this is to store each vertex in a float array with both position and color data, and set up the buffer object to read every set of six entries as three floats for position and three for color, using glVertexAttribPointer. The problem is that for each neighboring quad, the same vertices will be repeated because if they are made of different "blocks" they will be different colors, and I want to avoid this.
Yes, and perhaps there's a reason for that. You seem to be trying to save.. what, a few bytes of RAM? Your graphics card has 8GB of RAM on it, what it doesn't have is a general processing unit or an unlimited bus to do random lookups in other buffers for every single rendered pixel.
The indices array will store each quad's position, and a separate array will store each quad's color.
If you insist on doing it this way, nothing's stopping you. You don't even need the quad vertices, you can synthesize them in a geometry shader.
Just fill in a buffer with X|Y|Width|Height|Color(RGB) with glVertexAttribPointer like you already know, then run a geometry shader to synthesize two triangles for each entry in your input buffer (a quad), then your vertex shader projects it to world units (you mentioned integers, so you're not in world units initially), and then your fragment shader can color each rastered pixel according to its color entry.
ridiculously inefficient
Indeed, if that sounds ridiculously inefficient to you, it's because it is. You're essentially packing your data on the CPU, transferring it to the GPU, unpacking it and then processing it as normal. You can skip at least two of the steps, and even more if you consider that vertex shader outputs get cached within rasterized primitives.
There are many more variations of this insanity, like:
store vertex positions unpacked as normal, and store an index for the colors. Then store the colors in a linear buffer of some kind (texture, SSBO, generic buffer, etc) and look up each color index. That's even more inefficient, but it's closer to the algorithm you were suggesting.
store vertex positions for one quad and set up instanced rendering with a multi-draw command and a buffer to feed individual instance data (positions and colors). If you also have textures, you can use bindless textures for each quad instance. It's still rendering multiple objects, but it's slightly more optimized by your graphics driver.
or just store per-vertex data in a buffer and render it. Done. No pre-computations, no unlimited expansions, no crazy code, you have your vertex data and you render it.

OpenGL - How to render many different models?

I'm currently struggling with finding a good approach to render many (thousands) slightly different models. The model itself is a simple cube with some vertex offset, think of a skewed quad face. Each 'block' has a different offset of its vertices, so basically I have a voxel engine on steroids as each block is not a perfect cube but rather a skewed cuboid. To render this shape 48 vertices are needed but can be cut to 24 vertices as only 3 faces are visible. With indexing we are at 12 vertices (4 for each face).
But, now that I have the vertices for each block in the world, how do I render them?
What I've tried:
Instanced Rendering. Sounds good, doesn't work as my models are not the same.
I could simplify distant blocks to a cube and render them with glDrawArraysInstanced/glDrawElementsInstanced.
Put everything in one giant VBO. This has a better performance than rendering each cube individually, but has the downside of having one large mesh. This is not desireable as I need every cube to have different textures, lighting, etc... Selecting a single cube within that huge mesh is not possible.
I am aware of frustum culling and occlusion culling, but I already have problems with some cubes in front of me (tested with a 128x128 world).
My requirements:
Draw some thousand models.
Each model has vertices offsets to make the block less cubic, stored in another VBO.
Each block has to be an individual object, as you should be able to place/remove blocks.
Any good performance advices?
This is not desireable as I need every cube to have different textures, lighting, etc... Selecting a single cube within that huge mesh is not possible.
Programmers should avoid declaring that something is "impossible"; it limits your thinking.
Giving each face of these cubes different textures has many solutions. The Minecraft approach uses texture atlases. Each "texture" is really just a sub-section of one large texture, and you use texture coordinates to select which sub-section a particular face uses. But you can get more complex.
Array textures allow for a more direct way to solve this problem. Here, the texture coordinates would be the same, but you use a per-vertex integer to select the correct texture for a face. All of the vertices for a particular face would have an index. And if you're clever, you don't even really need texture coordinates. You can generate them in your vertex shader, based on per-vertex values like gl_VertexID and the like.
Lighting parameters would work the same way: use some per-vertex data to select parameters from a UBO or SSBO.
As for the "individual object" bit, that's merely a matter of how you're thinking about the problem. Do not confuse what happens in the player's mind with what happens in your code. Games are an elaborate illusion; just because something appears to the user to be an "individual object" doesn't mean it is one to your rendering engine.
What you need is the ability to modify your world's data to remove and add new blocks. And if you need to show a block as "selected" or something, then you simply need another per-block value (like the lighting parameters and index for the texture) which tells you whether to draw it as a "selected" block or as an "unselected" one. Or you can just redraw that specific selected block. There are many ways of handling it.
Any decent graphics card (since about 2010) is able to render a few millions vertices in a blinking.
The approach is different depending on how many changes per frame. In other words, how many data must be transferred to the GPU per frame.
For the case of small number of changes, storing the data in one big VBO or many smaller VBOs (and their VAOs), sending the changes by uniforms, and calling several glDraw***, shows similar performance. Different hardwares behave with little difference. Indexed data may improve the speed.
When most of the data changes in every frame and these changes are hard or impossible to do in the shaders, then your app is memory-transfer bound. Streaming is a good advise.

OpenGL- drawarrays or drawelements?

I'm making a small 2D game demo and from what I've read, it's better to use drawElements() to draw an indexed triangle list than using drawArrays() to draw an unindexed triangle list.
But it doesn't seem possible as far as I know to draw multiple elements that are not connected in a single draw call with drawElements().
So for my 2D game demo where I'm only ever going to draw squares made of two triangles, what would be the best approach so I don't end having one draw call per object?
Yes, it's better to use indices in many cases since you don't have to store or transfer duplicate vertices and you don't have to process duplicate vertices (vertex shader only needs to be run once per vertex). In the case of quads, you reduce 6 vertices to 4, plus a small amount of index data. Two thirds is quite a good improvement really, especially if your vertex data is more than just position.
In summary, glDrawElements results in
Less data (mostly), which means more GPU memory for other things
Faster updating if the data changes
Faster transfer to the GPU
Faster vertex processing (no duplicates)
Indexing can affect cache performance, if the reference vertices that aren't near each other in memory. Modellers commonly produce meshes which are optimized with this in mind.
For multiple elements, if you're referring to GL_TRIANGLE_STRIP you could use glPrimitiveRestartIndex to draw multiple strips of triangles with the one glDrawElements call. In your case it's easy enough to use GL_TRIANGLES and reference 4 vertices with 6 indices for each quad. Your vertex array then needs to store all the vertices for all your quads. If they're moving you still need to send that data to the GPU every frame. You could position all the moving quads at the front of the array and only update the active ones. You could also store static vertex data in a separate array.
The typical approach to drawing a 3D model is to provide a list of fixed vertices for the geometry and move the whole thing with the model matrix (as part of the model-view). The confusing part here is that the mesh data is so small that, as you say, the overhead of the draw calls may become quite prominent. I think you'll have to draw a LOT of quads before you get to the stage where it'll be a problem. However, if you do, instancing or some similar idea such as particle systems is where you should look.
Perhaps only go down the following track if the draw calls or data transfer becomes a problem as there's a lot involved. A good way of implementing particle systems entirely on the GPU is to store instance attributes such as position/colour in a texture. Each frame you use an FBO/render-to-texture to "ping-pong" this data between another texture and update the attributes in a fragment shader. To draw the particles, you can set up a static VBO which stores quads with the attribute-data texture coordinates for use in the vertex shader where the particle position can be read and applied. I'm sure there's a bunch of good tutorials/implementations to follow out there (please comment if you know of a good one).

Fastest way to upload streaming points, and removing occasionally

So i have a system (using OpenGL 4.x) where i am receiving a stream of points (potentially with color and/or normal), from an external source. And I need to draw these points as GL_POINTS, running custom switchable shaders for coloring (color could be procedurally generated, or come from vertex color or normal direction).
The stream consists of receiving a group of points (with or without normal or color) of an arbitrary count (typical from 1k to 70k points) at a fairly regular interval (4 to 10 hz), I need to add these points to my current points and draw all the points so far received points.
I am guaranteed that my vertex type will not change, I am told at the beginning of the streaming which to expect, so i am either using an interleaved vertex with: pos+normal+color, pos+normal, pos+color, or just pos.
My current solution is to allocate interleaved vertex VBOs (with surrounding VAOs) of the appropriate vertex type at a config file specified max vertex count (allocated with the DYNAMIC hint).
As new points come in i fill up my current non filled VBO via glBufferSubData. I keep a count (activePoints) of how many vertices the current frontier VBO has in it so far, and use glBufferSubData to fill in a range starting with activePoints, if my current update group has more vertices than can fit in my frontier buffer (since i limit the vertex count per VBO), then i allocate a new VBO and fill the range starting at 0 and ending with the number of points left in my update group (not added to the last buffer), if I still have points left I do this again and again. It is rare that an update group straddles more than 2 buffers.
When rendering i render all my VBOs (-1) with a glDrawArrays(m_DrawMode,0,numVertices), where numVertices is equal to max buffer allowed size, and my frontier buffer with a glDrawArrays(m_DrawMode,startElem,numElems) to account for it not being completely filled with valid vertices.
Of course at some point I will have more points than I can draw interactively, so i have an LRU mechanism that deallocates the oldest (according to the LRU alg) sets of VBOs as needed.
Is there a more optimal method for doing this? Buffer orphaning? Streaming hint? Map vs SubData? Something else?
The second issue is that i am now asked to removed points (at irregular intervals), ranging from 10 to 2000 at a time. But these points are irregularly spaced within the order I received them initially. I can find out what offsets in which buffers they currently exit in, but its more of a scattering than a range. I have been "removing them" by finding their offsets into the right buffers and one by one calling glBufferSubData with a range of 1 (its rare that they are beside each other in a buffer), and changing there pos to be somewhere far off where they will never be seen. Eventually i guess buffers should be deleted from these remove request adding up, but I don't currently do that.
What would be a better way to handle that?
Mapping may be more efficient than glBufferSubData, especially when having to "delete" points. Explicit flush may be of particular help. Also, mapping allows you to offload the filling of a buffer to another thread.
Be positively sure to get the access bits correct (or performance is abysmal), in particular do not map a region "read" if all you do is write.
Deleting points from a vertex buffer is not easily possible, as you probably know. For "few" points (e.g. 10 or 20) I would just set w = 0, which moves them to infinity and keep drawing the whole thing as before. If your clip plane is not at infinity, this will just discard them. With explicit flushing, you would not even need to keep a separate copy in memory.
For "many" points (e.g. 1,000), you may consider using glCopyBufferSubData to remove the "holes". Moving memory on the GPU is fast, and for thousands of points it's probably worth the trouble. You then need to maintain a count for every vertex buffer, so you draw fewer points after removing some.
To "delete" entire vertex buffers, you should just orphan them (and reuse). OpenGL will do the right thing on its own behalf then, and it's the most efficient way to keep drawing and reusing memory.
Using glDrawElements instead of glDrawArrays as suggested in Andon M. Coleman's comment is usually a good advice, but will not help you in this case. The reason why one would want to do that is that the post-transform cache works by tagging vertices by their index, so drawing elements takes advantage of the post-transform cache whereas drawing arrays does not. However, the post-transform cache is only useful on complex geometry such as triangle lists or triangle strips. You're drawing points, so you will not use the post-transform cache in any case -- but using indices increases memory bandwidth both on the GPU and on the PCIe bus.

How to batch same square in a single glVertexPointer

I've read that to optimize drawing, one can draw a set of figures using the same texture in one pass. But how do i connect my singles square together to form one figure to send to glVertexPointer.
(read in PowerVR MBX.3D Application Development Recommendations.1.0.67a - page 5)
I take it you are pondering the "software transform" step. I assume you have some kind of vertex array for ONE square and would like to concatenate several different of instances of that array into one big array containing all the vertexes for your squares, and then finally draw the big array.
The big array in this case would then contain pre-transformed vertex data that is then sent to the GPU in one draw call. Since you know how many squares you are going to draw, say N of them. That array must be able to contain N*4*3 elements (assuming you are working in 3d and therefore have 3 coordinates per vertex).
The software transform step would then iterate over all the squares and append the transformed data into the big array, which in turn can be drawn by a call to glVertexPointer().
However, I'm a bit sceptical to this software transform step. This means you are going to take care of all these transformations on your own, which means you are not using the power of the GPU. That is you will use CPU power and not GPU power to get your transformed coordinates. Personally I'd start off by creating the texture atlas and then generate all the texture coordinates for each single square. This is only needed once, since the texture coordinates will not change. You can then use the texture coordinates by a call to glTextureCoordPointer() before you start drawing your squares (push matrix i, draw quad, pop matrix etc)
Edit:
yes this is what i want. Are you sure
it's slower on the cpu ? from what
they say the overhead of calling
multiple times glVertexPointer and
glDrawArrays would be slower.
I'm taking back my scepticism ! :) Why don't you do some measures, just for the heck of it? The trade off must at least be that you must be shuffling a whole lot more data. So if the GPU can't hold all that data, normal transformations might be a neccesity.
Oh, there's one more thing. As soon as a butterfly moves, its data has to be manually retransformed, this is not needed when you let the GPU transform the data for you. So you must flag some state for each instance when it's dirty and you need to retransform before doing the draw call.
Looks like you want to use indices? See glDrawElements.