OpenGL rendering several mesh instances

OpenGL rendering several mesh instances - opengl

I started learning OpenGL, now I'm going to try to develop something on my own and got stuck on a doubt.
I'm going to render models that have about 50k primitives (cylinders, cubes, cones, etc). Less than 1/4 of them are 'unique', I mean, have the same dimensions, but different positioning and rotation. So I thought that somehow I could fill a data buffer with only basic vertices and then draw them with individual transformations matrices.
From what I've read, I should use a buffer for vertices and a buffer for indices, so I don't waste memory storing repeated vertices.
All of them are stored in a single big buffer (that's because I read that this is more efficient if the one single buffer if it do not exceed a 1~3mb limit).
To draw them I'm trying to use glDrawElements, but since they are all in a single buffer, I cannot update the individual matrices to update the shaders so they can draw each mesh in the correct position.
One solution would be to use thousands of small buffers and then update the matrices between the glDrawElements calls.
Another would be discard the indices buffer and store only the vertices so I can draw them using glDrawArrays, which allows me to draw only a small part of the buffer.
Anything I said above is wrong? Which option would result in better performance? Is there a better way to do this?

Related

How would I store vertex, color, and index data separately? (OpenGL)

I'm starting to learn openGL (working with version 3.3) with intent to get a small 3d falling sand simulation up, akin to this:
https://www.youtube.com/watch?v=R3Ji8J2Kprw&t=41s
I have a little experience with setting up a voxel environment like Minecraft from some Udemy tutorials for Unity, but I want to build something simple from the ground up and not deal with all the systems already laid on top of things with Unity.
The first issue I've run into comes early. I want to build a system for rendering quads, because instancing a ton of cubes is ridiculously inefficient. I also want to be efficient with storage of vertices, colors, etc. Thus far in the opengl tutorials I've worked with the way to do this is to store each vertex in a float array with both position and color data, and set up the buffer object to read every set of six entries as three floats for position and three for color, using glVertexAttribPointer. The problem is that for each neighboring quad, the same vertices will be repeated because if they are made of different "blocks" they will be different colors, and I want to avoid this.
What I want to do instead to make things more efficient is store the vertices of a cube in one int array (positions will all be ints), then add each quad out of the terrain to an indices array (which will probably turn into each chunk's mesh later on). The indices array will store each quad's position, and a separate array will store each quad's color. I'm a little confused on how to set this up since I am rather new to opengl, but I know this should be doable based on what other people have done with minecraft clones, if not even easier since I don't need textures.
I just really want to get the framework for the chunks, blocks, world, etc, up and running so that I can get to the fun stuff like adding new elements. Anyone able to verify that this is a sensible way to do this (lol) and offer guidance on how to set this up in the rendering, I would very much appreciate.

Thus far in the opengl tutorials I've worked with the way to do this is to store each vertex in a float array with both position and color data, and set up the buffer object to read every set of six entries as three floats for position and three for color, using glVertexAttribPointer. The problem is that for each neighboring quad, the same vertices will be repeated because if they are made of different "blocks" they will be different colors, and I want to avoid this.
Yes, and perhaps there's a reason for that. You seem to be trying to save.. what, a few bytes of RAM? Your graphics card has 8GB of RAM on it, what it doesn't have is a general processing unit or an unlimited bus to do random lookups in other buffers for every single rendered pixel.
The indices array will store each quad's position, and a separate array will store each quad's color.
If you insist on doing it this way, nothing's stopping you. You don't even need the quad vertices, you can synthesize them in a geometry shader.
Just fill in a buffer with X|Y|Width|Height|Color(RGB) with glVertexAttribPointer like you already know, then run a geometry shader to synthesize two triangles for each entry in your input buffer (a quad), then your vertex shader projects it to world units (you mentioned integers, so you're not in world units initially), and then your fragment shader can color each rastered pixel according to its color entry.
ridiculously inefficient
Indeed, if that sounds ridiculously inefficient to you, it's because it is. You're essentially packing your data on the CPU, transferring it to the GPU, unpacking it and then processing it as normal. You can skip at least two of the steps, and even more if you consider that vertex shader outputs get cached within rasterized primitives.
There are many more variations of this insanity, like:
store vertex positions unpacked as normal, and store an index for the colors. Then store the colors in a linear buffer of some kind (texture, SSBO, generic buffer, etc) and look up each color index. That's even more inefficient, but it's closer to the algorithm you were suggesting.
store vertex positions for one quad and set up instanced rendering with a multi-draw command and a buffer to feed individual instance data (positions and colors). If you also have textures, you can use bindless textures for each quad instance. It's still rendering multiple objects, but it's slightly more optimized by your graphics driver.
or just store per-vertex data in a buffer and render it. Done. No pre-computations, no unlimited expansions, no crazy code, you have your vertex data and you render it.

Using Index buffers in DirectX 11; how does it know?

Let's say I create two vertex buffers, for two different meshes.
(I'm assuming creating separate buffers for separate meshes is how it's usually done)
Now, let's say I want to draw one of the meshes using an index buffer.
Looking at the book Practical Rendering and Computation with Direct3D 11 it doesnt seem like the creation of an index buffer in any way references a vertex buffer, so how does the index buffer know (during input assembly) what vertex buffer to act on?
I've done some googling without answers, which leads me to assume there's something obvious about it that I'm missing.

You are right, index buffers do not reference specific vertex buffers. During DrawIndexed active index buffer is used to supply indices into active vertex buffers (the ones you set using SetIndexBuffer/SetVertexBuffers).

Indeed, Index Buffers and Vertex Buffers are completely independent.
Index buffer will know about VertexBuffer at draw time (eg: when both as bound to the pipeline)
You can think of Index Buffer as a "Lookup Table", where you keep a list or elements indices to draw.
That also means you can attach two completely "logically unrelated" buffers to the pipeline and draw it, nothing will prevent you from doing that, but you will of course have some strange visual results.
Decoupling both has many advantages, here are a few examples:
You can reuse an index buffer (for example, two displaced grids with identical resolution can share the same index buffer). That can be a decent memory gain.
You can draw your Vertex buffer on it's own and do some processing per vertex (draw a point list for sprites for example, or apply skinning/displacement into a Stream Output buffer , then draw the resulting vertex buffer using DrawIndexed)
Both Vertex/Index buffers can also be bound as ByteAddressBuffer, so you can also process your geometry in Compute Shader, and build another optimized index buffer, with culled triangles for example, then process the Indexed Draw with the optimized buffer. Applying those culls in with indices instead of vertices is often faster than vertex, as you will move much less memory.
This is a niche case, but sometimes I have to draw a mesh as a set of triangles, but then draw as a set of lines (some form of wireframe). If as example, you take a single Box, you will not want to draw the diagonals as lines, so I have a shared Vertexbuffer with box vertices, then one IndexBuffer dedicated to draw triangle list, and another to draw line list. In case of large models, this can also be an effective memory gain.

OpenGL- drawarrays or drawelements?

I'm making a small 2D game demo and from what I've read, it's better to use drawElements() to draw an indexed triangle list than using drawArrays() to draw an unindexed triangle list.
But it doesn't seem possible as far as I know to draw multiple elements that are not connected in a single draw call with drawElements().
So for my 2D game demo where I'm only ever going to draw squares made of two triangles, what would be the best approach so I don't end having one draw call per object?

Yes, it's better to use indices in many cases since you don't have to store or transfer duplicate vertices and you don't have to process duplicate vertices (vertex shader only needs to be run once per vertex). In the case of quads, you reduce 6 vertices to 4, plus a small amount of index data. Two thirds is quite a good improvement really, especially if your vertex data is more than just position.
In summary, glDrawElements results in
Less data (mostly), which means more GPU memory for other things
Faster updating if the data changes
Faster transfer to the GPU
Faster vertex processing (no duplicates)
Indexing can affect cache performance, if the reference vertices that aren't near each other in memory. Modellers commonly produce meshes which are optimized with this in mind.
For multiple elements, if you're referring to GL_TRIANGLE_STRIP you could use glPrimitiveRestartIndex to draw multiple strips of triangles with the one glDrawElements call. In your case it's easy enough to use GL_TRIANGLES and reference 4 vertices with 6 indices for each quad. Your vertex array then needs to store all the vertices for all your quads. If they're moving you still need to send that data to the GPU every frame. You could position all the moving quads at the front of the array and only update the active ones. You could also store static vertex data in a separate array.
The typical approach to drawing a 3D model is to provide a list of fixed vertices for the geometry and move the whole thing with the model matrix (as part of the model-view). The confusing part here is that the mesh data is so small that, as you say, the overhead of the draw calls may become quite prominent. I think you'll have to draw a LOT of quads before you get to the stage where it'll be a problem. However, if you do, instancing or some similar idea such as particle systems is where you should look.
Perhaps only go down the following track if the draw calls or data transfer becomes a problem as there's a lot involved. A good way of implementing particle systems entirely on the GPU is to store instance attributes such as position/colour in a texture. Each frame you use an FBO/render-to-texture to "ping-pong" this data between another texture and update the attributes in a fragment shader. To draw the particles, you can set up a static VBO which stores quads with the attribute-data texture coordinates for use in the vertex shader where the particle position can be read and applied. I'm sure there's a bunch of good tutorials/implementations to follow out there (please comment if you know of a good one).

Do VBOs boost performance even when all data changes frequently

I'm doing a 2D turn based RTS game with 32x32 tiles (400-500 tiles per frame). I could use a VBO for this, but I may have to change almost all the VBO data each frame, as the background is a scrolling one and the visible tiles will change every time the map scrolls. Will using VBOs rather than client side vertex arrays still yield a performance benefit here? Also if using VBOs which data format is most efficient (float, or int16, or ...)?

If you are simply scrolling, you can use the vertex shader to manipulate the position rather than update the vertices themselves. Pass in a 'scroll' value as a uniform to your background and simply add that value to the x (or y, or whatever applies to your case) value of each vertex.
Update:
If you intend to modify the VBO often, you can tell the driver this using the usage param of glBufferData. This page has a good description of how that works: http://www.opengl.org/wiki/Vertex_Buffer_Object, under Accessing VBOs. In your case, it looks like you should specify GL_DYNAMIC_DRAW to glBufferData so that the driver puts your VBO in the best place in memory for your application.

The regular approach is to move the camera and perform culling instead of updating the content of the VBOs. For a 2d game culling will use simple rectangle intersection algorithm, which you will need anyway for unit selection in the game. As a bonus, manipulating the camera will allow to rate the camera and zoom in and zoom out. Also you could combine several tiles (4, 9 or 16) into one VBO.
I would strongly advise against writing logic to move the tiles instead of the camera. It will take you longer, have more bugs, and be less flexible.
The format will depend on what data you are storing in the VBOs. When in doubt, just use uint8 for color and float32 for everything else. Though for a 2d game your VBOs or vertex array are going to be very small compared to 3d applications, so it's highly unlikely VBO will make any difference.

OpenGL: Buffer object performance issue

I have a question related to Buffer object performance. I have rendered a mesh using standard Vertex Arrays (not interleaved) and I wanted to change it to Buffer Object to get some performance boost. When I introduce buffers object I was in shock when I find out that using Buffers object lowers performance four times. I think that buffers should increase performance. Does it true? So, I think that I am doing something wrong...
I have render 3d tiled map and to reduce amount of needed memory I use only a single tile (vertices set) to render whole map. I change only texture coordinates and y value in vertex position for each tile of map. Buffers for position and texture coords are created with GL_DYNAMIC_DRAW parameter. The buffer for indices is created with GL_STATIC_DRAW because it doesn't change during map rendering. So, for each tile of map buffers are mapped and unmapped at least one time. Should I use only one buffer for texture coords and positions?
Thanks,

Try moving your vertex/texture coordinates with GL_MODELVIEW/GL_TEXTURE matrices, and leave buffer data alone (GL_STATIC_DRAW alone). e.g. if tile is of size 1x1, create rect (0, 0)-(1, 1) and set it's position in the world with glTranslate. Same with texture coordinates.
VBOs are not there to increase performance of drawing few quads. Their true power is seen when drawing meshes with thousands of polygons using shaders. If you don't need any forward compatibility with newer opengl versions, I see little use in using them to draw dynamically changing data.

If you need to update the buffer(s) each frame you should use GL_STREAM_DRAW (which hints that the buffer contents will likely be used only once) rather than GL_DYNAMIC_DRAW (which hints that they will be but used a couple of times before being updated).
As far as my experience goes, buffers created with GL_STREAM_DRAW will be treated similarly to plain ol' arrays, so you should expect about the same performance as for arrays when using it.
Also make sure that you call glMapBuffer with the access parameter set to GL_WRITE_ONLY, assuming you don't need to read the contents of the buffer. Otherwise, if the buffer is in video memory, it will have to be transferred from video memory to main memory and then back again (well, that's up to the driver really...) for each map call. Transferring to much data over the bus is a very real bottleneck that's quite easy to bump into.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js