I'm currently struggling with finding a good approach to render many (thousands) slightly different models. The model itself is a simple cube with some vertex offset, think of a skewed quad face. Each 'block' has a different offset of its vertices, so basically I have a voxel engine on steroids as each block is not a perfect cube but rather a skewed cuboid. To render this shape 48 vertices are needed but can be cut to 24 vertices as only 3 faces are visible. With indexing we are at 12 vertices (4 for each face).
But, now that I have the vertices for each block in the world, how do I render them?
What I've tried:
Instanced Rendering. Sounds good, doesn't work as my models are not the same.
I could simplify distant blocks to a cube and render them with glDrawArraysInstanced/glDrawElementsInstanced.
Put everything in one giant VBO. This has a better performance than rendering each cube individually, but has the downside of having one large mesh. This is not desireable as I need every cube to have different textures, lighting, etc... Selecting a single cube within that huge mesh is not possible.
I am aware of frustum culling and occlusion culling, but I already have problems with some cubes in front of me (tested with a 128x128 world).
My requirements:
Draw some thousand models.
Each model has vertices offsets to make the block less cubic, stored in another VBO.
Each block has to be an individual object, as you should be able to place/remove blocks.
Any good performance advices?
This is not desireable as I need every cube to have different textures, lighting, etc... Selecting a single cube within that huge mesh is not possible.
Programmers should avoid declaring that something is "impossible"; it limits your thinking.
Giving each face of these cubes different textures has many solutions. The Minecraft approach uses texture atlases. Each "texture" is really just a sub-section of one large texture, and you use texture coordinates to select which sub-section a particular face uses. But you can get more complex.
Array textures allow for a more direct way to solve this problem. Here, the texture coordinates would be the same, but you use a per-vertex integer to select the correct texture for a face. All of the vertices for a particular face would have an index. And if you're clever, you don't even really need texture coordinates. You can generate them in your vertex shader, based on per-vertex values like gl_VertexID and the like.
Lighting parameters would work the same way: use some per-vertex data to select parameters from a UBO or SSBO.
As for the "individual object" bit, that's merely a matter of how you're thinking about the problem. Do not confuse what happens in the player's mind with what happens in your code. Games are an elaborate illusion; just because something appears to the user to be an "individual object" doesn't mean it is one to your rendering engine.
What you need is the ability to modify your world's data to remove and add new blocks. And if you need to show a block as "selected" or something, then you simply need another per-block value (like the lighting parameters and index for the texture) which tells you whether to draw it as a "selected" block or as an "unselected" one. Or you can just redraw that specific selected block. There are many ways of handling it.
Any decent graphics card (since about 2010) is able to render a few millions vertices in a blinking.
The approach is different depending on how many changes per frame. In other words, how many data must be transferred to the GPU per frame.
For the case of small number of changes, storing the data in one big VBO or many smaller VBOs (and their VAOs), sending the changes by uniforms, and calling several glDraw***, shows similar performance. Different hardwares behave with little difference. Indexed data may improve the speed.
When most of the data changes in every frame and these changes are hard or impossible to do in the shaders, then your app is memory-transfer bound. Streaming is a good advise.
Related
A cube with different colored faces in intermediate mode is very simple. But doing this same thing with shaders seems to be quite a challenge.
I have read that in order to create a cube with different coloured faces, I should create 24 vertices instead of 8 vertices for the cube - in other words, (I visualies this as 6 squares that don't quite touch).
Is perhaps another (better?) solution to texture the faces of the cube using a real simple texture a flat color - perhaps a 1x1 pixel texture?
My texturing idea seems simpler to me - from a coder's point of view.. but which method would be the most efficient from a GPU/graphic card perspective?
I'm not sure what your overall goal is (e.g. what you're learning to do in the long term), but generally for high performance applications (e.g. games) your goal is to reduce GPU load. Every time you switch certain states (e.g. change textures, render targets, shader uniform values, etc..) the GPU stalls reconfiguring itself to meet your demands.
So, you can pass in a 1x1 pixel texture for each face, but then you'd need six draw calls (usually not so bad, but there is some prep work and potential cache misses) and six texture sets (can be very bad, often as bad as changing shader uniform values).
Suppose you wanted to pass in one texture and use that as a texture map for the cube. This is a little less trivial than it sounds -- you need to express each texture face on the texture in a way that maps to the vertices. Often you need to pass in a texture coordinate for each vertex, and due to the spacial configuration of the texture this normally doesn't end up meaning one texture coordinate for one spatial vertex.
However, if you use an environmental/reflection map, the complexities of mapping are handled for you. In this way, you could draw a single texture on all sides of your cube. (Or on your sphere, or whatever sphere-mapped shape you wanted.) I'm not sure I'd call this easier since you have to form the environmental texture carefully, and you still have to set a different texture for each new colors you want to represent -- or change the texture either via the GPU or in step with the GPU, and that's tricky and usually not performant.
Which brings us back to the canonical way of doing as you mentioned: use vertex values -- they're fast, you can draw many, many cubes very quickly by only specifying different vertex data, and it's easy to understand. It really is the best way, and how GPUs are designed to run quickly.
Additionally..
And yes, you can do this with just shaders... But it'd be ugly and slow, and the GPU would end up computing it per each pixel.. Pass the object space coordinates to the fragment shader, and in the fragment shader test which side you're on and output the corresponding color. Highly not recommended, it's not particularly easier, and it's definitely not faster for the GPU -- to change colors you'd again end up changing uniform values for the shaders.
I just got stuck with texturing my cubes - i searched web and realized that the only way to give my cube 6 different textures (With glDrawElements) is to create about a 24 indices. It is still faster than just glDrawArrays, but it seems quite illogically and horribly slow. I understand that the purpose of glDrawElements is to deal with complex models where very few indices share different texture coords.
But, i am still pretty confused, because glDrawElements gave me perfomance boost (Without any effects, just shader coloring) from about 50-67ms with 10,000 cubes (glDrawArrays), to 25-33ms with 100,000 cubes.
My question is: i just have to accept it, or there is still some way to come over this?
I do not know what you want to archieve but you can try to reduce the number ob vertices in the scene. This will make is much more faster. I think if your Cube needs more than one texture, you can store them in a texture atlas ans assign each vertex of the cube the corresponding texture coordinate for the image in the texture atlas. This will reduce the number of texture calls, as there is only one texutre. IN addition you will only need to render the vertices/Triangles which are visible by the user, so you do not need to texture the back of the cube.
I've found a good site which explains a lot of speeding up rendering thousands of cubes. They try to generate one big object on the CPU by culling a lot of vertices before loading them to the GPU, but I think for your problem, you can only try to use as less textures as possible (TextureAtlas), this will increase the performance, but for a whole cube you need the 8 Vertices with each a specific TextureCoordinate.
I'm making a small 2D game demo and from what I've read, it's better to use drawElements() to draw an indexed triangle list than using drawArrays() to draw an unindexed triangle list.
But it doesn't seem possible as far as I know to draw multiple elements that are not connected in a single draw call with drawElements().
So for my 2D game demo where I'm only ever going to draw squares made of two triangles, what would be the best approach so I don't end having one draw call per object?
Yes, it's better to use indices in many cases since you don't have to store or transfer duplicate vertices and you don't have to process duplicate vertices (vertex shader only needs to be run once per vertex). In the case of quads, you reduce 6 vertices to 4, plus a small amount of index data. Two thirds is quite a good improvement really, especially if your vertex data is more than just position.
In summary, glDrawElements results in
Less data (mostly), which means more GPU memory for other things
Faster updating if the data changes
Faster transfer to the GPU
Faster vertex processing (no duplicates)
Indexing can affect cache performance, if the reference vertices that aren't near each other in memory. Modellers commonly produce meshes which are optimized with this in mind.
For multiple elements, if you're referring to GL_TRIANGLE_STRIP you could use glPrimitiveRestartIndex to draw multiple strips of triangles with the one glDrawElements call. In your case it's easy enough to use GL_TRIANGLES and reference 4 vertices with 6 indices for each quad. Your vertex array then needs to store all the vertices for all your quads. If they're moving you still need to send that data to the GPU every frame. You could position all the moving quads at the front of the array and only update the active ones. You could also store static vertex data in a separate array.
The typical approach to drawing a 3D model is to provide a list of fixed vertices for the geometry and move the whole thing with the model matrix (as part of the model-view). The confusing part here is that the mesh data is so small that, as you say, the overhead of the draw calls may become quite prominent. I think you'll have to draw a LOT of quads before you get to the stage where it'll be a problem. However, if you do, instancing or some similar idea such as particle systems is where you should look.
Perhaps only go down the following track if the draw calls or data transfer becomes a problem as there's a lot involved. A good way of implementing particle systems entirely on the GPU is to store instance attributes such as position/colour in a texture. Each frame you use an FBO/render-to-texture to "ping-pong" this data between another texture and update the attributes in a fragment shader. To draw the particles, you can set up a static VBO which stores quads with the attribute-data texture coordinates for use in the vertex shader where the particle position can be read and applied. I'm sure there's a bunch of good tutorials/implementations to follow out there (please comment if you know of a good one).
What is the best way to texture terrain made from quads in OpenGL? I have around 30 different textures I want to have for my terrains (1 texture per terrain type, so 30 terrain types) and would like to have smooth transitions between any two of the terrains.
I have been doing some browsing on the web and found that there are many different methods, including 3d texturing, Alpha channels, blending, and using shaders. However, which of these is the most efficient and can handle the amount of textures I am looking to use? For example: This popular answer describes how to use some techniques, but since the mixmap only has 4 properties (RGBA) and so can only support 4 textures.
I should also note that I know nothing about shaders, so non-shader required techniques would be preferable.
Since you linked to an answer that describes texture splatting, and its question mentions the game Oblivion, I can provide some additional insight into that.
Basic texture splatting with an RGBA mixmap only supports four textures per terrain quad, but you can use different sets of textures for different quads. Oblivion divides its terrain into squares (called "cells") of 32 grid points (192 feet) per side, and each cell defines its own set of four terrain textures. So you can't have lots of texture diversity within a small area, but you can easily vary your textures over larger regions. If you prefer, you can define texture sets for smaller regions, even individual quads, at the expense of using more memory.
If you really need more than four textures in a quad, you can use multiple mixmaps. For each additional one, you just do another texture lookup to get four more blending factors, and blend in four more textures on top of the results from the previous mixmap. You can scale up to as many textures as you want, again at the expense of memory.
Texture splatting can be tricky to combine with with LOD techniques on the height map, because when a single low-detail terrain quad represents a group of high-detail quads, you have to sample several different mixmaps for different regions of the big quad. Oblivion sidesteps that problem by using texture splatting only for full-detail terrain; distant cells, rendered at lower resolution, use precomputed textures produced by the editor, which does the splatting and downscaling in advance.
One alternative to texture splatting is to use a clipmap to render a "megatexture". With this approach, you have a single large texture that represents your entire terrain, and you avoid filling up your RAM by loading different parts of it with only as much detail as is actually needed to render it based on the viewer's current position. (Distant parts of the terrain can't be seen at full detail, so there's no need to load them at full detail.)
The advantage of this approach is its artistic freedom: you can place fine details anywhere you want in the texture, without regard to the vertex grid. The disadvantage is that it's rather complex to implement, and the entire clipmap has to be stored somewhere, probably in a big file on disk, so that you can load parts of it into RAM as needed.
I have a question about VBO's. Let's say just as an example I'm trying to build a voxel style engine that makes even a 16x16x16 chunk.
Do I store the map information in the VBO? How do I get the verticies for a cube? The way I'm thinking about it, the VBO would require 24 vector3 variables (vectors for each cube at each location). That seems like a lot.
is there some way to have a single 'cube' VBO template, then somehow change the coordinates for each cube I want to draw, calling the template (i hope that makes sense) and using bufferdata to update that template for every location, do I have to actually store those 24 vectors for every single location in the 16x16x16, or would I just store the map coordinates, then have the cube and polygons drawn through a shader?
I hope that makes sense. it seems expensive memory wise loading up something that stores 24 vectors per location, and it seems resource intensive to me calling bufferdata 16x16x16 times per frame... so the last option using the vertex shader seems the most viable, but I'm new to shaders so is something like that possible?
What is the most common method used?
Geometry shaders can, indeed, emit multiple primitives for a single input primitive. So drawing all 6 faces of a cube from a single input point is certainly possible. Though for "voxel" engines you might be better served by point sprites, as often the orientation of the cube isn't useful. A point sprite draws a single screen-aligned quad from an input point. Beyond that you'll need to be more specific about what you're doing.