I am currently working on a new Renderer using DX11. To batch multiple meshes I would like to use geometry instancing with Texture2dArrays to prevent texture atlases.
This would be the pseudo code for rendering:
foreach effect in effects
foreach batch in batches
SetTexture2DArray()
SetInstanceBuffer() //Transform & Material (cbuffer)
SetVertexBuffer()
SetIndexBuffer()
DrawIndexed()
Each mesh consists of 3 Textures and geometry. Meshes with the same input layout would be get combined in one batch. One batch can hold up to ~300 meshes to get an TexturArray of 900 Textures per batch.Is it possiple to to use diffrent combine textures of diffrent sizes into on TextureArray?If not I could only combine meshes with the same input layout and texures sizes.Do you think this is a good system generally?
About texture arrays, each slice needs to be same size.
About merging models less draw calls is generally better, but having one single buffer holding different subsets with varying amount of primitives each can lead to some overdraw, and frustrum culling will be harder to apply in that use case if it's needed (depending on the amount of geometry you might just send it anyway, modern cards can eat up geometry rather easily). If all your geometry is visible at all time then merging is a definite good option.
Related
I have a lot of cases in my application, where I make drawcalls using the same shader with different uniform values and thought about instancing the drawcalls. However, the drawcalls have a varying number of triangles in my case.
As far as I understand DrawIndexedInstanced, it only permits to draw multiple instances with the same number of triangles/indices, so I guess I can't use this.
I thought that DrawIndexedInstancedIndirect may help, but that only seems to execute multiple calls to DrawIndexedIstanced basically.
Is there a way in Directx11 to draw instanced with a different number of triangles for each instance, or will I have to stay with normal drawcalls?
As stated in the documentation, instanced drawing is to
[...] reusing the same geometry to draw multiple objects in a scene.
It improves performance by not swapping the vertex data, but reusing it, which seems not be the case for your data, where the vertex sources are different for each draw call.
So you'll have to stick to single draw calls, but to improve your performance you could stage them after each other. Each state change has a certain cost being submitted to the gpu, if you keep your shader set as it is used for all draw calls, you can save some performance by doing all draw calls with the same shader and uniform values after each other and only switch if it is needed.
I'm currently struggling with finding a good approach to render many (thousands) slightly different models. The model itself is a simple cube with some vertex offset, think of a skewed quad face. Each 'block' has a different offset of its vertices, so basically I have a voxel engine on steroids as each block is not a perfect cube but rather a skewed cuboid. To render this shape 48 vertices are needed but can be cut to 24 vertices as only 3 faces are visible. With indexing we are at 12 vertices (4 for each face).
But, now that I have the vertices for each block in the world, how do I render them?
What I've tried:
Instanced Rendering. Sounds good, doesn't work as my models are not the same.
I could simplify distant blocks to a cube and render them with glDrawArraysInstanced/glDrawElementsInstanced.
Put everything in one giant VBO. This has a better performance than rendering each cube individually, but has the downside of having one large mesh. This is not desireable as I need every cube to have different textures, lighting, etc... Selecting a single cube within that huge mesh is not possible.
I am aware of frustum culling and occlusion culling, but I already have problems with some cubes in front of me (tested with a 128x128 world).
My requirements:
Draw some thousand models.
Each model has vertices offsets to make the block less cubic, stored in another VBO.
Each block has to be an individual object, as you should be able to place/remove blocks.
Any good performance advices?
This is not desireable as I need every cube to have different textures, lighting, etc... Selecting a single cube within that huge mesh is not possible.
Programmers should avoid declaring that something is "impossible"; it limits your thinking.
Giving each face of these cubes different textures has many solutions. The Minecraft approach uses texture atlases. Each "texture" is really just a sub-section of one large texture, and you use texture coordinates to select which sub-section a particular face uses. But you can get more complex.
Array textures allow for a more direct way to solve this problem. Here, the texture coordinates would be the same, but you use a per-vertex integer to select the correct texture for a face. All of the vertices for a particular face would have an index. And if you're clever, you don't even really need texture coordinates. You can generate them in your vertex shader, based on per-vertex values like gl_VertexID and the like.
Lighting parameters would work the same way: use some per-vertex data to select parameters from a UBO or SSBO.
As for the "individual object" bit, that's merely a matter of how you're thinking about the problem. Do not confuse what happens in the player's mind with what happens in your code. Games are an elaborate illusion; just because something appears to the user to be an "individual object" doesn't mean it is one to your rendering engine.
What you need is the ability to modify your world's data to remove and add new blocks. And if you need to show a block as "selected" or something, then you simply need another per-block value (like the lighting parameters and index for the texture) which tells you whether to draw it as a "selected" block or as an "unselected" one. Or you can just redraw that specific selected block. There are many ways of handling it.
Any decent graphics card (since about 2010) is able to render a few millions vertices in a blinking.
The approach is different depending on how many changes per frame. In other words, how many data must be transferred to the GPU per frame.
For the case of small number of changes, storing the data in one big VBO or many smaller VBOs (and their VAOs), sending the changes by uniforms, and calling several glDraw***, shows similar performance. Different hardwares behave with little difference. Indexed data may improve the speed.
When most of the data changes in every frame and these changes are hard or impossible to do in the shaders, then your app is memory-transfer bound. Streaming is a good advise.
A cube with different colored faces in intermediate mode is very simple. But doing this same thing with shaders seems to be quite a challenge.
I have read that in order to create a cube with different coloured faces, I should create 24 vertices instead of 8 vertices for the cube - in other words, (I visualies this as 6 squares that don't quite touch).
Is perhaps another (better?) solution to texture the faces of the cube using a real simple texture a flat color - perhaps a 1x1 pixel texture?
My texturing idea seems simpler to me - from a coder's point of view.. but which method would be the most efficient from a GPU/graphic card perspective?
I'm not sure what your overall goal is (e.g. what you're learning to do in the long term), but generally for high performance applications (e.g. games) your goal is to reduce GPU load. Every time you switch certain states (e.g. change textures, render targets, shader uniform values, etc..) the GPU stalls reconfiguring itself to meet your demands.
So, you can pass in a 1x1 pixel texture for each face, but then you'd need six draw calls (usually not so bad, but there is some prep work and potential cache misses) and six texture sets (can be very bad, often as bad as changing shader uniform values).
Suppose you wanted to pass in one texture and use that as a texture map for the cube. This is a little less trivial than it sounds -- you need to express each texture face on the texture in a way that maps to the vertices. Often you need to pass in a texture coordinate for each vertex, and due to the spacial configuration of the texture this normally doesn't end up meaning one texture coordinate for one spatial vertex.
However, if you use an environmental/reflection map, the complexities of mapping are handled for you. In this way, you could draw a single texture on all sides of your cube. (Or on your sphere, or whatever sphere-mapped shape you wanted.) I'm not sure I'd call this easier since you have to form the environmental texture carefully, and you still have to set a different texture for each new colors you want to represent -- or change the texture either via the GPU or in step with the GPU, and that's tricky and usually not performant.
Which brings us back to the canonical way of doing as you mentioned: use vertex values -- they're fast, you can draw many, many cubes very quickly by only specifying different vertex data, and it's easy to understand. It really is the best way, and how GPUs are designed to run quickly.
Additionally..
And yes, you can do this with just shaders... But it'd be ugly and slow, and the GPU would end up computing it per each pixel.. Pass the object space coordinates to the fragment shader, and in the fragment shader test which side you're on and output the corresponding color. Highly not recommended, it's not particularly easier, and it's definitely not faster for the GPU -- to change colors you'd again end up changing uniform values for the shaders.
I'm making a small 2D game demo and from what I've read, it's better to use drawElements() to draw an indexed triangle list than using drawArrays() to draw an unindexed triangle list.
But it doesn't seem possible as far as I know to draw multiple elements that are not connected in a single draw call with drawElements().
So for my 2D game demo where I'm only ever going to draw squares made of two triangles, what would be the best approach so I don't end having one draw call per object?
Yes, it's better to use indices in many cases since you don't have to store or transfer duplicate vertices and you don't have to process duplicate vertices (vertex shader only needs to be run once per vertex). In the case of quads, you reduce 6 vertices to 4, plus a small amount of index data. Two thirds is quite a good improvement really, especially if your vertex data is more than just position.
In summary, glDrawElements results in
Less data (mostly), which means more GPU memory for other things
Faster updating if the data changes
Faster transfer to the GPU
Faster vertex processing (no duplicates)
Indexing can affect cache performance, if the reference vertices that aren't near each other in memory. Modellers commonly produce meshes which are optimized with this in mind.
For multiple elements, if you're referring to GL_TRIANGLE_STRIP you could use glPrimitiveRestartIndex to draw multiple strips of triangles with the one glDrawElements call. In your case it's easy enough to use GL_TRIANGLES and reference 4 vertices with 6 indices for each quad. Your vertex array then needs to store all the vertices for all your quads. If they're moving you still need to send that data to the GPU every frame. You could position all the moving quads at the front of the array and only update the active ones. You could also store static vertex data in a separate array.
The typical approach to drawing a 3D model is to provide a list of fixed vertices for the geometry and move the whole thing with the model matrix (as part of the model-view). The confusing part here is that the mesh data is so small that, as you say, the overhead of the draw calls may become quite prominent. I think you'll have to draw a LOT of quads before you get to the stage where it'll be a problem. However, if you do, instancing or some similar idea such as particle systems is where you should look.
Perhaps only go down the following track if the draw calls or data transfer becomes a problem as there's a lot involved. A good way of implementing particle systems entirely on the GPU is to store instance attributes such as position/colour in a texture. Each frame you use an FBO/render-to-texture to "ping-pong" this data between another texture and update the attributes in a fragment shader. To draw the particles, you can set up a static VBO which stores quads with the attribute-data texture coordinates for use in the vertex shader where the particle position can be read and applied. I'm sure there's a bunch of good tutorials/implementations to follow out there (please comment if you know of a good one).
What is the best way to texture terrain made from quads in OpenGL? I have around 30 different textures I want to have for my terrains (1 texture per terrain type, so 30 terrain types) and would like to have smooth transitions between any two of the terrains.
I have been doing some browsing on the web and found that there are many different methods, including 3d texturing, Alpha channels, blending, and using shaders. However, which of these is the most efficient and can handle the amount of textures I am looking to use? For example: This popular answer describes how to use some techniques, but since the mixmap only has 4 properties (RGBA) and so can only support 4 textures.
I should also note that I know nothing about shaders, so non-shader required techniques would be preferable.
Since you linked to an answer that describes texture splatting, and its question mentions the game Oblivion, I can provide some additional insight into that.
Basic texture splatting with an RGBA mixmap only supports four textures per terrain quad, but you can use different sets of textures for different quads. Oblivion divides its terrain into squares (called "cells") of 32 grid points (192 feet) per side, and each cell defines its own set of four terrain textures. So you can't have lots of texture diversity within a small area, but you can easily vary your textures over larger regions. If you prefer, you can define texture sets for smaller regions, even individual quads, at the expense of using more memory.
If you really need more than four textures in a quad, you can use multiple mixmaps. For each additional one, you just do another texture lookup to get four more blending factors, and blend in four more textures on top of the results from the previous mixmap. You can scale up to as many textures as you want, again at the expense of memory.
Texture splatting can be tricky to combine with with LOD techniques on the height map, because when a single low-detail terrain quad represents a group of high-detail quads, you have to sample several different mixmaps for different regions of the big quad. Oblivion sidesteps that problem by using texture splatting only for full-detail terrain; distant cells, rendered at lower resolution, use precomputed textures produced by the editor, which does the splatting and downscaling in advance.
One alternative to texture splatting is to use a clipmap to render a "megatexture". With this approach, you have a single large texture that represents your entire terrain, and you avoid filling up your RAM by loading different parts of it with only as much detail as is actually needed to render it based on the viewer's current position. (Distant parts of the terrain can't be seen at full detail, so there's no need to load them at full detail.)
The advantage of this approach is its artistic freedom: you can place fine details anywhere you want in the texture, without regard to the vertex grid. The disadvantage is that it's rather complex to implement, and the entire clipmap has to be stored somewhere, probably in a big file on disk, so that you can load parts of it into RAM as needed.