I've just started learning vulkan and looked through several tutorials/samples and I notices something: everyone uses 1 struct for uniform loading, containing all the relevant data(model, view, projection matrix etc.). These structs are updated every frame ENTIRELY.
Now my question: Is that because the writers were "lazy" and instead of creating seperate unforms for seperate frequencies (e.g. projection, since it only needs to be loaded once) they just used 1 struct because it doesn't require additional setup, or is it because the performance is better when loading 1 "bigger" struct more frequenty than creating multiple sets.
Most demo/example code is lazy. Typically it's to avoid over-complicating the code with concepts that aren't needed for the result.
Also demos generally have 1 draw call (only rendering one model) so there is no need to change descriptor sets in them.
Also frequency of change is within recording of a single command buffer. So if you have a view matrix which will remain constant for the duration of the frame then you can put it at the start of the layout together with the projection matrix.
Related
I'm working on my own D3D12 wrapper, and I'm in the beginning 'schema' phase.
I understand the purpose of the PSO in a very simple rendering pipeline, but say I've multiple objects, meshes, models, whatever terminology works best, and I would like to use a different pixel shader for each,
for clarification, I would make multiple PSOs for each of these objects correct?
Sorry for the simple question, it's just a clarification I really need, thank you.
You need a distinct Pipeline State Object for every unique combination of all states:
VS, PS, GS, etc. Shader Objects
Blend, Depth, and Raster state
Render Target format
Number of render targets (MRT vs. 1)
Sample count (MSAA vs. not)
In practice that means at least one PSO per unique material in your entire scene.
I am in the middle of rendering different textures on multiple meshes of a model, but I do not have much clues about the procedures. Someone suggested for each mesh, create its own descriptor sets and call vkCmdBindDescriptorSets() and vkCmdDrawIndexed() for rendering like this:
// Pipeline with descriptor set layout that matches the shared descriptor sets
vkCmdBindPipeline(...pipelines.mesh...);
...
// Mesh A
vkCmdBindDescriptorSets(...&meshA.descriptorSet... );
vkCmdDrawIndexed(...);
// Mesh B
vkCmdBindDescriptorSets(...&meshB.descriptorSet... );
vkCmdDrawIndexed(...);
However, the above approach is quite different from the chopper sample and vulkan's samples that makes me have no idea where to start the change. I really appreciate any help to guide me to a correct direction.
Cheers
You have a conceptual object which is made of multiple meshes which have different texturing needs. The general ways to deal with this are:
Change descriptor sets between parts of the object. Painful, but it works on all Vulkan-capable hardware.
Employ array textures. Each individual mesh fetches its data from a particular layer in the array texture. Of course, this restricts you to having each sub-mesh use textures of the same size. But it works on all Vulkan-capable hardware (up to 128 array elements, minimum). The array layer for a particular mesh can be provided as a push-constant, or a base instance if that's available.
Note that if you manage to be able to do it by base instance, then you can render the entire object with a multi-draw indirect command. Though it's not clear that a short multi-draw indirect would be faster than just baking a short sequence of drawing commands into a command buffer.
Employ sampler arrays, as Sascha Willems suggests. Presumably, the array index for the sub-mesh is provided as a push-constant or a multi-draw's draw index. The problem is that, regardless of how that array index is provided, it will have to be a dynamically uniform expression. And Vulkan implementations are not required to allow you to index a sampler array with a dynamically uniform expression. The base requirement is just a constant expression.
This limits you to hardware that supports the shaderSampledImageArrayDynamicIndexing feature. So you have to ask for that, and if it's not available, then you've got to work around that with #1 or #2. Or just don't run on that hardware. But the last one means that you can't run on any mobile hardware, since most of them don't support this feature as of yet.
Note that I am not saying you shouldn't use this method. I just want you to be aware that there are costs. There's a lot of hardware out there that can't do this. So you need to plan for that.
The person that suggested the above code fragment was me I guess ;)
This is only one way of doing it. You don't necessarily have to create one descriptor set per mesh or per texture. If your mesh e.g. uses 4 different textures, you could bind all of them at once to different binding points and select them in the shader.
And if you a take a look at NVIDIA's chopper sample, they do it pretty much the same way only with some more abstraction.
The example also sets up descriptor sets for the textures used :
VkDescriptorSet *textureDescriptors = m_renderer->getTextureDescriptorSets();
binds them a few lines later :
VkDescriptorSet sets[3] = { sceneDescriptor, textureDescriptors[0], m_transform_descriptor_set };
vkCmdBindDescriptorSets(m_draw_command[inCommandIndex], VK_PIPELINE_BIND_POINT_GRAPHICS, layout, 0, 3, sets, 0, NULL);
and then renders the mesh with the bound descriptor sets :
vkCmdDrawIndexedIndirect(m_draw_command[inCommandIndex], sceneIndirectBuffer, 0, inCount, sizeof(VkDrawIndexedIndirectCommand));
vkCmdDraw(m_draw_command[inCommandIndex], 1, 1, 0, 0);
If you take a look at initDescriptorSets you can see that they also create separate descriptor sets for the cubemap, the terrain, etc.
The LunarG examples should work similar, though if I'm not mistaken they never use more than one texture?
Voxel engine (like Minecraft) optimization suggestions?
As a fun project (and to get my Minecraft-adict son excited for programming) I am building a 3D Minecraft-like voxel engine using C# .NET4.5.1, OpenGL and GLSL 4.x.
Right now my world is built using chunks. Chunks are stored in a dictionary, where I can select them based on a 64bit X | Z<<32 key. This allows to create an 'infinite' world that can cache-in and cache-out chunks.
Every chunk consists of an array of 16x16x16 block segments. Starting from level 0, bedrock, it can go as high as you want (unlike minecraft where the limit is 256, I think).
Chunks are queued for generation on a separate thread when they come in view and need to be rendered. This means that chunks might not show right away. In practice you will not notice this. NOTE: I am not waiting for them to be generated, they will just not be visible immediately.
When a chunk needs to be rendered for the first time a VBO (glGenBuffer, GL_STREAM_DRAW, etc.) for that chunk is generated containing the possibly visible/outside faces (neighboring chunks are checked as well). [This means that a chunk potentially needs to be re-tesselated when a neighbor has been modified]. When tesselating first the opaque faces are tesselated for every segment and then the transparent ones. Every segment knows where it starts within that vertex array and how many vertices it has, both for opaque faces and transparent faces.
Textures are taken from an array texture.
When rendering;
I first take the bounding box of the frustum and map that onto the chunk grid. Using that knowledge I pick every chunk that is within the frustum and within a certain distance of the camera.
Now I do a distance sort on the chunks.
After that I determine the ranges (index, length) of the chunks-segments that are actually visible. NOW I know exactly what segments (and what vertex ranges) are 'at least partially' in view. The only excess segments that I have are the ones that are hidden behind mountains or 'sometimes' deep underground.
Then I start rendering ... first I render the opaque faces [culling and depth test enabled, alpha test and blend disabled] front to back using the known vertex ranges. Then I render the transparent faces back to front [blend enabled]
Now... does anyone know a way of improving this and still allow dynamic generation of an infinite world? I am currently reaching ~80fps#1920x1080, ~120fps#1024x768 (screenshots: http://i.stack.imgur.com/t4k30.jpg, http://i.stack.imgur.com/prV8X.jpg) on an average 2.2Ghz i7 laptop with a ATI HD8600M gfx card. I think it must be possible to increase the number of frames. And I think I have to, as I want to add entity AI, sound and do bump and specular mapping. Could using Occlusion Queries help me out? ... which I can't really imagine based on the nature of the segments. I already minimized the creation of objects, so there is no 'new Object' all over the place. Also as the performance doesn't really change when using Debug or Release mode, I don't think it's the code but more the approach to the problem.
edit: I have been thinking of using GL_SAMPLE_ALPHA_TO_COVERAGE but it doesn't seem to be working?
gl.Enable(GL.DEPTH_TEST);
gl.Enable(GL.BLEND); // gl.Disable(GL.BLEND);
gl.Enable(GL.MULTI_SAMPLE);
gl.Enable(GL.SAMPLE_ALPHA_TO_COVERAGE);
To render a lot of similar objects, I strongly suggest you take a look into instanced draw : glDrawArraysInstanced and/or glDrawElementsInstanced.
It made a huge difference for me. I'm talking from 2 fps to over 60 fps to render 100000 similar icosahedrons.
You can parametrize your cubes by using Attribs ( glVertexAttribDivisor and friends ) to make them differents. Hope this helps.
It's on ~200fps currently, should be OK. The 3 main things that I've done are:
1) generation of both chunks on a separate thread.
2) tessellation the chunks on a separate thread.
3) using a Deferred Rendering Pipeline.
Don't really think the last one contributed much to the overall performance but had to start using it because of some of the shaders. Now the CPU is sort of falling asleep # ~11%.
This question is pretty old, but I'm working on a similar project. I approached it almost exactly the same way as you, however I added in one additional optimization that helped out a lot.
For each chunk, I determine which sides are completely opaque. I then use that information to do a flood fill through the chunks to cull out the ones that are underground. Note, I'm not checking individual blocks when I do the flood fill, only a precomputed bitmask for each chunk.
When I'm computing the bitmask, I also check to see if the chunk is entirely empty, since empty chunks can obviously be ignored.
In my program I have a texture which is used several times in different situations. In each situation I need to apply a certain set of parameters.
I want to avoid having to create an additional buffer and essentially creating a copy of the texture for every time I need to use it for something else, so I'd like to know if there's a better way?
This is what sampler objects are for (available in core since version 3.3, or using ARB_sampler_objects). Sampler objects separate the texture image from its parameters, so you can use one texture with several parameter sets. That functionality was created with exactly your problem in mind.
Quote from ARB_sampler_objects extension spec:
In unextended OpenGL textures are considered to be sets of image data (mip-chains, arrays, cube-map face sets, etc.) and sampling state (sampling mode, mip-mapping state, coordinate wrapping and clamping rules, etc.) combined into a single object. It is typical for an application to use many textures with a limited set of sampling states that are the same between them. In order to use textures in this way, an application must generate and configure many texture names, adding overhead both to applications and to implementations. Furthermore, should an application wish to sample from a texture in more than one way (with and without mip-mapping, for example) it must either modify the state of the texture or create two textures, each with a copy of the same image data. This can introduce runtime and memory costs to the application.
I am working on a simple CAD program which uses OpenGL to handle on-screen rendering. Every shape drawn on the screen is constructed entirely out of simple line segments, so even a simple drawing ends up processing thousands of individual lines.
What is the best way to communicate changes in this collection of lines between my application and OpenGL? Is there a way to update only a certain subset of the lines in the OpenGL buffers?
I'm looking for a conceptual answer here. No need to get into the actual source code, just some recommendations on data structure and communication.
You can use a simple approach such as using a display list (glNewList/glEndList)
The other option, which is slightly more complicated, is to use Vertex Buffer Objects (VBOs - GL_ARB_vertex_buffer_object). They have the advantage that they can be changed dynamically whereas a display list can not.
These basically batch all your data/transformations up and them execute on the GPU (assuming you are using hardware acceleration) resulting in higher performance.
Vertex Buffer Objects are probably what you want. Once you load the original data set in, you can make modifications to existing chunks with glBufferSubData().
If you add extra line segments and overflow the size of your buffer, you'll of course have to make a new buffer, but this is no different than having to allocate a new, larger memory chunk in C when something grows.
EDIT: A couple of notes on display lists, and why not to use them:
In OpenGL 3.0, display lists are deprecated, so using them isn't forward-compatible past 3.0 (2.1 implementations will be around for a while, of course, so depending on your target audience this might not be a problem)
Whenever you change anything, you have to rebuild the entire display list, which defeats the entire purpose of display lists if things are changed often.
Not sure if you're already doing this, but it's worth mentioning you should try to use GL_LINE_STRIP instead of individual GL_LINES if possible to reduce the amount of vertex data being sent to the card.
My suggestion is to try using a scene graph, some kind of hierarchical data structure for the lines/curves. If you have huge models, performance will be affected if you have plain list of lines. With a graph/tree structure you can check easily which items are visible and which are not by using bounding volumes. Also with a scenegraph you can apply transformation easily and reuse geometries.