Using image load and store, i would like to do the following in GLSL 4.2:
vec3 someColor = ...;
vec4 currentPixel = imageLoad(myImage, uv);
float a = currentPixel.a/(currentPixel.a+1.0f);
vec4 newPixel = vec4(currentPixel.rgb*a+someColor*(1.0f-a),currentPixel.a+1.0f);
imageStore(myImage, uv, newPixel);
the value for 'uv' can be the same for multiple rasterized pixels. In order to get the proper result, of course I want no other shaderexecution to write into my pixel inbetween the calls of imageLoad() and imageStore();
Is this possible to do somehow with memoryBarrier? if so, how does it have to be used in this code?
the value for 'uv' can be the same for multiple rasterized pixels.
Then you can't do it.
memoryBarrier is not a way to create an atomic operation. It only guarantees the ordering for a single shader's operation. So if a particular shader invocation reads an image, writes it, and then reads it again, you need a memoryBarrier to ensure that what is read is what was written before. If some other shader invocation wrote to it, then you're out of luck (unless it was a dependent invocation. The rules for this stuff are complex).
If you're trying to do programmatic blending, then you need to make certain that each fragment shader invocation reads/writes to a unique value. Otherwise, it's not going to work.
You don't say what it is you're trying to actually achieve, so it's not possible to provide a better way of getting what you want. All I can say is that this way is not going to work.
You would need to implement locking system (lock / mutex).
For this purpose, it is good to use imageAtomicCompSwap or if buffer is used, atomicCompSwap. Ofcourse you would need to use global variable (say texture) not local one.
For implementation purpose, I think this question is in big part answer to your problem: Is my spin lock implementation correct and optimal?
Related
I am in the middle of rendering different textures on multiple meshes of a model, but I do not have much clues about the procedures. Someone suggested for each mesh, create its own descriptor sets and call vkCmdBindDescriptorSets() and vkCmdDrawIndexed() for rendering like this:
// Pipeline with descriptor set layout that matches the shared descriptor sets
vkCmdBindPipeline(...pipelines.mesh...);
...
// Mesh A
vkCmdBindDescriptorSets(...&meshA.descriptorSet... );
vkCmdDrawIndexed(...);
// Mesh B
vkCmdBindDescriptorSets(...&meshB.descriptorSet... );
vkCmdDrawIndexed(...);
However, the above approach is quite different from the chopper sample and vulkan's samples that makes me have no idea where to start the change. I really appreciate any help to guide me to a correct direction.
Cheers
You have a conceptual object which is made of multiple meshes which have different texturing needs. The general ways to deal with this are:
Change descriptor sets between parts of the object. Painful, but it works on all Vulkan-capable hardware.
Employ array textures. Each individual mesh fetches its data from a particular layer in the array texture. Of course, this restricts you to having each sub-mesh use textures of the same size. But it works on all Vulkan-capable hardware (up to 128 array elements, minimum). The array layer for a particular mesh can be provided as a push-constant, or a base instance if that's available.
Note that if you manage to be able to do it by base instance, then you can render the entire object with a multi-draw indirect command. Though it's not clear that a short multi-draw indirect would be faster than just baking a short sequence of drawing commands into a command buffer.
Employ sampler arrays, as Sascha Willems suggests. Presumably, the array index for the sub-mesh is provided as a push-constant or a multi-draw's draw index. The problem is that, regardless of how that array index is provided, it will have to be a dynamically uniform expression. And Vulkan implementations are not required to allow you to index a sampler array with a dynamically uniform expression. The base requirement is just a constant expression.
This limits you to hardware that supports the shaderSampledImageArrayDynamicIndexing feature. So you have to ask for that, and if it's not available, then you've got to work around that with #1 or #2. Or just don't run on that hardware. But the last one means that you can't run on any mobile hardware, since most of them don't support this feature as of yet.
Note that I am not saying you shouldn't use this method. I just want you to be aware that there are costs. There's a lot of hardware out there that can't do this. So you need to plan for that.
The person that suggested the above code fragment was me I guess ;)
This is only one way of doing it. You don't necessarily have to create one descriptor set per mesh or per texture. If your mesh e.g. uses 4 different textures, you could bind all of them at once to different binding points and select them in the shader.
And if you a take a look at NVIDIA's chopper sample, they do it pretty much the same way only with some more abstraction.
The example also sets up descriptor sets for the textures used :
VkDescriptorSet *textureDescriptors = m_renderer->getTextureDescriptorSets();
binds them a few lines later :
VkDescriptorSet sets[3] = { sceneDescriptor, textureDescriptors[0], m_transform_descriptor_set };
vkCmdBindDescriptorSets(m_draw_command[inCommandIndex], VK_PIPELINE_BIND_POINT_GRAPHICS, layout, 0, 3, sets, 0, NULL);
and then renders the mesh with the bound descriptor sets :
vkCmdDrawIndexedIndirect(m_draw_command[inCommandIndex], sceneIndirectBuffer, 0, inCount, sizeof(VkDrawIndexedIndirectCommand));
vkCmdDraw(m_draw_command[inCommandIndex], 1, 1, 0, 0);
If you take a look at initDescriptorSets you can see that they also create separate descriptor sets for the cubemap, the terrain, etc.
The LunarG examples should work similar, though if I'm not mistaken they never use more than one texture?
I've just started learning vulkan and looked through several tutorials/samples and I notices something: everyone uses 1 struct for uniform loading, containing all the relevant data(model, view, projection matrix etc.). These structs are updated every frame ENTIRELY.
Now my question: Is that because the writers were "lazy" and instead of creating seperate unforms for seperate frequencies (e.g. projection, since it only needs to be loaded once) they just used 1 struct because it doesn't require additional setup, or is it because the performance is better when loading 1 "bigger" struct more frequenty than creating multiple sets.
Most demo/example code is lazy. Typically it's to avoid over-complicating the code with concepts that aren't needed for the result.
Also demos generally have 1 draw call (only rendering one model) so there is no need to change descriptor sets in them.
Also frequency of change is within recording of a single command buffer. So if you have a view matrix which will remain constant for the duration of the frame then you can put it at the start of the layout together with the projection matrix.
What is the best way to render complex meshes? I wrote different solutions below and wonder what is your opinion about them.
Let's take an example: how to render the 'Crytek-Sponza' mesh?
PS: I do not use Ubershader but only separate shaders
If you download the mesh on the following link:
http://graphics.cs.williams.edu/data/meshes.xml
and load it in Blender you'll see that the whole mesh is composed by about 400 sub-meshes with their own materials/textures respectively.
A dummy renderer (version 1) will render each of the 400 sub-mesh separately! It means (to simplify the situation) 400 draw calls with for each of them a binding to a material/texture. Very bad for performance. Very slow!
pseudo-code version_1:
foreach mesh in meshList //400 iterations :(!
mesh->BindVBO();
Material material = mesh->GetMaterial();
Shader bsdf = ShaderManager::GetBSDFByMaterial(material);
bsdf->Bind();
bsdf->SetMaterial(material);
bsdf->SetTexture(material->GetTexture()); //Bind texture
mesh->Render();
Now, if we take care of the materials being loaded we can notice that the Sponza is composed in reality of ONLY (if I have a good memory :)) 25 different materials!
So a smarter solution (version 2) should be to gather all the vertex/index data in batches (25 in our example) and not store VBO/IBO into sub-meshes classes but into a new class called Batch.
pseudo-code version_2:
foreach batch in batchList //25 iterations :)!
batch->BindVBO();
Material material = batch->GetMaterial();
Shader bsdf = ShaderManager::GetBSDFByMaterial(material);
bsdf->Bind();
bsdf->SetMaterial(material);
bsdf->SetTexture(material->GetTexture()); //Bind texture
batch->Render();
In this case each VBO contains data that share exactly the same texture/material settings!
It's so much better! Now I think 25 VBO for render the sponza is too much! The problem is the number of Buffer bindings to render the sponza! I think a good solution should be to allocate a new VBO if the first one if 'full' (for example let's assume that the maximum size of a VBO (value defined in the VBO class as attribute) is 4MB or 8MB).
pseudo-code version_3:
foreach vbo in vboList //for example 5 VBOs (depends on the maxVBOSize)
vbo->Bind();
BatchList batchList = vbo->GetBatchList();
foreach batch in batchList
Material material = batch->GetMaterial();
Shader bsdf = ShaderManager::GetBSDFByMaterial(material);
bsdf->Bind();
bsdf->SetMaterial(material);
bsdf->SetTexture(material->GetTexture()); //Bind texture
batch->Render();
In this case each VBO does not contain necessary data that share exactly the same texture/material settings! It depends of the sub-mesh loading order!
So OK, there are less VBO/IBO bindings but not necessary less draw calls! (are you OK by this affirmation ?). But in a general manner I think this version 3 is better than the previous one! What do you think about this ?
Another optimization should be to store all the textures (or group of textures) of the sponza model in array(s) of textures! But if you download the sponza package you will see that all texture has different sizes! So I think they can't be bound together because of their format differences.
But if it's possible, the version 4 of the renderer should use only less texture bindings rather than 25 bindings for the whole mesh! Do you think it's possible ?
So, according to you, what is the best way to render the sponza mesh ? Have you another suggestion ?
You are focused on the wrong things. In two ways.
First, there's no reason you can't stick all of the mesh's vertex data into a single buffer object. Note that this has nothing to do with batching. Remember: batching is about the number of draw calls, not the number of buffers you use. You can render 400 draw calls out of the same buffer.
This "maximum size" that you seem to want to have is a fiction, based on nothing from the real world. You can have it if you want. Just don't expect it to make your code faster.
So when rendering this mesh, there is no reason to be switching buffers at all.
Second, batching is not really about the number of draw calls (in OpenGL). It's really about the cost of the state changes between draw calls.
This video clearly spells out (about 31 minutes in), the relative cost of different state changes. Issuing two draw calls with no state changes between them is cheap (relatively speaking). But different kinds of state changes have different costs.
The cost of changing buffer bindings is quite small (assuming you're using separate vertex formats, so that changing buffers doesn't mean changing vertex formats). The cost of changing programs and even texture bindings is far greater. So even if you had to make multiple buffer objects (which again, you don't have to), that's not going to be the primary bottleneck.
So if performance is your goal, you'd be better off focusing on the expensive state changes, not the cheap ones. Making a single shader that can handle all of the material settings for the entire mesh, so that you only need to change uniforms between them. Use array textures so that you only have one texture binding call. This will turn a texture bind into a uniform setting, which is a much cheaper state change.
There are even fancier things you can do, involving base instance counts and the like. But that's overkill for a trivial example like this.
In computer graphics it's a common technique to apply jittering to sampling positions in order to avoid visible sampling patterns.
What's the proper way to apply jittering to sampl-positions in a fragment shader? One way I could think of would be to feed a noise-texture into the shader, and then depending on the texlvalue of this noise texture alter the sampling-positions of whatever one wants to sample.
Is there a better way of implementing jittering?
The various hardware driver AA schemes usually are jitter-based already -- each vendor has their own favored patterns. And unfortunately the user/player can often mess with the settings and turn things on and off.
This might make you think that there's no "correct" way -- and you'd be right! There are ways, some of which are correct for some tasks. Learn and use what you like.
For surface-texture jittering, you might try something like so:
In photoshop, make a 256x512 rgb image (or 256x511, for some versions of the NVIDIA DDS exporter), fill it ALL with random noise, save as DDS with "use existing mip maps." this will give you roughly-uniform noise regardless of how much the texture is scaled (at least through many useful size ranges).
in your shader, read the noise texture, then apply it to the UVs, and then read your other texture(s), e.g.
float4 noise = tex2D(noiseSampler,OriginalUV);
float2 newUV = OriginalUV + someSmallScale*(noise.xy-0.5);
float4 jitteredColor = tex2D(colorSampler,newUV);
Try other variations as you like. Be aware that strong texture jittering can cause a lot of texture cache misses, which may have a performance cost, and be careful around normal maps, since you are introducing high-frequency components into a signal that's already been filtered.
In C, I can debug code like:
fprintf(stderr, "blah: %f", some_var);
in GLSL ... is there anyway for me to just dump out a value in a Vertex or Fragment shader? I don't care if it's slow; I just want to dump out the value. Ideally, I want a setup like the following:
normal state = run GLSL shader normally
press key 'd' = next frame is generated in ULTRA slow mode, where the "printfs" in the
Vertex/Fragment shader are executed and dumped out.
Is this feasible? (I don't care about performance; I just want to do this for one frame).
Thanks!
Unfortunately it's not possible directly. One possible solution though, that I end up using a lot (but I'm sure it's pretty common among GLSL developers) is to "print" values as colors, in place of your intended final result.
Of course this has many limitations; for one, you have to make sure that your value maps in a (0,1.0) range. Functions as mod, fract etc. turn out useful in these cases. But, in general, this is what I see as the "printf" equivalent in GLSL.
Instead of printing values, have you thought of trying a GLSL debugger?
For example, glslDevil will let you step through your shader's execution and examine the variables at each step.
Check out AMD CodeXL. It will let you step frame by frame to inspect opengl state values, shader code, and texture memory.
http://developer.amd.com/tools-and-sdks/heterogeneous-computing/codexl/
You can see the variable you want to check by copying its value in a uniform and then get that uniform with glGetUniformfv