SSBO vs VBO performance - opengl

The standard approach to rendering is to first define VBO and then use layout in
layout (location = 0) in vec4 point;
void main(){
gl_Position = point;
}
However, the exact same result could be achieved with SSBO
layout(std430, binding = 0) buffer Points{
vec4 points[];
};
void main(){
gl_Position = points[gl_VertexID];
}
I am wondering whether there is any performance difference between those two approaches.
Couldn't we just use SSBO all the time and completely get rid of VBO?
Of course the older hardware might not support SSBO, but if I don't care about such compatibility, then what's really stopping me?

Related

Can I modify vertex buffer in GPU through the vertex shader?

For some reason cannot find the answer on the web. I want to update vertex attributes in GPU through the shader in similar form:
#version 330 core
layout(location = 0) in vec4 position;
uniform mat4 someTransformation;
void main()
{
position = position * someTransformation;
gl_Position = position;
}
Is it possible?
Can you write the code you have written? Yes, that is legal code.
Will that change the contents of any GPU storage? No.
While there are ways for a VS to directly manipulate the contents of a buffer, if the buffer region being manipulated is also potentially being used as an attribute array for a rendering command, then you will have undefined behavior.
You can use SSBOs to manipulate other storage which is not being used as the input for rendering. And you can use transform feedback to accumulate data output from vertex processing. But you cannot have a VS directly modify its own input array.

Batch model matrixes to single Uniform Buffer Object

I've decided to take a look at uniform buffer objects. But i am not sure when and when not use it.
I have tried to batch all models transformations into single array that i would send to the shader at once. But it has it's consequences. I have to also send to shader id for every vertex to match those transformations.
So my question is: Is it worth? Or should i prefer in that particular case to use regular glUniform calls?
Below is my shader program.
#version 330 core
layout (location = 0) in vec3 position;
layout (location = 1) in int id;
layout (std140) uniform scene_data
{
mat4 ViewProjectionMatrix;
mat4 ModelMatrix[128];
};
void main()
{
gl_Position = ViewProjectionMatrix * ModelMatrix[id] * vec4(position,1.0);
}
Here is how i create Uniform Buffer Object:
Dot::UniformBuffer::UniformBuffer(const void*data, unsigned int size, unsigned int index)
:m_Index(index),m_size(size)
{
glGenBuffers(1, &m_UBO);
glBindBuffer(GL_UNIFORM_BUFFER, m_UBO);
glBufferData(GL_UNIFORM_BUFFER, size, data, GL_DYNAMIC_DRAW);
glBindBufferBase(GL_UNIFORM_BUFFER, index, m_UBO);
glBindBuffer(GL_UNIFORM_BUFFER, 0);
}
Dot::UniformBuffer::~UniformBuffer()
{
glDeleteBuffers(1, &m_UBO);
}
void Dot::UniformBuffer::Update(const void* data)
{
glBindBuffer(GL_UNIFORM_BUFFER, m_UBO);
GLvoid* p = glMapBuffer(GL_UNIFORM_BUFFER, GL_WRITE_ONLY);
memcpy(p, data, m_size);
glUnmapBuffer(GL_UNIFORM_BUFFER);
}
Everything has its tradeoffs, and would require performance measurements to verify.
Think of glUniform as pass-by-value, whereas UBO is pass-by-reference. Except, there is [usually] no perf penalty in the GPU shader code for additional indirection to the UBO.
UBOs' main benefit is reducing CPU overhead at the "bind" stage, since all the values are already written to memory. A render can prepare many UBOs up front at loading time (one UBO per "state vector") to avoid having to transfer the data during the main rendering loop. Typically you don't want to modify a UBO in-place in your renderer, because the glMapBuffer will wait for previous draws that are using that UBO to complete.
In this particular example, the shader is performing an "indexed constant lookup" using the input attribute id, which is slower than a uniform constant lookup.
Other considerations:
layout (location = 1) in int id; requires the CPU to prepare an additional vertex buffer and index buffer. This may also imply making data copies of the raw geometry, which is more CPU and memory intensive.
glDrawArraysInstanced or glDrawElementsInstanced can let OpenGL generate the id for you instead. From the shader, use gl_InstanceID. Removes the need for additional vertex buffers. That would be similar to https://learnopengl.com/Advanced-OpenGL/Instancing
glUniform will cause the data to be transferred from application code --> driver --> GPU on every call.
UBO requires more object management and planning, but usually converges on the fastest performing renderer. Vulkan and D3D12 support only UBO style programming.

how can i load an array of uniform buffer objects into the shader?

shader code:
// UBO for MVP matrices
layout (binding = 0) uniform UniformBufferObject {
mat4 model;
mat4 view;
mat4 proj;
} ubo;
this works fine because its just one struct and I can set the VkWriteDescriptorSet.descriptorCount to 1. But how can I create an array of those structs?
// Want to do something like this
// for lighting calculations
layout (binding = 2) uniform Light {
vec3 position;
vec3 color;
} lights[4];
I have the data for all of the four lights stored in one buffer. When I set the VkWriteDescriptorSet.descriptorCount to four,
Do I have to create four VkDescriptorBufferInfo? If so, I dont know what to put into offset and range.
All of the blocks in a uniform buffer array live in the same descriptor. However, they are still different blocks; they get a different VkDescriptorBufferInfo info object. So those blocks don't have to come from sequential regions of storage.
Note: The KHR_vulkan_glsl extension gets this wrong, if you look pay close attention. It notes that arrays of opaque types should go into a single descriptor, but arrays of interface blocks don't. The actual glslangValidator compiler (and SPIR-V and Vulkan) do handle it as I described.
However, you cannot access the elements of an interface block array with anything other than dynamically uniform expressions. And even that requires having a certain feature available; without that feature, you can only access the array with constant expressions.
What you probably want is an array within the block, not an array of blocks:
struct Light
{
vec4 position; //(NEVER use `vec3` in blocks)
vec4 color;
};
layout (set = 0, binding = 2, std140) uniform Lights {
Light lights[4];
};
This means that you have a single descriptor in binding slot 2 (descriptorCount is 1). And the buffer's data should be 4 sequential structs.

OpenGL Bindless Textures: Bind to uniform sampler2D array

I am looking into using bindless textures to rapidly display a series of images. My reference is the OpenGL 4.5 redbook. The book says I can sample bindless textures in a shader with this fragment shader:
#version 450 core
#extension GL_ARB_bindless_texture : require
in FS_INPUTS {
vec2 i_texcoord;
flat int i_texindex;
};
layout (binding = 0) uniform ALL_TEXTURES {
sampler2D fs_textures[200];
};
out vec4 color;
void main(void) {
color = texture(fs_textures[i_texindex], i_texcoord);
};
I created a vertex shader that looks like this:
#version 450 core
in vec2 vert;
in vec2 texcoord;
uniform int texindex;
out FS_INPUTS {
vec2 i_texcoord;
flat int i_texindex;
} tex_data;
void main(void) {
tex_data.i_texcoord = texcoord;
tex_data.i_texindex = texindex;
gl_Position = vec4(vert.x, vert.y, 0.0, 1.0);
};
As you may notice, my grasp of whats going on is a little weak.
In my OpenGL code, I create a bunch of textures, get their handles, and make them resident. The function I am using to get the texture handles is 'glGetTextureHandleARB'. There is another function that could be used instead, 'glGetTextureSamplerHandleARB' where I can pass in a sampler location. Here is what I did:
Texture* textures = new Texture[load_limit];
GLuint64* tex_handles = new GLuint64[load_limit];
for (int i=0; i<load_limit; ++i)
{
textures[i].bind();
textures[i].data(new CvImageFile(image_names[i]));
tex_handles[i] = glGetTextureHandleARB(textures[i].id());
glMakeTextureHandleResidentARB(tex_handles[i]);
textures[i].unbind();
}
My question is how do I bind my texture handles to the ALL_TEXTURES uniform attribute of the fragment shader? Also, what should I use to update the vertex attribute 'texindex' - an actual index into my texture handle array or a texture handle?
It's bindless texturing. You do not "bind" such textures to anything.
In bindless texturing, the data value of a sampler is a number. Specifically, the number returned by glGetTextureHandleARB. Texture handles are 64-bit unsigned integer.
In a shader, values of sampler types in buffer-backed interface blocks (UBOs and SSBOs) are 64-bit unsigned integers. So an array of samplers is equivalent in structure to an array of 64-bit unsigned integers.
So in C++, a struct equivalent to your ALL_TEXTURES block would be:
struct AllTextures
{
GLuint64 textures[200];
};
Well, assuming you properly use std140 layout, of course. Otherwise, you'd have to query the layout of the structure.
At this point, you treat the buffer as no different from any other UBO usage. Build the data for the shader by sticking an AllTextures into a buffer object, then bind that buffer as a UBO to binding 0. You just need to fill the array in with the actual texture handles.
Also, what should I use to update the vertex attribute 'texindex' - an actual index into my texture handle array or a texture handle?
Well, neither one will work. Not the way you've written it.
See, ARB_bindless_texture does not allow you to access any texture you want in any way at any time from any shader invocation. Unless you are using NV_gpu_shader5, the code leading to the texture access must be based on dynamically uniform expressions.
So unless every vertex in your rendering command gets the same index or handle... you cannot use them to pick which texture to use. Even instancing will not save you, since dynamically uniform expressions don't care about instancing.
If you want to render a bunch of quads without having to change uniforms between them (and without having to rely on an NVIDIA extension), then you have a few options. Most hardware that supports bindless texture also supports ARB_shader_draw_parameters. This gives you access to gl_DrawID, which represents the current index of a rendering command within a glMultiDraw-style command. And that extension explicitly declares that gl_DrawID is dynamically uniform.
So you could use that to select which texture to render. You simply need to issue a multi-draw command where you render the same mesh data over and over, but it gets a different gl_DrawID index in each case.

How do I properly declare a uniform array of structs in GLSL so that I can point a UBO at it?

The following glsl code appears in my fragment shader. The struct definition causes no problems, but my attempt to use it as the type of a uniform array causes an "invalid operation" error, which is not particularly helpful.
struct InstanceData
{
vec3 rotation;
vec3 scale;
mat4 position;
};
layout (std140) uniform InstanceData instances[100];
How do I structure this code correctly so that it compiles without error, and is thus ready for me to populate with data? Note that I am using core profile version 4.5.
Edit: It seems to be something to do with the use of layout (std140). Removal of that part allows the code to compile, though don't I need that to ensure that the glsl compiler packs the struct data in a predictable way?
Edit: Still not working. My entire vertex shader code looks like this:
#version 450
layout(location=0) in vec4 in_Position;
layout(location=1) in vec4 in_Color;
out vec4 ex_Color;
flat out int ex_Instance;
uniform mat4 modelMatrix;
uniform mat4 viewMatrix;
uniform mat4 projectionMatrix;
// ------ preliminary addition of uniform block to be used soon ------
struct layout (std140) InstanceData
{
vec3 rotation;
vec3 scale;
mat4 position;
};
layout (std140, binding = 0) uniform InstanceData
{
InstanceData instances[100];
};
// -------------------------------------------------------------------
void main(void)
{
gl_Position = (projectionMatrix * viewMatrix * modelMatrix) * in_Position;
ex_Color = in_Color;
}
Note that I haven't written any code externally to populate the uniform buffer yet, nor, as you can see above, have I adjusted my code to make use of the data either. I just want it to compile and work initially as-is (i.e. declared but not used), at which point I will add additional code to start making use of it. There's no point going even that far if the program doesn't like the declaration to begin with.
Edit: Finally figured out the problem by using the shader info log, like so:
GLint infoLogLength;
glGetShaderiv(id, GL_INFO_LOG_LENGTH, &infoLogLength);
GLchar* strInfoLog = new GLchar[infoLogLength + 1];
glGetShaderInfoLog(id, infoLogLength, NULL, strInfoLog);
In a nutshell, as Anton said, I was using layout incorrectly, and my uniform block had the same name as my struct, creating all kinds of confusion for the compiler.
To do this portably, you should use a Uniform Buffer Object.
Presently, your struct uses 3+3+16=22 floating-point components and you are trying to build an array of 100 of these. OpenGL implementations are only required to support 1024 floating-point uniform components in any stage, and your array requires 2200.
Uniform Buffer Objects will allow you to store up to 64 KiB (minimum) of data; well exceeding the limitation above. However, you need to be mindful of data alignment when using UBOs and that is what the layout (std140) qualifier you tried to use is for.
struct InstanceData
{
vec3 rotation;
vec3 scale;
mat4 position;
};
// Uniform block named InstanceBlock, follows std140 alignment rules
layout (std140, binding = 0) uniform InstanceBlock {
InstanceData instances [100];
};
The struct above is not correctly aligned for std140, you will need to be careful when using it.
This is how the data is actually laid out:
struct InstanceData
{
vec3 rotation; // 0,1,2
float padding03; // 3
vec3 scale; // 4,5,6
float padding07; // 7
mat4 position; // 8-23
} // Size: 24 * sizeof (float)
vec3 types are treated the same as vec4 in GLSL and mat4 is effectively an array of 4 vec4, so that means they all need to begin on 4-float boundaries. GLSL automatically inserts padding to satisfy these alignment rules; the changes I made above are to show you the correct way to write this data structure in C. You have to account for 2 floats worth of implicit padding in your structure.
Regarding your edit, you do not have to worry about how GLSL packs a struct until you start using buffer objects.
Without using a Uniform Buffer Object, you have to use functions like glUniform3f (...) to set uniforms. Those functions do not directly expose the data structure to you, so packing does not matter. To set the values for an instance N you would call glUniform3f (...) using the location of "instances [N].rotation" and "instances [N].scale" and glUniformMatrix4fv (...) on "instances [N].position".
That would require 300 API calls to initialize all 100 instances of your struct, so you can see why UBOs are more practical (even ignoring the above mentioned 1024 limitation).