opengl - use large (100k) uniform array in shader - c++

In my program I am doing a single render of one model. I have a generated array of unsigned chars where all bits in each byte can be used. There is an element in the array for each triangle in the model. To get the color for the triangle I use gl_PrimitiveID which gives you the location of the triangle in the buffer being rendered.
My problem is that my GPU is 4.2 meaning I can only use UBOs and not SSBOs. The max array size (byte array) is a little over 16,000 and I need 100,000, The smallest required UBO size is 16KB. Using a standard uniform float[N] has the same limit as the UBO in my case.
I have been looking at this: https://www.opengl.org/registry/specs/NV/shader_buffer_load.txt
But I would like to know if there are other option before I use something so device specific.
My current Frag-Shader if you would like to see:
#version 420 core
out vec3 color;
layout (std140) uniform ColorBlock{
unsigned char colors[16000]; // this need to be 100,000
};
void main(){
float r, g, b;
r = colors[1.0f / gl_PrimitiveID];
g = colors[1.0f / gl_PrimitiveID];
b = colors[1.0f / gl_PrimitiveID];
color = vec3(r, g, b);
}

You can use texture buffer objects (TBOs).
Note that although they are exposed via the texture interface, the data access is totally different from textures, the data is directly fetched from the underlying buffer object, with no sampler overhead.
Also note that the guaranteed minimum size for TBOs is only 65536 texels. However, on all desktop implementations, it is much larger. Also note that you can pack up to 4 floats into an texel, so that 100000 values will be possible that way, even if only the minimum size is available.

Related

Arrays in uniform blocks cannot be indexed with vertex attributes, right?

I was wondering if I can index into a uniform buffer array with a value contained in the vertices I draw, like:
layout (location = 0) in vec3 position;
layout (location = 1) in flat int idx;
layout(binding = 0, std140) uniform uniformValues
{
float values[100];
};
void main()
{
values[idx];
}
My understanding is that this is not possible because
'in flat int idx'
Is most likely not a 'dynamically uniform expression', and according to the documentation cannot be used to index into a uniform buffer array:
There are places where GLSL requires the expression to be dynamically
uniform. All of the following must use a dynamically uniform
expression:
-The index to buffer-backed interface block arrays.
However I came across information from the same source regarding how to access an array of samplers holding texture handles for 'bindless textures', and it says (emphasis mine):
Sampler and image types used in default block uniform variables can be
populated from handles rather than the index of a binding point.
These types can also now be passed as Shader Stage inputs/outputs (using the flat interpolation qualifier where needed). They can be
used as Vertex Attributes, where they are treated as 64-bit integers
on the OpenGL side. And they can be used in Interface Blocks of all
kinds; buffer-backed interface blocks treat them as 64-bit integers.
It's saying, I believe, that instead of doing this:
layout (location = 0) in flat int textureBinding;
layout (binding = 0) uniform sampler2D textures[16];
void main()
{
textures[textureBinding];
}
You do this:
layout (location = 0) in flat int bindlessTextureHandle;
layout (binding = 0) uniform textureBuffer
{
sampler2D textures[200];
}
void main()
{
textures[bindlessTextureHandle];
}
'bindlessTextureHandle' isn't a 'dynamically uniform expression', how can it be used to index into uniform buffer?
All of the following must use a dynamically uniformexpression:
-The index to buffer-backed interface block arrays.
So why is it saying that you can index into 'interface blocks' of all kinds with values from vertex inputs?
Also are you allowed to index into:
'uniform sampler2D[16] textures;'
with a 'non-dynamically uniform expression'?
My understanding is that this is not possible because
in flat int idx
Is most likely not a 'dynamically uniform expression', and according
to the documentation cannot be used to index into a uniform buffer
array.
You are right that you need a dynamically uniform value to index into an uniform buffer array. However, this:
layout(binding = 0, std140) uniform uniformValues
{
float values[100];
};
is not a uniform buffer array. That is an array inside a single uniform buffer object, and you can index with non-uniform values into this array as you like. A uniform buffer array would be:
layout(binding = 0, std140) uniform myUBO
{
float value;
} myUBOArray[4];
The rest of your question gets even more obscure. You sometimes reference "The index to buffer-backed interface block arrays", which your code never uses. This is talking about SSBOs, which use interface blocks of the form layout(...) buffer foo {...}.
So why is it saying that you can index into 'interface blocks' of all kinds with values from vertex inputs?
Because that is how it is. You just need to understand that indexing into an array of interface blocks is not the same as indexing some other array (which might or might not be defined inside an interface block, doesn't matter).
Also are you allowed to index into:
uniform sampler2D[16] textures; with a 'non-dynamically uniform expression'?
No, not in standard GL.
The first thing about bindless textures is that this is not a core feature of any OpenGL version released to date (which is 4.6 at the time of writing this).It is only defined as an extension GL_ARB_bindless_textures which some modern GPUs and drivers expose, but the availability of that feature is quite far from being universal.
Second, the extension spec above explains: "Sampler and image handles passed to texture built-in functions must be dynamically uniform", so it still doesn't get you there. However, the extension GL_NV_gpu_shader5 removes that restriction. So on recent NVIDIA GPUs, you can get a non-dymically uniform index into an array of bindless texture samplers - but performance will still suffer by a significant amount if you do so.
There are a myriad of separate, overlapping issues here.
Indexing an array within a uniform block has never been limited to dynamically uniform expressions (generally, see below). Even in GL 3.x, you can index an array within a buffer-backed block with an arbitrary index.
However, you're not asking about a general array; you're asking about arrays of textures. Or to be more general, the entire sequence of operations leading to the computation of a sampler type through bindless textures.
That entire sequence must be dynamically uniform (unless you're on NVIDIA, which allows arbitrary expressions). It doesn't matter if you're indexing an SSBO array, using an input variable to pass a texture handle directly, or anything else. The value that leads to the acquisition of a sampler type must be dynamically uniform.
So why is it saying that you can index into 'interface blocks' of all kinds with values from vertex inputs?
Because you can.
A common misunderstanding of what "dynamically uniform" means is that it is a static property. That an expression by itself is either dynamically uniform or not. This is close to true, but it's not actually true.
Some expressions are dynamically uniform by their nature. You could call these "statically dynamically uniform" expressions. A constant expression is always dynamically uniform, for example.
However, being dynamically uniform is about the value of the expression. All invocations (within the rendering command) must result in the same value. An in variable for a shader stage can be dynamically uniform so long as it just so happens to always have the same value within the rendering command. For example, a VS could access a value from a uniform array using gl_DrawID (which is dynamically uniform), pass that as an in to the FS, and the FS can use that value as a sampler. Or to access an array of samplers. Or whatever. All FS invocations will get the same value within the draw command, so that value is dynamically uniform.

How to initialize matrices in a shader storage buffer?

I have a shader storage block in the vertex shader, like this:
layout(std430,binding=0) buffer buf {mat3 rotX, rotY, rotZ; } b;
I initialized those 3 matrices with identity matrix like this:
float mats[]={ 1,0,0,0,1,0,0,0,1,
1,0,0,0,1,0,0,0,1,
1,0,0,0,1,0,0,0,1 };
GLuint ssbos;
glGenBuffers(1,&ssbos);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER,0,ssbos);
glBufferData(GL_SHADER_STORAGE_BUFFER,sizeof(mats),mats,GL_DYNAMIC_DRAW);
But it doesn't seem to work (I'm using Opengl 4.3 core profile). Am I doing something wrong?
glBindBufferBase(GL_SHADER_STORAGE_BUFFER,0,ssbos);
glBufferData(GL_SHADER_STORAGE_BUFFER,sizeof(mats),mats,GL_DYNAMIC_DRAW);
glBindBufferBase binds the entire range of the buffer. But it's not a magic "bind whatever the buffer happens to store" function. It binds the entire range of the buffer as it currently exists.
And since you haven't allocated any storage for that buffer object, its current state is empty: a size of 0. And that's what you bind: a range of 0 bytes of memory.
Oh sure, in the next statement, you give the buffer memory. But that doesn't change the fact that it didn't have memory when you bound it.
So you need to create storage for the buffer before binding a range of it.
Also, don't use vec3 or any types related to vec3 in buffer-backed interface blocks. And you really shouldn't be passing axial rotation matrices like that.
The std430 layout is essentially std140 with tighter packing of structs and arrays. The data you are supplying does not respect the layout rules.
From section 7.6.2.2 Standard Uniform Block Layout of the OpenGL spec:
If the member is an array of scalars or vectors, the base alignment and array stride are set to match the base alignment of a single array element, according to rules (1), (2), and (3), and rounded up to the base alignment of a vec4. The array may have padding at the end; the base offset of the member following the array is rounded up to the next multiple of the base alignment.
If the member is a column-major matrix with C columns and R rows, the matrix is stored identically to an array of C column vectors with R components each, according to rule (4).
So your mat3 matrices are treated as 3 vec3 each (one for each column). According to the rule (4), a vec3 is padded to occupy the same memory as a vec4.
In essence, when using a mat3 in an SSBO, you need to supply the same amount of data as if you were using a mat4 mat3x4 with the added benefit of a more confusing memory layout. Therefore, it is best to use mat3x4 (or mat4) in an SSBO and only use its relevant portions in the shader. Similar advice also stands for vec3 by the way.
It is easy to get smaller matrices from a larger one:
A wide range of other possibilities exist, to construct a matrix from vectors and scalars, as long as enough components are present to initialize the matrix. To construct a matrix from a matrix:
mat3x3(mat4x4); // takes the upper-left 3x3 of the mat4x4
mat2x3(mat4x2); // takes the upper-left 2x2 of the mat4x4, last row is 0,0
mat4x4(mat3x3); // puts the mat3x3 in the upper-left, sets the lower right
// component to 1, and the rest to 0
This should give you proper results:
float mats[]={ 1,0,0,0, 0,1,0,0, 0,0,1,0,
1,0,0,0, 0,1,0,0, 0,0,1,0,
1,0,0,0, 0,1,0,0, 0,0,1,0, };

OpenGL Bindless Textures: Bind to uniform sampler2D array

I am looking into using bindless textures to rapidly display a series of images. My reference is the OpenGL 4.5 redbook. The book says I can sample bindless textures in a shader with this fragment shader:
#version 450 core
#extension GL_ARB_bindless_texture : require
in FS_INPUTS {
vec2 i_texcoord;
flat int i_texindex;
};
layout (binding = 0) uniform ALL_TEXTURES {
sampler2D fs_textures[200];
};
out vec4 color;
void main(void) {
color = texture(fs_textures[i_texindex], i_texcoord);
};
I created a vertex shader that looks like this:
#version 450 core
in vec2 vert;
in vec2 texcoord;
uniform int texindex;
out FS_INPUTS {
vec2 i_texcoord;
flat int i_texindex;
} tex_data;
void main(void) {
tex_data.i_texcoord = texcoord;
tex_data.i_texindex = texindex;
gl_Position = vec4(vert.x, vert.y, 0.0, 1.0);
};
As you may notice, my grasp of whats going on is a little weak.
In my OpenGL code, I create a bunch of textures, get their handles, and make them resident. The function I am using to get the texture handles is 'glGetTextureHandleARB'. There is another function that could be used instead, 'glGetTextureSamplerHandleARB' where I can pass in a sampler location. Here is what I did:
Texture* textures = new Texture[load_limit];
GLuint64* tex_handles = new GLuint64[load_limit];
for (int i=0; i<load_limit; ++i)
{
textures[i].bind();
textures[i].data(new CvImageFile(image_names[i]));
tex_handles[i] = glGetTextureHandleARB(textures[i].id());
glMakeTextureHandleResidentARB(tex_handles[i]);
textures[i].unbind();
}
My question is how do I bind my texture handles to the ALL_TEXTURES uniform attribute of the fragment shader? Also, what should I use to update the vertex attribute 'texindex' - an actual index into my texture handle array or a texture handle?
It's bindless texturing. You do not "bind" such textures to anything.
In bindless texturing, the data value of a sampler is a number. Specifically, the number returned by glGetTextureHandleARB. Texture handles are 64-bit unsigned integer.
In a shader, values of sampler types in buffer-backed interface blocks (UBOs and SSBOs) are 64-bit unsigned integers. So an array of samplers is equivalent in structure to an array of 64-bit unsigned integers.
So in C++, a struct equivalent to your ALL_TEXTURES block would be:
struct AllTextures
{
GLuint64 textures[200];
};
Well, assuming you properly use std140 layout, of course. Otherwise, you'd have to query the layout of the structure.
At this point, you treat the buffer as no different from any other UBO usage. Build the data for the shader by sticking an AllTextures into a buffer object, then bind that buffer as a UBO to binding 0. You just need to fill the array in with the actual texture handles.
Also, what should I use to update the vertex attribute 'texindex' - an actual index into my texture handle array or a texture handle?
Well, neither one will work. Not the way you've written it.
See, ARB_bindless_texture does not allow you to access any texture you want in any way at any time from any shader invocation. Unless you are using NV_gpu_shader5, the code leading to the texture access must be based on dynamically uniform expressions.
So unless every vertex in your rendering command gets the same index or handle... you cannot use them to pick which texture to use. Even instancing will not save you, since dynamically uniform expressions don't care about instancing.
If you want to render a bunch of quads without having to change uniforms between them (and without having to rely on an NVIDIA extension), then you have a few options. Most hardware that supports bindless texture also supports ARB_shader_draw_parameters. This gives you access to gl_DrawID, which represents the current index of a rendering command within a glMultiDraw-style command. And that extension explicitly declares that gl_DrawID is dynamically uniform.
So you could use that to select which texture to render. You simply need to issue a multi-draw command where you render the same mesh data over and over, but it gets a different gl_DrawID index in each case.

Opengl - instanced attributes

I use oglplus - it's a c++ wrapper for OpenGL.
I have a problem with defining instanced data for my particle renderer - positions work fine but something goes wrong when I want to instance a bunch of ints from the same VBO.
I am going to skip some of the implementation details to not make this problem more complicated. Assume that I bind VAO and VBO before described operations.
I have an array of structs (called "Particle") that I upload like this:
glBufferData(GL_ARRAY_BUFFER, sizeof(Particle) * numInstances, newData, GL_DYNAMIC_DRAW);
Definition of the struct:
struct Particle
{
float3 position;
//some more attributes, 9 floats in total
//(...)
int fluidID;
};
I use a helper function to define the OpenGL attributes like this:
void addInstancedAttrib(const InstancedAttribDescriptor& attribDesc, GLSLProgram& program, int offset=0)
{
//binding and some implementation details
//(...)
oglplus::VertexArrayAttrib attrib(program, attribDesc.getName().c_str());
attrib.Pointer(attribDesc.getPerVertVals(), attribDesc.getType(), false, sizeof(Particle), (void*)offset);
attrib.Divisor(1);
attrib.Enable();
}
I add attributes for positions and fluidids like this:
InstancedAttribDescriptor posDesc(3, "InstanceTranslation", oglplus::DataType::Float);
this->instancedData.addInstancedAttrib(posDesc, this->program);
InstancedAttribDescriptor fluidDesc(1, "FluidID", oglplus::DataType::Int);
this->instancedData.addInstancedAttrib(fluidDesc, this->program, (int)offsetof(Particle,fluidID));
Vertex shader code:
uniform vec3 FluidColors[2];
in vec3 InstanceTranslation;
in vec3 VertexPosition;
in vec3 n;
in int FluidID;
out float lightIntensity;
out vec3 sphereColor;
void main()
{
//some typical MVP transformations
//(...)
sphereColor = FluidColors[FluidID];
gl_Position = projection * vertexPosEye;
}
This code as whole produces this output:
As you can see, the particles are arranged in the way I wanted them to be, which means that "InstanceTranslation" property is setup correctly. The group of the particles to the left have FluidID value of 0 and the ones to the right equal to 1. The second set of particles have proper positions but index improperly into FluidColors array.
What I know:
It's not a problem with the way I set up the FluidColors uniform. If I hard-code the color selection in the shader like this:
sphereColor = FluidID == 0? FluidColors[0] : FluidColors1;
I get:
OpenGL returns GL_NO_ERROR from glGetError so there's no problem with the enums/values I provide
It's not a problem with the offsetof macro. I tried using hard-coded values and they didn't work either.
It's not a compatibility issue with GLint, I use simple 32bit Ints (checked this with sizeof(int))
I need to use FluidID as a instanced attrib that indexes into the color array because otherwise, if I were to set the color for a particle group as a simple vec3 uniform, I'd have to batch the same particle types (with the same FluidID) together first which means sorting them and it'd be too costly of an operation.
To me, this seems to be an issue of how you set up the fluidID attribute pointer. Since you use the type int in the shader, you must use glVertexAttribIPointer() to set up the attribute pointer. Attributes you set up with the normal glVertexAttribPointer() function work only for float-based attribute types. They accept integer input, but the data will be converted to float when the shader accesses them.
In oglplus, you apparently have to use VertexArrayAttrib::IPointer() instead of VertexArrayAttrib::Pointer() if you want to work with integer attributes.

Opengl Vertex attribute stride

Hey.
I new to OpenGL ES but I've had my share of experience with normal OpenGL.
I've been told that using interlaced arrays for the vertex buffers is a lot faster due to the optimisation for avoiding cache misses.
I've developed a vertex format that I will use that looks like this
struct SVertex
{
float x,y,z;
float nx,ny,nz;
float tx,ty,tz;
float bx,by,bz;
float tu1,tv1;
float tu2,tv2;
};
Then I used "glVertexAttribPointer(index,3,GL_FLOAT,GL_FALSE,stride,v);" to point to the vertex array. The index is the one of the attribute I want to use and everything else is ok except the stride. It worked before I decided to add this into the equation. I passed the stride both as sizeof(SVertex) and like 13*4 but none of them seem to work.
If it has any importance I draw the primitives like this glDrawElements(GL_TRIANGLES,surface->GetIndexCount()/3,GL_UNSIGNED_INT,surface->IndPtr());
In the OpenGL specs it's written that the stride should be the size in bytes from the end of the attribute( in this case z) to the next attribute of the same kind(in this case x). So by my calculations this should be 13(nx,ny,nz,tx,ty....tuv2,tv2) times 4 (the size of a float).
Oh and one more thing is that the display is just empty.
Could anyone please help me with this?
Thanks a lot.
If you have a structure like this, then stride is just sizeof SVertex and it's the same for every attribute. There's nothing complicated here.
If this didn't work, look for your error somewhere else.
For example here:
surface->GetIndexCount()/3
This parameter should be the number of vertices, not primitives to be sent - hence I'd say that this division by three is wrong. Leave it as:
surface->GetIndexCount()
Then I used
"glVertexAttribPointer(index,3,GL_FLOAT,GL_FALSE,stride,v);"
to point to the vertex array. The
index is the one of the attribute I
want to use and everything else is ok
except the stride
This does not work for texcoord (you have 2x 2 floats or 1x 4 floats).
About the stride, like Kos said, I think you should pass a stride of 16 * sizeof(float) (the size of your SVertex).
Also another thing worth mentioning. You say you want to optimize for performance. Why dont you compress your vertex to the max, and suppress redundant values? This would save a lot of bandwidth.
x, y, z are OK, but nx and ny are self sufficient if your normals are normalized (which may be the case). You can extract in the vertex shader nz (assuming you have shader capabilities). The same thing applies for tx and ty. You don't need bx, by, bz at all since you know it's the cross product of normal and tangent.
struct SPackedVertex
{
float x,y,z,w; //pack all on vector4
float nx,ny,tx,ty;
float tu1,tv1;tu2,tv2;
};