I want to be able to input a bunch of vertices to my graphics program and then I want to be able to do the following on them:
Use them in the graphics part of OpenGL, especially in the Vertex Shader.
Do physics calculations on them in a Compute Shader.
By these requirements I figured that I need some structure in which I store my vertices and can access them correctly, I thought of the following:
ArrayBuffers
Textures (as in storing the information, not for texturing itself)
However I've thought and came up with drawbacks of both variants:
ArrayBuffers:
I'm unsure how my Compute Shader can read, let alone modify, the vertices. Yet I do know how to draw them.
Textures:
I know how to modify them in Compute Shaders, however I am unsure how to draw from a texture. More specifically, the number of elements needed to be drawn depends on the number of written (data not zero) elements in the texture.
I might have overlooked some important other features that suffice my need, so as the real question:
How do I create Vertices that reside on the GPU and which I can both access in the Vertex and in the Compute Shader?
Hopefully this will clear up a few misconceptions, and give you a little bit better understanding of how general purpose shader storage is setup.
What you have to understand is how buffer objects really work in GL. You often hear people distinguish between things like "Vertex Buffer Objects" and "Uniform Buffer Objects". In reality, there is no fundamental distinction – a buffer object is treated the same way no matter what it stores. It is just a generic data store, and it only takes on special meaning while it is bound to a specific point (e.g. GL_ARRAY_BUFFER or GL_UNIFORM_BUFFER).
Do not think of special purpose vertex buffers residing on the GPU, think more generally – it is actually unformatted memory that you can read/write if you know the structure. Calls like glVertexAttribPointer (...) describe the data structure of the buffer object sufficiently for glDrawArrays (...) to meaningfully pull vertex attributes from the buffer object's memory for each vertex shader invocation.
You need to do the same thing yourself for compute shaders, as demonstrated below. You need to familiarize yourself with the rules discussed in 7.6.2.2 - Standard Uniform Block Layout to fully understand the following data structure.
Description of a vertex data structure using Shader Storage Blocks can be done like so:
// Compute Shader SSB Data Structure and Buffer Definition
struct VtxData {
vec4 vtx_pos; // 4N [GOOD] -- Largest base alignment
vec3 vtx_normal; // 3N [BAD]
float vtx_padding7; // N (such that vtx_st begins on a 2N boundary)
vec2 vtx_st; // 2N [BAD]
vec2 vtx_padding10; // 2N (in order to align the entire thing to 4N)
}; // ^^ 12 * sizeof (GLfloat) per-vtx
// std140 is pretty important here, it is the only way to guarantee the data
// structure is aligned as described above and that the stride between
// elements in verts[] is 0.
layout (std140, binding = 1) buffer VertexBuffer {
VtxData verts [];
};
This allows you to use an interleaved vertex buffer in a compute shader, with the data structure defined above. You have to be careful with data alignment when you do this... you could haphazardly use any alignment/stride you wanted for an interleaved vertex array ordinarily, but here you want to conform to the std140 layout rules. This means using 3-component vectors is not always a wise use of memory; you need things to be aligned on N (float), 2N (vec2) or 4N (vec3/vec4) boundaries and this often necessitates the insertion of padding and/or clever packing of data. In the example above, you could fit an entire 3-component vector worth of data in all the space wasted by alignment padding.
Pseudo-code showing how the buffer would be created and bound for dual-use:
struct Vertex {
GLfloat pos [4];
GLfloat normal [3];
GLfloat padding7;
GLfloat st [2];
GLfloat padding10 [2];
} *verts;
[... code to allocate and fill verts ...]
GLuint vbo;
glGenBuffers (1, &vbo);
glBindBuffer (GL_ARRAY_BUFFER, vbo);
glBufferData (GL_ARRAY_BUFFER, sizeof (Vertex) * num_verts, verts, GL_STATIC_DRAW);
glVertexAttribPointer (0, 4, GL_FLOAT, GL_FALSE, 48, 0); // Vertex Attrib. 0
glVertexAttribPointer (1, 3, GL_FLOAT, GL_FALSE, 48, 16); // Vertex Attrib. 1
glVertexAttribPointer (2, 2, GL_FLOAT, GL_FALSE, 48, 32); // Vertex Attrib. 2
glBindBufferBase (GL_SHADER_STORAGE_BUFFER, 1, vbo); // Buffer Binding 1
Related
I am working on a 3D mesh I am storing in an array: each element of the array is a 4D point in homogeneous coordinates (x, y, z, w). I use OpenCL to do some calculations on these data, which later I want to visualise, therefore I set up an OpenCL/GL interop context. I have created a shared buffer between OpenCL and OpenGL by using the clCreateFromGLBuffer function on a GL_ARRAY_BUFFER:
...
glGenBuffers(1, &vbo);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferData(GL_ARRAY_BUFFER, size_of_data, data, GL_DYNAMIC_DRAW);
vbo_buff = clCreateFromGLBuffer(ctx, CL_MEM_READ_WRITE, vbo, &err);
...
In the vertex shader, I access data this way:
layout (location = 0) in vec4 data;
out VS_OUT
{
vec4 vertex;
} vs_out;
void main(void)
{
vs_out.vertex = data;
}
Then in the geometry shader I do something like this:
layout (points) in;
layout (triangle_strip, max_vertices = MAX_VERT) out;
in VS_OUT
{
vec4 vertex;
} gs_in[];
void main()
{
gl_Position = gs_in[0].vertex;
EmitVertex();
...etc...
}
This gives me the ability of generating geometry based on the position of each point the stored in the data array.
This way, the geometry I can generate is only based on the current point being processed by the geometry shader: e.g. I am able to construct a small cube (voxel) around each point.
Now I would like to be able to access to the position of other points in the data array within the geometry shader: e.g. I would like to be able to retrieve the coordinates of another point (indexed by another shared buffer of an arbitrary length) besides the one which is currently processed in order to draw a line connecting them.
The problem I have is that in the geometry shader gs_in[0].vertex gives me the position of each point but I don't know which one for at the time (which index?). Moreover I don't know how to access the position of other points besides that one at the same time.
In an hypothetical pseudo-code I would like to be able to do something like this:
point_A = gs_in[0].vertex[index_A];
point_B = gs_in[0].vertex[index_B];
draw_line_between_A_and_B(point_A, point_B);
It is not clear to me whether this is possible or not, or how to achieve this within a geometry shader. I would like to stick to this approach because the calculations I do in the OpenCL kernels implement a cellular automata, hence it is convenient for me to organise my code (neutrino) in terms of central nodes and related neighbours.
All suggestions are welcome.
Thanks.
but I don't know which one for at the time (which index?)
See gl_PrimitiveIDIn
I don't know how to access the position of other points besides that one at the same time.
You can bind same source buffer two times, as a vertex source and as GL_TEXTURE_BUFFER. If your OpenGL implementation supports it, you'll then be able to read from there.
Unlike Direct3D, in GL the support for the feature is optional, the spec says GL_MAX_GEOMETRY_SHADER_STORAGE_BLOCKS can be zero.
"This gives me the ability of generating geometry based on the position of each point the stored in the data array."
No it does not. The input to the geometry shader are not all the vertex attributes in the buffer. Let me quote the Geometry Shader wiki page:
Geometry shaders take a primitive as input; each primitive is composed of some number of vertices, as defined by the input primitive type in the shader.
Primitives are a single point, line primitive or triangle primitive. For instance, If the primitive type is GL_POINTS, then the size of the input array is 1 and you can only access the vertex coordinate of the point, or if the primitive type is GL_TRIANGLES, the the size of the input array is 3 and you can only access the 3 vertex attributes (respectively corners) which form the triangle.
If you want to access more data, the you have to use a Shader Storage Buffer Object (or a texture).
I've decided to take a look at uniform buffer objects. But i am not sure when and when not use it.
I have tried to batch all models transformations into single array that i would send to the shader at once. But it has it's consequences. I have to also send to shader id for every vertex to match those transformations.
So my question is: Is it worth? Or should i prefer in that particular case to use regular glUniform calls?
Below is my shader program.
#version 330 core
layout (location = 0) in vec3 position;
layout (location = 1) in int id;
layout (std140) uniform scene_data
{
mat4 ViewProjectionMatrix;
mat4 ModelMatrix[128];
};
void main()
{
gl_Position = ViewProjectionMatrix * ModelMatrix[id] * vec4(position,1.0);
}
Here is how i create Uniform Buffer Object:
Dot::UniformBuffer::UniformBuffer(const void*data, unsigned int size, unsigned int index)
:m_Index(index),m_size(size)
{
glGenBuffers(1, &m_UBO);
glBindBuffer(GL_UNIFORM_BUFFER, m_UBO);
glBufferData(GL_UNIFORM_BUFFER, size, data, GL_DYNAMIC_DRAW);
glBindBufferBase(GL_UNIFORM_BUFFER, index, m_UBO);
glBindBuffer(GL_UNIFORM_BUFFER, 0);
}
Dot::UniformBuffer::~UniformBuffer()
{
glDeleteBuffers(1, &m_UBO);
}
void Dot::UniformBuffer::Update(const void* data)
{
glBindBuffer(GL_UNIFORM_BUFFER, m_UBO);
GLvoid* p = glMapBuffer(GL_UNIFORM_BUFFER, GL_WRITE_ONLY);
memcpy(p, data, m_size);
glUnmapBuffer(GL_UNIFORM_BUFFER);
}
Everything has its tradeoffs, and would require performance measurements to verify.
Think of glUniform as pass-by-value, whereas UBO is pass-by-reference. Except, there is [usually] no perf penalty in the GPU shader code for additional indirection to the UBO.
UBOs' main benefit is reducing CPU overhead at the "bind" stage, since all the values are already written to memory. A render can prepare many UBOs up front at loading time (one UBO per "state vector") to avoid having to transfer the data during the main rendering loop. Typically you don't want to modify a UBO in-place in your renderer, because the glMapBuffer will wait for previous draws that are using that UBO to complete.
In this particular example, the shader is performing an "indexed constant lookup" using the input attribute id, which is slower than a uniform constant lookup.
Other considerations:
layout (location = 1) in int id; requires the CPU to prepare an additional vertex buffer and index buffer. This may also imply making data copies of the raw geometry, which is more CPU and memory intensive.
glDrawArraysInstanced or glDrawElementsInstanced can let OpenGL generate the id for you instead. From the shader, use gl_InstanceID. Removes the need for additional vertex buffers. That would be similar to https://learnopengl.com/Advanced-OpenGL/Instancing
glUniform will cause the data to be transferred from application code --> driver --> GPU on every call.
UBO requires more object management and planning, but usually converges on the fastest performing renderer. Vulkan and D3D12 support only UBO style programming.
I have a shader storage block in the vertex shader, like this:
layout(std430,binding=0) buffer buf {mat3 rotX, rotY, rotZ; } b;
I initialized those 3 matrices with identity matrix like this:
float mats[]={ 1,0,0,0,1,0,0,0,1,
1,0,0,0,1,0,0,0,1,
1,0,0,0,1,0,0,0,1 };
GLuint ssbos;
glGenBuffers(1,&ssbos);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER,0,ssbos);
glBufferData(GL_SHADER_STORAGE_BUFFER,sizeof(mats),mats,GL_DYNAMIC_DRAW);
But it doesn't seem to work (I'm using Opengl 4.3 core profile). Am I doing something wrong?
glBindBufferBase(GL_SHADER_STORAGE_BUFFER,0,ssbos);
glBufferData(GL_SHADER_STORAGE_BUFFER,sizeof(mats),mats,GL_DYNAMIC_DRAW);
glBindBufferBase binds the entire range of the buffer. But it's not a magic "bind whatever the buffer happens to store" function. It binds the entire range of the buffer as it currently exists.
And since you haven't allocated any storage for that buffer object, its current state is empty: a size of 0. And that's what you bind: a range of 0 bytes of memory.
Oh sure, in the next statement, you give the buffer memory. But that doesn't change the fact that it didn't have memory when you bound it.
So you need to create storage for the buffer before binding a range of it.
Also, don't use vec3 or any types related to vec3 in buffer-backed interface blocks. And you really shouldn't be passing axial rotation matrices like that.
The std430 layout is essentially std140 with tighter packing of structs and arrays. The data you are supplying does not respect the layout rules.
From section 7.6.2.2 Standard Uniform Block Layout of the OpenGL spec:
If the member is an array of scalars or vectors, the base alignment and array stride are set to match the base alignment of a single array element, according to rules (1), (2), and (3), and rounded up to the base alignment of a vec4. The array may have padding at the end; the base offset of the member following the array is rounded up to the next multiple of the base alignment.
If the member is a column-major matrix with C columns and R rows, the matrix is stored identically to an array of C column vectors with R components each, according to rule (4).
So your mat3 matrices are treated as 3 vec3 each (one for each column). According to the rule (4), a vec3 is padded to occupy the same memory as a vec4.
In essence, when using a mat3 in an SSBO, you need to supply the same amount of data as if you were using a mat4 mat3x4 with the added benefit of a more confusing memory layout. Therefore, it is best to use mat3x4 (or mat4) in an SSBO and only use its relevant portions in the shader. Similar advice also stands for vec3 by the way.
It is easy to get smaller matrices from a larger one:
A wide range of other possibilities exist, to construct a matrix from vectors and scalars, as long as enough components are present to initialize the matrix. To construct a matrix from a matrix:
mat3x3(mat4x4); // takes the upper-left 3x3 of the mat4x4
mat2x3(mat4x2); // takes the upper-left 2x2 of the mat4x4, last row is 0,0
mat4x4(mat3x3); // puts the mat3x3 in the upper-left, sets the lower right
// component to 1, and the rest to 0
This should give you proper results:
float mats[]={ 1,0,0,0, 0,1,0,0, 0,0,1,0,
1,0,0,0, 0,1,0,0, 0,0,1,0,
1,0,0,0, 0,1,0,0, 0,0,1,0, };
I use oglplus - it's a c++ wrapper for OpenGL.
I have a problem with defining instanced data for my particle renderer - positions work fine but something goes wrong when I want to instance a bunch of ints from the same VBO.
I am going to skip some of the implementation details to not make this problem more complicated. Assume that I bind VAO and VBO before described operations.
I have an array of structs (called "Particle") that I upload like this:
glBufferData(GL_ARRAY_BUFFER, sizeof(Particle) * numInstances, newData, GL_DYNAMIC_DRAW);
Definition of the struct:
struct Particle
{
float3 position;
//some more attributes, 9 floats in total
//(...)
int fluidID;
};
I use a helper function to define the OpenGL attributes like this:
void addInstancedAttrib(const InstancedAttribDescriptor& attribDesc, GLSLProgram& program, int offset=0)
{
//binding and some implementation details
//(...)
oglplus::VertexArrayAttrib attrib(program, attribDesc.getName().c_str());
attrib.Pointer(attribDesc.getPerVertVals(), attribDesc.getType(), false, sizeof(Particle), (void*)offset);
attrib.Divisor(1);
attrib.Enable();
}
I add attributes for positions and fluidids like this:
InstancedAttribDescriptor posDesc(3, "InstanceTranslation", oglplus::DataType::Float);
this->instancedData.addInstancedAttrib(posDesc, this->program);
InstancedAttribDescriptor fluidDesc(1, "FluidID", oglplus::DataType::Int);
this->instancedData.addInstancedAttrib(fluidDesc, this->program, (int)offsetof(Particle,fluidID));
Vertex shader code:
uniform vec3 FluidColors[2];
in vec3 InstanceTranslation;
in vec3 VertexPosition;
in vec3 n;
in int FluidID;
out float lightIntensity;
out vec3 sphereColor;
void main()
{
//some typical MVP transformations
//(...)
sphereColor = FluidColors[FluidID];
gl_Position = projection * vertexPosEye;
}
This code as whole produces this output:
As you can see, the particles are arranged in the way I wanted them to be, which means that "InstanceTranslation" property is setup correctly. The group of the particles to the left have FluidID value of 0 and the ones to the right equal to 1. The second set of particles have proper positions but index improperly into FluidColors array.
What I know:
It's not a problem with the way I set up the FluidColors uniform. If I hard-code the color selection in the shader like this:
sphereColor = FluidID == 0? FluidColors[0] : FluidColors1;
I get:
OpenGL returns GL_NO_ERROR from glGetError so there's no problem with the enums/values I provide
It's not a problem with the offsetof macro. I tried using hard-coded values and they didn't work either.
It's not a compatibility issue with GLint, I use simple 32bit Ints (checked this with sizeof(int))
I need to use FluidID as a instanced attrib that indexes into the color array because otherwise, if I were to set the color for a particle group as a simple vec3 uniform, I'd have to batch the same particle types (with the same FluidID) together first which means sorting them and it'd be too costly of an operation.
To me, this seems to be an issue of how you set up the fluidID attribute pointer. Since you use the type int in the shader, you must use glVertexAttribIPointer() to set up the attribute pointer. Attributes you set up with the normal glVertexAttribPointer() function work only for float-based attribute types. They accept integer input, but the data will be converted to float when the shader accesses them.
In oglplus, you apparently have to use VertexArrayAttrib::IPointer() instead of VertexArrayAttrib::Pointer() if you want to work with integer attributes.
I want to pass a single float or unsigned int type variable to vertex shader but you can only pass vec or struct as an attribute variable. So, I used a vec2 type attribute variable and later used it to access the content.
glBindAttribLocation(program, 0, "Bid");
glEnableVertexAttribArray(0);
glVertexAttribIPointer(0, 1, GL_UNSIGNED_INT, sizeof(strideStructure), (const GLvoid*)0);
The vertex shader contains this code:
attribute ivec2 Bid;
void main()
{
int x = Bid.x;
int y = Bid.y;
}
So, when I pass value each time, doesn't the value get stored in x-component of vec2 Bid? In the second run of the loop, will the passed data be stored in x- component of different vector attribute? Also, if I change the size parameter to 2 for example, what would be the order in which data are stored in the vector attribute?
You can use scalar types for attributes. From the GLSL 1.50 spec (which corresponds to OpenGL 3.2):
Vertex shader inputs can only be float, floating-point vectors, matrices, signed and unsigned integers and integer vectors. Vertex shader inputs can also form arrays of these types, but not structures.
No matter if you use vector or scalar values, the types have to match. In your example, you're specifying GL_UNSIGNED_INT as the attribute type, but the type in the shader is ivec2, which is a signed value. It should be uvec2 to match the specified attribute type.
Yes, if you declare the type in the shader as uvec2 Bid, but pass only one value, that value will be in Bid.x. Bid.y will be 0. If you pass two values per vertex, the first one will be in Bid.x, and the second one in Bid.y.
You sound a little unclear about how vertex shaders are invoked, particularly when you talk about "run of the loop". There is no looping here. The vertex shader is invoked once for each vertex, and the corresponding attribute values for this specific vertex will be passed in the attribute variables. They will be in the same attribute variables, and in the same place within these variables, for each vertex.
I guess you can picture a "loop" in the sense of the vertex shader being invoked for each vertex. In reality, a lot of processing on the GPU will happen in parallel. This means that the vertex shader for a bunch of vertices will be invoked at the same time. Each one has its own variable instances, so they will not get in each others way, and the attributes for each one will be passed in exactly the same way.
An additional note on your code. You need to be careful with this call:
glBindAttribLocation(program, 0, "Bid");
glBindAttribLocation() needs to be called before linking the shader program. Otherwise it will have no effect.