OpenGL glMultiDrawElementsIndirect with Interleaved Buffers - c++

Originally using glDrawElementsInstancedBaseVertex to draw the scene meshes. All the meshes vertex attributes are being interleaved in a single buffer object. In total there are only 30 unique meshes. So I've been calling draw 30 times with instance counts, etc. but now I want to batch the draw calls into one using glMultiDrawElementsIndirect. Since I have no experience with this command function, I've been reading articles here and there to understand the implementation with little success. (For testing purposes all meshes are instanced only once).
The command structure from the OpenGL reference page.
struct DrawElementsIndirectCommand
{
GLuint vertexCount;
GLuint instanceCount;
GLuint firstVertex;
GLuint baseVertex;
GLuint baseInstance;
};
DrawElementsIndirectCommand commands[30];
// Populate commands.
for (size_t index { 0 }; index < 30; ++index)
{
const Mesh* mesh{ m_meshes[index] };
commands[index].vertexCount = mesh->elementCount;
commands[index].instanceCount = 1; // Just testing with 1 instance, ATM.
commands[index].firstVertex = mesh->elementOffset();
commands[index].baseVertex = mesh->verticeIndex();
commands[index].baseInstance = 0; // Shouldn't impact testing?
}
// Create and populate the GL_DRAW_INDIRECT_BUFFER buffer... bla bla
Then later down the line, after setup I do some drawing.
// Some prep before drawing like bind VAO, update buffers, etc.
// Draw?
if (RenderMode == MULTIDRAW)
{
// Bind, Draw, Unbind
glBindBuffer(GL_DRAW_INDIRECT_BUFFER, m_indirectBuffer);
glMultiDrawElementsIndirect (GL_TRIANGLES, GL_UNSIGNED_INT, nullptr, 30, 0);
glBindBuffer(GL_DRAW_INDIRECT_BUFFER, 0);
}
else
{
for (size_t index { 0 }; index < 30; ++index)
{
const Mesh* mesh { m_meshes[index] };
glDrawElementsInstancedBaseVertex(
GL_TRIANGLES,
mesh->elementCount,
GL_UNSIGNED_INT,
reinterpret_cast<GLvoid*>(mesh->elementOffset()),
1,
mesh->verticeIndex());
}
}
Now the glDrawElements... still works fine like before when switched. But trying glMultiDraw... gives indistinguishable meshes but when I set the firstVertex to 0 for all commands, the meshes look almost correct (at least distinguishable) but still largely wrong in places?? I feel I'm missing something important about indirect multi-drawing?

//Indirect data
commands[index].firstVertex = mesh->elementOffset();
//Direct draw call
reinterpret_cast<GLvoid*>(mesh->elementOffset()),
That's not how it works for indirect rendering. The firstVertex is not a byte offset; it's the first vertex index. So you have to divide the byte offset by the size of the index to compute firstVertex:
commands[index].firstVertex = mesh->elementOffset() / sizeof(GLuint);
The result of that should be a whole number. If it wasn't, then you were doing unaligned reads, which probably hurt your performance. So fix that ;)

Related

Write code, which does not cause many page faults

I have a code for rendering an OpenGL scene. This code is causing many page faults, when started without visual studio. The code seen in paintGL() is only a fraction what happens there, but it takes the most time.
Example code:
void prepareData() {
std::vector<int> m_indices; // vector of point indices, that should be connected
std::vector<float> m_vertices; // vector of the 3d points
/*
fill the vectors
*/
}
void MyGLWidget::paintGL() {
glBegin(GL_TRIANGLE_STRIP);
for (unsigned int i=0; i < m_indices.size(); i++)
{
// search end of strip
if (m_indices[i] < 0)
{
// store end of strip
endStrip = i;
// we need at least three vertices for a triangle strip
if (startStrip+2 < endStrip)
{
// draw strip
for (unsigned int j=startStrip; j<endStrip; j++) {
idx = 3 * m_indices[j];
glVertex3dv(m_vertices[idx]));
}
}
// store start of next strip
startStrip = i+1;
}
}
glEnd();
}
So here is the problem: when the data changes and gets calculated, the next call of paintGL() is very slow, because accessing the new values causes a lot of page faults.
When the data does not change, paintGL() is as fast as it should be.
Both data vectors can get be really big, normally we have sizes like 10 million indices and 15 million vertices.
My question is, how can I achieve to make the paintGL faster, when the values to display are freshly calculated?
When the application is started with Visual Studio (both Releae builds), there aren't that many page faults and it is faster than normal. How does Visual Studio achieve that and can I do this too, without visual studio monitoring my application.
The problem was already described here, but now I have found out the root cause for the problem: Release Build is faster, when started from Visual Studio than started “normally”
Additional information: C++/opengl application running smoother with debugger attached
The increased page fault load is just a secondary symptom of the really poor rendering loop. Modern GPUs operate on (large/-ish) buffers of vertex and index data. When using glBegin…glEnd intermediate mode, the driver is forced to create such buffers in situ. To speed things up there are a lot of heuristics, including the drivers also marking pages so that they get notified, if the contents of the pages changes, so that buffers are recreated only when needed.
Rewrite it to use indexed triangles in a vertex array, this is the mode GPUs and OpenGL drivers work best.
Even a client side vertex array will massively speed things up, since the driver can then coalesce the buffer copy. Of course the best thing would be, if you could just place m_vertices in a Vertex Buffer Object.
#include <vector>
#include <utility>
// overloaded wrappers to deduce the
// GL type from the type of the index buffer vector
namespace gl_wrap {
void DrawElements(
GLenum mode, GLsizei count,
std::vector<GLubyte> const &idx_buffer,
size_t offset )
{
glDrawElements(mode, count, GL_UNSIGNED_BYTE, idx_buffer.data()+offset);
}
void DrawElements(
GLenum mode, GLsizei count,
std::vector<GLushort> const &idx_buffer,
size_t offset )
{
glDrawElements(mode, count, GL_UNSIGNED_SHORT, idx_buffer.data()+offset);
}
void DrawElements(
GLenum mode, GLsizei count,
std::vector<GLuint> const &idx_buffer,
size_t offset )
{
glDrawElements(mode, count, GL_UNSIGNED_INT, idx_buffer.data()+offset);
}
}
void MyGLWidget::paintGL() {
glBegin(GL_TRIANGLE_STRIP);
std::vector<std::pair<size_t,size_t>> strips;
size_t i_prev = 0, i = 0;
for( auto idx : m_indices ){
++i;
if( idx < 0 ){
strips.push_back(std::make_pair(i_prev, i-i_prev));
i_prev = i;
}
}
glEnableClientState(GL_VERTEX_ARRAY);
glVertexPointer(3, GL_DOUBLE, 0, m_vertices.data());
for( auto const &s : strips ){
gl_wrap::DrawElements(GL_TRIANGLE_STRIP, m_indices.data(), s.second, s.first);
}
glDisableClientState(GL_VERTEX_ARRAY);
}

OpenGL render multiple objects using single VBO and updata object's matrices using another VBO

So, I need the way to render multiple objects(not instances) using one draw call. Actually I know how to do this, just to place data into single vbo/ibo and render, using glDrawElements.
The question is: what is efficient way to update uniform data without setting it up for every single object, using glUniform...?
How can I setup one buffer containing all uniform data of dozens of objects, include MVP matrices, bind it and perform render using single draw call?
I tried to use UBOs, but it's not what I need at all.
For rendering instances we just place uniform data, including matrices, at another VBO and set up attribute divisor using glVertexAttribDivisor, but it only works for instances.
Is there a way to do that I want in OpenGL? If not, what can I do to overcome overheads of setting uniform data for dozens of objects?
For example like this:
{
// setting up VBO
glGenBuffers(1, &vbo);
glBindBuffer(vbo);
glBufferData(..., data_size);
// setup buffer
for(int i = 0; i < objects_num; i++)
glBufferSubData(...offset, size, &(objects[i]));
// the same for IBO
.........
// when setup some buffer, that will store all uniforms, for every object
.........
glDrawElements(...);
}
Thanks in advance for helping.
If you're ok with requiring OpenGL 4.3 or higher, I believe you can render this with a single draw call using glMultiDrawElementsIndirect(). This allows you to essentially make multiple draw calls with a single API call. Each sub-call is defined by values in a struct of the form:
typedef struct {
GLuint count;
GLuint instanceCount;
GLuint firstIndex;
GLuint baseVertex;
GLuint baseInstance;
} DrawElementsIndirectCommand;
Since you do not want to draw multiple instances of the same vertices, you use 1 for the instanceCount in each draw call. The key idea is that you can still use instancing by specifying a different baseInstance value for each one. So each object will have a different gl_InstanceID value, and you can use instanced attributes for the values (matrices, etc) that you want to vary per object.
So if you currently have a rendering loop:
for (int k = 0; k < objectCount; ++k) {
// set uniforms for object k.
glDrawElements(GL_TRIANGLES, object[k].indexCount,
GL_UNSIGNED_INT, object[k].indexOffset * sizeof(GLuint));
}
you would instead fill an array of the struct defined above with the arguments:
DrawElementsIndirectCommand cmds[objectCount];
for (int k = 0; k < objectCount; ++k) {
cmds[k].count = object[k].indexCount;
cmds[k].instanceCount = 1;
cmds[k].firstIndex = object[k].indexOffset;
cmds[k].baseVertex = 0;
cmds[k].baseInstance = k;
}
// Rest of setup.
glMultiDrawElementsIndirect(GL_TRIANGLES, GL_UNSIGNED_INT, 0, objectCount, 0);
I didn't provide code for the full setup above. The key steps include:
Drop the cmds array into a buffer, and bind it as GL_DRAW_INDIRECT_BUFFER.
Store the per-object values in a VBO. Set up the corresponding vertex attributes, which includes specifying them as instanced with glVertexAttribDivisor(1).
Set up the per-vertex attributes as usual.
Set up the index buffer as usual.
For this to work, the indices for all the objects will have to be in the same index buffer, and the values for each attribute will have to be in the same VBO across all objects.

OpenGL: Shader storage buffer mapping/binding

I'm currently working on a program which supports depth-independent (also known as order-independent) alpha blending. To do that, I implemented a per-pixel linked list, using a texture for the header (points for every pixel to the first entry in the linked list) and a texture buffer object for the linked list itself. While this works fine, I would like to exchange the texture buffer object with a shader storage buffer as an excercise.
I think I almost got it, but it took me about a week to get to a point where I could actually use the shader storage buffer. My question are:
Why I can't map the shader storage buffer?
Why is it a problem to bind the shader storage buffer again?
For debugging, I just display the contents of the shader storage buffer (which doesn't contain a linked list yet). I created the shader storage buffer in the following way:
glm::vec4* bufferData = new glm::vec4[windowOptions.width * windowOptions.height];
glm::vec4* readBufferData = new glm::vec4[windowOptions.width * windowOptions.height];
for(unsigned int y = 0; y < windowOptions.height; ++y)
{
for(unsigned int x = 0; x < windowOptions.width; ++x)
{
// Set the whole buffer to red
bufferData[x + y * windowOptions.width] = glm::vec4(1,0,0,1);
}
}
GLuint ssb;
// Get a handle
glGenBuffers(1, &ssb);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssb);
// Create buffer
glBufferData(GL_SHADER_STORAGE_BUFFER, windowOptions.width * windowOptions.height * sizeof(glm::vec4), bufferData, GL_DYNAMIC_COPY);
// Now bind the buffer to the shader
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, ssb);
In the shader, the shader storage buffer is defined as:
layout (std430, binding = 0) buffer BufferObject
{
vec4 points[];
};
In the rendering loop, I do the following:
glUseProgram(defaultProgram);
for(unsigned int y = 0; y < windowOptions.height; ++y)
{
for(unsigned int x = 0; x < windowOptions.width; ++x)
{
// Create a green/red color gradient
bufferData[x + y * windowOptions.width] =
glm::vec4((float)x / (float)windowOptions.width,
(float)y / (float)windowOptions.height, 0.0f, 1.0f);
}
}
glMemoryBarrier(GL_ALL_BARRIER_BITS); // Don't know if this is necessary, just a precaution
glBufferSubData(GL_SHADER_STORAGE_BUFFER, 0, windowOptions.width * windowOptions.height * sizeof(glm::vec4), bufferData);
// Retrieving the buffer also works fine
// glMemoryBarrier(GL_ALL_BARRIER_BITS);
// glGetBufferSubData(GL_SHADER_STORAGE_BUFFER, 0, windowOptions.width * windowOptions.height * sizeof(glm::vec4), readBufferData);
glMemoryBarrier(GL_ALL_BARRIER_BITS); // Don't know if this is necessary, just a precaution
// Draw a quad which fills the screen
// ...
This code works, but when I replace glBufferSubData with the following code,
glm::vec4* p = (glm::vec4*)glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0, windowOptions.width * windowOptions.height, GL_WRITE_ONLY);
for(unsigned int x = 0; x < windowOptions.width; ++x)
{
for(unsigned int y = 0; y < windowOptions.height; ++y)
{
p[x + y * windowOptions.width] = glm::vec4(0,1,0,1);
}
}
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
the mapping fails, returning GL_INVALID_OPERATION. It seems like the shader storage buffer is still bound to something, so it can't be mapped. I read something about glGetProgramResourceIndex (http://www.opengl.org/wiki/GlGetProgramResourceIndex) and glShaderStorageBlockBinding (http://www.opengl.org/wiki/GlShaderStorageBlockBinding), but I don't really get it.
My second question is, why I can neither call
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, ssb);
, nor
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssb);
in the render loop after glBufferSubData and glMemoryBarrier. This code should not change a thing, since these calls are the same as during the creation of the shader storage buffer. If I can't bind different shader storage buffers, I can only use one. But I know that more than one shader storage buffer is supported, so I think I'm missing something else (like "releasing" the buffer).
First of all, the glMapBufferRange fails simply because GL_WRITE_ONLY is not a valid argument to it. That was used for the old glMapBuffer, but glMapBufferRange uses a collection of flags for more fine-grained control. In your case you need GL_MAP_WRITE_BIT instead. And since you seem to completely overwrite the whole buffer, without caring for the previous values, an additional optimization would probably be GL_MAP_INVALIDATE_BUFFER_BIT. So replace that call with:
glm::vec4* p = (glm::vec4*)glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0,
windowOptions.width * windowOptions.height,
GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT);
The other error is not described that well in the question. But fix this one first and maybe it will already help with the following error.

(DirectX 11) Dynamic Vertex/Index Buffers implementation with constant scene content changes

Been delving into un-managed DirectX 11 for the first time (bear with me) and there's an issue that, although asked several times over the forums still leaves me with questions.
I am developing as app in which objects are added to the scene over time. On each render loop I want to collect all vertices in the scene and render them reusing a single vertex and index buffer for performance and best practice. My question is regarding the usage of dynamic vertex and index buffers. I haven't been able to fully understand their correct usage when scene content changes.
vertexBufferDescription.Usage = D3D11_USAGE_DYNAMIC;
vertexBufferDescription.BindFlags = D3D11_BIND_VERTEX_BUFFER;
vertexBufferDescription.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
vertexBufferDescription.MiscFlags = 0;
vertexBufferDescription.StructureByteStride = 0;
Should I create the buffers when the scene is initialized and somehow update their content in every frame? If so, what ByteSize should I set in the buffer description? And what do I initialize it with?
Or, should I create it the first time the scene is rendered (frame 1) using the current vertex count as its size? If so, when I add another object to the scene, don't I need to recreate the buffer and changing the buffer description's ByteWidth to the new vertex count? If my scene keeps updating its vertices on each frame, the usage of a single dynamic buffer would loose its purpose this way...
I've been testing initializing the buffer on the first time the scene is rendered, and from there on, using Map/Unmap on each frame. I start by filling in a vector list with all the scene objects and then update the resource like so:
void Scene::Render()
{
(...)
std::vector<VERTEX> totalVertices;
std::vector<int> totalIndices;
int totalVertexCount = 0;
int totalIndexCount = 0;
for (shapeIterator = models.begin(); shapeIterator != models.end(); ++shapeIterator)
{
Model* currentModel = (*shapeIterator);
// totalVertices gets filled here...
}
// At this point totalVertices and totalIndices have all scene data
if (isVertexBufferSet)
{
// This is where it copies the new vertices to the buffer.
// but it's causing flickering in the entire screen...
D3D11_MAPPED_SUBRESOURCE resource;
context->Map(vertexBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &resource);
memcpy(resource.pData, &totalVertices[0], sizeof(totalVertices));
context->Unmap(vertexBuffer, 0);
}
else
{
// This is run in the first frame. But what if new vertices are added to the scene?
vertexBufferDescription.ByteWidth = sizeof(VERTEX) * totalVertexCount;
UINT stride = sizeof(VERTEX);
UINT offset = 0;
D3D11_SUBRESOURCE_DATA resourceData;
ZeroMemory(&resourceData, sizeof(resourceData));
resourceData.pSysMem = &totalVertices[0];
device->CreateBuffer(&vertexBufferDescription, &resourceData, &vertexBuffer);
context->IASetVertexBuffers(0, 1, &vertexBuffer, &stride, &offset);
isVertexBufferSet = true;
}
In the end of the render loop, while keeping track of the buffer position of the vertices for each object, I finally invoke Draw():
context->Draw(objectVertexCount, currentVertexOffset);
}
My current implementation is causing my whole scene to flicker. But no memory leaks. Wonder if it has anything to do with the way I am using the Map/Unmap API?
Also, in this scenario, when would it be ideal to invoke buffer->Release()?
Tips or code sample would be great! Thanks in advance!
At the memcpy into the vertex buffer you do the following:
memcpy(resource.pData, &totalVertices[0], sizeof(totalVertices));
sizeof( totalVertices ) is just asking for the size of a std::vector< VERTEX > which is not what you want.
Try the following code:
memcpy(resource.pData, &totalVertices[0], sizeof( VERTEX ) * totalVertices.size() );
Also you don't appear to calling IASetVertexBuffers when isVertexBufferSet is true. Make sure you do so.

Using vertex buffers in jogl, crash when too many triangles

I have written a simple application in Java using Jogl which draws a 3d geometry. The camera can be rotated by dragging the mouse. The application works fine, but drawing the geometry with glBegin(GL_TRIANGLE) ... calls ist too slow.
So I started to use vertex buffers. This also works fine until the number of triangles gets larger than 1000000. If that happens, the display driver suddenly crashes and my montior gets dark. Is there a limit of how many triangles fit in the buffer? I hoped to get 1000000 triangles rendered at a reasonable frame rate.
I have no idea on how to debug this problem. The nasty thing is that I have to reboot Windows after each launch, since I have no other way to get my display working again. Could anyone give me some advice?
The vertices, triangles and normals are stored in arrays float[][] m_vertices, int[][] m_triangles, float[][] m_triangleNormals.
I initialized the buffer with:
// generate a VBO pointer / handle
if (m_vboHandle <= 0) {
int[] vboHandle = new int[1];
m_gl.glGenBuffers(1, vboHandle, 0);
m_vboHandle = vboHandle[0];
}
// interleave vertex / normal data
FloatBuffer data = Buffers.newDirectFloatBuffer(m_triangles.length * 3*3*2);
for (int t=0; t<m_triangles.length; t++)
for (int j=0; j<3; j++) {
int v = m_triangles[t][j];
data.put(m_vertices[v]);
data.put(m_triangleNormals[t]);
}
data.rewind();
// transfer data to VBO
int numBytes = data.capacity() * 4;
m_gl.glBindBuffer(GL.GL_ARRAY_BUFFER, m_vboHandle);
m_gl.glBufferData(GL.GL_ARRAY_BUFFER, numBytes, data, GL.GL_STATIC_DRAW);
m_gl.glBindBuffer(GL.GL_ARRAY_BUFFER, 0);
Then, the scene gets rendered with:
gl.glBindBuffer(GL.GL_ARRAY_BUFFER, m_vboHandle);
gl.glEnableClientState(GL2.GL_VERTEX_ARRAY);
gl.glEnableClientState(GL2.GL_NORMAL_ARRAY);
gl.glVertexPointer(3, GL.GL_FLOAT, 6*4, 0);
gl.glNormalPointer(GL.GL_FLOAT, 6*4, 3*4);
gl.glDrawArrays(GL.GL_TRIANGLES, 0, 3*m_triangles.length);
gl.glDisableClientState(GL2.GL_VERTEX_ARRAY);
gl.glDisableClientState(GL2.GL_NORMAL_ARRAY);
gl.glBindBuffer(GL.GL_ARRAY_BUFFER, 0);
Try checking the return value of calling glBufferData. It will return GL_OUT_OF_MEMORY if it cannot satisfy numBytes.