I'm trying to implement batch rendering for 3D objects in an engine I'm doing, and I can't manage to get the indices fine.
So in a 3D Renderer class I have a Renderer3DData structure that looks like the next:
static const uint MaxQuads = 20000;
static const uint MaxVertices = MaxQuads * 4;
static const uint MaxIndices = MaxQuads * 6;
uint IndicesDrawCount = 0; // Debug var
std::vector<uint> Indices;
Ref<IndexBuffer> IBuffer = nullptr;
// Other data like a VBuffer, VArray...
So the vector of Indices will store the indices to draw on each batch while the IBuffer is the Index Buffer class which handles all OpenGL operations ("Ref" is a typedef to make a shared pointer).
Then a static Renderer3DData* s_3DData; is initialized in the init function and the index buffer is initialized as follows:
uint* indices = new uint[s_3DData->MaxIndices];
s_3DData->IBuffer = IndexBuffer::Create(indices, s_3DData->MaxIndices);
And then bounded together with the Vertex Array and the Vertex Buffer, the initialization process is properly done since without batching this works.
So on each new batch the VArray gets bound and the Indices vector gets cleared and, on each mesh drawn, it gets modified like this:
uint offset = 0;
std::vector<uint> indices = mesh->m_Indices;
for (uint i = 0; i < indices.size(); i += 6)
{
s_3DData->Indices.push_back(offset + 0 + indices[i]);
s_3DData->Indices.push_back(offset + 1 + indices[i]);
s_3DData->Indices.push_back(offset + 2 + indices[i]);
s_3DData->Indices.push_back(offset + 3 + indices[i]);
s_3DData->Indices.push_back(offset + 4 + indices[i]);
s_3DData->Indices.push_back(offset + 5 + indices[i]);
offset += 4;
s_3DData->IndicesDrawCount += 6;
}
I don't know how I did come up with this way of setting the index buffer, I was testing things to do it, pushing only the indices or the indices + offset doesn't works neither. Finally, on each draw, I do the next:
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, BufferID);
glBufferSubData(GL_ELEMENT_ARRAY_BUFFER, 0, s_3DData->Indices.size(), s_3DData->Indices.data());
// With the vArray bound:
glDrawElements(GL_TRIANGLES, s_3DData->IndicesDrawCount, GL_UNSIGNED_INT, nullptr);
As I mentioned, when I'm not batching, the drawing (which doesn't goes through all this process), works, so the data in the mesh and the vertex/index buffers must be good, what I think it's wrong is the way to set the index buffer since I'm not sure how to even set it up (unlike other rendering stuff).
The result is the next one (should be a solid sphere):
The way that "sphere" is rendered makes me think that the indices are wrong. And the objects in the center are objects drawn without batching for me to know that it's not the initial setup that's wrong. Does anybody sees what I'm doing wrong?
I finally solved it (I'm crying, I've been with this a lot of time).
So there was a couple of problems:
First: The function s_3DData->IBuffer = IndexBuffer::Create(indices, s_3DData->MaxIndices); that I posted was doing the next:
glCreateBuffers(1, &m_BufferID);
glBindBuffer(GL_ARRAY_BUFFER, m_BufferID);
glBufferData(GL_ARRAY_BUFFER, count * sizeof(uint), nullptr, GL_STATIC_DRAW);
So the first problem was that I was creating index buffers with GL_STATIC_DRAW instead of GL_DYNAMIC_DRAW as required to batch since we are dynamically updating the buffer (this was my bad to not to post the function entirely, I was pretty asleep when I posted it, I should have done it).
Second: The function glBufferSubData(GL_ELEMENT_ARRAY_BUFFER, 0, s_3DData->Indices.size(), s_3DData->Indices.data()); was wrong on the size parameter.
OpenGL requires the size of this function to be the total size of the buffer that we want to update, which is not the vector size but the vector size multiplied by sizeof(uint) (in this case, uint because the vector is a uint vector).
Third: And final problem was the loop that modified the indices vector on each mesh draw, it was wrong and thought from the point of view of drawing quads in 2D (as I was previously testing batching in 2D).
The correct loop is the next:
std::vector<uint> indices = mesh->m_Indices;
for (uint i = 0; i < indices.size(); ++i)
{
s_3DData->Indices.push_back(s_3DData->IndicesCurrentOffset + indices[i]);
++s_3DData->IndicesDrawCount;
++s_3DData->RendererStats.IndicesCount; // Debug Purpose
}
s_3DData->IndicesCurrentOffset += mesh->m_MaxIndex;
So now each mesh stores the (max index + 1) that it has (for a quad with indices from 0 to 3, this would be 4).
This way, I can go through all mesh indices while updating the indices that we use to draw and then I can update the current offset value so that we properly store all the indices drawn in order.
Again, I'm not intending this to be fast nor performative, I was just learning how to do this (and I did :) ).
The result:
Related
Originally using glDrawElementsInstancedBaseVertex to draw the scene meshes. All the meshes vertex attributes are being interleaved in a single buffer object. In total there are only 30 unique meshes. So I've been calling draw 30 times with instance counts, etc. but now I want to batch the draw calls into one using glMultiDrawElementsIndirect. Since I have no experience with this command function, I've been reading articles here and there to understand the implementation with little success. (For testing purposes all meshes are instanced only once).
The command structure from the OpenGL reference page.
struct DrawElementsIndirectCommand
{
GLuint vertexCount;
GLuint instanceCount;
GLuint firstVertex;
GLuint baseVertex;
GLuint baseInstance;
};
DrawElementsIndirectCommand commands[30];
// Populate commands.
for (size_t index { 0 }; index < 30; ++index)
{
const Mesh* mesh{ m_meshes[index] };
commands[index].vertexCount = mesh->elementCount;
commands[index].instanceCount = 1; // Just testing with 1 instance, ATM.
commands[index].firstVertex = mesh->elementOffset();
commands[index].baseVertex = mesh->verticeIndex();
commands[index].baseInstance = 0; // Shouldn't impact testing?
}
// Create and populate the GL_DRAW_INDIRECT_BUFFER buffer... bla bla
Then later down the line, after setup I do some drawing.
// Some prep before drawing like bind VAO, update buffers, etc.
// Draw?
if (RenderMode == MULTIDRAW)
{
// Bind, Draw, Unbind
glBindBuffer(GL_DRAW_INDIRECT_BUFFER, m_indirectBuffer);
glMultiDrawElementsIndirect (GL_TRIANGLES, GL_UNSIGNED_INT, nullptr, 30, 0);
glBindBuffer(GL_DRAW_INDIRECT_BUFFER, 0);
}
else
{
for (size_t index { 0 }; index < 30; ++index)
{
const Mesh* mesh { m_meshes[index] };
glDrawElementsInstancedBaseVertex(
GL_TRIANGLES,
mesh->elementCount,
GL_UNSIGNED_INT,
reinterpret_cast<GLvoid*>(mesh->elementOffset()),
1,
mesh->verticeIndex());
}
}
Now the glDrawElements... still works fine like before when switched. But trying glMultiDraw... gives indistinguishable meshes but when I set the firstVertex to 0 for all commands, the meshes look almost correct (at least distinguishable) but still largely wrong in places?? I feel I'm missing something important about indirect multi-drawing?
//Indirect data
commands[index].firstVertex = mesh->elementOffset();
//Direct draw call
reinterpret_cast<GLvoid*>(mesh->elementOffset()),
That's not how it works for indirect rendering. The firstVertex is not a byte offset; it's the first vertex index. So you have to divide the byte offset by the size of the index to compute firstVertex:
commands[index].firstVertex = mesh->elementOffset() / sizeof(GLuint);
The result of that should be a whole number. If it wasn't, then you were doing unaligned reads, which probably hurt your performance. So fix that ;)
I'm currently trying to teach myself some OpenGL using some Tutorials and LWJGL. Obviously I'm just at rendering cubes.
What I've done up until now, and what works is, that for each cube I'll do
glUniformMatrix4(RenderProgram.ModelMatrixID, false,
renderobject.getTransformationBuffer());
glDrawElements(GL_TRIANGLES, renderobject.Model.countIndices(),
GL_UNSIGNED_INT, renderobject.Model.indexOffset);
Since that only gives me about 50-55 FPS with about 70k cubes, I decided trying instanced rendering, like so:
glDrawElementsInstanced(GL_TRIANGLES, Model.countIndices(),
GL_UNSIGNED_INT, 0, instanceCount);
Of course I've created another buffer for that beforehand, filling it with renderobject.getTransformationBuffer() of each cube and I'm binding this buffer before I try to draw instanced.
I also added it to my vertex shader like so layout(location = 12) in mat4 mModel and I've initialized the attrib pointers like so:
for (int i = 0; i < 4; i++) {
glEnableVertexAttribArray(12 + i);
glVertexAttribPointer(12 + i, 4, GL_FLOAT, false, Float.BYTES * 16,
Float.BYTES * 4 * i);
glVertexAttribDivisor(InstanceBufferID, 1);
}
I get no errors and while I don't see anything on screen, it's rendering and I see an FPS increase of about 350% so I think that I don't get the right model matrix in the shader.
Unfortunately I can't debug variable contents within the shader :) So I'm a little bit stumped as to what I might be missing or how I could unravel this... Also, obviously, Google didn't help me much either and SO just comes up with glDrawElements not working for people.
Edit: The accepted answer was the one error that could be determined from the code provided. However, I had another error in the code, which needed fixing before finally something was visible on the screen, which I'd like to share as well: I unbound the VAO before populating the VBO with the matrix data. As soon as I pushed that unbinding after loading the data into the VBO it worked!
Edit2: Interestingly the performance increase is even more imense now that something IS rendered. With my blank screen I got around 170 FPS for around 70k cubes. Now that it renders correctly I'm getting around 350-400 FPS for around 270k cubes! I didn't expect that.
The first argument to glVertexAttribDivisor should be the index of the vertex attribute that you want to use as an instanced array and not InstanceBufferID.
This should thus become:
for (int i = 0; i < 4; i++) {
glEnableVertexAttribArray(12 + i);
glVertexAttribPointer(12 + i, 4, GL_FLOAT, false, Float.BYTES * 16,
Float.BYTES * 4 * i);
glVertexAttribDivisor(12 + i, 1);
}
I'm currently working on a program which supports depth-independent (also known as order-independent) alpha blending. To do that, I implemented a per-pixel linked list, using a texture for the header (points for every pixel to the first entry in the linked list) and a texture buffer object for the linked list itself. While this works fine, I would like to exchange the texture buffer object with a shader storage buffer as an excercise.
I think I almost got it, but it took me about a week to get to a point where I could actually use the shader storage buffer. My question are:
Why I can't map the shader storage buffer?
Why is it a problem to bind the shader storage buffer again?
For debugging, I just display the contents of the shader storage buffer (which doesn't contain a linked list yet). I created the shader storage buffer in the following way:
glm::vec4* bufferData = new glm::vec4[windowOptions.width * windowOptions.height];
glm::vec4* readBufferData = new glm::vec4[windowOptions.width * windowOptions.height];
for(unsigned int y = 0; y < windowOptions.height; ++y)
{
for(unsigned int x = 0; x < windowOptions.width; ++x)
{
// Set the whole buffer to red
bufferData[x + y * windowOptions.width] = glm::vec4(1,0,0,1);
}
}
GLuint ssb;
// Get a handle
glGenBuffers(1, &ssb);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssb);
// Create buffer
glBufferData(GL_SHADER_STORAGE_BUFFER, windowOptions.width * windowOptions.height * sizeof(glm::vec4), bufferData, GL_DYNAMIC_COPY);
// Now bind the buffer to the shader
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, ssb);
In the shader, the shader storage buffer is defined as:
layout (std430, binding = 0) buffer BufferObject
{
vec4 points[];
};
In the rendering loop, I do the following:
glUseProgram(defaultProgram);
for(unsigned int y = 0; y < windowOptions.height; ++y)
{
for(unsigned int x = 0; x < windowOptions.width; ++x)
{
// Create a green/red color gradient
bufferData[x + y * windowOptions.width] =
glm::vec4((float)x / (float)windowOptions.width,
(float)y / (float)windowOptions.height, 0.0f, 1.0f);
}
}
glMemoryBarrier(GL_ALL_BARRIER_BITS); // Don't know if this is necessary, just a precaution
glBufferSubData(GL_SHADER_STORAGE_BUFFER, 0, windowOptions.width * windowOptions.height * sizeof(glm::vec4), bufferData);
// Retrieving the buffer also works fine
// glMemoryBarrier(GL_ALL_BARRIER_BITS);
// glGetBufferSubData(GL_SHADER_STORAGE_BUFFER, 0, windowOptions.width * windowOptions.height * sizeof(glm::vec4), readBufferData);
glMemoryBarrier(GL_ALL_BARRIER_BITS); // Don't know if this is necessary, just a precaution
// Draw a quad which fills the screen
// ...
This code works, but when I replace glBufferSubData with the following code,
glm::vec4* p = (glm::vec4*)glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0, windowOptions.width * windowOptions.height, GL_WRITE_ONLY);
for(unsigned int x = 0; x < windowOptions.width; ++x)
{
for(unsigned int y = 0; y < windowOptions.height; ++y)
{
p[x + y * windowOptions.width] = glm::vec4(0,1,0,1);
}
}
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
the mapping fails, returning GL_INVALID_OPERATION. It seems like the shader storage buffer is still bound to something, so it can't be mapped. I read something about glGetProgramResourceIndex (http://www.opengl.org/wiki/GlGetProgramResourceIndex) and glShaderStorageBlockBinding (http://www.opengl.org/wiki/GlShaderStorageBlockBinding), but I don't really get it.
My second question is, why I can neither call
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, ssb);
, nor
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssb);
in the render loop after glBufferSubData and glMemoryBarrier. This code should not change a thing, since these calls are the same as during the creation of the shader storage buffer. If I can't bind different shader storage buffers, I can only use one. But I know that more than one shader storage buffer is supported, so I think I'm missing something else (like "releasing" the buffer).
First of all, the glMapBufferRange fails simply because GL_WRITE_ONLY is not a valid argument to it. That was used for the old glMapBuffer, but glMapBufferRange uses a collection of flags for more fine-grained control. In your case you need GL_MAP_WRITE_BIT instead. And since you seem to completely overwrite the whole buffer, without caring for the previous values, an additional optimization would probably be GL_MAP_INVALIDATE_BUFFER_BIT. So replace that call with:
glm::vec4* p = (glm::vec4*)glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0,
windowOptions.width * windowOptions.height,
GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT);
The other error is not described that well in the question. But fix this one first and maybe it will already help with the following error.
Been delving into un-managed DirectX 11 for the first time (bear with me) and there's an issue that, although asked several times over the forums still leaves me with questions.
I am developing as app in which objects are added to the scene over time. On each render loop I want to collect all vertices in the scene and render them reusing a single vertex and index buffer for performance and best practice. My question is regarding the usage of dynamic vertex and index buffers. I haven't been able to fully understand their correct usage when scene content changes.
vertexBufferDescription.Usage = D3D11_USAGE_DYNAMIC;
vertexBufferDescription.BindFlags = D3D11_BIND_VERTEX_BUFFER;
vertexBufferDescription.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
vertexBufferDescription.MiscFlags = 0;
vertexBufferDescription.StructureByteStride = 0;
Should I create the buffers when the scene is initialized and somehow update their content in every frame? If so, what ByteSize should I set in the buffer description? And what do I initialize it with?
Or, should I create it the first time the scene is rendered (frame 1) using the current vertex count as its size? If so, when I add another object to the scene, don't I need to recreate the buffer and changing the buffer description's ByteWidth to the new vertex count? If my scene keeps updating its vertices on each frame, the usage of a single dynamic buffer would loose its purpose this way...
I've been testing initializing the buffer on the first time the scene is rendered, and from there on, using Map/Unmap on each frame. I start by filling in a vector list with all the scene objects and then update the resource like so:
void Scene::Render()
{
(...)
std::vector<VERTEX> totalVertices;
std::vector<int> totalIndices;
int totalVertexCount = 0;
int totalIndexCount = 0;
for (shapeIterator = models.begin(); shapeIterator != models.end(); ++shapeIterator)
{
Model* currentModel = (*shapeIterator);
// totalVertices gets filled here...
}
// At this point totalVertices and totalIndices have all scene data
if (isVertexBufferSet)
{
// This is where it copies the new vertices to the buffer.
// but it's causing flickering in the entire screen...
D3D11_MAPPED_SUBRESOURCE resource;
context->Map(vertexBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &resource);
memcpy(resource.pData, &totalVertices[0], sizeof(totalVertices));
context->Unmap(vertexBuffer, 0);
}
else
{
// This is run in the first frame. But what if new vertices are added to the scene?
vertexBufferDescription.ByteWidth = sizeof(VERTEX) * totalVertexCount;
UINT stride = sizeof(VERTEX);
UINT offset = 0;
D3D11_SUBRESOURCE_DATA resourceData;
ZeroMemory(&resourceData, sizeof(resourceData));
resourceData.pSysMem = &totalVertices[0];
device->CreateBuffer(&vertexBufferDescription, &resourceData, &vertexBuffer);
context->IASetVertexBuffers(0, 1, &vertexBuffer, &stride, &offset);
isVertexBufferSet = true;
}
In the end of the render loop, while keeping track of the buffer position of the vertices for each object, I finally invoke Draw():
context->Draw(objectVertexCount, currentVertexOffset);
}
My current implementation is causing my whole scene to flicker. But no memory leaks. Wonder if it has anything to do with the way I am using the Map/Unmap API?
Also, in this scenario, when would it be ideal to invoke buffer->Release()?
Tips or code sample would be great! Thanks in advance!
At the memcpy into the vertex buffer you do the following:
memcpy(resource.pData, &totalVertices[0], sizeof(totalVertices));
sizeof( totalVertices ) is just asking for the size of a std::vector< VERTEX > which is not what you want.
Try the following code:
memcpy(resource.pData, &totalVertices[0], sizeof( VERTEX ) * totalVertices.size() );
Also you don't appear to calling IASetVertexBuffers when isVertexBufferSet is true. Make sure you do so.
I have written a simple application in Java using Jogl which draws a 3d geometry. The camera can be rotated by dragging the mouse. The application works fine, but drawing the geometry with glBegin(GL_TRIANGLE) ... calls ist too slow.
So I started to use vertex buffers. This also works fine until the number of triangles gets larger than 1000000. If that happens, the display driver suddenly crashes and my montior gets dark. Is there a limit of how many triangles fit in the buffer? I hoped to get 1000000 triangles rendered at a reasonable frame rate.
I have no idea on how to debug this problem. The nasty thing is that I have to reboot Windows after each launch, since I have no other way to get my display working again. Could anyone give me some advice?
The vertices, triangles and normals are stored in arrays float[][] m_vertices, int[][] m_triangles, float[][] m_triangleNormals.
I initialized the buffer with:
// generate a VBO pointer / handle
if (m_vboHandle <= 0) {
int[] vboHandle = new int[1];
m_gl.glGenBuffers(1, vboHandle, 0);
m_vboHandle = vboHandle[0];
}
// interleave vertex / normal data
FloatBuffer data = Buffers.newDirectFloatBuffer(m_triangles.length * 3*3*2);
for (int t=0; t<m_triangles.length; t++)
for (int j=0; j<3; j++) {
int v = m_triangles[t][j];
data.put(m_vertices[v]);
data.put(m_triangleNormals[t]);
}
data.rewind();
// transfer data to VBO
int numBytes = data.capacity() * 4;
m_gl.glBindBuffer(GL.GL_ARRAY_BUFFER, m_vboHandle);
m_gl.glBufferData(GL.GL_ARRAY_BUFFER, numBytes, data, GL.GL_STATIC_DRAW);
m_gl.glBindBuffer(GL.GL_ARRAY_BUFFER, 0);
Then, the scene gets rendered with:
gl.glBindBuffer(GL.GL_ARRAY_BUFFER, m_vboHandle);
gl.glEnableClientState(GL2.GL_VERTEX_ARRAY);
gl.glEnableClientState(GL2.GL_NORMAL_ARRAY);
gl.glVertexPointer(3, GL.GL_FLOAT, 6*4, 0);
gl.glNormalPointer(GL.GL_FLOAT, 6*4, 3*4);
gl.glDrawArrays(GL.GL_TRIANGLES, 0, 3*m_triangles.length);
gl.glDisableClientState(GL2.GL_VERTEX_ARRAY);
gl.glDisableClientState(GL2.GL_NORMAL_ARRAY);
gl.glBindBuffer(GL.GL_ARRAY_BUFFER, 0);
Try checking the return value of calling glBufferData. It will return GL_OUT_OF_MEMORY if it cannot satisfy numBytes.