I have written a simple application in Java using Jogl which draws a 3d geometry. The camera can be rotated by dragging the mouse. The application works fine, but drawing the geometry with glBegin(GL_TRIANGLE) ... calls ist too slow.
So I started to use vertex buffers. This also works fine until the number of triangles gets larger than 1000000. If that happens, the display driver suddenly crashes and my montior gets dark. Is there a limit of how many triangles fit in the buffer? I hoped to get 1000000 triangles rendered at a reasonable frame rate.
I have no idea on how to debug this problem. The nasty thing is that I have to reboot Windows after each launch, since I have no other way to get my display working again. Could anyone give me some advice?
The vertices, triangles and normals are stored in arrays float[][] m_vertices, int[][] m_triangles, float[][] m_triangleNormals.
I initialized the buffer with:
// generate a VBO pointer / handle
if (m_vboHandle <= 0) {
int[] vboHandle = new int[1];
m_gl.glGenBuffers(1, vboHandle, 0);
m_vboHandle = vboHandle[0];
}
// interleave vertex / normal data
FloatBuffer data = Buffers.newDirectFloatBuffer(m_triangles.length * 3*3*2);
for (int t=0; t<m_triangles.length; t++)
for (int j=0; j<3; j++) {
int v = m_triangles[t][j];
data.put(m_vertices[v]);
data.put(m_triangleNormals[t]);
}
data.rewind();
// transfer data to VBO
int numBytes = data.capacity() * 4;
m_gl.glBindBuffer(GL.GL_ARRAY_BUFFER, m_vboHandle);
m_gl.glBufferData(GL.GL_ARRAY_BUFFER, numBytes, data, GL.GL_STATIC_DRAW);
m_gl.glBindBuffer(GL.GL_ARRAY_BUFFER, 0);
Then, the scene gets rendered with:
gl.glBindBuffer(GL.GL_ARRAY_BUFFER, m_vboHandle);
gl.glEnableClientState(GL2.GL_VERTEX_ARRAY);
gl.glEnableClientState(GL2.GL_NORMAL_ARRAY);
gl.glVertexPointer(3, GL.GL_FLOAT, 6*4, 0);
gl.glNormalPointer(GL.GL_FLOAT, 6*4, 3*4);
gl.glDrawArrays(GL.GL_TRIANGLES, 0, 3*m_triangles.length);
gl.glDisableClientState(GL2.GL_VERTEX_ARRAY);
gl.glDisableClientState(GL2.GL_NORMAL_ARRAY);
gl.glBindBuffer(GL.GL_ARRAY_BUFFER, 0);
Try checking the return value of calling glBufferData. It will return GL_OUT_OF_MEMORY if it cannot satisfy numBytes.
Related
i am writing a rasterizer for real-time 3d rendering with opencl.
my current architecture:
vertex shader: 1 thread per vertex
rasterizer: 1 thread per face that loops over all pixels covered by the face
fragment shader: 1 thread per pixel
this works well when the faces occupy a small screen space but when i have one covering a large portion of the screen, the frame rate tanks on account of the fact that the rasterization thread must synchronously loop over all pixels the face covers.
I think this could be solved by a tiled approach. The screen would be divided into subsections (tiles), and one thread would be launched per tile. Only the faces whose bounding box overlap the tile would be processed.
I have some questions about this method though:
Should I find the tile's overlapping faces the CPU or GPU?
What data structure should be used to store the face lists? They will have variable length, however I believe OpenCL buffers are fixed length.
Sample of host code of current implementation:
// set up vertex shader args
queue.enqueueNDRangeKernel(vertexShader, cl::NullRange, numVerts, cl::NullRange);
// set up rasterizer args
queue.enqueueNDRangeKernel(rasterizer, cl::NullRange, numFaces, cl::NullRange);
// set up fragment shader args
queue.enqueueNDRangeKernel(fragmentShader, cl::NullRange, numPixels, cl::NullRange);
// read frame buffer to draw to screen
queue.enqueueReadBuffer(buffer_screen, CL_TRUE, 0, width * height * 3 * sizeof(unsigned char), screen);
Sample of rasterizer kernel:
float2 bboxmin = (float2)(INFINITY,INFINITY);
float2 bboxmax = (float2)(-INFINITY,-INFINITY);
float2 clampCoords = (float2)(width-1, height-1);
// get bounding box
for (int i=0; i<3; i++) {
for (int j=0; j<2; j++) {
bboxmin[j] = max(0.f, min(bboxmin[j], vs[i][j]));
bboxmax[j] = min(clampCoords[j], max(bboxmax[j], vs[i][j]));
}
}
// loop over all pixels in bounding box
// this is the part that needs to be improved
int2 pix;
for (pix.x=bboxmin.x; pix.x<=bboxmax.x; pix.x++) {
for (pix.y=bboxmin.y; pix.y<=bboxmax.y; pix.y++) {
float3 bc_screen = barycentric(vs[0].xy, vs[1].xy, vs[2].xy, (float2)(pix.x,pix.y), offset);
float3 bc_clip = (float3)(bc_screen.x/vsVP[0][3], bc_screen.y/vsVP[1][3], bc_screen.z/vsVP[2][3]);
bc_clip = bc_clip/(bc_clip.x+bc_clip.y+bc_clip.z);
float frag_depth = dot(homoZs, bc_clip);
int pixInd = pix.x+pix.y*width;
if (bc_screen.x<0 || bc_screen.y<0 || bc_screen.z<0 || zbuffer[pixInd]>frag_depth) continue;
zbuffer[pixInd] = frag_depth;
}
}
A workaround is to cancel rasterization if a face gets too large and just return. This will lead to some visual artifacts, but at least the frame rate won't suffer.
Originally using glDrawElementsInstancedBaseVertex to draw the scene meshes. All the meshes vertex attributes are being interleaved in a single buffer object. In total there are only 30 unique meshes. So I've been calling draw 30 times with instance counts, etc. but now I want to batch the draw calls into one using glMultiDrawElementsIndirect. Since I have no experience with this command function, I've been reading articles here and there to understand the implementation with little success. (For testing purposes all meshes are instanced only once).
The command structure from the OpenGL reference page.
struct DrawElementsIndirectCommand
{
GLuint vertexCount;
GLuint instanceCount;
GLuint firstVertex;
GLuint baseVertex;
GLuint baseInstance;
};
DrawElementsIndirectCommand commands[30];
// Populate commands.
for (size_t index { 0 }; index < 30; ++index)
{
const Mesh* mesh{ m_meshes[index] };
commands[index].vertexCount = mesh->elementCount;
commands[index].instanceCount = 1; // Just testing with 1 instance, ATM.
commands[index].firstVertex = mesh->elementOffset();
commands[index].baseVertex = mesh->verticeIndex();
commands[index].baseInstance = 0; // Shouldn't impact testing?
}
// Create and populate the GL_DRAW_INDIRECT_BUFFER buffer... bla bla
Then later down the line, after setup I do some drawing.
// Some prep before drawing like bind VAO, update buffers, etc.
// Draw?
if (RenderMode == MULTIDRAW)
{
// Bind, Draw, Unbind
glBindBuffer(GL_DRAW_INDIRECT_BUFFER, m_indirectBuffer);
glMultiDrawElementsIndirect (GL_TRIANGLES, GL_UNSIGNED_INT, nullptr, 30, 0);
glBindBuffer(GL_DRAW_INDIRECT_BUFFER, 0);
}
else
{
for (size_t index { 0 }; index < 30; ++index)
{
const Mesh* mesh { m_meshes[index] };
glDrawElementsInstancedBaseVertex(
GL_TRIANGLES,
mesh->elementCount,
GL_UNSIGNED_INT,
reinterpret_cast<GLvoid*>(mesh->elementOffset()),
1,
mesh->verticeIndex());
}
}
Now the glDrawElements... still works fine like before when switched. But trying glMultiDraw... gives indistinguishable meshes but when I set the firstVertex to 0 for all commands, the meshes look almost correct (at least distinguishable) but still largely wrong in places?? I feel I'm missing something important about indirect multi-drawing?
//Indirect data
commands[index].firstVertex = mesh->elementOffset();
//Direct draw call
reinterpret_cast<GLvoid*>(mesh->elementOffset()),
That's not how it works for indirect rendering. The firstVertex is not a byte offset; it's the first vertex index. So you have to divide the byte offset by the size of the index to compute firstVertex:
commands[index].firstVertex = mesh->elementOffset() / sizeof(GLuint);
The result of that should be a whole number. If it wasn't, then you were doing unaligned reads, which probably hurt your performance. So fix that ;)
I'm currently trying to teach myself some OpenGL using some Tutorials and LWJGL. Obviously I'm just at rendering cubes.
What I've done up until now, and what works is, that for each cube I'll do
glUniformMatrix4(RenderProgram.ModelMatrixID, false,
renderobject.getTransformationBuffer());
glDrawElements(GL_TRIANGLES, renderobject.Model.countIndices(),
GL_UNSIGNED_INT, renderobject.Model.indexOffset);
Since that only gives me about 50-55 FPS with about 70k cubes, I decided trying instanced rendering, like so:
glDrawElementsInstanced(GL_TRIANGLES, Model.countIndices(),
GL_UNSIGNED_INT, 0, instanceCount);
Of course I've created another buffer for that beforehand, filling it with renderobject.getTransformationBuffer() of each cube and I'm binding this buffer before I try to draw instanced.
I also added it to my vertex shader like so layout(location = 12) in mat4 mModel and I've initialized the attrib pointers like so:
for (int i = 0; i < 4; i++) {
glEnableVertexAttribArray(12 + i);
glVertexAttribPointer(12 + i, 4, GL_FLOAT, false, Float.BYTES * 16,
Float.BYTES * 4 * i);
glVertexAttribDivisor(InstanceBufferID, 1);
}
I get no errors and while I don't see anything on screen, it's rendering and I see an FPS increase of about 350% so I think that I don't get the right model matrix in the shader.
Unfortunately I can't debug variable contents within the shader :) So I'm a little bit stumped as to what I might be missing or how I could unravel this... Also, obviously, Google didn't help me much either and SO just comes up with glDrawElements not working for people.
Edit: The accepted answer was the one error that could be determined from the code provided. However, I had another error in the code, which needed fixing before finally something was visible on the screen, which I'd like to share as well: I unbound the VAO before populating the VBO with the matrix data. As soon as I pushed that unbinding after loading the data into the VBO it worked!
Edit2: Interestingly the performance increase is even more imense now that something IS rendered. With my blank screen I got around 170 FPS for around 70k cubes. Now that it renders correctly I'm getting around 350-400 FPS for around 270k cubes! I didn't expect that.
The first argument to glVertexAttribDivisor should be the index of the vertex attribute that you want to use as an instanced array and not InstanceBufferID.
This should thus become:
for (int i = 0; i < 4; i++) {
glEnableVertexAttribArray(12 + i);
glVertexAttribPointer(12 + i, 4, GL_FLOAT, false, Float.BYTES * 16,
Float.BYTES * 4 * i);
glVertexAttribDivisor(12 + i, 1);
}
I'm currently working on a program which supports depth-independent (also known as order-independent) alpha blending. To do that, I implemented a per-pixel linked list, using a texture for the header (points for every pixel to the first entry in the linked list) and a texture buffer object for the linked list itself. While this works fine, I would like to exchange the texture buffer object with a shader storage buffer as an excercise.
I think I almost got it, but it took me about a week to get to a point where I could actually use the shader storage buffer. My question are:
Why I can't map the shader storage buffer?
Why is it a problem to bind the shader storage buffer again?
For debugging, I just display the contents of the shader storage buffer (which doesn't contain a linked list yet). I created the shader storage buffer in the following way:
glm::vec4* bufferData = new glm::vec4[windowOptions.width * windowOptions.height];
glm::vec4* readBufferData = new glm::vec4[windowOptions.width * windowOptions.height];
for(unsigned int y = 0; y < windowOptions.height; ++y)
{
for(unsigned int x = 0; x < windowOptions.width; ++x)
{
// Set the whole buffer to red
bufferData[x + y * windowOptions.width] = glm::vec4(1,0,0,1);
}
}
GLuint ssb;
// Get a handle
glGenBuffers(1, &ssb);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssb);
// Create buffer
glBufferData(GL_SHADER_STORAGE_BUFFER, windowOptions.width * windowOptions.height * sizeof(glm::vec4), bufferData, GL_DYNAMIC_COPY);
// Now bind the buffer to the shader
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, ssb);
In the shader, the shader storage buffer is defined as:
layout (std430, binding = 0) buffer BufferObject
{
vec4 points[];
};
In the rendering loop, I do the following:
glUseProgram(defaultProgram);
for(unsigned int y = 0; y < windowOptions.height; ++y)
{
for(unsigned int x = 0; x < windowOptions.width; ++x)
{
// Create a green/red color gradient
bufferData[x + y * windowOptions.width] =
glm::vec4((float)x / (float)windowOptions.width,
(float)y / (float)windowOptions.height, 0.0f, 1.0f);
}
}
glMemoryBarrier(GL_ALL_BARRIER_BITS); // Don't know if this is necessary, just a precaution
glBufferSubData(GL_SHADER_STORAGE_BUFFER, 0, windowOptions.width * windowOptions.height * sizeof(glm::vec4), bufferData);
// Retrieving the buffer also works fine
// glMemoryBarrier(GL_ALL_BARRIER_BITS);
// glGetBufferSubData(GL_SHADER_STORAGE_BUFFER, 0, windowOptions.width * windowOptions.height * sizeof(glm::vec4), readBufferData);
glMemoryBarrier(GL_ALL_BARRIER_BITS); // Don't know if this is necessary, just a precaution
// Draw a quad which fills the screen
// ...
This code works, but when I replace glBufferSubData with the following code,
glm::vec4* p = (glm::vec4*)glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0, windowOptions.width * windowOptions.height, GL_WRITE_ONLY);
for(unsigned int x = 0; x < windowOptions.width; ++x)
{
for(unsigned int y = 0; y < windowOptions.height; ++y)
{
p[x + y * windowOptions.width] = glm::vec4(0,1,0,1);
}
}
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
the mapping fails, returning GL_INVALID_OPERATION. It seems like the shader storage buffer is still bound to something, so it can't be mapped. I read something about glGetProgramResourceIndex (http://www.opengl.org/wiki/GlGetProgramResourceIndex) and glShaderStorageBlockBinding (http://www.opengl.org/wiki/GlShaderStorageBlockBinding), but I don't really get it.
My second question is, why I can neither call
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, ssb);
, nor
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssb);
in the render loop after glBufferSubData and glMemoryBarrier. This code should not change a thing, since these calls are the same as during the creation of the shader storage buffer. If I can't bind different shader storage buffers, I can only use one. But I know that more than one shader storage buffer is supported, so I think I'm missing something else (like "releasing" the buffer).
First of all, the glMapBufferRange fails simply because GL_WRITE_ONLY is not a valid argument to it. That was used for the old glMapBuffer, but glMapBufferRange uses a collection of flags for more fine-grained control. In your case you need GL_MAP_WRITE_BIT instead. And since you seem to completely overwrite the whole buffer, without caring for the previous values, an additional optimization would probably be GL_MAP_INVALIDATE_BUFFER_BIT. So replace that call with:
glm::vec4* p = (glm::vec4*)glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0,
windowOptions.width * windowOptions.height,
GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT);
The other error is not described that well in the question. But fix this one first and maybe it will already help with the following error.
Been delving into un-managed DirectX 11 for the first time (bear with me) and there's an issue that, although asked several times over the forums still leaves me with questions.
I am developing as app in which objects are added to the scene over time. On each render loop I want to collect all vertices in the scene and render them reusing a single vertex and index buffer for performance and best practice. My question is regarding the usage of dynamic vertex and index buffers. I haven't been able to fully understand their correct usage when scene content changes.
vertexBufferDescription.Usage = D3D11_USAGE_DYNAMIC;
vertexBufferDescription.BindFlags = D3D11_BIND_VERTEX_BUFFER;
vertexBufferDescription.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
vertexBufferDescription.MiscFlags = 0;
vertexBufferDescription.StructureByteStride = 0;
Should I create the buffers when the scene is initialized and somehow update their content in every frame? If so, what ByteSize should I set in the buffer description? And what do I initialize it with?
Or, should I create it the first time the scene is rendered (frame 1) using the current vertex count as its size? If so, when I add another object to the scene, don't I need to recreate the buffer and changing the buffer description's ByteWidth to the new vertex count? If my scene keeps updating its vertices on each frame, the usage of a single dynamic buffer would loose its purpose this way...
I've been testing initializing the buffer on the first time the scene is rendered, and from there on, using Map/Unmap on each frame. I start by filling in a vector list with all the scene objects and then update the resource like so:
void Scene::Render()
{
(...)
std::vector<VERTEX> totalVertices;
std::vector<int> totalIndices;
int totalVertexCount = 0;
int totalIndexCount = 0;
for (shapeIterator = models.begin(); shapeIterator != models.end(); ++shapeIterator)
{
Model* currentModel = (*shapeIterator);
// totalVertices gets filled here...
}
// At this point totalVertices and totalIndices have all scene data
if (isVertexBufferSet)
{
// This is where it copies the new vertices to the buffer.
// but it's causing flickering in the entire screen...
D3D11_MAPPED_SUBRESOURCE resource;
context->Map(vertexBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &resource);
memcpy(resource.pData, &totalVertices[0], sizeof(totalVertices));
context->Unmap(vertexBuffer, 0);
}
else
{
// This is run in the first frame. But what if new vertices are added to the scene?
vertexBufferDescription.ByteWidth = sizeof(VERTEX) * totalVertexCount;
UINT stride = sizeof(VERTEX);
UINT offset = 0;
D3D11_SUBRESOURCE_DATA resourceData;
ZeroMemory(&resourceData, sizeof(resourceData));
resourceData.pSysMem = &totalVertices[0];
device->CreateBuffer(&vertexBufferDescription, &resourceData, &vertexBuffer);
context->IASetVertexBuffers(0, 1, &vertexBuffer, &stride, &offset);
isVertexBufferSet = true;
}
In the end of the render loop, while keeping track of the buffer position of the vertices for each object, I finally invoke Draw():
context->Draw(objectVertexCount, currentVertexOffset);
}
My current implementation is causing my whole scene to flicker. But no memory leaks. Wonder if it has anything to do with the way I am using the Map/Unmap API?
Also, in this scenario, when would it be ideal to invoke buffer->Release()?
Tips or code sample would be great! Thanks in advance!
At the memcpy into the vertex buffer you do the following:
memcpy(resource.pData, &totalVertices[0], sizeof(totalVertices));
sizeof( totalVertices ) is just asking for the size of a std::vector< VERTEX > which is not what you want.
Try the following code:
memcpy(resource.pData, &totalVertices[0], sizeof( VERTEX ) * totalVertices.size() );
Also you don't appear to calling IASetVertexBuffers when isVertexBufferSet is true. Make sure you do so.