Improve Particle system OpenGL - opengl

I'm looking for a way to improve my particle system performance, since it is very costly in terms of FPS.
This is because I call:
glDrawElements(GL_TRIANGLE_STRIP, mNumberOfIndices,
GL_UNSIGNED_SHORT, 0);
I call this method for every particle in my application (which could be between 1000 and 5000 particles). Note that when increasing to over 1000 particles my application starting to drop down in FPS. I'm using VBO:s, but the performance when calling this method is too costly.
Any ideas how I can make the particle system more efficient?
Edit: This is how my particle system draw things:
glBindTexture(GL_TEXTURE_2D, textureObject);
glBindBuffer(GL_ARRAY_BUFFER, vboVertexBuffer[0]);
glVertexPointer(3, GL_FLOAT, 0, 0);
glBindBuffer(GL_ARRAY_BUFFER, vboTextureBuffer[0]);
glTexCoordPointer(2, GL_FLOAT, 0, 0);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, vboIndexBuffer[0]);
Vector3f partPos;
for (int i = 0; i < m_numParticles; i++) {
partPos = m_particleList[i].m_pos;
glTranslatef(partPos.x, partPos.y, partPos.z);
glDrawElements(GL_TRIANGLE_STRIP, mNumberOfIndices,
GL_UNSIGNED_SHORT, 0);
gl.glTranslatef(-partPos.x, -partPos.y, -partPos.z);
}

The way you describe it, it sounds like you have a own VBO for each particle. This is not how it should be done. Put all particles into a single VBO and draw them all at once using a single glDrawElements or glDrawArrays call. Or even better, if available: Use instancing.

Expanding a bit on what datenwolf said, just pack all your particle indices into a single index buffer and draw all particles with a single glDrawElements call. This means you cannot use triangle strips anymore but a triangle set, but that shouldn't be too much of a problem.
Otherwise, if your hardware supports instanced rendering (or better, instanced arrays), you can do it by just rendering a single particle n times with the position and texCoord data taken from the respective arrays for each particle. You then still need to compute the four corner's position and texCoord data in the vertex shader (assuming you draw a quad for each particle), as with the instanced arrays you only get one attribute per instance (particle).
You might also use the geometry shader to create the particle's quad and just render a single point set, but I assume this might be slower than instancing, considering that SM4/GL3 hardware is quite likely to support instancing, too.

Related

Is it necessary to bind all VBOs (and textures) each frame?

I'm following basic tutorial on OpenGL 3.0. What is not clear to me why/if I have to bind, enable and unbind/disable all vertex buffers and textures each frame.
To me it seems too much gl**** calls which I guess have some overhead. For example here you see each frame several blocks like:
// do this for each mesh in scene
// vertexes
glEnableVertexAttribArray(0);
glBindBuffer(GL_ARRAY_BUFFER, vertex_buffer);
glVertexAttribPointer( 0, 3, GL_FLOAT,GL_FALSE,0,(void*)0);
// normals
glEnableVertexAttribArray(1);
glBindBuffer(GL_ARRAY_BUFFER, normal_buffer );
glVertexAttribPointer( 1, 3, GL_FLOAT,GL_FALSE,0,(void*)0);
// UVs
glEnableVertexAttribArray(2);
glBindBuffer(GL_ARRAY_BUFFER, uv_buffer );
glVertexAttribPointer( 2, 2, GL_FLOAT,GL_FALSE,0,(void*)0);
// ...
glDrawArrays(GL_TRIANGLES, 0, nVerts );
// ...
glDisableVertexAttribArray(0);
glDisableVertexAttribArray(1);
glDisableVertexAttribArray(2);
imagine you have not just one but 100 different meshes each with it's own VBOs for vertexes,normas,UVs. Should I really do this procedure each frame for each of them? Sure I can encapsulate that complexity into some function/objects, but I worry about overheads of this gl**** function calls.
Is it not possible some part of this machinery to move from per frame loop into scene setup ?
Also I read that VAO is a way how to pack corresponding VBOs for one object together. And that binding VAO automatically binds corresponding VBOs. So I was thinking that maybe one VAO for each mesh (not instance) is how it should be done - but according to this answer it does not seems so?
First things first: Your concerns about GL call overhead have been addressed with the introduction of Vertex Array Objects (see #Criss answer). However the real problem with your train of thought is, that you equate VBOs with geometry meshes, i.e. give each geometry its own VBO.
That's not how you should see and use VBOs. VBOs are chunks of memory and you can put the data of several objects into a single VBO; you don't have to draw the whole thing, you can limit draw calls to subsets of a VBO. And you can coalesce geometries with similar or even identical drawing setup and draw them all at once with a single draw call. Either by having the right vertex index list, or by use of instancing.
When it comes to the binding state of textures… well, yeah, that's a bit more annoying. You really have to do the whole binding dance when switching textures. That's why in general you sort geometry by texture/shader before drawing, so that the amount of texture switches is minimized.
The last 3 or 4 generations of GPUs (as of late 2016) do support bindless textures though, where you can access textures through a 64 bit handle (effectively the address of the relevant data structure in some address space) in the shader. However bindless textures did not yet make it into the core OpenGL standard and you have to use vendor extensions to make use of it.
Another interesting approach (popularized by Id Tech 4) is virtual textures. You can allocate sparsely populated texture objects that are huge in their addressable size, but only part of them actually populated with data. During program execution you determine which areas of the texture are required and swap in the required data on demand.
You should use vertex array object (generated by glGenVertexArrays). Thanks to it you don't have to perform those calls everytime. Vertex buffer object stores:
Calls to glEnableVertexAttribArray or glDisableVertexAttribArray.
Vertex attribute configurations via glVertexAttribPointer.
Vertex buffer objects associated with vertex attributes by calls to
glVertexAttribPointer.
Maybe this will be better tutorial.
So that you can generate vao object, then bind it, perform the calls and unbind. Now in drawing loop you just have to bind vao.
Example:
glUseProgram(shaderId);
glBindVertexArray(vaoId);
glDrawArrays(GL_TRIANGLES, 0, 3);
glBindVertexArray(0);
glUseProgram(0);

Is there a way to create the verts/indices of a cube that can be well represented by line drawing mode as well as render correctly in triangles mode?

Attempting to switch drawing mode to GL_LINE, GL_LINE_STRIP or GL_LINE_LOOP when your cube's vertex data is constructed mainly for use with GL_TRIANGLES presents some interesting results but none that provide a good wireframe representation of the cube.
Is there a way to construct the cube's vertex and index data so that simply toggling the draw mode between GL_LINES/GL_LINE_STRIP/GL_LINE_LOOP and GL_TRIANGLES provides nice results? Or is the only way to get a good wireframe to re-create the vertices specifically for use with one of the line modes?
The most practical approach is most likely the simplest one: Use separate index arrays for line and triangle rendering. There is certainly no need to replicate the vertex attributes, but drawing entirely different primitive types with the same indices sounds highly problematic.
To implement this, you could use two different index (GL_ELEMENT_ARRAY_BUFFER) buffers. Or, more elegantly IMHO, use a single buffer, and store both sets of indices in it. Say you need triIdxCount indices for triangle rendering, and lineIdxCount for line rendering. You can then set up you index buffer with:
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, indexBuf);
glBufferData(GL_ELEMENT_ARRAY_BUFFER,
(triIdxCount + lineIdxCount) * sizeof(GLushort), 0,
GL_STATIC_DRAW);
glBufferSubData(GL_ELEMENT_ARRAY_BUFFER,
0, triIdxCount * sizeof(GLushort), triIdxArray);
glBufferSubData(GL_ELEMENT_ARRAY_BUFFER,
triIdxCount * sizeof(GLushort), lineIdxCount * sizeof(GLushort),
lineIdxArray);
Then, when you're ready to draw, set up all your state, including the index buffer binding (ideally using a VAO for all of the state setup) and then render conditionally:
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, indexBuf);
if (renderTri) {
glDrawElements(GL_TRIANGLES, triIndexCount, GL_UNSIGNED_SHORT, 0);
} else {
glDrawElements(GL_LINES, lineIdxCount, GL_UNSIGNED_SHORT,
triIndexCount * sizeof(GLushort));
}
From a memory usage point of view, having two sets of indices is a moderate amount of overhead. The actual vertex attribute data is normally much bigger than the index data, and the key point here is that the attribute data is not replicated.
If you don't strictly want to render lines, but just have a requirement for wireframe types of rendering, there are other options. There is for example an elegant approach (never implemented it myself, but it looks clever) where you only draw pixels close to the boundary of polygons, and discard the interior pixels in the fragment shader based on the distance to the polygon edge. This question (where I contributed an answer) elaborates on the approach: Wireframe shader - Issue with Barycentric coordinates when using shared vertices.

Can OpenGL be used to draw real valued triangles into buffer?

I need to implement an image reconstruction which involves drawing triangles in a buffer representing the pixels in an image. These triangles are assigned some floating point value to be filled with. If triangles are drawn such that they overlap, the values of the overlapping regions must be added together.
Is it possible to accomplish this with OpenGL? I would like to take advantage of the fact that rasterizing triangles is a basic graphics task that can be accelerated on the graphics card. I have a cpu-only implementation of this algorithm already but it is not fast enough for my purposes. This is due to the huge number of triangles that need to be drawn.
Specifically my questions are:
Can I draw triangles with a real value using openGL? (Or can I come up with a hack using color etc?)
Can OpenGL add the values where triangles overlap? (Once again I could deal with a hack, like color mixing)
Can I recover the real values for the pixels as an array of floats or similar to be further processed?
Do I have misconceptions about the idea that drawing in OpenGL -> using GPU to draw -> likely faster execution?
Additionally, I would like to run this code on a virtual machine so getting acceleration to work with OpenGL is more feasible than rolling my own implementation in something like Cuda as far as I understand. Is this true?
EDIT: Is an accumulation buffer an option here?
If 32-bit floats are sufficient then it looks like the answer is yes: http://www.opengl.org/wiki/Image_Format#Required_formats
Even under the old fixed pipeline you could use a blending mode with the function GL_FUNC_ADD, though I'm sure fragment shaders can do it more easily now.
glReadPixels() will get you the data back out of the buffer after drawing.
There are software implementations of OpenGL, but you get to choose when you set up the context. Using the GPU should be much faster than the CPU.
No idea. I've never used OpenGL or CUDA on a VM. I've never used CUDA at all.
I guess giving pieces of code as an answer wouldn't be appropriate here as your question is extremely broad. So I'll simply answer your questions individually with bits of hints.
Yes, drawing triangles with openGL is a piece of cake. You provide 3 vertice per triangle and with the proper shaders your can draw triangles, with filling or just edges, whatever you want. You seem to require a large set (bigger than [0, 255]) since a lot of triangles may overlap, and the value of each may be bigger than one. This is not a problem. You can fill a 32bit precision one channel frame buffer. In your case only one channel may suffice.
Yes, the blending exists since forever on openGL. So whatever the version of openGL you choose to use, there will be a way to add up the value of the trianlges overlapping.
Yes, depending on you implement it you may have to use glGetSubData() or glReadPixels or something else. However, depending on the size of the matrix you're filling, it may be a bit long to download the full buffer (2000x1000 pixels for a one channel at 32bit would be around 4-5ms). It may be more efficient to do all your processing on the GPU and extract only few valuable information instead of continuing the processing on the CPU.
The execution will be undoubtedly faster. However, the download of data from the GPU memory is often not optimized (upload is). So the time you will win on the processing may be lost on the download. I've never worked with openGL on VM so the additional loss of performance is unknown to me.
//Struct definition
struct Triangle {
float[2] position;
float intensity;
};
//Init
glGenBuffers(1, &m_buffer);
glBindBuffer(GL_ARRAY_BUFFER, 0, m_buffer);
glBufferData(GL_ARRAY_BUFFER,
triangleVector.data() * sizeof(Triangle),
triangleVector.size(),
GL_DYNAMIC_DRAW);
glBindBufferBase(GL_ARRAY_BUFFER, 0, 0);
glGenVertexArrays(1, &m_vao);
glBindVertexArray(m_vao);
glBindBuffer(GL_ARRAY_BUFFER, m_buffer);
glVertexAttribPointer(
POSITION,
2,
GL_FLOAT,
GL_FALSE,
sizeof(Triangle),
0);
glVertexAttribPointer(
INTENSITY,
1,
GL_FLOAT,
GL_FALSE,
sizeof(Triangle),
sizeof(float)*2);
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindVertexArray(0);
glGenTextures(1, &m_texture);
glBindTexture(GL_TEXTURE_2D, m_texture);
glTexImage2D(
GL_TEXTURE_2D,
0,
GL_R32F,
width,
height,
0,
GL_RED,
GL_FLOAT,
NULL);
glBindTexture(GL_FRAMEBUFFER, 0);
glGenFrameBuffers(1, &m_frameBuffer);
glBindFrameBuffer(GL_FRAMEBUFFER, m_frameBuffer);
glFramebufferTexture(
GL_FRAMEBUFFER,
GL_COLOR_ATTACHMENT0,
m_texture,
0);
glBindFrameBuffer(GL_FRAMEBUFFER, 0);
After you just need to write your render function. A simple glDraw*() should be enough. Just remember to bind your buffers correctly. To enable the blending with the proper function. You might also want to disable the anti aliasing for your case. At first I'd say you need an ortho projection but I don't have all the element of your problem so it's up to you.
Long story short, if you never worked with openGL, the piece of code above will be relevant only after you read few documentation/tutorials on openGL/GLSL.

glColorPointer trying to improve fps

I've finnaly reached the point that I can add some color to my vertices. But now I want to improve my FPS rate. Here is the current situation. I have a large number of vertices (~200000) , and each of them can be in one of ~150 classes. Each class is differenced by it`s color. I'm currently drawing my vertices like:
glEnableClientState(GL_VERTEX_ARRAY)
glBindBufferARB(GL_ARRAY_BUFFER_ARB, self.bufferVertices)
glVertexPointer(3, GL_FLOAT, 0, None)
glEnableClientState(GL_NORMAL_ARRAY);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, self.bufferNormals)
glNormalPointer(GL_FLOAT, 0, None)
glEnableClientState(GL_COLOR_ARRAY)
glBindBufferARB(GL_ARRAY_BUFFER_ARB, self.bufferColors)
glColorPointer(3, GL_FLOAT, 0, None)
glDrawArrays( GL_POINTS, 0, len(self.vertices) / 3 )
Everything works fine and the FPS is ~60. All my buffers are generated only at init time. But now the colors that represent each class will change rapidly, once about every millisecond. With that the colors will also change so I would have to recreate my color buffer for each draw, and for 200000 vertices I have a feeling the FPS will be close to 0. The fix I'm trying to implement is to keep a fixed color buffer, but instead of the actual colors, they would retain a pointer to the class it`s represented by. That way, only the 200 colors of the classes will change. Problem is I don't know how this could be implemented in OpenGL. Is this doable ? Any pointers in how to do this?
Instead of colors the classes should set a 1D texture coordinate, then instead of changing that huge dataset you have to replace only parts of the texture. To avoid texture sampling interpolation set the texture into GL_NEAREST filtering mode, and sample the texture in the vertex shader per vertex (i.e. no per fragment texture sampling as you'd do usually).
I think you should seriously take a look at shaders. You shouldn't need to recreate your buffer every time. http://en.wikipedia.org/wiki/GLSL. The shaders will use your buffer to perform all kinds of operations in color and geometry.

Replicate in space an object with position and orientation from GPU kernel

Context
I am doing swarm simulation using GPU programming (both OpenCL and CUDA,
but not at the same time of course) for scientific purpose.
I use OpenGL for display.
Goal
I would like to draw the same object —namely the swarming particle, can be a simple triangle in 2D— N times at different positions and with
different orientations in the most efficient way knowing that:
the object is always exactly the same
the positions and orientations are calculated on the GPU and thus stored in the GPU memory
the number of particles N can be large
Current solution
So far, to avoid sending back the data to the CPU, I store the position and
orientation arrays in a VBO and use:
glBindBuffer(GL_ARRAY_BUFFER, position_vbo);
glEnableClientState(GL_VERTEX_ARRAY);
glVertexPointer(2, GL_FLOAT, 0, 0);
glBindBuffer(GL_ARRAY_BUFFER, velocity_vbo);
glEnableClientState(GL_COLOR_ARRAY);
glColorPointer(4, GL_FLOAT, 0, 0);
glDrawArrays(GL_POINTS, 0, N);
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_COLOR_ARRAY);
glBindBuffer(GL_ARRAY_BUFFER, 0);
to draw a set of points with color-coded velocity without copying back the arrays to the CPU.
What I would like to do is something like drawing a full object instead of a simple point
using a similar way ie without copying back the VBO's to the CPU.
Basically I would like to store on the GPU the model of an object
(a Display List? a Vertex Array?) and to use the positions and orientations on the GPU
to draw the object N times without sending data back to the CPU.
Is it possible and how? Else, how should I do it?
PS: I like keeping the code clean so I would rather separate the display issues from the swarming kernel.
I believe you can do this with a geometry shader (available in OpenGL 3.2). See this tutorial for specific information.
In your case, you need to make the input type and output type of the geometry shader to GL_POINTS and GL_TRIANGLES respectively, and in your geometry shader, emit the 3 vertices of your triangle for each incoming point vertex.