2D Particle System - Performance - c++

I have implemented a 2D Particle System based on the ideas and concepts outlined in "Bulding an Advanced Particle System" (John van der Burg, Game Developer Magazine, March 2000).
Now I am wondering what performance I should expect from this system. I am currently testing it within the context of my simple (unfinished) SDL/OpenGL platformer, where all particles are updated every frame. Drawing is done as follows
// Bind Texture
glBindTexture(GL_TEXTURE_2D, *texture);
// for all particles
glBegin(GL_QUADS);
glTexCoord2d(0,0); glVertex2f(x,y);
glTexCoord2d(1,0); glVertex2f(x+w,y);
glTexCoord2d(1,1); glVertex2f(x+w,y+h);
glTexCoord2d(0,1); glVertex2f(x,y+h);
glEnd();
where one texture is used for all particles.
It runs smoothly up to about 3000 particles. To be honest I was expecting a lot more, particularly since this is meant to be used with more than one system on screen. What number of particles should I expect to be displayed smoothly?
PS: I am relatively new to C++ and OpenGL likewise, so it might well be that I messed up somewhere!?
EDIT Using POINT_SPRITE
glEnable(GL_POINT_SPRITE);
glBindTexture(GL_TEXTURE_2D, *texture);
glTexEnvi(GL_POINT_SPRITE, GL_COORD_REPLACE, GL_TRUE);
// for all particles
glBegin(GL_POINTS);
glPointSize(size);
glVertex2f(x,y);
glEnd();
glDisable( GL_POINT_SPRITE );
Can't see any performance difference to using GL_QUADS at all!?
EDIT Using VERTEX_ARRAY
// Setup
glEnable (GL_POINT_SPRITE);
glTexEnvi(GL_POINT_SPRITE, GL_COORD_REPLACE, GL_TRUE);
glPointSize(20);
// A big array to hold all the points
const int NumPoints = 2000;
Vector2 ArrayOfPoints[NumPoints];
for (int i = 0; i < NumPoints; i++) {
ArrayOfPoints[i].x = 350 + rand()%201;
ArrayOfPoints[i].y = 350 + rand()%201;
}
// Rendering
glEnableClientState(GL_VERTEX_ARRAY); // Enable vertex arrays
glVertexPointer(2, GL_FLOAT, 0, ArrayOfPoints); // Specify data
glDrawArrays(GL_POINTS, 0, NumPoints); // ddraw with points, starting from the 0'th point in my array and draw exactly NumPoints
Using VAs made a performance difference to the above. I've then tried VBOs, but don't really see a performance difference there?

I can't say how much you can expect from that solution, but there are some ways to improve it.
Firstly, by using glBegin() and glEnd() you are using immediate mode, which is, as far as I know, the slowest way of doing things. Furthermore, it isn't even present in the current OpenGL standard anymore.
For OpenGL 2.1
Point Sprites:
You might want to use point sprites. I implemented a particle system using them and came up with a nice performance (for my knowledge back then, at least). Using point sprites you are doing less OpenGL calls per frame and you send less data to the graphic card (or even have the data stored at the graphic card, not sure about that). A short google search should even give you some implementations of that to look at.
Vertex Arrays:
If using point sprites doesn't help, you should consider using vertex arrays in combination with point sprites (to save a bit of memory). Basically, you have to store the vertex data of the particles in an array. You then enable vertex array support by calling glEnableClientState() with GL_VERTEX_ARRAY as parameter. After that, you call glVertexPointer() (the parameters are explained in the OpenGL documentation) and call glDrawArrays() to draw the particles. This will reduce your OpenGL calls to only a handfull instead of 3000 calls per frame.
For OpenGL 3.3 and above
Instancing:
If you are programming against OpenGL 3.3 or above, you can even consider using instancing to draw your particles, which should speed that up even further. Again, a short google search will let you look at some code about that.
In General:
Using SSE:
In addition, some time might be lost while updating your vertex positions. So, if you want to speed that up, you can take a look at using SSE for updating them. If done correctly, you will gain a lot of performance (at a large amount of particles at least)
Data Layout:
Finally, I recently found a link (divergentcoder.com/programming/aos-soa-explorations-part-1, thanks Ben) about structures of arrays (SoA) and arrays of structures (AoS). They were compared on how they affect the performance with an example of a particle system.

Consider using vertex arrays instead of immediate mode (glBegin/End): http://www.songho.ca/opengl/gl_vertexarray.html
If you are willing to get into shaders, you could also search for "vertex shader" and consider using that approach for your project.

Related

OpenGL what do I have to do before drawing a triangle?

Most of the tutorials, guides and books that I've found out there are related to OpenGL, explains how to draw a triangle and initialize OpenGL. That's fine. But when they try to explain it they just list a bunch of functions and parameters like:
glClear()
glClearColor()
glBegin()
glEnd()
...
Since I'm not very good at learning things by memory, I always need an answer to "why are we doing this?" so that I'll write that bunch of functions because I remember that I have to set a certain things before doing somethings else and so on not because the tutorial told me so.
Could please someone explain to me what do I have to define to OpenGL (only pure OpenGL, I'm using SFML as background library but that really doesn't matter) before starting to draw something with glBegin() and glEnd()?
Sample answer:
You have to first tell OpenGL what color does it need to clear the
screen with. Because each frame needs to be cleared by the previous
before we start to draw the current one...
First you should know, that OpenGL is a state machine. That means, that apart from creating the OpenGL context (which is done by SFML) there's no such thing as initialization!
Since I'm not very good at learning things by memory,
This is good…
I always need an answer to "why are we doing this?"
This is excellent!
Could please someone explain to me what do I have to define to OpenGL (only pure OpenGL, I'm using SFML as background library but that really doesn't matter) before starting to draw something with glBegin() and glEnd()?
As I already told: OpenGL is a state machine. That basically means, that there are two kinds of calls you can do: Setting state and executing operations.
For example glClearColor sets a state variable, that of the clear color, which value is used for clearing the active framebuffer color to, when a call to glClear with the GL_COLOR_BUFFER_BIT flag set. There exists a similar function glClearDepth for the depth value (GL_DEPTH_BUFFER_BIT flag to glClear).
glBegin and glEnd belong to the immediate mode of OpenGL, which have been deprecated. So there's little reason in learning them. You should use Vertex Arrays instead, preferrably through Vertex Buffer Objects.
But here it goes: glBegin sets OpenGL in a state that it should now draw geometry, of the kind of primitive selected as parameter to glBegin. GL_TRIANGLES for example means, that OpenGL will now interpret every 3 calls to glVertex as forming a triangle. glEnd tells OpenGL that you've finished that batch of triangles. Within a glBegin…glEnd block certain state changes are disallowed. Among those everything that has to do with transforming the geometry and generating the picture, which matrices, shaders, textures, and some others.
One common misconception is, that OpenGL is initialized. This is due to badly written tutorials which have a initGL function or similar. It's a good practice to set all state from scratch when beginning to render a scene. But since a single frame may contain several scenes (think of a HUD or split screen gaming) this happens several times a scene.
Update:
So how do you draw a triangle? Well, it's simple enough. First you need the geometry data. For example this:
GLfloat triangle[] = {
-1, 0, 0,
+1, 0, 0,
0, 1, 0
};
In the render function we tell OpenGL that the next calls to glDrawArrays or glDrawElements shall fetch the data from there (for the sake of simplicity I'll use OpenGL-2 functions here):
glVertexPointer(3, /* there are three scalars per vertex element */
GL_FLOAT, /* element scalars are float */
0, /* elements are tightly packed (could as well be sizeof(GLfloat)*3 */
trignale /* and there you find the data */ );
/* Note that glVertexPointer does not make a copy of the data!
If using a VBO the data is copied when calling glBufferData. */
/* this switches OpenGL into a state that it will
actually access data at the place we pointed it
to with glVertexPointer */
glEnableClientState(GL_VERTEX_ARRAY);
/* glDrawArrays takes data from the supplied arrays and draws them
as if they were submitted sequentially in a for loop to immediate
mode functions. Has some valid applications. Better use index
based drawing for models with a lot of shared vertices. */
glDrawArrays(Gl_TRIANGLE, /* draw triangles */
0, /* start at index 0 */
3, /* process 3 elements (of 3 scalars each) */ );
What I didn't include yet is setting up the transformation and viewport mapping.
The viewport defines how the readily projected and normalized geometry is placed in the window. This state is set using glViewport(pos_left, pos_bottom, width, height).
Transformation today happens in a vertex shader, Essentially a vertex shader is a small program written in a special language (GLSL), that takes the vertex attributes and calculates the clip space position of the resulting vertex. The usual approach for this is emulating the fixed function pipeline, which is a two stage process: First transform the geometry into view space (some calculations, like illumination are easier in this space), then project it into clip space, which is kind of the lens of the renderer. In the fixed function pipeline there are two transformation matrices for this: Modelview and Projection. You set them to whatever is required for the desired outcome. In the case of just a triangle, we leave the modelview identity and use a ortho projection from -1 to 1 in either dimension.
glMatrixMode(GL_PROJECTION);
/* the following function multiplies onto what's already on the stack,
so reset it to identity */
glLoadIdentity();
/* our clip volume is defined by 6 orthogonal planes with normals X,Y,Z
and ditance 1 from origin into each direction */
glOrtho(-1, 1, -1, 1, -1, 1);
glMatrixMode(GL_MODELVIEW);
/* now a identity matrix is loaded onto the modelview */
glLoadIdentity();
Having set up the transformation we can now draw the triangle as outlined above:
draw_triangle();
Finally we need to tell OpenGL we're done with sending commands and it should finish it's renderings.
if(singlebuffered)
glFinish();
However most of the time your window is double buffered, so you need to swap it to make things visime. Since swapping makes no sense without finishing the swap implies a finish
else
SwapBuffers();
You're using the API to set and change the OpenGL state machine.
You're not actually programming directly to the GPU, you're using a medium between your application and your GPU to do whatever you're trying to do.
The reason it is like this and doesn't work the same way as a CPU and memory, is because openGL was intended to run on os/system-independent hardware, so that your code can run on any OS and run on any hardware and not just the one your programming to.
Hence, because of this, you need to learn to use their preset code that makes sure that whatever you're trying to do it will be able to be run on all systems/OS/hardware within a reasonable range.
For example if you were to create your application on windows 8.1 with a certain graphics card(say amd's) you still want your application to be able to run on Andoird/iOS/Linux/other Windows systems/other hardware(gpus) such as Nvidia.
Hence why Khronos, when they created the API, they made it as system/hardware independent as possible so that it can run on everything and be a standard for everyone.
This is the price we have to pay for it, we have to learn their API instead of learning how to directly write to gpu memory and directly utilize the GPU to process information/data.
Although with the introduction of Vulkan things might be different when it is released(also from khronos)and we will find out how it will be working.

Opengl 2D performance tips

I'm currently developing a Touhou-esque bullet hell shooter game. The screen will be absolutely filled with bullets (so instancing is what I want here), but I want this to work on older hardware, so I'm doing something along the lines of this at the moment, there are not colors, textures, etc. yet until I figure this out.
glVertexPointer(3, GL_FLOAT, 0, SQUARE_VERTICES);
for (int i = 0; i < info.centers.size(); i += 3) {
glPushMatrix();
glTranslatef(info.centers.get(i), info.centers.get(i + 1), info.centers.get(i + 2));
glScalef(info.sizes.get(i), info.sizes.get(i + 1), info.sizes.get(i + 2));
glDrawElements(GL_QUADS, 4, GL_UNSIGNED_SHORT, SQUARE_INDICES);
glPopMatrix();
}
Because I want this to work on old hardware I'm trying to avoid shaders and whatnot. The setup up there fails me on about 80 polygons. I'm looking to get at least a few hundred out of this. info is a struct which has all the goodies for rendering, nothing much to it besides a few vectors.
I'm pretty new to OpenGL, but I at least heard and tried out everything that can be done, not saying I'm good with it at all though. This game is a 2D game, I switched from SDL to Opengl because it would make for some fancier effects easier. Obviously SDL works differently, but I never had this problem using it.
It boils down to this, I'm clearly doing something wrong here, so how can I implement instancing for old hardware (OpenGL 1.x) correctly? Also, give me any tips for increasing performance.
Also, give me any tips for increasing performance.
If you're going to use sprites....
Load all sprites into single huge texture. If they don't fit, use several textures, but keep number of textures low - to avoid texture switching.
Switch textures and change OpenGL state as infrequently as possible. Ideally, you should set texture once, and draw everything you can with it.
Use texture fonts for text. FTGL font might look nice, but it can hit performance very hard with complex fonts.
Avoid alpha-blending when possible and use alpha-testing.
When alpha-blending, always use alpha-testing to reduce number of pixels you draw. When your texture has many pixels with alpha==0, cut them out with alpha-test.
Reduce number of very big sprites. Huge screen-aligned/pixel-aligne sprite (1024*1024) will drop FPS even on very good hardware.
Don't use non-power-of-2 sized textures. They (used to) produce huge performance drop on certain ATI cards.
glTranslatef
For 2D sprite-based(that's important) game you could avoid matrices completely (with exception of camera/projection matrices, perhaps). I don't think that matrices will benefit you very much with 2D game.
With 2d game your main bottleneck will be GPU memory transfer speed - transferring data from texture to screen. So "use as little draw calls" and "put everything in VA" won't help you - you can kill performance with one sprite.
However, if you're going to use vector graphics (see area2048(youtube) or rez) that does not use textures, then most of the advice above will not apply, and such game won't be very different from 3d game. In this case it'll be reasonable to use vertex arrays, vertex buffer objects or display lists (depends on what is available) and utilize matrix function - because your bottleneck will be vertex processing. You'll still have to minimize number of state switches.

Opengl slows down when a lot of stuff is on the screen

So I just started switching over from SDL to OpenGL today and I'm having this problem which I didn't have when I was using SDL.
When there's a lot of stuff on the screen the whole thing goes into slow motion. And when I say a lot I mean 200+ objects, but starts to be noticeable maybe from 50.
This is how things are rendered, I have a class Renderable with a virtual void render() which is called by the RenderManager in a loop void manage() which calls render() for every Renderable on screen.
The main loop looks like this
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glLoadIdentity();
_renderManager.manage();
glFlush();
SDL_GL_SwapBuffers();
and render() for the objects I'm using are only squares so
glBegin(GL_QUADS);
// Draw square with colors
glEnd();
My CPU usage or memory usage don't seem to be high at all, it's just like... the game is slowing down.
I guess the problem is that you are using immediate mode, you should use Vertex Arrays if you have performance problems.
We don't know all of your code so it's difficult to give a complete answer but using vertex arrays is surely the first step you should ensure if things go slowly.
Take a look here: http://www.songho.ca/opengl/gl_vertexarray.html
Basically the fact is that with glBegin...glEnd you end up doing many calls to the GPU while with vertex arrays you precompute your shapes, you save them in buffers and directly draw them reducing the number of calls by a significant amount.
Sorry, I've never used OpenGL and I really had no idea what I was doing, but I've found a solution (and am working on it now!). Also thank you Jack for you answer! I'm definitely using vertex arrays now, I've found that glBegin() and glEnd() are deprecated. I might even try vertex buffer objects.
The problem was each individual Renderable was calling glBegin() and glEnd() when the RenderManager calls render() for each Renderable each loop. That causes a lot of stress for the GPU.
At the time of writing this answer I have a sendVertices(GLfloat vertices[]); in my RenderManager which adds all vertices to an std::vector<GLfloat>. During a loop for RenderManager I make a vertex pointer
glVertexPointer(2, GL_FLOAT, 0, &vertices[0]);
then call
glDrawArrays(GL_QUADS, 0, vertices.size() / 2);
So instead of rendering for each object it renders everything at once based on the vertices. Now I start to see slowing down at 800+ objects. Albeit not great, but there's still a lot of work I have to do. Because right now the vertices vector is recreated each loop rather than modified. Also this doesn't account for colors, but I'm on the right track!
Originally switching to vertex arrays it didn't make much of a difference because it was still being rendered for each individual object, so glDrawArrays() was being called for each of the 200+ Renderable.
EDIT:
Also sorry for not giving enough information, I guess I assumed the problem would be obvious. What confidence I have in myself, huh? Haha.

Dynamic tile display optimalization in OpenGL

I am working on a tile based, top-down 2D game with dinamically generated terrain, and started (re)writing the graphics engine in OpenGL. The game is written in Java using LWJGL, and I'd prefer it to stay relatively platform-independent, and playable on older computers too.
Currently I'm using immediate mode for drawing, but obviously this is too slow for anything but the simplest scenes.
There are two basic types of objects that are drawn: Tiles, which is the world, and Sprites, which is pretty much everything else (Entities, items, effects, ect).
The tiles are 20*20 px, and are stored in chunks (40*40 tiles). Terrain generation is done in full chunks, like in Minecraft.
The method I use now is iterating over the 9 chunks near the player, and then iterating over each tile inside, drawing one quad for the tile texture, and optional extra quads for features depending on the material.
This ends up quite slow, but a simple out-of-view check gives a 5-10x FPS boost.
For optimizing this, I looked into using VBOs and quad strips, but I have a problem when terrain changes. This doesn't happen every frame, but not a very rare event either.
A simple method would be dropping and rebuilding a chunk's VBO every time it changes. This doesn't seem the best way though. I read that VBOs can be "dynamic" allowing their content to be changed. How can this be done, and what data can be changed inside them efficiently? Are there any other ways for efficiently drawing the world?
The other type, sprites, are currently drawn with a quad with a texture mapped from a sprite sheet. So by changing texture coordinates, I can even animate them later. Is this the correct way to do the aniamtion though?
Currently even a very high number of sprites won't slow the game down much, and by understanding VBOs, I'll be able to speed them up even more, but I haven't seen any solid and reliable tutorials for an efficient way of doing this. Does anyone know one perhaps?
Thanks for the help!
Currently I'm using immediate mode for drawing, but obviously this is too slow for anything but the simplest scenes.
I disagree. Unless you are drawing a lot of tiles (tens of thousands per frame), immediate mode should be just fine for you.
The key is something you will have to be doing to get good performance anyway: texture atlases. All of your tiles should be stored in a single texture. You use texture coordinate to pull different tiles out of that texture when rendering. So if this is what your render loop looks like now:
for(tile in tileList) //Pseudocode. Not actual C++
{
glBindTexture(GL_TEXTURE_2D, tile.texture);
glBegin(GL_QUADS);
glTexCoord2f(0.0f, 0.0f);
glVertex2fv(tile.lowerLeft);
glTexCoord2f(0.0f, 1.0f);
glVertex2fv(tile.upperLeft);
glTexCoord2f(1.0f, 1.0f);
glVertex2fv(tile.upperRight);
glTexCoord2f(1.0f, 0.0f);
glVertex2fv(tile.lowerRight);
glEnd();
}
You can convert it into this:
glBindTexture(GL_TEXTURE_2D, allTilesTexture);
glBegin(GL_QUADS);
for(tile in tileList) //Still pseudocode.
{
glTexCoord2f(tile.texCoord.lowerLeft);
glVertex2fv(tile.lowerLeft);
glTexCoord2f(tile.texCoord.upperLeft);
glVertex2fv(tile.upperLeft);
glTexCoord2f(tile.texCoord.upperRight);
glVertex2fv(tile.upperRight);
glTexCoord2f(tile.texCoord.lowerRight);
glVertex2fv(tile.lowerRight);
}
glEnd();
If you are already using a texture atlas and still aren't getting acceptable performance, then you can move on to buffer objects and the like. But you won't get any better performance from buffer objects if you don't do this first.
If all of your tiles cannot fit into a single texture, then you will need to do one of two things: use multiple textures (rendering as many tiles with each texture in one glBegin/glEnd pair as possible), or use a texture array. Texture arrays are available in OpenGL 3.0-level hardware only. That means any Radeon HDxxxx or GeForce 8xxxx or better.
You mentioned that you sometimes render "features" on top of tiles. These features likely use blending and different glTexEnv modes from regular tiles. In this case, you need to find ways to group similar features into a single glBegin/glEnd pair.
As you may be gathering from this, the key to performance is minimizing the number of times you call glBindTexture and glBegin/glEnd. Do as much work as possible in each glBegin/glEnd.
If you wish to proceed with a buffer-based approach (and you should only bother if the texture atlas approach didn't get your performance up to par), it's fairly simple. Put all of your tile "chunks" into a single buffer object. Don't make a buffer for each one; there's no real reason to do so, and 40x40 tiles worth of vertex data is only 12,800 bytes. You can put 81 such chunks in a single 1MB buffer. This way, you only have to call glBindBuffer for your terrain. Which again, saves you performance.
I would need to know more about these "features" you sometimes use to suggest a way to optimize them. But as for dynamic buffers, I wouldn't worry. Just use glBufferSubData to update the part of the buffer in question. If this turns out to be slow, there are several options for making it faster that you can employ. But you shouldn't bother unless you know that it is necessary, since they're complex.
Sprites are probably something that benefits the absolute least from a buffer object approach. There's really nothing to be gained by it over immediate mode. Even if you're rendering hundreds of them, each one will have its own transformation matrix. Which means that each one will have to be a separate draw call. So it may as well be glBegin/glEnd.

How do draw to a texture in OpenGL

Now that my OpenGL application is getting larger and more complex, I am noticing that it's also getting a little slow on very low-end systems such as Netbooks. In Java, I am able to get around this by drawing to a BufferedImage then drawing that to the screen and updating the cached render every one in a while. How would I go about doing this in OpenGL with C++?
I found a few guides but they seem to only work on newer hardware/specific Nvidia cards. Since the cached rendering operations will only be updated every once in a while, i can sacrifice speed for compatability.
glBegin(GL_QUADS);
setColor(DARK_BLUE);
glVertex2f(0, 0); //TL
glVertex2f(appWidth, 0); //TR
setColor(LIGHT_BLUE);
glVertex2f(appWidth, appHeight); //BR
glVertex2f(0, appHeight); //BR
glEnd();
This is something that I am especially concerned about. A gradient that takes up the entire screen is being re-drawn many times per second. How can I cache it to a texture then just draw that texture to increase performance?
Also, a trick I use in Java is to render it to a 1 X height texture then scale that to width x height to increase the performance and lower memory usage. Is there such a trick with openGL?
If you don't want to use Framebuffer Objects for compatibility reasons (but they are pretty widely available), you don't want to use the legacy (and non portable) Pbuffers either. That leaves you with the simple possibility of reading the contents of the framebuffer with glReadPixels and creating a new texture with that data using glTexImage2D.
Let me add that I don't really think that in your case you are going to gain much. Drawing a texture onscreen requires at least texel access per pixel, that's not really a huge saving if the alternative is just interpolating a color as you are doing now!
I sincerely doubt drawing from a texture is less work than drawing a gradient.
In drawing a gradient:
Color is interpolated at every pixel
In drawing a texture:
Texture coordinate is interpolated at every pixel
Color is still interpolated at every pixel
Texture lookup for every pixel
Multiply lookup color with current color
Not that either of these are slow, but drawing untextured polygons is pretty much as fast as it gets.
Hey there, thought I'd give you some insight in to this.
There's essentially two ways to do it.
Frame Buffer Objects (FBOs) for more modern hardware, and the back buffer for a fall back.
The article from one of the previous posters is a good article to follow on it, and there's plent of tutorials on google for FBOs.
In my 2d Engine (Phoenix), we decided we would go with just the back buffer method. Our class was fairly simple and you can view the header and source here:
http://code.google.com/p/phoenixgl/source/browse/branches/0.3/libPhoenixGL/PhRenderTexture.h
http://code.google.com/p/phoenixgl/source/browse/branches/0.3/libPhoenixGL/PhRenderTexture.cpp
Hope that helps!
Consider using a display list rather than a texture. Texture reads (especially for large ones) are a good deal slower than 8 or 9 function calls.
Before doing any optimization you should make sure you fully understand the bottlenecks. You'll probably be surprised at the result.
Look into FBOs - framebuffer objects. It's an extension that lets you render to arbitrary rendertargets, including textures. This extension should be available on most recent hardware. This is a fairly good primer on FBOs: OpenGL Frame Buffer Object 101