I'm new to OpenGL. I'm using JOGL.
I have a WorldEntity class that represents a thing that can be rendered. It has attributes like position and size. To render, I've been using this method:
/**
* Renders the object in the world.
*/
public void render() {
gl.glTranslatef(getPosition().x, getPosition().y, getPosition().z);
gl.glRotatef(getRotationAngle(), getRotation().x, getRotation().y, getRotation().z);
// gl.glScalef(size, size, size);
gl.glCallList(drawID);
// gl.glScalef(1/size, 1/size, 1/size);
gl.glRotatef(-getRotationAngle(), getRotation().x, getRotation().y, getRotation().z);
gl.glTranslatef(-getPosition().x, -getPosition().y, -getPosition().z);
}
The pattern I've been using is applying each attribute of the entity (like position or rotation), then undoing it to avoid corrupting the state for the next entity to get rendered.
Uncommenting out the scaling lines causes the app to be much more sluggish as it renders a modest scene on my modest computer. I'm guessing that the float division is too much to handle thousands of operations per second. (?)
What is the correct way to go about this? Can I find a less computationally intensive way to undo a scaling transformation? Do I need to sort objects by scale and draw them in order to reduce scaling transformations required?
Thanks.
This is where you use matrices (bear with me, I come from a OpenGL/C programming background):
glMatrixMode(GL_MODELVIEW); // set the matrix mode to manipulate models
glPushMatrix(); // push the matrix onto the matrix stack
// apply transformations
glTranslatef(getPosition().x, getPosition().y, getPosition().z);
glRotatef(getRotationAngle(), getRotation().x, getRotation().y, getRotation().z);
glScalef(size, size, size);
glCallList(drawID); // drawing here
glPopMatrix(); // get your original matrix back
... at least, that's what I think it is.
It's very unlikely the divisions will cause any perf issue. rfw gave you the usual way of implementing this, but my guess is that your "slugish" rendering is mostly due to the fact that your GPU is the bottleneck, and using the matrix stacks will not improve perf.
When you increase the size of your drawn objects, more pixels have to be processed, and the GPU has to work significantly harder. What your CPU does at this point (the divisions) is irrelevant.
To prove my point, try to keep the scaling code in, but with sizes around 1.
Related
This is a followup to my question here: Is it okay to have a SDL_Surface and SDL_Texture for each sprite?
I made an class called entity each having a SDL_Texture, which is set in the constructor and then a member function render() is called for every onscreen entity in a vector, which uses SDL_RenderCopy() to draw to the renderer.
This render() function includes generating rectangles for each sprite based on their position/cameradata
Is this okay? Is there a faster way?
I made a testlevel with 96 sprites that each take up 2% of the screen with tons of overdraw and ft is 15ms (~65fps)at a resolution of1600x900. Seems a little slow for just some sprites, and my computer breathes much heavier then when playing a full game such as spelunky or isaac.
Prefer frame time over FPS
You want to measure and judge your performance based on the frame time not FPS. Because the relation between the two is not linear. Going from 20 FPS to 30 FPS needs about 16.7 ms worth of optimization. That is the same amount of performance gain in optimization it takes to get from 30 FPS to 60 FPS. So if you judge performance based on FPS you would come to conclusion that a particular "optimization" that increased the FPS from 30 to 60 is better that the one that made a 20 FPS scene run 31 FPS. while the latter is actually a better optimization.
Batch your draws
If you pack all your textures into one and store each individual image's coordinates, you can use the same texture to draw many of your objects. This is limited by the size and number of your textures and also the maximum texture size supported in your environment. In my experiences 4096x4096 is safe but I prefer to use 2048x2048 "texture atlases". There are many utility programs to make such textures. You can easily find a suitable one by doing a Google search.
In this setup in addition to a SDL texture, each sprite also has the x, y, width and height of the region in the "big" texture containing the particular image needed. You can make a TextureRegion class. Each sprite then has a TextureRegion. This whole process is often referred to as batching. Look it up. The whole idea is to minimize state changes. I am not sure if it applies to software rendering or to all of SDL2 backends.
Cache your transformations
Batching your sprites will increase the performance in the GPU side. The CPU bound code is another optimization opportunity. Instead of calculating the parameters of SDL_RenderCopy in each frame, calculate them once and cache them. Then when the position/rotation of the camera or object changes, recalculate the cache. You can do this in "accessors" of your entity class (like setPosition, setRotaion, etc..). Note that instead of directly recalculating transform as soon as a position or rotation changes your want to flag the object as "dirty" and check for the dirty flag in the your render function. if this->isDirty Then recalculate and cache the transform. This prevents redundant calculations when you do this:
//if dirty flag is not used each of the following function calls
//would have resulted in a recalculation of transforms. However by
//using the dirty flag they will be calculated only once before
//the rendering of next frame in the render() function.
player->setPostion(start_x,start_y);
player->setRotation(0);
camera->reset();
So, I've done some more testing by examining the memory/cpu usage of this program at full screen with a "demanding" level and managed to make it similar to other games by enforcing a framerate cap with SDL_Wait()
float g_max_framerate = 60;
float g_max_frametime = 1/g_max_framerate * 1000;
...
while (!quit) {
lastticks = ticks;
ticks = SDL_GetTicks();
elapsed = ticks - lastticks;
...
SDL_RenderPresent(renderer);
//lock framerate
if(elapsed < g_max_frametime) {
SDL_Delay(g_max_frametime - elapsed);
}
}
With this limitation it is appropriatly lowspec.
i'm rendering single-pixel points into a uint32-texture with a compute shader. the texture is a 3d texture, x and y are viewport coordinates, z has depth information on coordinate 0 and additional attributes on 1. so two manually built rendertargets, if you will. code looks like this:
layout (r32ui, binding = 0) coherent volatile uniform uimage3D renderBuffer;
layout (rgba32f, binding = 1) restrict readonly uniform imageBuffer pointBuffer;
for(int j = 0; j < numPoints / gl_WorkGroupSize.x + 1; j++)
{
vec4 point = imageLoad(pointBuffer, ...)
// ... transform point ...
uint originalDepth = imageAtomicMin(renderBuffer, ivec3(imageCoords, 0), point.depth);
if (originalDepth >= point.depth)
{
// write happened, store the attributes
imageStore(renderBuffer, ivec3(imageCoords, 1), point.attributes);
}
}
while the depth values are correct, i have a few pixels where the attributes flicker between two values.
the order of points in the pointBuffer is random (but i've verified the set of all points is always the same), so my first thought was that two equal depth values might change the output, depending on which one comes first. so i made it that, if originalDepth == point.depth it uses imageAtomicMax to always have the same of the two alternative attributes written, but that changed nothing.
i scattered barrier() and memoryBarrier() all over the place, but that changed nothing. i also removed all diverging control flow for this, changed nothing.
reducing the local work size to 32 removes 90% of the flickering, but some still remains.
any ideas would be greatly appreciated.
edit: before you ask why i do this stuff manually instead of using normal rasterization and fragment shaders, the reason is performance. the rasterizer does not help since i'm rendering single-pixel-points, shared memory greatly speeded things up, and i render each point multiple times, which required me to use a geometry shader which was slow.
The problem is this: you have a race condition on writing to renderBuffer. If two different CS invocations map to the same pixel, and both of them decide to write the value, then there is a race on your imageStore call. One may overwrite the other, it may be a partial overwrite, or something else entirely. But in any case, it's not guaranteed to work.
This would be best solved by doing what rasterizers do: break the process down into two separate phases. The first phase does the ... transform point ... part, writing that data out to a buffer. The second phase then goes through the points and writes them to the final image.
In phase 2, each CS invocation performs all of the processing for a particular output pixel. That way, there are no race conditions. Of course, that requires that phase 1 produces data in a way that can be ordered per-pixel.
There are several ways to go about the latter. You could use a linked list, with a list per-pixel. Or your could use a list per-workgroup, where a workgroup represents some X/Y region of pixel space. In that case, you would use local shared memory as your local depth buffer, with all CS invocations reading from/writing to that region. After they all get done processing pixels, you write it out to real memory. Basically, you'd be implementing tile-based rendering manually.
Indeed, if you have a lot of these points, a tile-based solution would allow you to incorporate pipelining, so that you don't have to wait until all of phase 1 is done before starting on some of phase 2. You could break phase 1 down into chunks. You start a couple of phase 1 chunks, then a phase 2 chunk that reads from the first phase 1, then another phase 1, and so forth.
Vulkan with its event system, has better tools for building such an efficient dependency chain than OpenGL.
I want to draw a ring (circle with big border) with the shaperenderer.
I tried two different solutions:
Solution: draw n-circles, each with 1 pixel width and 1 pixel bigger than the one before. Problem with that: it produces a graphic glitch. (also with different Multisample Anti-Aliasing values)
Solution: draw one big filled circle and then draw a smaller one with the backgroundcolor. Problem: I can't realize overlapping ring shapes. Everything else works fine.
I can't use a ring texture, because I have to increase/decrease the ring radius dynamic. The border-width should always have the same value.
How can I draw smooth rings with the shaperenderer?
EDIT:
Increasing the line-width doesn't help:
MeshBuilder has the option to create a ring using the ellipse method. It allows you to specify the inner and outer size of the ring. Normally this would result in a Mesh, which you would need to render yourself. But because of a recent change it is also possible to use in conjunction with PolygonSpriteBatch (an implementation of Batch that allows more flexible shapes, while SpriteBatch only allows quads). You can use PolygonSpriteBatch instead of where you normally would use a SpriteBatch (e.g. for your Stage or Sprite class).
Here is an example how to use it: https://gist.github.com/xoppa/2978633678fa1c19cc47, but keep in mind that you do need the latest nightly (or at least release 1.6.4) for this.
Maybe you can try making a ring some other way, such as using triangles. I'm not familiar with LibGDX, so here's some
pseudocode.
// number of sectors in the ring, you may need
// to adapt this value based on the desired size of
// the ring
int sectors=32;
float outer=0.8; // distance to outer edge
float inner=1.2; // distance to inner edge
glBegin(GL_TRIANGLES)
glNormal3f(0,0,1)
for(int i=0;i<sectors;i++){
// define each section of the ring
float angle=(i/sectors)*Math.PI*2
float nextangle=((i+1)/sectors)*Math.PI*2
float s=Math.sin(angle)
float c=Math.cos(angle)
float sn=Math.sin(nextangle)
float cn=Math.cos(nextangle)
glVertex3f(inner*c,inner*s,0)
glVertex3f(outer*cn,outer*sn,0)
glVertex3f(outer*c,outer*s,0)
glVertex3f(inner*c,inner*s,0)
glVertex3f(inner*cn,inner*sn,0)
glVertex3f(outer*cn,outer*sn,0)
}
glEnd()
Alternatively, divide the ring into four polygons, each of which consists of one quarter of the whole ring. Then use ShapeRenderer to fill each of these polygons.
Here's an illustration of how you would divide the ring:
if I understand your question,
maybe, using glLineWidth(); help you.
example pseudo code:
size = 5;
Gdx.gl.glLineWidth(size);
mShapeRenderer.begin(....);
..//
mShapeRenderer.end();
I'm trying to get the hang of moving objects (in general) and line strips (in particular) most efficiently in opengl and therefore I'm writing an application where multiple line segments are traveling with a constant speed from right to left. At every time point the left most point will be removed, the entire line will be shifted to the left, and a new point will be added at the very right of the line (this new data point is streamed / received / calculated on the fly, every 10ms or so). To illustrate what I mean, see this image:
Because I want to work with many objects, I decided to use vertex buffer objects in order to minimize the amount of gl* calls. My current code looks something like this:
A) setup initial vertices:
# calculate my_func(x) in range [0, n]
# (could also be random data)
data = my_func(0, n)
# create & bind buffer
vbo_id = GLuint()
glGenBuffers(1, vbo_id);
glBindBuffer(GL_ARRAY_BUFFER, vbo_id)
# allocate memory & transfer data to GPU
glBufferData(GL_ARRAY_BUFFER, sizeof(data), data, GL_DYNAMIC_DRAW)
B) update vertices:
draw():
# get new data and update offset
data = my_func(n+dx, n+2*dx)
# update offset 'n' which is the current absolute value of x.
n = n + 2*dx
# upload data
glBindBuffer(GL_ARRAY_BUFFER, vbo_id)
glBufferSubData(GL_ARRAY_BUFFER, n, sizeof(data), data)
# translate scene so it looks like line strip has moved to the left.
glTranslatef(-local_shift, 0.0, 0.0)
# draw all points from offset
glVertexPointer(2, GL_FLOAT, 0, n)
glDrawArrays(GL_LINE_STRIP, 0, points_per_vbo)
where my_func would do something like this:
my_func(start_x, end_x):
# generate the correct x locations.
x_values = range(start_x, end_x, STEP_SIZE)
# generate the y values. We could be getting these values from a sensor.
y_values = []
for j in x_values:
y_values.append(random())
data = []
for i, j in zip(x_values, y_values):
data.extend([i, j])
return data
This works just fine, however if I have let's say 20 of those line strips that span the entire screen, then things slow down considerably.
Therefore my questions:
1) should I use glMapBuffer to bind the buffer on the GPU and fill the data directly (instead of using glBufferSubData)? Or will this make no difference performance wise?
2) should I use a shader for moving objects (here line strip) instead of calling glTranslatef? If so, how would such a shader look like? (I suspect that a shader is the wrong way to go, since my line strip is NOT a period function but rather contains random data).
3) what happens if the window get's resized? how do I keep aspect ratio and scale vertices accordingly? glViewport() only helps scaling in y direction, not in x direction. If the window is rescaled in x-direction, then in my current implementation I would have to recalculate the position of the entire line strip (calling my_func to get the new x coordinates) and upload it to the GPU. I guess this could be done more elegantly? How would I do that?
4) I noticed that when I use glTranslatef with a non integral value, the screen starts to flicker if the line strip consists of thousands of points. This is most probably because the fine resolution that I use to calculate the line strip does not match the pixel resolution of the screen and therefore sometimes some points appear in front and sometimes behind other points (this is particularly annoying when you don't render a sine wave but some 'random' data). How can I prevent this from happening (besides the obvious solution of translating by a integer multiple of 1 pixel)? If a window get re-sized from let's say originally 800x800 pixels to 100x100 pixels and I still want to visualize a line strip of 20 seconds, then shifting in x direction must work flicker free somehow with sub pixel precision, right?
5) as you can see I always call glTranslatef(-local_shift, 0.0, 0.0) - without ever doing the opposite. Therefore I keep shifting the entire view to the right. And that's why I need to keep track of the absolute x position (in order to place new data at the correct location). This problem will eventually lead to an artifact, where the line is overlapping with the edges of the window. I guess there must be a better way for doing this, right? Like keeping the x values fixed and just moving & updating the y values?
EDIT I've removed the sine wave example and replaced it with a better example. My question is generally about how to move line strips in space most efficiently (while adding new values to them). Therefore any suggestions like "precompute the values for t -> infinity" don't help here (I could also just be drawing the current temperature measured in front of my house).
EDIT2
Consider this toy example where after each time step, the first point is removed and a new one is added to the end:
t = 0
*
* * *
* **** *
1234567890
t = 1
*
* * * *
**** *
2345678901
t = 2
* *
* * *
**** *
3456789012
I don't think I can use a shader here, can I?
EDIT 3: example with two line strips.
EDIT 4: based on Tim's answer I'm using now the following code, which works nicely, but breaks the line into two (since I have two calls of glDrawArrays), see also the following two screenshots.
# calculate the difference
diff_first = x[1] - x[0]
''' first part of the line '''
# push the matrix
glPushMatrix()
move_to = -(diff_first * c)
print 'going to %d ' % (move_to)
glTranslatef(move_to, 0, 0)
# format of glVertexPointer: nbr points per vertex, data type, stride, byte offset
# calculate the offset into the Vertex
offset_bytes = c * BYTES_PER_POINT
stride = 0
glVertexPointer(2, GL_FLOAT, stride, offset_bytes)
# format of glDrawArrays: mode, Specifies the starting index in the enabled arrays, nbr of points
nbr_points_to_render = (nbr_points - c)
starting_point_in_above_selected_Vertex = 0
glDrawArrays(GL_POINTS, starting_point_in_above_selected_Vertex, nbr_points_to_render)
# pop the matrix
glPopMatrix()
''' second part of the line '''
# push the matrix
glPushMatrix()
move_to = (nbr_points - c) * diff_first
print 'moving to %d ' %(move_to)
glTranslatef(move_to, 0, 0)
# select the vertex
offset_bytes = 0
stride = 0
glVertexPointer(2, GL_FLOAT, stride, offset_bytes)
# draw the line
nbr_points_to_render = c
starting_point_in_above_selected_Vertex = 0
glDrawArrays(GL_POINTS, starting_point_in_above_selected_Vertex, nbr_points_to_render)
# pop the matrix
glPopMatrix()
# update counter
c += 1
if c == nbr_points:
c = 0
EDIT5 the resulting solution must obviously render one line across the screen - and no two lines that are missing a connection. The circular buffer solution by Tim provides a solution on how to move the plot, but I end up with two lines, instead of one.
Here's my thoughts to the revised question:
1) should I use glMapBuffer to bind the buffer on the GPU and fill the
data directly (instead of using glBufferSubData)? Or will this make no
difference performance wise?
I'm not aware that there is any significant performance between the two, though I would probably prefer glBufferSubData.
What I might suggest in your case is to create a VBO with N floats, and then use it similar to a circular buffer. Keep an index locally to where the 'end' of the buffer is, then every update replace the value under 'end' with the new value, and increment the pointer. This way you only have to update a single float each cycle.
Having done that, you can draw this buffer using 2x translates and 2x glDrawArrays/Elements:
Imagine that you've got an array of 10 elements, and the buffer end pointer is at element 4. Your array will contain the following 10 values, where x is a constant value, and f(n-d) is the random sample from d cycles ago:
0: (0, f(n-4) )
1: (1, f(n-3) )
2: (2, f(n-2) )
3: (3, f(n-1) )
4: (4, f(n) ) <-- end of buffer
5: (5, f(n-9) ) <-- start of buffer
6: (6, f(n-8) )
7: (7, f(n-7) )
8: (8, f(n-6) )
9: (9, f(n-5) )
To draw this (pseudo-guess code, might not be exactly correct):
glTranslatef( -end, 0, 0);
glDrawArrays( LINE_STRIP, end+1, (10-end)); //draw elems 5-9 shifted left by 4
glPopMatrix();
glTranslatef( end+1, 0, 0);
glDrawArrays(LINE_STRIP, 0, end); // draw elems 0-4 shifted right by 5
Then in the next cycle, replace the oldest value with the new random value,and shift the circular buffer pointer forward.
2) should I use a shader for moving objects (here line strip) instead
of calling glTranslatef? If so, how would such a shader look like? (I
suspect that a shader is the wrong way to go, since my line strip is
NOT a period function but rather contains random data).
Probably optional, if you use the method that I've described in #1. There's not a particular advantage to using one here.
3) what happens if the window get's resized? how do I keep aspect
ratio and scale vertices accordingly? glViewport() only helps scaling
in y direction, not in x direction. If the window is rescaled in
x-direction, then in my current implementation I would have to
recalculate the position of the entire line strip (calling my_func to
get the new x coordinates) and upload it to the GPU. I guess this
could be done more elegantly? How would I do that?
You shouldn't have to recalculate any data. Just define all your data in some fixed coordinate system that makes sense to you, and then use projection matrix to map this range to the window. Without more specifics its hard to answer.
4) I noticed that when I use glTranslatef with a non integral value,
the screen starts to flicker if the line strip consists of thousands
of points. This is most probably because the fine resolution that I
use to calculate the line strip does not match the pixel resolution of
the screen and therefore sometimes some points appear in front and
sometimes behind other points (this is particularly annoying when you
don't render a sine wave but some 'random' data). How can I prevent
this from happening (besides the obvious solution of translating by a
integer multiple of 1 pixel)? If a window get re-sized from let's say
originally 800x800 pixels to 100x100 pixels and I still want to
visualize a line strip of 20 seconds, then shifting in x direction
must work flicker free somehow with sub pixel precision, right?
Your assumption seems correct. I think the thing to do here would either to enable some kind of antialiasing (you can read other posts for how to do that), or make the lines wider.
There are a number of things that could be at work here.
glBindBuffer is one of the slowest OpenGL operations (along with similar call for shaders, textures, etc.)
glTranslate adjusts the modelview matrix, which the vertex unit multiplies all points by. So, it simply changes what matrix you multiply by. If you were to instead use a vertex shader, then you'd have to translate it for each vertex individually. In short: glTranslate is faster. In practice, this shouldn't matter too much, though.
If you're recalculating the sine function on a lot of points every time you draw, you're going to have performance issues (especially since, by looking at your source, it looks like you might be using Python).
You're updating your VBO every time you draw it, so it's not any faster than a vertex array. Vertex arrays are faster than intermediate mode (glVertex, etc.) but nowhere near as fast as display lists or static VBOs.
There could be coding errors or redundant calls somewhere.
My verdict:
You're calculating a sine wave and an offset on the CPU. I strongly suspect that most of your overhead comes from calculating and uploading different data every time you draw it. This is coupled with unnecessary OpenGL calls and possibly unnecessary local calls.
My recommendation:
This is an opportunity for the GPU to shine. Calculating function values on parallel data is (literally) what the GPU does best.
I suggest you make a display list representing your function, but set all the y-coordinates to 0 (so it's a series of points all along the line y=0). Then, draw this exact same display list once for every sine wave you want to draw. Ordinarily, this would just produce a flat graph, but, you write a vertex shader that transforms the points vertically into your sine wave. The shader takes a uniform for the sine wave's offset ("sin(x-offset)"), and just changes each vertex's y.
I estimate this will make your code at least ten times faster. Furthermore, because the vertices' x coordinates are all at integral points (the shader does the "translation" in the function's space by computing "sin(x-offset)"), you won't experience jittering when offsetting with floating point values.
You've got a lot here, so I'll cover what I can. Hopefully this will give you some areas to research.
1) should I use glMapBuffer to bind the buffer on the GPU and fill the data directly (instead of using glBufferSubData)? Or will this make no difference performance wise?
I would expect glBufferSubData to have better performance. If the data is stored on the GPU then mapping it will either
Copy the data back into host memory so you can modify it, and the copy it back when you unmap it.
or, give you a pointer to the GPU's memory directly which the CPU will access over PCI-Express. This isn't anywhere near as slow as it used to be to access GPU memory when we were on AGP or PCI, but it's still slower and not as well cached, etc, as host memory.
glSubBufferData will send the update of the buffer to the GPU and it will modify the buffer. No copying the back and fore. All data transferred in one burst. It should be able to do it as an asynchronous update of the buffer as well.
Once you get into "is this faster than that?" type comparisons you need to start measuring how long things take. A simple frame timer is normally sufficient (but report time per frame, not frames per second - it makes numbers easier to compare). If you go finer-grained than that, just be aware that because of the asynchronous nature of OpenGL, you often see time being consumed away from the call that caused the work. This is because after you give the GPU a load of work, it's only when you have to wait for it to finish something that you notice how long it's taking. That normally only happens when you're waiting for front/back buffers to swap.
2) should I use a shader for moving objects (here line strip) instead of calling glTranslatef? If so, how would such a shader look like?
No difference. glTranslate modifies a matrix (normally the Model-View) which is then applied to all vertices. If you have a shader you'd apply a translation matrix to all your vertices. In fact the driver is probably building a small shader for you already.
Be aware that the older APIs like glTranslate() are depreciated from OpenGL 3.0 onwards, and in modern OpenGL everything is done with shaders.
3) what happens if the window get's resized? how do I keep aspect ratio and scale vertices accordingly? glViewport() only helps scaling in y direction, not in x direction.
glViewport() sets the size and shape of the screen area that is rendered to. Quite often it's called on window resizing to set the viewport to the size and shape of the window. Doing just this will cause any image rendered by OpenGL to change aspect ratio with the window. To keep things looking the same you also have to control the projection matrix to counteract the effect of changing the viewport.
Something along the lines of:
glViewport(0,0, width, height);
glMatrixMode(GL_PROJECTION_MATRIX);
glLoadIdentity();
glScale2f(1.0f, width / height); // Keeps X scale the same, but scales Y to compensate for aspect ratio
That's written from memory, and I might not have the maths right, but hopefully you get the idea.
4) I noticed that when I use glTranslatef with a non integral value, the screen starts to flicker if the line strip consists of thousands of points.
I think you're seeing a form of aliasing which is due to the lines moving under the sampling grid of the pixels. There are various anti-aliasing techniques you can use to reduce the problem. OpenGL has anti-aliased lines (glEnable(GL_SMOOTH_LINE)), but a lot of consumer cards didn't support it, or only did it in software. You can try it, but you may get no effect or run very slowly.
Alternatively you can look into Multi-sample anti-aliasing (MSAA), or other types that your card may support through extensions.
Another option is rendering to a high resolution texture (via Frame Buffer Objects - FBOs) and then filtering it down when you render it to the screen as a textured quad. This would also allow you to do a trick where you move the rendered texture slightly to the left each time, and rendered the new strip on the right each frame.
1 1
1 1 1 Frame 1
11
1
1 1 1 Frame 1 is copied left, and a new line segment is added to make frame 2
11 2
1
1 1 3 Frame 2 is copied left, and a new line segment is added to make frame 3
11 2
It's not a simple change, but it might help you out with your problem (5).
I'm new to C++ and DirectX, I come from XNA.
I have developed a game like Fly The Copter.
What i've done is created a class named Wall.
While the game is running I draw all the walls.
In XNA I stored the walls in a ArrayList and in C++ I've used vector.
In XNA the game just runs fast and in C++ really slow.
Here's the C++ code:
void GameScreen::Update()
{
//Update Walls
int len = walls.size();
for(int i = wallsPassed; i < len; i++)
{
walls.at(i).Update();
if (walls.at(i).pos.x <= -40)
wallsPassed += 2;
}
}
void GameScreen::Draw()
{
//Draw Walls
int len = walls.size();
for(int i = wallsPassed; i < len; i++)
{
if (walls.at(i).pos.x < 1280)
walls.at(i).Draw();
else
break;
}
}
In the Update method I decrease the X value by 4.
In the Draw method I call sprite->Draw (Direct3DXSprite).
That the only codes that runs in the game loop.
I know this is a bad code, if you have an idea to improve it please help.
Thanks and sorry about my english.
Try replacing all occurrences of at() with the [] operator. For example:
walls[i].Draw();
and then turn on all optimisations. Both [] and at() are function calls - to get the maximum performance you need to make sure that they are inlined, which is what upping the optimisation level will do.
You can also do some minimal caching of a wall object - for example:
for(int i = wallsPassed; i < len; i++)
{
Wall & w = walls[i];
w.Update();
if (w.pos.x <= -40)
wallsPassed += 2;
}
Try to narrow the cause of the performance problem (also termed profiling). I would try drawing only one object while continue updating all the objects. If its suddenly faster, then its a DirectX drawing problem.
Otherwise try drawing all the objects, but updating only one wall. If its faster then your update() function may be too expensive.
How fast is 'fast'?
How slow is'really slow'?
How many sprites are you drawing?
How big is each one as an image file, and in pixels drawn on-screen?
How does performance scale (in XNA/C++) as you change the number of sprites drawn?
What difference do you get if you draw without updating, or vice versa
Maybe you just have forgotten to turn on release mode :) I had some problems with it in the past - I thought my code was very slow because of debug mode. If it's not it, you can have a problem with rendering part, or with huge count of objects. The code you provided looks good...
Have you tried multiple buffers (a.k.a. Double Buffering) for the bitmaps?
The typical scenario is to draw in one buffer, then while the first buffer is copied to the screen, draw in a second buffer.
Another technique is to have a huge "logical" screen in memory. The portion draw in the physical display is a viewport or view into a small area in the logical screen. Moving the background (or screen) just requires a copy on the part of the graphics processor.
You can aid batching of sprite draw calls. Presumably Your draw call calls your only instance of ID3DXSprite::Draw with the relevant parameters.
You can get much improved performance by doing a call to ID3DXSprite::Begin (with the D3DXSPRITE_SORT_TEXTURE flag set) and then calling ID3DXSprite::End when you've done all your rendering. ID3DXSprite will then sort all your sprite calls by texture to decrease the number of texture switches and batch the relevant calls together. This will improve performance massively.
Its difficult to say more, however, without seeing the internals of your Update and Draw calls. The above is only a guess ...
To draw every single wall with a different draw call is a bad idea. Try to batch the data into a single vertex buffer/index buffer and send them into a single draw. That's a more sane idea.
Anyway for getting an idea of WHY it goes slowly try with some CPU and GPU (PerfHud, Intel GPA, etc...) to know first of all WHAT's the bottleneck (if the CPU or the GPU). And then you can fight to alleviate the problem.
The lookups into your list of walls are unlikely to be the source of your slowdown. The cost of drawing objects in 3D will typically be the limiting factor.
The important parts are your draw code, the flags you used to create the DirectX device, and the flags you use to create your textures. My stab in the dark... check that you initialize the device as HAL (hardware 3d) rather than REF (software 3d).
Also, how many sprites are you drawing? Each draw call has a fair amount of overhead. If you make more than couple-hundred per frame, that will be your limiting factor.