Slow C++ DirectX 2D Game - c++

I'm new to C++ and DirectX, I come from XNA.
I have developed a game like Fly The Copter.
What i've done is created a class named Wall.
While the game is running I draw all the walls.
In XNA I stored the walls in a ArrayList and in C++ I've used vector.
In XNA the game just runs fast and in C++ really slow.
Here's the C++ code:
void GameScreen::Update()
{
//Update Walls
int len = walls.size();
for(int i = wallsPassed; i < len; i++)
{
walls.at(i).Update();
if (walls.at(i).pos.x <= -40)
wallsPassed += 2;
}
}
void GameScreen::Draw()
{
//Draw Walls
int len = walls.size();
for(int i = wallsPassed; i < len; i++)
{
if (walls.at(i).pos.x < 1280)
walls.at(i).Draw();
else
break;
}
}
In the Update method I decrease the X value by 4.
In the Draw method I call sprite->Draw (Direct3DXSprite).
That the only codes that runs in the game loop.
I know this is a bad code, if you have an idea to improve it please help.
Thanks and sorry about my english.

Try replacing all occurrences of at() with the [] operator. For example:
walls[i].Draw();
and then turn on all optimisations. Both [] and at() are function calls - to get the maximum performance you need to make sure that they are inlined, which is what upping the optimisation level will do.
You can also do some minimal caching of a wall object - for example:
for(int i = wallsPassed; i < len; i++)
{
Wall & w = walls[i];
w.Update();
if (w.pos.x <= -40)
wallsPassed += 2;
}

Try to narrow the cause of the performance problem (also termed profiling). I would try drawing only one object while continue updating all the objects. If its suddenly faster, then its a DirectX drawing problem.
Otherwise try drawing all the objects, but updating only one wall. If its faster then your update() function may be too expensive.

How fast is 'fast'?
How slow is'really slow'?
How many sprites are you drawing?
How big is each one as an image file, and in pixels drawn on-screen?
How does performance scale (in XNA/C++) as you change the number of sprites drawn?
What difference do you get if you draw without updating, or vice versa

Maybe you just have forgotten to turn on release mode :) I had some problems with it in the past - I thought my code was very slow because of debug mode. If it's not it, you can have a problem with rendering part, or with huge count of objects. The code you provided looks good...

Have you tried multiple buffers (a.k.a. Double Buffering) for the bitmaps?
The typical scenario is to draw in one buffer, then while the first buffer is copied to the screen, draw in a second buffer.
Another technique is to have a huge "logical" screen in memory. The portion draw in the physical display is a viewport or view into a small area in the logical screen. Moving the background (or screen) just requires a copy on the part of the graphics processor.

You can aid batching of sprite draw calls. Presumably Your draw call calls your only instance of ID3DXSprite::Draw with the relevant parameters.
You can get much improved performance by doing a call to ID3DXSprite::Begin (with the D3DXSPRITE_SORT_TEXTURE flag set) and then calling ID3DXSprite::End when you've done all your rendering. ID3DXSprite will then sort all your sprite calls by texture to decrease the number of texture switches and batch the relevant calls together. This will improve performance massively.
Its difficult to say more, however, without seeing the internals of your Update and Draw calls. The above is only a guess ...

To draw every single wall with a different draw call is a bad idea. Try to batch the data into a single vertex buffer/index buffer and send them into a single draw. That's a more sane idea.
Anyway for getting an idea of WHY it goes slowly try with some CPU and GPU (PerfHud, Intel GPA, etc...) to know first of all WHAT's the bottleneck (if the CPU or the GPU). And then you can fight to alleviate the problem.

The lookups into your list of walls are unlikely to be the source of your slowdown. The cost of drawing objects in 3D will typically be the limiting factor.
The important parts are your draw code, the flags you used to create the DirectX device, and the flags you use to create your textures. My stab in the dark... check that you initialize the device as HAL (hardware 3d) rather than REF (software 3d).
Also, how many sprites are you drawing? Each draw call has a fair amount of overhead. If you make more than couple-hundred per frame, that will be your limiting factor.

Related

Is it okay to call SDL_RenderCopy() for each sprite?

This is a followup to my question here: Is it okay to have a SDL_Surface and SDL_Texture for each sprite?
I made an class called entity each having a SDL_Texture, which is set in the constructor and then a member function render() is called for every onscreen entity in a vector, which uses SDL_RenderCopy() to draw to the renderer.
This render() function includes generating rectangles for each sprite based on their position/cameradata
Is this okay? Is there a faster way?
I made a testlevel with 96 sprites that each take up 2% of the screen with tons of overdraw and ft is 15ms (~65fps)at a resolution of1600x900. Seems a little slow for just some sprites, and my computer breathes much heavier then when playing a full game such as spelunky or isaac.
Prefer frame time over FPS
You want to measure and judge your performance based on the frame time not FPS. Because the relation between the two is not linear. Going from 20 FPS to 30 FPS needs about 16.7 ms worth of optimization. That is the same amount of performance gain in optimization it takes to get from 30 FPS to 60 FPS. So if you judge performance based on FPS you would come to conclusion that a particular "optimization" that increased the FPS from 30 to 60 is better that the one that made a 20 FPS scene run 31 FPS. while the latter is actually a better optimization.
Batch your draws
If you pack all your textures into one and store each individual image's coordinates, you can use the same texture to draw many of your objects. This is limited by the size and number of your textures and also the maximum texture size supported in your environment. In my experiences 4096x4096 is safe but I prefer to use 2048x2048 "texture atlases". There are many utility programs to make such textures. You can easily find a suitable one by doing a Google search.
In this setup in addition to a SDL texture, each sprite also has the x, y, width and height of the region in the "big" texture containing the particular image needed. You can make a TextureRegion class. Each sprite then has a TextureRegion. This whole process is often referred to as batching. Look it up. The whole idea is to minimize state changes. I am not sure if it applies to software rendering or to all of SDL2 backends.
Cache your transformations
Batching your sprites will increase the performance in the GPU side. The CPU bound code is another optimization opportunity. Instead of calculating the parameters of SDL_RenderCopy in each frame, calculate them once and cache them. Then when the position/rotation of the camera or object changes, recalculate the cache. You can do this in "accessors" of your entity class (like setPosition, setRotaion, etc..). Note that instead of directly recalculating transform as soon as a position or rotation changes your want to flag the object as "dirty" and check for the dirty flag in the your render function. if this->isDirty Then recalculate and cache the transform. This prevents redundant calculations when you do this:
//if dirty flag is not used each of the following function calls
//would have resulted in a recalculation of transforms. However by
//using the dirty flag they will be calculated only once before
//the rendering of next frame in the render() function.
player->setPostion(start_x,start_y);
player->setRotation(0);
camera->reset();
So, I've done some more testing by examining the memory/cpu usage of this program at full screen with a "demanding" level and managed to make it similar to other games by enforcing a framerate cap with SDL_Wait()
float g_max_framerate = 60;
float g_max_frametime = 1/g_max_framerate * 1000;
...
while (!quit) {
lastticks = ticks;
ticks = SDL_GetTicks();
elapsed = ticks - lastticks;
...
SDL_RenderPresent(renderer);
//lock framerate
if(elapsed < g_max_frametime) {
SDL_Delay(g_max_frametime - elapsed);
}
}
With this limitation it is appropriatly lowspec.

In OpenGL, what is a good target for how many vertices in a VBO while maintaining a good frame rate

I am working on making a 2D game engine from scratch, mostly for fun. Recently I've been really concerned about the performance of the whole engine. I keep reading articles on a good target number of polygons to try and reach, and I've seen talk in the millions, meanwhile I've only managed to get 40,000 without horrible frame rate drops.
I've tried to use a mapped buffer from the graphics card instead of my own, but that actually gives me worse performance. I've read about techniques like triple buffer rendering, and I can see how it may theoretically speed it up, I cant imagine it speeding my code up into the millions I've read about.
The format I use is 28 Byte vertices, (Three floats for position, 2 floats for texture coordinates, 1 for color, and 1 for which texture buffer to read from). I've thought about trimming this down, but once again it doesn't seem worth it.
Looking through my code almost 98% of the time is spent allocating, filling up, and giving the VAO to the graphics card. So that's currently my only bottleneck.
All the sprites are just 4 sided polygons, and I'm just using GL_QUADS to render the whole object. 40,000 sprites just feels really low. I only have one draw call for them, so I was expecting at least 10 times that from what I've read. I've head some models have nearly 40k polygons in them alone for 3D!
Here is some relevant code to how I render it all:
//This is the main render loop, currently it's only called once per frame
for (int i = 0; i < l_Layers.size(); i++) {
glUseProgram(l_Layers[i]->getShader().getShaderProgram());
GLint loc = glGetUniformLocation(l_Layers[i]->getShader().getShaderProgram(), "MVT");
glUniformMatrix4fv(loc,1, GL_FALSE, mat.data);
l_Layers[i]->getVertexBuffer().Bind();
glDrawArrays(GL_QUADS, 0, l_Layers[i]->getVertexBuffer().getSize());
l_Layers[i]->getVertexBuffer().Unbind();
}
//These lines of code take up by far the most compute time
void OP::VertexBuffer::startBuffer(int size)
{
flush();
Vertices = new Vertex[size * 4];
}
void OP::VertexBuffer::submit(Vertex vertex)
{
Vertices[Index] = vertex;
Index++;
}
void Layer::Render() {
l_VertexBuffer.startBuffer(l_Sprites.size());
for (size_t i = 0; i < l_Sprites.size(); i++) {
Vertex* vert = l_Sprites[i]->getVertexArray();
l_VertexBuffer.submit(vert[0]);
l_VertexBuffer.submit(vert[1]);
l_VertexBuffer.submit(vert[2]);
l_VertexBuffer.submit(vert[3]);
}
}
I don't know of anything I've been doing wrong, but I just dont understand how people are getting orders of magnitude more polygons on the screen. Especially when they have far more complex models then I have with GL_QUADS.
98% of the time is spent allocating, filling up, and giving the VAO to the graphics card. So that's currently my only bottleneck.
Creating the VAO and filling it up should actually only happen once and therefore should not affect the frame rate, you should only need to bind the VAO before calling render.
Obviously I can't see all of your code so I may have the wrong idea but it looks like you're creating a new vertex array every time Render is called.
It doesn't surprise me that you're spending all of your time in here:
//These lines of code take up by far the most compute time
void OP::VertexBuffer::startBuffer(int size)
{
flush();
Vertices = new Vertex[size * 4];
}
Calling new on every render call for a large array is going to considerably impact your performance, you're also spending time assigning to that array every frame.
On top of that you appear to be leaking memory.
Every time you call:
Vertices = new Vertex[size * 4];
You're failing to free the array that you allocated on the previous call to Render. What you're doing is similar to the example below:
foo = new Foo();
foo = new Foo();
Memory is allocated to foo in the first call, the first foo created was never deconstructed nor deallocated and there is now no way to do so as foo has been reassigned so the first foo has leaked.
So I think you have a combination of issues going on here.

Dynamically loading maps in a "tile engine"

I'm implementing a tile engine for games using C++. Currently the game is divided into maps, each map has a 2D grid of sprites where each represents a tile.
I am coding a system where if several maps are adjacents you can walk from one to the other.
At startup of the game, all the maps are instancied but are "unloaded" ie the sprites objects are not in memory. When I'm close enough of an adjacent map, the maps sprites are "loaded" in memory by basically doing:
for(int i=0; i < sizeX; i++) {
for(int j=0; j < sizeY; j++) {
Tile *tile_ptr = new Tile(tileset, tilesId[i][j], i + offsetX, j + offsetY);
tilesMap[i][j] = tile_ptr;
}
}
And they are unloaded by being destroyed the same way when I am too far away from the map.
For a 50x50 map of sprites of 32x32 pixels, it takes me roughly 0.3 secs to load or unload which is done during 1 frame. My question is: what is a more efficient way to load/unload maps dynamically, even using a totally different mechanism? thanks
PS : I'm using SFML as a graphic library but I'm not sure this changes anything
A different possibility to improve latency, but will increase overall number of ops needed:
Instead of waiting when you are 'too close' or 'too far' from a map, store in memory the maps for a bigger square around the player [i.e. if the map is 50x50, store 150x150], but show only the 50x50. now, every step - calculate the new 150x150 map, it will require 150 destroy ops, and 150 build ops in each step.
By doing so, you will actually need to calculate and build/destroy elements more times! But, latency will improve, since you don't need to wait 0.3 secs for building 2,500 elements, since you always need a small portion: 150*2 = 300 elements.
I think it's a perfect occasion to learn multithreading and asynchronous calls.
It can seem complex if you're new to it but it's a very useful skill to have.
It will still take 0.3sec to load (well, a bit more actually), but the game will not freeze.
That's what most games do. You can search SO for the various ways to do it in C++.

OpenGL: How to undo scaling?

I'm new to OpenGL. I'm using JOGL.
I have a WorldEntity class that represents a thing that can be rendered. It has attributes like position and size. To render, I've been using this method:
/**
* Renders the object in the world.
*/
public void render() {
gl.glTranslatef(getPosition().x, getPosition().y, getPosition().z);
gl.glRotatef(getRotationAngle(), getRotation().x, getRotation().y, getRotation().z);
// gl.glScalef(size, size, size);
gl.glCallList(drawID);
// gl.glScalef(1/size, 1/size, 1/size);
gl.glRotatef(-getRotationAngle(), getRotation().x, getRotation().y, getRotation().z);
gl.glTranslatef(-getPosition().x, -getPosition().y, -getPosition().z);
}
The pattern I've been using is applying each attribute of the entity (like position or rotation), then undoing it to avoid corrupting the state for the next entity to get rendered.
Uncommenting out the scaling lines causes the app to be much more sluggish as it renders a modest scene on my modest computer. I'm guessing that the float division is too much to handle thousands of operations per second. (?)
What is the correct way to go about this? Can I find a less computationally intensive way to undo a scaling transformation? Do I need to sort objects by scale and draw them in order to reduce scaling transformations required?
Thanks.
This is where you use matrices (bear with me, I come from a OpenGL/C programming background):
glMatrixMode(GL_MODELVIEW); // set the matrix mode to manipulate models
glPushMatrix(); // push the matrix onto the matrix stack
// apply transformations
glTranslatef(getPosition().x, getPosition().y, getPosition().z);
glRotatef(getRotationAngle(), getRotation().x, getRotation().y, getRotation().z);
glScalef(size, size, size);
glCallList(drawID); // drawing here
glPopMatrix(); // get your original matrix back
... at least, that's what I think it is.
It's very unlikely the divisions will cause any perf issue. rfw gave you the usual way of implementing this, but my guess is that your "slugish" rendering is mostly due to the fact that your GPU is the bottleneck, and using the matrix stacks will not improve perf.
When you increase the size of your drawn objects, more pixels have to be processed, and the GPU has to work significantly harder. What your CPU does at this point (the divisions) is irrelevant.
To prove my point, try to keep the scaling code in, but with sizes around 1.

Rewriting a simple Pygame 2D drawing function in C++

I have a 2D list of vectors (say 20x20 / 400 points) and I am drawing these points on a screen like so:
for row in grid:
for point in row:
pygame.draw.circle(window, white, (particle.x, particle.y), 2, 0)
pygame.display.flip() #redraw the screen
This works perfectly, however it's much slower then I expected.
I want to rewrite this in C++ and hopefully learn some stuff (I am doing a unit on C++ atm, so it'll help) on the way. What's the easiest way to approach this? I have looked at Direct X, and have so far followed a bunch of tutorials and have drawn some rudimentary triangles. However I can't find a simple (draw point).
DirectX doesn't have functions for drawing just one point. It operates on vertex and index buffers only. If you want simpler way to make just one point, you'll need to write a wrapper.
For drawing lists of points you'll need to use DrawPrimitive(D3DPT_POINTLIST, ...). however, there will be no easy way to just plot a point. You'll have to prepare buffer, lock it, fill with data, then draw the buffer. Or you could use dynamic vertex buffers - to optimize performance. There is a DrawPrimitiveUP call that is supposed to be able to render primitives stored in system memory (instead of using buffers), but as far as I know, it doesn't work (may silently discard primitives) with pure devices, so you'll have to use software vertex processing.
In OpenGL you have glVertex2f and glVertex3f. Your call would look like this (there might be a typo or syntax error - I didn't compiler/run it) :
glBegin(GL_POINTS);
glColor3f(1.0, 1.0, 1.0);//white
for (int y = 0; y < height; y++)
for (int x = 0; x < width; x++)
glVertex2f(points[y][x].x, points[y][x].y);//plot point
glEnd();
OpenGL is MUCH easier for playing around and experimenting than DirectX. I'd recommend to take a look at SDL, and use it in conjuction with OpenGL. Or you could use GLUT instead of SDL.
Or you could try using Qt 4. It has a very good 2D rendering routines.
When I first dabbled with game/graphics programming I became fond of Allegro. It's got a huge range of features and a pretty easy learning curve.