Is it okay to call SDL_RenderCopy() for each sprite? - c++

This is a followup to my question here: Is it okay to have a SDL_Surface and SDL_Texture for each sprite?
I made an class called entity each having a SDL_Texture, which is set in the constructor and then a member function render() is called for every onscreen entity in a vector, which uses SDL_RenderCopy() to draw to the renderer.
This render() function includes generating rectangles for each sprite based on their position/cameradata
Is this okay? Is there a faster way?
I made a testlevel with 96 sprites that each take up 2% of the screen with tons of overdraw and ft is 15ms (~65fps)at a resolution of1600x900. Seems a little slow for just some sprites, and my computer breathes much heavier then when playing a full game such as spelunky or isaac.

Prefer frame time over FPS
You want to measure and judge your performance based on the frame time not FPS. Because the relation between the two is not linear. Going from 20 FPS to 30 FPS needs about 16.7 ms worth of optimization. That is the same amount of performance gain in optimization it takes to get from 30 FPS to 60 FPS. So if you judge performance based on FPS you would come to conclusion that a particular "optimization" that increased the FPS from 30 to 60 is better that the one that made a 20 FPS scene run 31 FPS. while the latter is actually a better optimization.
Batch your draws
If you pack all your textures into one and store each individual image's coordinates, you can use the same texture to draw many of your objects. This is limited by the size and number of your textures and also the maximum texture size supported in your environment. In my experiences 4096x4096 is safe but I prefer to use 2048x2048 "texture atlases". There are many utility programs to make such textures. You can easily find a suitable one by doing a Google search.
In this setup in addition to a SDL texture, each sprite also has the x, y, width and height of the region in the "big" texture containing the particular image needed. You can make a TextureRegion class. Each sprite then has a TextureRegion. This whole process is often referred to as batching. Look it up. The whole idea is to minimize state changes. I am not sure if it applies to software rendering or to all of SDL2 backends.
Cache your transformations
Batching your sprites will increase the performance in the GPU side. The CPU bound code is another optimization opportunity. Instead of calculating the parameters of SDL_RenderCopy in each frame, calculate them once and cache them. Then when the position/rotation of the camera or object changes, recalculate the cache. You can do this in "accessors" of your entity class (like setPosition, setRotaion, etc..). Note that instead of directly recalculating transform as soon as a position or rotation changes your want to flag the object as "dirty" and check for the dirty flag in the your render function. if this->isDirty Then recalculate and cache the transform. This prevents redundant calculations when you do this:
//if dirty flag is not used each of the following function calls
//would have resulted in a recalculation of transforms. However by
//using the dirty flag they will be calculated only once before
//the rendering of next frame in the render() function.
player->setPostion(start_x,start_y);
player->setRotation(0);
camera->reset();

So, I've done some more testing by examining the memory/cpu usage of this program at full screen with a "demanding" level and managed to make it similar to other games by enforcing a framerate cap with SDL_Wait()
float g_max_framerate = 60;
float g_max_frametime = 1/g_max_framerate * 1000;
...
while (!quit) {
lastticks = ticks;
ticks = SDL_GetTicks();
elapsed = ticks - lastticks;
...
SDL_RenderPresent(renderer);
//lock framerate
if(elapsed < g_max_frametime) {
SDL_Delay(g_max_frametime - elapsed);
}
}
With this limitation it is appropriatly lowspec.

Related

Capping & Calculating FPS in SDL?

I have written this code to cap the SDL game to 60 fps, is this the correct way of implementing an fps count?
int fps=60;
int desiredDelta=1000/fps; //desired time b/w frames
while (gameRunning)
{
int starttick=SDL_GetTicks();
// Get our controls and events
while (SDL_PollEvent(&event))
{
if (event.type == SDL_QUIT)
gameRunning = false;
}
window.Clear();
for(Entity& e: entities)
{
window.Render(e);
}
window.Display();
int delta=SDL_GetTicks()-starttick; //actual time b/w frames
int avgFPS=1000/(desiredDelta-delta); //calculating FPS HERE
if(delta<desiredDelta)
{
SDL_Delay(desiredDelta-delta);
}
std::cout<<avgFPS<<std::endl;
}
In games there are usually two "framerates" to worry about: the framerate of the display and the framerate of your game physics. Ideally, you draw at the native framerate of the display. To do this, ensure you pass the SDL_RENDERER_PRESENTVSYNC flag to SDL_CreateRenderer().
For the physics of your game, you can either measure the time between two rendered frames, and use that as your physics time step, so that you basically match the display framerate. The advantage is that your rates are naturally synced, but the disadvantage is that the physics of the game might suffer from accuracy problems, especially during times when rendering takes so long that the display framerate drops to low values.
Alternatively, you use a fixed framerate for your physics. However, if you do the latter then you have to some how deal with the two framerates not being in sync, but this is not solved by adding SDL_Delay()s. The usual approach is to have the rendering thread interpolate animations based on physics as necessary.
A more practical issue in your code is that SDL_GetTicks() gives you the time in milliseconds. That is not very accurate; if your display runs at 60 fps, then one frame takes 16.66666... milliseconds. Your calculation will have an error of up to 1 ms, which is 6.25% of the length of one frame. Depending on the type of game, this can actually be noticable!

Unreal C++ Controller Input: Yaw Rotation

I'm setting my game character camera via c++, and I came across this, and even though it works, I don't understand why the code uses DeltaTime. What is the function of GetDeltaSeconds actually doing?
void AWizardCharater::LookX(float Value)
{
AddControllerYawInput(Sensitivity * Value * GetWorld()->GetDeltaSeconds());
}
Here is the api ref : https://docs.unrealengine.com/latest/INT/API/Runtime/Engine/GameFramework/APawn/AddControllerYawInput/index.html
Thanks
Using delta time, multiplied by some sensitivity value, is a standard method used throughout games to provide a consistent movement rate, independent of framerate.
Consider the following code, without using delta time:
AddControllerYawInput(1);
If you had a framerate of 10 FPS then you'd be doing 10 degrees per second. If the framerate increases to 100 FPS, you'd be doing 100 degrees per second.
Using delta time makes the movement consistent regardless of framerate, as the time between frames decreases with faster framerate, slowing down the movement.

How to define how much CPU to use in SFML game?

I've made a game but I don't know if the game will work the same way in other devices. For example if the CPU of a computer is high will the player and enemies move faster? If so, is there a way to define CPU usage available in SFML? The way the player and enemies move in my program is to :
1-Check if the key is pressed
2-If so : move(x,y); Or is there a way to get the CPU to do some operations in the move function.
Thank you!
It sounds like you are worried about the physics of your game being affected by the game's framerate. Your intuition is serving you well! This is a significant problem, and one you'll want to address if you want your game to feel professional.
According to Glenn Fiedler in his Gaffer on Games article 'Fix Your Timestep!'
[A game loop that handles time improperly can make] the behavior of your physics simulation [depend] on the delta time you pass in. The effect could be subtle as your game having a slightly different “feel” depending on framerate or it could be as extreme as your spring simulation exploding to infinity, fast moving objects tunneling through walls and the player falling through the floor!
Logic dictates that you must detach the dependencies of your update from the time it takes to draw a frame. A simple solution is to:
Pick an amount of time which can be safely processed (your timestep)
Add the time passed every frame into an accumulated pool of time
Process the time passed in safe chunks
In pseudocode:
time_pool = 0;
timestep = 0.01; //or whatever is safe for you!
old_time = get_current_time();
while (!closed) {
new_time = get_current_time();
time_pool += new_time - old_time;
old_time = new_time;
handle_input();
while (time_pool > timestep)
{
consume_time(timestep); //update your gamestate
time_pool -= timestep;
}
//note: leftover time is not lost, and will be left in time_pool
render();
}
It is worth noting that this method has its own problem: future frames have to consume the time produced by calls to consume_time. If a call to consume_time takes too long, the time produced might require two calls be made next frame - then four - then eight - and so on. If you use this method, you will have to make sure consume_time is very efficient, and even then it would be best to have a contingency plan.
For a more thorough treatment I encourage you to read the linked article.

How to reduce OpenGL CPU usage and/or how to use OpenGL properly

I'm working a on a Micromouse simulation application built with OpenGL, and I have a hunch that I'm not doing things properly. In particular, I'm suspicious about the way I am getting my (mostly static) graphics to refresh at a close-to-constant framerate (60 FPS). My approach is as follows:
1) Start a timer
2) Draw my shapes and text (about a thousand of them):
glBegin(GL_POLYGON);
for (Cartesian vertex : polygon.getVertices()) {
std::pair<float, float> coordinates = getOpenGlCoordinates(vertex);
glVertex2f(coordinates.first, coordinates.second);
}
glEnd();
and
glPushMatrix();
glScalef(scaleX, scaleY, 0);
glTranslatef(coordinates.first * 1.0/scaleX, coordinates.second * 1.0/scaleY, 0);
for (int i = 0; i < text.size(); i += 1) {
glutStrokeCharacter(GLUT_STROKE_MONO_ROMAN, text.at(i));
}
glPopMatrix();
3) Call
glFlush();
4) Stop the timer
5) Sleep for (1/FPS - duration) seconds
6) Call
glutPostRedisplay();
The "problem" is that the above approach really hogs my CPU - the process is using something like 96-100%. I know that there isn't anything inherently wrong with using lots of CPU, but I feel like I shouldn't be using that much all of the time.
The kicker is that most of the graphics don't change from frame to frame. It's really just a single polygon moving over (and covering up) some static shapes. Is there any way to tell OpenGL to only redraw what has changed since the previous frame (with the hope it would reduce the number of glxxx calls, which I've deemed to be the source of the "problem")? Or, better yet, is my approach to getting my graphics to refresh even correct?
First and foremost the biggest CPU hog with OpenGL is immediate mode… and you're using it (glBegin, glEnd). The problem with IM is, that every single vertex requires a whole couple of OpenGL calls being made; and because OpenGL uses a thread local state this means that each and every OpenGL call must go through some indirection. So the first step would be getting rid of that.
The next issue is with how you're timing your display. If low latency between user input and display is not your ultimate goal the standard approach would setting up the window for double buffering, enabling V-Sync, set a swap interval of 1 and do a buffer swap (glutSwapBuffers) once the frame is rendered. The exact timings what and where things will block are implementation dependent (unfortunately), but you're more or less guaranteed to exactly hit your screen refresh frequency, as long as your renderer is able to keep up (i.e. rendering a frame takes less time that a screen refresh interval).
glutPostRedisplay merely sets a flag for the main loop to call the display function if no further events are pending, so timing a frame redraw through that is not very accurate.
Last but not least you may be simply mocked by the way Windows does account CPU time (time spent in driver context, which includes blocking, waiting for V-Sync) will be accouted to the consumed CPU time, while it's in fact interruptible sleep. However you wrote, that you already do a sleep in your code, which would rule that out, because the go-to approach to get a more reasonable accounting would be adding a Sleep(1) before or after the buffer swap.
I found that by putting render thread to sleep helps reducing cpu usage from (my case) 26% to around 8%
#include <chrono>
#include <thread>
void render_loop(){
...
auto const start_time = std::chrono::steady_clock::now();
auto const wait_time = std::chrono::milliseconds{ 17 };
auto next_time = start_time + wait_time;
while(true){
...
// execute once after thread wakes up every 17ms which is theoretically 60 frames per
// second
auto then = std::chrono::high_resolution_clock::now();
std::this_thread::sleep_until(next_time);
...rendering jobs
auto elasped_time =
std::chrono::duration_cast<std::chrono::milliseconds> (std::chrono::high_resolution_clock::now() - then);
std::cout << "ms: " << elasped_time.count() << '\n';
next_time += wait_time;
}
}
I thought about attempting to measure the frame rate while the thread is asleep but there isn't any reason for my use case to attempt that. The result was averaging around 16ms so I thought it was good enough
Inspired by this post

Slow C++ DirectX 2D Game

I'm new to C++ and DirectX, I come from XNA.
I have developed a game like Fly The Copter.
What i've done is created a class named Wall.
While the game is running I draw all the walls.
In XNA I stored the walls in a ArrayList and in C++ I've used vector.
In XNA the game just runs fast and in C++ really slow.
Here's the C++ code:
void GameScreen::Update()
{
//Update Walls
int len = walls.size();
for(int i = wallsPassed; i < len; i++)
{
walls.at(i).Update();
if (walls.at(i).pos.x <= -40)
wallsPassed += 2;
}
}
void GameScreen::Draw()
{
//Draw Walls
int len = walls.size();
for(int i = wallsPassed; i < len; i++)
{
if (walls.at(i).pos.x < 1280)
walls.at(i).Draw();
else
break;
}
}
In the Update method I decrease the X value by 4.
In the Draw method I call sprite->Draw (Direct3DXSprite).
That the only codes that runs in the game loop.
I know this is a bad code, if you have an idea to improve it please help.
Thanks and sorry about my english.
Try replacing all occurrences of at() with the [] operator. For example:
walls[i].Draw();
and then turn on all optimisations. Both [] and at() are function calls - to get the maximum performance you need to make sure that they are inlined, which is what upping the optimisation level will do.
You can also do some minimal caching of a wall object - for example:
for(int i = wallsPassed; i < len; i++)
{
Wall & w = walls[i];
w.Update();
if (w.pos.x <= -40)
wallsPassed += 2;
}
Try to narrow the cause of the performance problem (also termed profiling). I would try drawing only one object while continue updating all the objects. If its suddenly faster, then its a DirectX drawing problem.
Otherwise try drawing all the objects, but updating only one wall. If its faster then your update() function may be too expensive.
How fast is 'fast'?
How slow is'really slow'?
How many sprites are you drawing?
How big is each one as an image file, and in pixels drawn on-screen?
How does performance scale (in XNA/C++) as you change the number of sprites drawn?
What difference do you get if you draw without updating, or vice versa
Maybe you just have forgotten to turn on release mode :) I had some problems with it in the past - I thought my code was very slow because of debug mode. If it's not it, you can have a problem with rendering part, or with huge count of objects. The code you provided looks good...
Have you tried multiple buffers (a.k.a. Double Buffering) for the bitmaps?
The typical scenario is to draw in one buffer, then while the first buffer is copied to the screen, draw in a second buffer.
Another technique is to have a huge "logical" screen in memory. The portion draw in the physical display is a viewport or view into a small area in the logical screen. Moving the background (or screen) just requires a copy on the part of the graphics processor.
You can aid batching of sprite draw calls. Presumably Your draw call calls your only instance of ID3DXSprite::Draw with the relevant parameters.
You can get much improved performance by doing a call to ID3DXSprite::Begin (with the D3DXSPRITE_SORT_TEXTURE flag set) and then calling ID3DXSprite::End when you've done all your rendering. ID3DXSprite will then sort all your sprite calls by texture to decrease the number of texture switches and batch the relevant calls together. This will improve performance massively.
Its difficult to say more, however, without seeing the internals of your Update and Draw calls. The above is only a guess ...
To draw every single wall with a different draw call is a bad idea. Try to batch the data into a single vertex buffer/index buffer and send them into a single draw. That's a more sane idea.
Anyway for getting an idea of WHY it goes slowly try with some CPU and GPU (PerfHud, Intel GPA, etc...) to know first of all WHAT's the bottleneck (if the CPU or the GPU). And then you can fight to alleviate the problem.
The lookups into your list of walls are unlikely to be the source of your slowdown. The cost of drawing objects in 3D will typically be the limiting factor.
The important parts are your draw code, the flags you used to create the DirectX device, and the flags you use to create your textures. My stab in the dark... check that you initialize the device as HAL (hardware 3d) rather than REF (software 3d).
Also, how many sprites are you drawing? Each draw call has a fair amount of overhead. If you make more than couple-hundred per frame, that will be your limiting factor.