How to attain minimum time dealy in rendering texture in SDL2? - sdl

I tested how much millisecond it takes to render 1280*720 picture into the texture by below code. I got the result in milliseconds in between the values (47 to 75) .
Due to this , i get the latency, as i m trying to display 30 frames per second, which means maximum of 33.3 milliseconds is needed to render.
1) Is it the right way of measuring the time frame?
2) Any quirk i need to be aware of ?
startingTime = SDL_GetTicks ();
SDL_UpdateYUVTexture(bmp, NULL, pFrame->data[0], pFrame->linesize[0],
pFrame->data[1], pFrame->linesize[1],
pFrame->data[2], pFrame->linesize[2]);
SDL_RenderClear(renderer);
SDL_RenderCopy(renderer, bmp, NULL, NULL);
SDL_RenderPresent(renderer);
eTime = SDL_GetTicks ();
printf (" Time taken for Rendering... %ld ", eTime - sTime);
PS NOTE:
I have tested with each API, where it look, SDL_UpdateYUVTexture takes most of the milliseconds, whereas other API's take hardly 0 to 1 ms. Rest of the time is occupied by SDL_UpdateTexture.

Try using OPENGLES library for rendering the frames in the screen,SDL_UpdateYUVTexture() api will generally takes more time for update.

Related

Capping & Calculating FPS in SDL?

I have written this code to cap the SDL game to 60 fps, is this the correct way of implementing an fps count?
int fps=60;
int desiredDelta=1000/fps; //desired time b/w frames
while (gameRunning)
{
int starttick=SDL_GetTicks();
// Get our controls and events
while (SDL_PollEvent(&event))
{
if (event.type == SDL_QUIT)
gameRunning = false;
}
window.Clear();
for(Entity& e: entities)
{
window.Render(e);
}
window.Display();
int delta=SDL_GetTicks()-starttick; //actual time b/w frames
int avgFPS=1000/(desiredDelta-delta); //calculating FPS HERE
if(delta<desiredDelta)
{
SDL_Delay(desiredDelta-delta);
}
std::cout<<avgFPS<<std::endl;
}
In games there are usually two "framerates" to worry about: the framerate of the display and the framerate of your game physics. Ideally, you draw at the native framerate of the display. To do this, ensure you pass the SDL_RENDERER_PRESENTVSYNC flag to SDL_CreateRenderer().
For the physics of your game, you can either measure the time between two rendered frames, and use that as your physics time step, so that you basically match the display framerate. The advantage is that your rates are naturally synced, but the disadvantage is that the physics of the game might suffer from accuracy problems, especially during times when rendering takes so long that the display framerate drops to low values.
Alternatively, you use a fixed framerate for your physics. However, if you do the latter then you have to some how deal with the two framerates not being in sync, but this is not solved by adding SDL_Delay()s. The usual approach is to have the rendering thread interpolate animations based on physics as necessary.
A more practical issue in your code is that SDL_GetTicks() gives you the time in milliseconds. That is not very accurate; if your display runs at 60 fps, then one frame takes 16.66666... milliseconds. Your calculation will have an error of up to 1 ms, which is 6.25% of the length of one frame. Depending on the type of game, this can actually be noticable!

Is it okay to call SDL_RenderCopy() for each sprite?

This is a followup to my question here: Is it okay to have a SDL_Surface and SDL_Texture for each sprite?
I made an class called entity each having a SDL_Texture, which is set in the constructor and then a member function render() is called for every onscreen entity in a vector, which uses SDL_RenderCopy() to draw to the renderer.
This render() function includes generating rectangles for each sprite based on their position/cameradata
Is this okay? Is there a faster way?
I made a testlevel with 96 sprites that each take up 2% of the screen with tons of overdraw and ft is 15ms (~65fps)at a resolution of1600x900. Seems a little slow for just some sprites, and my computer breathes much heavier then when playing a full game such as spelunky or isaac.
Prefer frame time over FPS
You want to measure and judge your performance based on the frame time not FPS. Because the relation between the two is not linear. Going from 20 FPS to 30 FPS needs about 16.7 ms worth of optimization. That is the same amount of performance gain in optimization it takes to get from 30 FPS to 60 FPS. So if you judge performance based on FPS you would come to conclusion that a particular "optimization" that increased the FPS from 30 to 60 is better that the one that made a 20 FPS scene run 31 FPS. while the latter is actually a better optimization.
Batch your draws
If you pack all your textures into one and store each individual image's coordinates, you can use the same texture to draw many of your objects. This is limited by the size and number of your textures and also the maximum texture size supported in your environment. In my experiences 4096x4096 is safe but I prefer to use 2048x2048 "texture atlases". There are many utility programs to make such textures. You can easily find a suitable one by doing a Google search.
In this setup in addition to a SDL texture, each sprite also has the x, y, width and height of the region in the "big" texture containing the particular image needed. You can make a TextureRegion class. Each sprite then has a TextureRegion. This whole process is often referred to as batching. Look it up. The whole idea is to minimize state changes. I am not sure if it applies to software rendering or to all of SDL2 backends.
Cache your transformations
Batching your sprites will increase the performance in the GPU side. The CPU bound code is another optimization opportunity. Instead of calculating the parameters of SDL_RenderCopy in each frame, calculate them once and cache them. Then when the position/rotation of the camera or object changes, recalculate the cache. You can do this in "accessors" of your entity class (like setPosition, setRotaion, etc..). Note that instead of directly recalculating transform as soon as a position or rotation changes your want to flag the object as "dirty" and check for the dirty flag in the your render function. if this->isDirty Then recalculate and cache the transform. This prevents redundant calculations when you do this:
//if dirty flag is not used each of the following function calls
//would have resulted in a recalculation of transforms. However by
//using the dirty flag they will be calculated only once before
//the rendering of next frame in the render() function.
player->setPostion(start_x,start_y);
player->setRotation(0);
camera->reset();
So, I've done some more testing by examining the memory/cpu usage of this program at full screen with a "demanding" level and managed to make it similar to other games by enforcing a framerate cap with SDL_Wait()
float g_max_framerate = 60;
float g_max_frametime = 1/g_max_framerate * 1000;
...
while (!quit) {
lastticks = ticks;
ticks = SDL_GetTicks();
elapsed = ticks - lastticks;
...
SDL_RenderPresent(renderer);
//lock framerate
if(elapsed < g_max_frametime) {
SDL_Delay(g_max_frametime - elapsed);
}
}
With this limitation it is appropriatly lowspec.

How to reduce OpenGL CPU usage and/or how to use OpenGL properly

I'm working a on a Micromouse simulation application built with OpenGL, and I have a hunch that I'm not doing things properly. In particular, I'm suspicious about the way I am getting my (mostly static) graphics to refresh at a close-to-constant framerate (60 FPS). My approach is as follows:
1) Start a timer
2) Draw my shapes and text (about a thousand of them):
glBegin(GL_POLYGON);
for (Cartesian vertex : polygon.getVertices()) {
std::pair<float, float> coordinates = getOpenGlCoordinates(vertex);
glVertex2f(coordinates.first, coordinates.second);
}
glEnd();
and
glPushMatrix();
glScalef(scaleX, scaleY, 0);
glTranslatef(coordinates.first * 1.0/scaleX, coordinates.second * 1.0/scaleY, 0);
for (int i = 0; i < text.size(); i += 1) {
glutStrokeCharacter(GLUT_STROKE_MONO_ROMAN, text.at(i));
}
glPopMatrix();
3) Call
glFlush();
4) Stop the timer
5) Sleep for (1/FPS - duration) seconds
6) Call
glutPostRedisplay();
The "problem" is that the above approach really hogs my CPU - the process is using something like 96-100%. I know that there isn't anything inherently wrong with using lots of CPU, but I feel like I shouldn't be using that much all of the time.
The kicker is that most of the graphics don't change from frame to frame. It's really just a single polygon moving over (and covering up) some static shapes. Is there any way to tell OpenGL to only redraw what has changed since the previous frame (with the hope it would reduce the number of glxxx calls, which I've deemed to be the source of the "problem")? Or, better yet, is my approach to getting my graphics to refresh even correct?
First and foremost the biggest CPU hog with OpenGL is immediate modeā€¦ and you're using it (glBegin, glEnd). The problem with IM is, that every single vertex requires a whole couple of OpenGL calls being made; and because OpenGL uses a thread local state this means that each and every OpenGL call must go through some indirection. So the first step would be getting rid of that.
The next issue is with how you're timing your display. If low latency between user input and display is not your ultimate goal the standard approach would setting up the window for double buffering, enabling V-Sync, set a swap interval of 1 and do a buffer swap (glutSwapBuffers) once the frame is rendered. The exact timings what and where things will block are implementation dependent (unfortunately), but you're more or less guaranteed to exactly hit your screen refresh frequency, as long as your renderer is able to keep up (i.e. rendering a frame takes less time that a screen refresh interval).
glutPostRedisplay merely sets a flag for the main loop to call the display function if no further events are pending, so timing a frame redraw through that is not very accurate.
Last but not least you may be simply mocked by the way Windows does account CPU time (time spent in driver context, which includes blocking, waiting for V-Sync) will be accouted to the consumed CPU time, while it's in fact interruptible sleep. However you wrote, that you already do a sleep in your code, which would rule that out, because the go-to approach to get a more reasonable accounting would be adding a Sleep(1) before or after the buffer swap.
I found that by putting render thread to sleep helps reducing cpu usage from (my case) 26% to around 8%
#include <chrono>
#include <thread>
void render_loop(){
...
auto const start_time = std::chrono::steady_clock::now();
auto const wait_time = std::chrono::milliseconds{ 17 };
auto next_time = start_time + wait_time;
while(true){
...
// execute once after thread wakes up every 17ms which is theoretically 60 frames per
// second
auto then = std::chrono::high_resolution_clock::now();
std::this_thread::sleep_until(next_time);
...rendering jobs
auto elasped_time =
std::chrono::duration_cast<std::chrono::milliseconds> (std::chrono::high_resolution_clock::now() - then);
std::cout << "ms: " << elasped_time.count() << '\n';
next_time += wait_time;
}
}
I thought about attempting to measure the frame rate while the thread is asleep but there isn't any reason for my use case to attempt that. The result was averaging around 16ms so I thought it was good enough
Inspired by this post

Understanding SDL frame animation

I am working through the book SDL game development. In the first project, there is a bit of code meant to move the coords of the rendered frame of a sprite sheet:
void Game::update()
{
m_sourceRectangle.x = 128 * int((SDL_GetTicks()/100)%6);
}
I am having trouble understanding this... I know that it moves m_sourceRectangle 128 pixels along the x axis every 100 ms... but how does it actually work? Can somebody breakdown each element of this code to help me understand?
I don't understand why SDL_GetTicks() needs to be called to do this...
I also know that %6 is there because there are 6 frames in the animation... but how does it actually do that?
The book says:
Here we have used SDL_GetTicks() to find out the amount of milliseconds since SDL was initialized. We then divide this by the amount of time (in ms) we want between frames and then use the modulo operator to keep
it in range of the amount of frames we have in our animation. This code will (every 100 milliseconds) shift the x value of our source rectangle by
128 pixels (the width of a frame), multiplied by the current frame we want, giving us the correct position. Build the project and you should see the animation displayed.
But I am not sure I understand why getting the amount of milliseconds since SDL was initialized works.
The modulo operator takes the rest of a division. So for example if GetTicks() is 2600, first dividing by 100 makes it 26 and modulo 6 of 26 is 2. Therefore it's frame 2.
if GetTicks() is 3300; you divide by 100 and get 33; modulo 6 of 33 is 3; frame 3.
Each frame will be displayed for 100ms, so at T=0ms it's Frame 0, t=100ms it's Frame 100/100, at T=200ms it's Frame 200/100 and so on. So at T=SDL_GetTicks() ms, it's Frame SDL_GetTicks()/100. But than you only have 6 frames all together and cycling, therefore at T=SDL_GetTicks() ms it's in face Frame (SDL_GetTicks()/100) % 6.
There is an assumption here is that when the program start, Frame 0 is displayed, which may not be true because there are lots of things to do at starting which take time. But for simple demo to illustrate cycling of frames, it is good enough.
Hope this helps.

Linear movement stutter

I have created simple, frame independent, variable time step, linear movement in Direct3D9 using ID3DXSprite. Most users cant notice it, but on some (including mine) computers it happens often and sometimes it stutters a lot.
Stuttering occurs with VSync enabled and disabled.
I figured out that same happens in OpenGL renderer.
Its not floating point problem.
Seems like problem only exist in AERO Transparent Glass windowed mode (fine or at least much less noticeable in fullscreen, borderless full screen window or with aero disabled), even worse when window lost focus.
EDIT:
Frame delta time doesnt leave bounds 16 .. 17 ms even when stuttering occurs.
Seems like my frame delta time measurement log code was bugged. I fixed it now.
Normally with VSync enabled frame renders 17ms, but sometimes (probably when sutttering happens) it jumps to 25-30ms.
(I dump log only once at application exit, not while running, rendering, so its does not affect performance)
device->Clear(0, 0, D3DCLEAR_TARGET, D3DCOLOR_ARGB(255, 255, 255, 255), 0, 0);
device->BeginScene();
sprite->Begin(D3DXSPRITE_ALPHABLEND);
QueryPerformanceCounter(&counter);
float time = counter.QuadPart / (float) frequency.QuadPart;
float deltaTime = time - currentTime;
currentTime = time;
position.x += velocity * deltaTime;
if (position.x > 640)
velocity = -250;
else if (position.x < 0)
velocity = 250;
position.x = (int) position.x;
sprite->Draw(texture, 0, 0, &position, D3DCOLOR_ARGB(255, 255, 255, 255));
sprite->End();
device->EndScene();
device->Present(0, 0, 0, 0);
Fixed timer thanks to Eduard Wirch and Ben Voigt (although it doesnt fix initial problem)
float time()
{
static LARGE_INTEGER start = {0};
static LARGE_INTEGER frequency;
if (start.QuadPart == 0)
{
QueryPerformanceFrequency(&frequency);
QueryPerformanceCounter(&start);
}
LARGE_INTEGER counter;
QueryPerformanceCounter(&counter);
return (float) ((counter.QuadPart - start.QuadPart) / (double) frequency.QuadPart);
}
EDIT #2:
So far I have tried three update methods:
1) Variable time step
x += velocity * deltaTime;
2) Fixed time step
x += 4;
3) Fixed time step + Interpolation
accumulator += deltaTime;
float updateTime = 0.001f;
while (accumulator > updateTime)
{
previousX = x;
x += velocity * updateTime;
accumulator -= updateTime;
}
float alpha = accumulator / updateTime;
float interpolatedX = x * alpha + previousX * (1 - alpha);
All methods work pretty much same, fixed time step looks better, but it's not quite an option to depend on frame rate and it doesn't solve problem completely (still jumps (stutters) from time to time rarely).
So far disabling AERO Transparent Glass or going full screen is only significant positive change.
I am using NVIDIA latest drivers GeForce 332.21 Driver and Windows 7 x64 Ultimate.
Part of the solution was a simple precision data type problem. Exchange the speed calculation by a constant, and you'll see a extremely smooth movement. Analysing the calculation showed that you're storing the result from QueryPerformanceCounter() inside a float. QueryPerformanceCounter() returns a number which looks like this on my computer: 724032629776. This number requires at least 5 bytes to be stored. How ever a float uses 4 bytes (and only 24 bits for actual number) to store the value. So precision is lost when you convert the result of QueryPerformanceCounter() to float. And sometimes this leads to a deltaTime of zero causing stuttering.
This explains partly why some users do not experience this problem. It all depends on if the result of QueryPerformanceCounter() does fit into a float.
The solution for this part of the problem is: use double (or as Ben Voigt suggested: store the initial performance counter, and subtract this from new values before converting to float. This would give you at least more head room, but might eventually hit the float resolution limit again, when the application runs for a long time (depends on the growth speed of the performance counter).)
After fixing this, the stuttering was much less but did not disappear completely. Analyzing the runtime behaviour showed that a frame is skipped now and then. The application GPU command buffer is flushed by Present but the present command remains in the application context queue until the next vsync (even though Present was invoked long before vsync (14ms)). Further analysis showed that a back ground process (f.lux) told the system to set the gamma ramp once in a while. This command required the complete GPU queue to run dry before it was executed. Probably to avoid side effects. This GPU flush was started just before the 'present' command was moved to the GPU queue. The system blocked the video scheduling until the GPU ran dry. This took until the next vsync. So the present packet was not moved to GPU queue until the next frame. The visible effect of this: stutter.
It's unlikely that you're running f.lux on your computer too. But you're probably experiencing a similar background intervention. You'll need to look for the source of the problem on your system yourself. I've written a blog post about how to diagnose frame skips: Diagnose frame skips and stutter in DirectX applications. You'll also find the whole story of diagnosing f.lux as the culprit there.
But even if you find the source of your frame skip, I doubt that you'll achieve stable 60fps while dwm window composition is enabled. The reason is, you're not drawing to the screen directly. But instead you draw to a shared surface of dwm. Since it's a shared resource it can be locked by others for an arbitrary amount of time making it impossible for you to keep the frame rate stable for your application. If you really need a stable frame rate, go full screen, or disable window composition (on Windows 7. Windows 8 does not allow disabling window composition):
#include <dwmapi.h>
...
HRESULT hr = DwmEnableComposition(DWM_EC_DISABLECOMPOSITION);
if (!SUCCEEDED(hr)) {
// log message or react in a different way
}
I took a look at your source code and noticed that you only process one window message every frame. For me this caused stuttering in the past.
I would recommend to loop on PeekMessage until it returns zero to indicate that the message queue is exhausted. After that render a frame.
So change:
if (PeekMessageW(&message, 0, 0, 0, PM_REMOVE))
to
while (PeekMessageW(&message, 0, 0, 0, PM_REMOVE))
Edit:
I compiled and ran you code (with another texture) and it displayed the movement smoothly for me. I don't have aero though (Windows 8).
One thing I noticed: You set D3DCREATE_SOFTWARE_VERTEXPROCESSING. Have you tried to set this to D3DCREATE_HARDWARE_VERTEXPROCESSING?