Fixed timestep stuttering with VSync on - c++

In a 2D OpenGL engine I implemented I have a fixed timestep as described in the famous fix your timestep article, along with blending.
I have a test object that moves vertically (y axis). There is stuttering in the movement (preprogrammed movement, not from user input). This means the object does not move smoothly across the screen.
Please see the uncompressed video I am linking: LINK
The game framerate stays at 60fps (Vsync turned on from Nvidia driver)
The game logic updates at a fixed 20 updates/ticks per second, set by me. This is normal. The object moves 50 pixels per update.
However the movement on the screen is severely stuttering.
EDIT: I noticed by stepping in the recorded video above frame by frame that the stuttering is caused by a frame being shown twice.
EDIT2: Setting the application priority to Realtime in the task manager completely eliminates the stutter! However this obviously isn't a solution.
Below is the object y movement delta at different times, with VSync turned off
First column is the elapsed time since last frame, in microseconds (ex 4403 )
Second column is movement on the y axis of an object since last frame.
Effectively, the object moves 1000 pixels per second, and the log below confirms it.
time since last frame: 4403 ypos delta since last frame: 4.403015
time since last frame: 3807 ypos delta since last frame: 3.806976
time since last frame: 3716 ypos delta since last frame: 3.716003
time since last frame: 3859 ypos delta since last frame: 3.859009
time since last frame: 4398 ypos delta since last frame: 4.398010
time since last frame: 8961 ypos delta since last frame: 8.960999
time since last frame: 7871 ypos delta since last frame: 7.871002
time since last frame: 3985 ypos delta since last frame: 3.984985
time since last frame: 3684 ypos delta since last frame: 3.684021
Now with VSync turned on
time since last frame: 17629 ypos delta since last frame: 17.628906
time since last frame: 15688 ypos delta since last frame: 15.687988
time since last frame: 16641 ypos delta since last frame: 16.641113
time since last frame: 16657 ypos delta since last frame: 16.656738
time since last frame: 16715 ypos delta since last frame: 16.715332
time since last frame: 16663 ypos delta since last frame: 16.663086
time since last frame: 16666 ypos delta since last frame: 16.665771
time since last frame: 16704 ypos delta since last frame: 16.704102
time since last frame: 16626 ypos delta since last frame: 16.625732
I would say they look ok.
This has been driving me bonkers for days, what am I missing?
Below is my Frame function which is called in a loop:
void Frame()
{
static sf::Time t;
static const double ticksPerSecond = 20;
static uint64_t stepSizeMicro = 1000000 / ticksPerSecond; // microseconds
static sf::Time accumulator = sf::seconds(0);
gElapsedTotal = gClock.getElapsedTime();
sf::Time elapsedSinceLastFrame = gElapsedTotal - gLastFrameTime;
gLastFrameTime = gElapsedTotal;
if (elapsedSinceLastFrame.asMicroseconds() > 250000 )
elapsedSinceLastFrame = sf::microseconds(250000);
accumulator += elapsedSinceLastFrame;
while (accumulator.asMicroseconds() >= stepSizeMicro)
{
Update(stepSizeMicro / 1000000.f);
gGameTime += sf::microseconds(stepSizeMicro);
accumulator -= sf::microseconds(stepSizeMicro);
}
uint64_t blendMicro = accumulator.asMicroseconds() / stepSizeMicro;
float blend = accumulator.asMicroseconds() / (float) stepSizeMicro;
if (rand() % 200 == 0) Trace("blend: %f", blend);
CWorld::GetInstance()->Draw(blend);
}
More info as requested in the comments:
stuttering occurs both while in fullscreen 1920x1080 and in window mode 1600x900
the setup is a simple SFML project. I'm not aware if it uses VBO/VAO internally when rendering textured rectangles
not doing anything else on my computer. Keep in mind this issue occurs on other computers as well, it's not just my rig
am running on primary display. The display doesn't really make a difference. The issue occurs both in fullscreen and window mode.

I have profiled my own code. The issue was there was an area of my code that occasionally had performance spikes due to cache misses. This caused my loop to take longer than 16.6666 milliseconds, the max time it should take to display smoothly at 60Hz. This was only one frame, once in a while. That frame caused the stuttering. The code logic itself was correct, this proved to be a performance issue.
For future reference in hopes that this will help other people, how I debugged this was I put an
if ( timeSinceLastFrame > 16000 ) // microseconds
{
Trace("Slow frame detected");
DisplayProfilingInformation();
}
in my frame code. When the if is triggered, it displays profiling stats for the functions in the last frame, to see which function took the longest in the previous frame. I was thus able to pinpoint the performance bug to a structure that was not suitable for its usage. A big, nasty map of maps that generated a lot of cache misses and occasionally spiked in performance.
I hope this helps future unfortunate souls.

It seems like you're not synchronizing your 60Hz frame loop with the GPU's 60Hz VSync. Yes, you have enabled Vsync in Nvidia but that only causes Nvidia to use a back-buffer which is swapped on the Vsync.
You need to set the swap interval to 1 and perform a glFinish() to wait for the Vsync.

A tricky one, but from the above it seems to me this is not a 'frame rate' problem, but rather somewhere in your 'animate' code. Another observation is the line "Update(stepSizeMicro / 1000000.f);". the divide by 1000000.f could mean you are losing resolution due to the limitations of floating point numbers bit resolution, so rounding could be your killer?

Related

How to reduce OpenGL CPU usage and/or how to use OpenGL properly

I'm working a on a Micromouse simulation application built with OpenGL, and I have a hunch that I'm not doing things properly. In particular, I'm suspicious about the way I am getting my (mostly static) graphics to refresh at a close-to-constant framerate (60 FPS). My approach is as follows:
1) Start a timer
2) Draw my shapes and text (about a thousand of them):
glBegin(GL_POLYGON);
for (Cartesian vertex : polygon.getVertices()) {
std::pair<float, float> coordinates = getOpenGlCoordinates(vertex);
glVertex2f(coordinates.first, coordinates.second);
}
glEnd();
and
glPushMatrix();
glScalef(scaleX, scaleY, 0);
glTranslatef(coordinates.first * 1.0/scaleX, coordinates.second * 1.0/scaleY, 0);
for (int i = 0; i < text.size(); i += 1) {
glutStrokeCharacter(GLUT_STROKE_MONO_ROMAN, text.at(i));
}
glPopMatrix();
3) Call
glFlush();
4) Stop the timer
5) Sleep for (1/FPS - duration) seconds
6) Call
glutPostRedisplay();
The "problem" is that the above approach really hogs my CPU - the process is using something like 96-100%. I know that there isn't anything inherently wrong with using lots of CPU, but I feel like I shouldn't be using that much all of the time.
The kicker is that most of the graphics don't change from frame to frame. It's really just a single polygon moving over (and covering up) some static shapes. Is there any way to tell OpenGL to only redraw what has changed since the previous frame (with the hope it would reduce the number of glxxx calls, which I've deemed to be the source of the "problem")? Or, better yet, is my approach to getting my graphics to refresh even correct?
First and foremost the biggest CPU hog with OpenGL is immediate modeā€¦ and you're using it (glBegin, glEnd). The problem with IM is, that every single vertex requires a whole couple of OpenGL calls being made; and because OpenGL uses a thread local state this means that each and every OpenGL call must go through some indirection. So the first step would be getting rid of that.
The next issue is with how you're timing your display. If low latency between user input and display is not your ultimate goal the standard approach would setting up the window for double buffering, enabling V-Sync, set a swap interval of 1 and do a buffer swap (glutSwapBuffers) once the frame is rendered. The exact timings what and where things will block are implementation dependent (unfortunately), but you're more or less guaranteed to exactly hit your screen refresh frequency, as long as your renderer is able to keep up (i.e. rendering a frame takes less time that a screen refresh interval).
glutPostRedisplay merely sets a flag for the main loop to call the display function if no further events are pending, so timing a frame redraw through that is not very accurate.
Last but not least you may be simply mocked by the way Windows does account CPU time (time spent in driver context, which includes blocking, waiting for V-Sync) will be accouted to the consumed CPU time, while it's in fact interruptible sleep. However you wrote, that you already do a sleep in your code, which would rule that out, because the go-to approach to get a more reasonable accounting would be adding a Sleep(1) before or after the buffer swap.
I found that by putting render thread to sleep helps reducing cpu usage from (my case) 26% to around 8%
#include <chrono>
#include <thread>
void render_loop(){
...
auto const start_time = std::chrono::steady_clock::now();
auto const wait_time = std::chrono::milliseconds{ 17 };
auto next_time = start_time + wait_time;
while(true){
...
// execute once after thread wakes up every 17ms which is theoretically 60 frames per
// second
auto then = std::chrono::high_resolution_clock::now();
std::this_thread::sleep_until(next_time);
...rendering jobs
auto elasped_time =
std::chrono::duration_cast<std::chrono::milliseconds> (std::chrono::high_resolution_clock::now() - then);
std::cout << "ms: " << elasped_time.count() << '\n';
next_time += wait_time;
}
}
I thought about attempting to measure the frame rate while the thread is asleep but there isn't any reason for my use case to attempt that. The result was averaging around 16ms so I thought it was good enough
Inspired by this post

Understanding SDL frame animation

I am working through the book SDL game development. In the first project, there is a bit of code meant to move the coords of the rendered frame of a sprite sheet:
void Game::update()
{
m_sourceRectangle.x = 128 * int((SDL_GetTicks()/100)%6);
}
I am having trouble understanding this... I know that it moves m_sourceRectangle 128 pixels along the x axis every 100 ms... but how does it actually work? Can somebody breakdown each element of this code to help me understand?
I don't understand why SDL_GetTicks() needs to be called to do this...
I also know that %6 is there because there are 6 frames in the animation... but how does it actually do that?
The book says:
Here we have used SDL_GetTicks() to find out the amount of milliseconds since SDL was initialized. We then divide this by the amount of time (in ms) we want between frames and then use the modulo operator to keep
it in range of the amount of frames we have in our animation. This code will (every 100 milliseconds) shift the x value of our source rectangle by
128 pixels (the width of a frame), multiplied by the current frame we want, giving us the correct position. Build the project and you should see the animation displayed.
But I am not sure I understand why getting the amount of milliseconds since SDL was initialized works.
The modulo operator takes the rest of a division. So for example if GetTicks() is 2600, first dividing by 100 makes it 26 and modulo 6 of 26 is 2. Therefore it's frame 2.
if GetTicks() is 3300; you divide by 100 and get 33; modulo 6 of 33 is 3; frame 3.
Each frame will be displayed for 100ms, so at T=0ms it's Frame 0, t=100ms it's Frame 100/100, at T=200ms it's Frame 200/100 and so on. So at T=SDL_GetTicks() ms, it's Frame SDL_GetTicks()/100. But than you only have 6 frames all together and cycling, therefore at T=SDL_GetTicks() ms it's in face Frame (SDL_GetTicks()/100) % 6.
There is an assumption here is that when the program start, Frame 0 is displayed, which may not be true because there are lots of things to do at starting which take time. But for simple demo to illustrate cycling of frames, it is good enough.
Hope this helps.

Linear movement stutter

I have created simple, frame independent, variable time step, linear movement in Direct3D9 using ID3DXSprite. Most users cant notice it, but on some (including mine) computers it happens often and sometimes it stutters a lot.
Stuttering occurs with VSync enabled and disabled.
I figured out that same happens in OpenGL renderer.
Its not floating point problem.
Seems like problem only exist in AERO Transparent Glass windowed mode (fine or at least much less noticeable in fullscreen, borderless full screen window or with aero disabled), even worse when window lost focus.
EDIT:
Frame delta time doesnt leave bounds 16 .. 17 ms even when stuttering occurs.
Seems like my frame delta time measurement log code was bugged. I fixed it now.
Normally with VSync enabled frame renders 17ms, but sometimes (probably when sutttering happens) it jumps to 25-30ms.
(I dump log only once at application exit, not while running, rendering, so its does not affect performance)
device->Clear(0, 0, D3DCLEAR_TARGET, D3DCOLOR_ARGB(255, 255, 255, 255), 0, 0);
device->BeginScene();
sprite->Begin(D3DXSPRITE_ALPHABLEND);
QueryPerformanceCounter(&counter);
float time = counter.QuadPart / (float) frequency.QuadPart;
float deltaTime = time - currentTime;
currentTime = time;
position.x += velocity * deltaTime;
if (position.x > 640)
velocity = -250;
else if (position.x < 0)
velocity = 250;
position.x = (int) position.x;
sprite->Draw(texture, 0, 0, &position, D3DCOLOR_ARGB(255, 255, 255, 255));
sprite->End();
device->EndScene();
device->Present(0, 0, 0, 0);
Fixed timer thanks to Eduard Wirch and Ben Voigt (although it doesnt fix initial problem)
float time()
{
static LARGE_INTEGER start = {0};
static LARGE_INTEGER frequency;
if (start.QuadPart == 0)
{
QueryPerformanceFrequency(&frequency);
QueryPerformanceCounter(&start);
}
LARGE_INTEGER counter;
QueryPerformanceCounter(&counter);
return (float) ((counter.QuadPart - start.QuadPart) / (double) frequency.QuadPart);
}
EDIT #2:
So far I have tried three update methods:
1) Variable time step
x += velocity * deltaTime;
2) Fixed time step
x += 4;
3) Fixed time step + Interpolation
accumulator += deltaTime;
float updateTime = 0.001f;
while (accumulator > updateTime)
{
previousX = x;
x += velocity * updateTime;
accumulator -= updateTime;
}
float alpha = accumulator / updateTime;
float interpolatedX = x * alpha + previousX * (1 - alpha);
All methods work pretty much same, fixed time step looks better, but it's not quite an option to depend on frame rate and it doesn't solve problem completely (still jumps (stutters) from time to time rarely).
So far disabling AERO Transparent Glass or going full screen is only significant positive change.
I am using NVIDIA latest drivers GeForce 332.21 Driver and Windows 7 x64 Ultimate.
Part of the solution was a simple precision data type problem. Exchange the speed calculation by a constant, and you'll see a extremely smooth movement. Analysing the calculation showed that you're storing the result from QueryPerformanceCounter() inside a float. QueryPerformanceCounter() returns a number which looks like this on my computer: 724032629776. This number requires at least 5 bytes to be stored. How ever a float uses 4 bytes (and only 24 bits for actual number) to store the value. So precision is lost when you convert the result of QueryPerformanceCounter() to float. And sometimes this leads to a deltaTime of zero causing stuttering.
This explains partly why some users do not experience this problem. It all depends on if the result of QueryPerformanceCounter() does fit into a float.
The solution for this part of the problem is: use double (or as Ben Voigt suggested: store the initial performance counter, and subtract this from new values before converting to float. This would give you at least more head room, but might eventually hit the float resolution limit again, when the application runs for a long time (depends on the growth speed of the performance counter).)
After fixing this, the stuttering was much less but did not disappear completely. Analyzing the runtime behaviour showed that a frame is skipped now and then. The application GPU command buffer is flushed by Present but the present command remains in the application context queue until the next vsync (even though Present was invoked long before vsync (14ms)). Further analysis showed that a back ground process (f.lux) told the system to set the gamma ramp once in a while. This command required the complete GPU queue to run dry before it was executed. Probably to avoid side effects. This GPU flush was started just before the 'present' command was moved to the GPU queue. The system blocked the video scheduling until the GPU ran dry. This took until the next vsync. So the present packet was not moved to GPU queue until the next frame. The visible effect of this: stutter.
It's unlikely that you're running f.lux on your computer too. But you're probably experiencing a similar background intervention. You'll need to look for the source of the problem on your system yourself. I've written a blog post about how to diagnose frame skips: Diagnose frame skips and stutter in DirectX applications. You'll also find the whole story of diagnosing f.lux as the culprit there.
But even if you find the source of your frame skip, I doubt that you'll achieve stable 60fps while dwm window composition is enabled. The reason is, you're not drawing to the screen directly. But instead you draw to a shared surface of dwm. Since it's a shared resource it can be locked by others for an arbitrary amount of time making it impossible for you to keep the frame rate stable for your application. If you really need a stable frame rate, go full screen, or disable window composition (on Windows 7. Windows 8 does not allow disabling window composition):
#include <dwmapi.h>
...
HRESULT hr = DwmEnableComposition(DWM_EC_DISABLECOMPOSITION);
if (!SUCCEEDED(hr)) {
// log message or react in a different way
}
I took a look at your source code and noticed that you only process one window message every frame. For me this caused stuttering in the past.
I would recommend to loop on PeekMessage until it returns zero to indicate that the message queue is exhausted. After that render a frame.
So change:
if (PeekMessageW(&message, 0, 0, 0, PM_REMOVE))
to
while (PeekMessageW(&message, 0, 0, 0, PM_REMOVE))
Edit:
I compiled and ran you code (with another texture) and it displayed the movement smoothly for me. I don't have aero though (Windows 8).
One thing I noticed: You set D3DCREATE_SOFTWARE_VERTEXPROCESSING. Have you tried to set this to D3DCREATE_HARDWARE_VERTEXPROCESSING?

Replay system for cocos2d with FPS correction

I'm building a replay system for my cocos2d-x game, it uses box2d too.
The game is nondeterministic so I can't store the user actions and replay them, so what I'm doing is storing position and angle of a sprite for each time step into a stream.
Now I have replay files that are in the following format
x,y,angle\r\n
Each line represents the state of the sprite at a given time step.
Now when I replay it back on the same device, it's all perfect, but of course life isn't this easy, so the problem arises if i want to play it back on different frame rates, how can this happen ?
is there a smart way to handle this issue ?
I don't have a fixed time step too, I'll implement this soon so you can consider a fixed time step in your answer :)
Thanks !
Ok, you record the delta time along with each recorded frame. Best to start recording the first frame with a delta time of 0 and assume the same for the first frame of the playback.
Then in your update method accumulate the current delta time:
-(void) update:(ccTime)delta
{
if (firstFrame)
{
totalTime = 0;
totalRecordedTime = 0;
firstFrame = NO;
}
else
{
totalTime += delta;
}
totalRecordedTime += [self getRecordedDeltaForFrameIndex:recordedFrameIndex];
if (totalTime > totalRecordedTime)
{
[self advanceRecordedFrameIndex];
}
// alternative:
id recordedFrameData = [self getRecordedFrameForDeltaTime:totalTime];
// apply recorded data ...
}
Alternative: create a method that returns or "applies" the recorded frame based on the current time delta. Pick the frame whose recorded total delta time is equal or lower the current total delta time.
To make searching for the frame faster, remember the most recent recorded frame's index and start searching from there (compare each frame's delta time), always including the most recent frame as well.
Occassionally this will display the same recorded frame multiple times, or it may skip recorded frames - depending on the actual device framerate and the framerate during recording.

Cocos2D Moving sprite sometimes jerky

Here I want to move some cocos2D sprite object from top to bottom. Sprites r generated at random position in screen. Some time all sprite's movement s jerky..I can't use CCMove as I want to maintain equal distance between sprite.
[self schedule: #selector(updateObjects:)];
-(void)updateObjects:(ccTime) dt
{
//when I print dt, it gives different value..
//jerk comes when this value s larger than ideal value..
for(Obstacles *Obs in ObsArray)
{
CGPoint pos = Obs.position;
pos.y -= gameSpeed;
Obs.position = pos;
}
}
How can I solve this problem.
Cocos2d uses variable time step: dt is time in seconds since last call of this scheduled selector. If your gameSpeed is distance the object has to move by per second (in points), then you must change your object position by gameSpeed * dt.
Resolved problem by
1. Removed all printf and cocos2D logs
2. Added separate thread for image loading.
3. Used multiple 1024x1024 sprite sheet in place of 2048x2048 in HD mode.
I had the same issue i tried lots of thing, at the end i found that it will jerk more on android devices then ios devices.. the frame-rate they rendered should be updated on constant interval .. if you use update method then set the constant delta time
For example i have used update method
void MyClass::update(float dt) {
int positioniteration=8;
int valocityiteration=8;
world->Step(1/60.0,valocityiteration,positioniteration);//60 FPS }
and also i have added these lines in appdelegate
director->setDepthTest(false);
director->setProjection(Director::Projection::_2D);// MY GAME IS 2D SO USED 2D PROJECTION .. IT HELPED ME
Hope this helps.:)