Linear movement stutter

Linear movement stutter - c++

I have created simple, frame independent, variable time step, linear movement in Direct3D9 using ID3DXSprite. Most users cant notice it, but on some (including mine) computers it happens often and sometimes it stutters a lot.
Stuttering occurs with VSync enabled and disabled.
I figured out that same happens in OpenGL renderer.
Its not floating point problem.
Seems like problem only exist in AERO Transparent Glass windowed mode (fine or at least much less noticeable in fullscreen, borderless full screen window or with aero disabled), even worse when window lost focus.
EDIT:
Frame delta time doesnt leave bounds 16 .. 17 ms even when stuttering occurs.
Seems like my frame delta time measurement log code was bugged. I fixed it now.
Normally with VSync enabled frame renders 17ms, but sometimes (probably when sutttering happens) it jumps to 25-30ms.
(I dump log only once at application exit, not while running, rendering, so its does not affect performance)
device->Clear(0, 0, D3DCLEAR_TARGET, D3DCOLOR_ARGB(255, 255, 255, 255), 0, 0);
device->BeginScene();
sprite->Begin(D3DXSPRITE_ALPHABLEND);
QueryPerformanceCounter(&counter);
float time = counter.QuadPart / (float) frequency.QuadPart;
float deltaTime = time - currentTime;
currentTime = time;
position.x += velocity * deltaTime;
if (position.x > 640)
velocity = -250;
else if (position.x < 0)
velocity = 250;
position.x = (int) position.x;
sprite->Draw(texture, 0, 0, &position, D3DCOLOR_ARGB(255, 255, 255, 255));
sprite->End();
device->EndScene();
device->Present(0, 0, 0, 0);
Fixed timer thanks to Eduard Wirch and Ben Voigt (although it doesnt fix initial problem)
float time()
{
static LARGE_INTEGER start = {0};
static LARGE_INTEGER frequency;
if (start.QuadPart == 0)
{
QueryPerformanceFrequency(&frequency);
QueryPerformanceCounter(&start);
}
LARGE_INTEGER counter;
QueryPerformanceCounter(&counter);
return (float) ((counter.QuadPart - start.QuadPart) / (double) frequency.QuadPart);
}
EDIT #2:
So far I have tried three update methods:
1) Variable time step
x += velocity * deltaTime;
2) Fixed time step
x += 4;
3) Fixed time step + Interpolation
accumulator += deltaTime;
float updateTime = 0.001f;
while (accumulator > updateTime)
{
previousX = x;
x += velocity * updateTime;
accumulator -= updateTime;
}
float alpha = accumulator / updateTime;
float interpolatedX = x * alpha + previousX * (1 - alpha);
All methods work pretty much same, fixed time step looks better, but it's not quite an option to depend on frame rate and it doesn't solve problem completely (still jumps (stutters) from time to time rarely).
So far disabling AERO Transparent Glass or going full screen is only significant positive change.
I am using NVIDIA latest drivers GeForce 332.21 Driver and Windows 7 x64 Ultimate.

Part of the solution was a simple precision data type problem. Exchange the speed calculation by a constant, and you'll see a extremely smooth movement. Analysing the calculation showed that you're storing the result from QueryPerformanceCounter() inside a float. QueryPerformanceCounter() returns a number which looks like this on my computer: 724032629776. This number requires at least 5 bytes to be stored. How ever a float uses 4 bytes (and only 24 bits for actual number) to store the value. So precision is lost when you convert the result of QueryPerformanceCounter() to float. And sometimes this leads to a deltaTime of zero causing stuttering.
This explains partly why some users do not experience this problem. It all depends on if the result of QueryPerformanceCounter() does fit into a float.
The solution for this part of the problem is: use double (or as Ben Voigt suggested: store the initial performance counter, and subtract this from new values before converting to float. This would give you at least more head room, but might eventually hit the float resolution limit again, when the application runs for a long time (depends on the growth speed of the performance counter).)
After fixing this, the stuttering was much less but did not disappear completely. Analyzing the runtime behaviour showed that a frame is skipped now and then. The application GPU command buffer is flushed by Present but the present command remains in the application context queue until the next vsync (even though Present was invoked long before vsync (14ms)). Further analysis showed that a back ground process (f.lux) told the system to set the gamma ramp once in a while. This command required the complete GPU queue to run dry before it was executed. Probably to avoid side effects. This GPU flush was started just before the 'present' command was moved to the GPU queue. The system blocked the video scheduling until the GPU ran dry. This took until the next vsync. So the present packet was not moved to GPU queue until the next frame. The visible effect of this: stutter.
It's unlikely that you're running f.lux on your computer too. But you're probably experiencing a similar background intervention. You'll need to look for the source of the problem on your system yourself. I've written a blog post about how to diagnose frame skips: Diagnose frame skips and stutter in DirectX applications. You'll also find the whole story of diagnosing f.lux as the culprit there.
But even if you find the source of your frame skip, I doubt that you'll achieve stable 60fps while dwm window composition is enabled. The reason is, you're not drawing to the screen directly. But instead you draw to a shared surface of dwm. Since it's a shared resource it can be locked by others for an arbitrary amount of time making it impossible for you to keep the frame rate stable for your application. If you really need a stable frame rate, go full screen, or disable window composition (on Windows 7. Windows 8 does not allow disabling window composition):
#include <dwmapi.h>
...
HRESULT hr = DwmEnableComposition(DWM_EC_DISABLECOMPOSITION);
if (!SUCCEEDED(hr)) {
// log message or react in a different way
}

I took a look at your source code and noticed that you only process one window message every frame. For me this caused stuttering in the past.
I would recommend to loop on PeekMessage until it returns zero to indicate that the message queue is exhausted. After that render a frame.
So change:
if (PeekMessageW(&message, 0, 0, 0, PM_REMOVE))
to
while (PeekMessageW(&message, 0, 0, 0, PM_REMOVE))
Edit:
I compiled and ran you code (with another texture) and it displayed the movement smoothly for me. I don't have aero though (Windows 8).
One thing I noticed: You set D3DCREATE_SOFTWARE_VERTEXPROCESSING. Have you tried to set this to D3DCREATE_HARDWARE_VERTEXPROCESSING?

Related

How to reduce OpenGL CPU usage and/or how to use OpenGL properly

I'm working a on a Micromouse simulation application built with OpenGL, and I have a hunch that I'm not doing things properly. In particular, I'm suspicious about the way I am getting my (mostly static) graphics to refresh at a close-to-constant framerate (60 FPS). My approach is as follows:
1) Start a timer
2) Draw my shapes and text (about a thousand of them):
glBegin(GL_POLYGON);
for (Cartesian vertex : polygon.getVertices()) {
std::pair<float, float> coordinates = getOpenGlCoordinates(vertex);
glVertex2f(coordinates.first, coordinates.second);
}
glEnd();
and
glPushMatrix();
glScalef(scaleX, scaleY, 0);
glTranslatef(coordinates.first * 1.0/scaleX, coordinates.second * 1.0/scaleY, 0);
for (int i = 0; i < text.size(); i += 1) {
glutStrokeCharacter(GLUT_STROKE_MONO_ROMAN, text.at(i));
}
glPopMatrix();
3) Call
glFlush();
4) Stop the timer
5) Sleep for (1/FPS - duration) seconds
6) Call
glutPostRedisplay();
The "problem" is that the above approach really hogs my CPU - the process is using something like 96-100%. I know that there isn't anything inherently wrong with using lots of CPU, but I feel like I shouldn't be using that much all of the time.
The kicker is that most of the graphics don't change from frame to frame. It's really just a single polygon moving over (and covering up) some static shapes. Is there any way to tell OpenGL to only redraw what has changed since the previous frame (with the hope it would reduce the number of glxxx calls, which I've deemed to be the source of the "problem")? Or, better yet, is my approach to getting my graphics to refresh even correct?

First and foremost the biggest CPU hog with OpenGL is immediate mode… and you're using it (glBegin, glEnd). The problem with IM is, that every single vertex requires a whole couple of OpenGL calls being made; and because OpenGL uses a thread local state this means that each and every OpenGL call must go through some indirection. So the first step would be getting rid of that.
The next issue is with how you're timing your display. If low latency between user input and display is not your ultimate goal the standard approach would setting up the window for double buffering, enabling V-Sync, set a swap interval of 1 and do a buffer swap (glutSwapBuffers) once the frame is rendered. The exact timings what and where things will block are implementation dependent (unfortunately), but you're more or less guaranteed to exactly hit your screen refresh frequency, as long as your renderer is able to keep up (i.e. rendering a frame takes less time that a screen refresh interval).
glutPostRedisplay merely sets a flag for the main loop to call the display function if no further events are pending, so timing a frame redraw through that is not very accurate.
Last but not least you may be simply mocked by the way Windows does account CPU time (time spent in driver context, which includes blocking, waiting for V-Sync) will be accouted to the consumed CPU time, while it's in fact interruptible sleep. However you wrote, that you already do a sleep in your code, which would rule that out, because the go-to approach to get a more reasonable accounting would be adding a Sleep(1) before or after the buffer swap.

I found that by putting render thread to sleep helps reducing cpu usage from (my case) 26% to around 8%
#include <chrono>
#include <thread>
void render_loop(){
...
auto const start_time = std::chrono::steady_clock::now();
auto const wait_time = std::chrono::milliseconds{ 17 };
auto next_time = start_time + wait_time;
while(true){
...
// execute once after thread wakes up every 17ms which is theoretically 60 frames per
// second
auto then = std::chrono::high_resolution_clock::now();
std::this_thread::sleep_until(next_time);
...rendering jobs
auto elasped_time =
std::chrono::duration_cast<std::chrono::milliseconds> (std::chrono::high_resolution_clock::now() - then);
std::cout << "ms: " << elasped_time.count() << '\n';
next_time += wait_time;
}
}
I thought about attempting to measure the frame rate while the thread is asleep but there isn't any reason for my use case to attempt that. The result was averaging around 16ms so I thought it was good enough
Inspired by this post

Fixed timestep stuttering with VSync on

In a 2D OpenGL engine I implemented I have a fixed timestep as described in the famous fix your timestep article, along with blending.
I have a test object that moves vertically (y axis). There is stuttering in the movement (preprogrammed movement, not from user input). This means the object does not move smoothly across the screen.
Please see the uncompressed video I am linking: LINK
The game framerate stays at 60fps (Vsync turned on from Nvidia driver)
The game logic updates at a fixed 20 updates/ticks per second, set by me. This is normal. The object moves 50 pixels per update.
However the movement on the screen is severely stuttering.
EDIT: I noticed by stepping in the recorded video above frame by frame that the stuttering is caused by a frame being shown twice.
EDIT2: Setting the application priority to Realtime in the task manager completely eliminates the stutter! However this obviously isn't a solution.
Below is the object y movement delta at different times, with VSync turned off
First column is the elapsed time since last frame, in microseconds (ex 4403 )
Second column is movement on the y axis of an object since last frame.
Effectively, the object moves 1000 pixels per second, and the log below confirms it.
time since last frame: 4403 ypos delta since last frame: 4.403015
time since last frame: 3807 ypos delta since last frame: 3.806976
time since last frame: 3716 ypos delta since last frame: 3.716003
time since last frame: 3859 ypos delta since last frame: 3.859009
time since last frame: 4398 ypos delta since last frame: 4.398010
time since last frame: 8961 ypos delta since last frame: 8.960999
time since last frame: 7871 ypos delta since last frame: 7.871002
time since last frame: 3985 ypos delta since last frame: 3.984985
time since last frame: 3684 ypos delta since last frame: 3.684021
Now with VSync turned on
time since last frame: 17629 ypos delta since last frame: 17.628906
time since last frame: 15688 ypos delta since last frame: 15.687988
time since last frame: 16641 ypos delta since last frame: 16.641113
time since last frame: 16657 ypos delta since last frame: 16.656738
time since last frame: 16715 ypos delta since last frame: 16.715332
time since last frame: 16663 ypos delta since last frame: 16.663086
time since last frame: 16666 ypos delta since last frame: 16.665771
time since last frame: 16704 ypos delta since last frame: 16.704102
time since last frame: 16626 ypos delta since last frame: 16.625732
I would say they look ok.
This has been driving me bonkers for days, what am I missing?
Below is my Frame function which is called in a loop:
void Frame()
{
static sf::Time t;
static const double ticksPerSecond = 20;
static uint64_t stepSizeMicro = 1000000 / ticksPerSecond; // microseconds
static sf::Time accumulator = sf::seconds(0);
gElapsedTotal = gClock.getElapsedTime();
sf::Time elapsedSinceLastFrame = gElapsedTotal - gLastFrameTime;
gLastFrameTime = gElapsedTotal;
if (elapsedSinceLastFrame.asMicroseconds() > 250000 )
elapsedSinceLastFrame = sf::microseconds(250000);
accumulator += elapsedSinceLastFrame;
while (accumulator.asMicroseconds() >= stepSizeMicro)
{
Update(stepSizeMicro / 1000000.f);
gGameTime += sf::microseconds(stepSizeMicro);
accumulator -= sf::microseconds(stepSizeMicro);
}
uint64_t blendMicro = accumulator.asMicroseconds() / stepSizeMicro;
float blend = accumulator.asMicroseconds() / (float) stepSizeMicro;
if (rand() % 200 == 0) Trace("blend: %f", blend);
CWorld::GetInstance()->Draw(blend);
}
More info as requested in the comments:
stuttering occurs both while in fullscreen 1920x1080 and in window mode 1600x900
the setup is a simple SFML project. I'm not aware if it uses VBO/VAO internally when rendering textured rectangles
not doing anything else on my computer. Keep in mind this issue occurs on other computers as well, it's not just my rig
am running on primary display. The display doesn't really make a difference. The issue occurs both in fullscreen and window mode.

I have profiled my own code. The issue was there was an area of my code that occasionally had performance spikes due to cache misses. This caused my loop to take longer than 16.6666 milliseconds, the max time it should take to display smoothly at 60Hz. This was only one frame, once in a while. That frame caused the stuttering. The code logic itself was correct, this proved to be a performance issue.
For future reference in hopes that this will help other people, how I debugged this was I put an
if ( timeSinceLastFrame > 16000 ) // microseconds
{
Trace("Slow frame detected");
DisplayProfilingInformation();
}
in my frame code. When the if is triggered, it displays profiling stats for the functions in the last frame, to see which function took the longest in the previous frame. I was thus able to pinpoint the performance bug to a structure that was not suitable for its usage. A big, nasty map of maps that generated a lot of cache misses and occasionally spiked in performance.
I hope this helps future unfortunate souls.

It seems like you're not synchronizing your 60Hz frame loop with the GPU's 60Hz VSync. Yes, you have enabled Vsync in Nvidia but that only causes Nvidia to use a back-buffer which is swapped on the Vsync.
You need to set the swap interval to 1 and perform a glFinish() to wait for the Vsync.

A tricky one, but from the above it seems to me this is not a 'frame rate' problem, but rather somewhere in your 'animate' code. Another observation is the line "Update(stepSizeMicro / 1000000.f);". the divide by 1000000.f could mean you are losing resolution due to the limitations of floating point numbers bit resolution, so rounding could be your killer?

Lerping issue with timers

I have been having an issue related to timers when I am lerping objects in my game engine. The lerping is almost correct and when I am applying the lerping to an object moving or rotating it is fine except every few seconds it appears as if the object quickly flashes to it's previous position before continuing to move smoothly.
Running the engine in windowed mode gives me 1500~fps but if I run in full screen with vsync clamping to 60fps the glitch happens a lot more often.
I have been trying to find either a good resource or explanation on lerping and how I can improve what I have.
For working out the tick gap I use:
float World::GetTickGap()
{
float gap = (float) (TimeMs() - m_lastTick) / m_tickDelay;
return gap > 1.f ? 1.f : gap;
}
My update function:
m_currentTick = TimeMs();
if(m_currentTick > m_lastTick+m_tickDelay)
{
m_lastTick = m_currentTick;
//Update actors
}
Then when rendering each actor I am giving the tick gap for them to lerp between their positions.
My lerping function:
float math::Lerp(float a, float b, float t)
{
return a + t*(b-a);
}
And an example of the lerping function being called:
renderPosition.x = (math::Lerp(m_LastPosition.x, m_Position.x, tickDelay));
I'm unsure where to start on trying to fix this problem. As far as I'm aware it is the timing issues with the functions. Though could anything else cause a small dip in performance at a constant rate?
Any help with this problem would be greatly appreciated. :)

I'm not really able to reconstruct your code from what you posted
But I remember that calling your time function more than once per frame is bad idea generally.
You seem to do that. Try thinking about what effect that has.
E.g. It might mean that the "update Actors" loops are out of sync with the "tickGap" intervals and actors are updated a second time with 0 gap.

Is this possible to wait for glRender() and glSwapBuffers() finished?

I noticed that when a key is pressed, and the redraw is initiated from the keyboard even function, the previous draw sometimes is not completely finished. The result is a sloppy "animation". I am basically scrolling the contents of the window. When I measure my draw() function I can see it takes 5 ms. Which is more than enough for a smooth scrolling. But my guess is that the actual drawing is done asynchronously by OpenGL driver somewhere under the hood. So the question:
Can I get notified when the actual rendering and screen update is finished?
function draw(ev) {
var gl = GLX.renderPipeline();
gl.Viewport(0, 0, width, height);
gl.MatrixMode(GL_PROJECTION);
gl.LoadIdentity();
gl.Ortho(0, width, height, 0, -1, 1);
gl.MatrixMode(GL_MODELVIEW);
gl.LoadIdentity();
gl.ClearColor(0.3,0.3,0.8,0.0);
gl.Clear(0x00004000|0x00000100);
gl.Color4f(0, 0, 0.9, 0.5);
rect(gl, 100, 100, 400, 300)
var s = 'ATARI 65 XE FOREVER ATARI 65 XE FOREVER ATARI 65 XE FOREVER ATARI 65 XE FOREVER!'
s = s + s
s = s.split('')
for (var i = 0; i < s.length; i++) s[i] = s[i].charCodeAt(0) + charListBegin
for (var y = 0; y < 60; y++) {
gl.LoadIdentity();
gl.Translatef(charsX, charsY + y * fontSize*8, 0)
gl.Color4f(colors[y][0], colors[y][1], colors[y][2], 0.5);
gl.CallLists(s)
}
gl.Render(ctx);
GLX.SwapBuffers(ctx, win);
}

In general, your commands are placed into a queue. The command queue is flushed at distinct points, for example when you call SwapBuffers or glFlush (but also on some other occasions, e.g. when the queue is full) and the commands are worked off asynchronously. Most commands simply post a command and return immediately, unless they have some lengthy work to do that cannot be postponed, like glBufferData performing a copy of a few hundred kilobytes into a buffer object (this is something that has to happen immediately too, because OpenGL cannot know if the data is still valid at a later time). The time it takes to post commands is what you measure, but it's not what you are interested in.
If your GL version is at least 3.2, you can be "kind of notified" by calling glFenceSync, which inserts a fence object, and allows you to block until the fence has been realized using glClientWaitSync. When glClientWaitSync returns, all commands up to the fence have completed.
If you have at least version 3.3, you can measure the time your OpenGL commands take to render by inserting a query of type GL_TIME_ELAPSED. This works without blocking and is therefore by far the preferrable thing. This is the actual time it takes to draw your stuff.
SwapBuffers, like most commands, does mostly nothing. It will call glFlush, insert the equivalent of a fence, and mark the framebuffer as "locked, and ready to be swapped".
Eventually, when all draw commands have finished and when the driver or window manager can be bothered (think of vertical sync and compositing window managers!), the driver will unlock and swap buffers. This is when your stuff actually gets visible.
If you perform any other command in the mean time that would alter the locked frame buffer, the command blocks. This is what gives SwapBuffers the illusion of blocking.
You don't have much control over that (other than modifying the swap interval, if the implementation lets you) and you can't make it any faster -- but by playing with things like glFlush or glFinish you can make it slower.

Usually you queue a redraw, instead of calling draw directly. I.e. when using Qt, you'd call QWidget::update().
As far as waiting until all commands in the pipeline have been processed, you can call glFinish(), see https://www.opengl.org/sdk/docs/man4/xhtml/glFinish.xml .

How to pause an animation with OpenGL / glut

To achieve an animation, i am just redrawing things on a loop.
However, I need to be able to pause when a key is pressed. I know the way i'm doing it now its wrong because it eats all of my cycles when the loop is going on.
Which way is better, and will allow for a key pause and resume?
I tried using a bool flag but obviously it didnt change the flag until the loop was done.

You have the correct very basic architecture sorted in that the everything needs to be updated in a loop, but you need to make your loop a lot smarter for a game (or other application requiring OpenGL animations).
However, I need to be able to pause when a key is pressed.
A basic way of doing this is to have a boolean value paused and to wrap the game into a loop.
while(!finished) {
while(!paused) {
update();
render();
}
}
Typically however you still want to do things such as look at your inventory, craft things, etc. while your game is paused, and many games still run their main loop while the game's paused, they just don't let the actors know any time has passed. For instance, it sounds like your animation frames simply have a number of game-frames to be visible for. This is a bad idea because if the animation speed increases or decreases on a different computer, the animation speed will look wrong on those computers. You can consider my answer here, and the linked samples to see how you can achieve framerate-independent animation by specifying animation frames in terms of millisecond duration and passing in the frame time in the update loop. For instance, your main game then changes to look like this:
float previousTime = 0.0f;
float thisTime = 0.0f;
float framePeriod = 0.0f;
while(!finished) {
thisTime = getTimeInMilliseconds();
framePeriod = previousTime - thisTime;
update(framePeriod);
render();
previousTime = thisTime;
}
Now, everything in the game that gets updated will know how much time has passed since the previous frame. This is helpful for all your physics calculations as all of our physical formulae are in terms of time + starting factors + decay factors (for instance, the SUVAT equations). The same information can be used for your animations to make them framerate independent as I have described with some links to examples here.
To answer the next part of the question:
it eats all of my cycles when the loop is going on.
This is because you're using 100% of the CPU and never going to sleep. If we consider that we want for instance 30fps on the target device (and we know that this is possible) then we know the period of one frame is 1/30th of a second. We've just calculated the time it takes to update and render our game, so we can sleep for any of the spare time:
float previousTime = 0.0f;
float thisTime = 0.0f;
float framePeriod = 0.0f;
float availablePeriod = 1 / 30.0f;
while (!finished) {
thisTime = getTimeInMilliseconds();
framePeriod = previousTime - thisTime;
update(framePeriod);
render();
previousTime = thisTime;
if (framePeriod < availablePeriod)
sleep(availablePeriod - framePeriod);
}
This technique is called framerate governance as you are manually controlling the rate at which you are rendering and updating.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js