Constantly lag in opengl application - c++

I'm getting some repeating lags in my opengl application.
I'm using the win32 api to create the window and I'm also creating a 2.2 context.
So the main loop of the program is very simple:
Clearing the color buffer
Drawing a triangle
Swapping the buffers.
The triangle is rotating, that's the way I can see the lag.
Also my frame time isn't smooth which may be the problem.
But I'm very very sure the delta time calculation is correct because I've tried plenty ways.
Do you think it could be a graphic driver problem?
Because a friend of mine run almost the exactly same program except I do less calculations + I'm using the standard opengl shader.
Also, His program use more CPU power than mine and the CPU % is smoother than mine.
I should also add:
On my laptop I get same lag every ~1 second, so I can see some kind of pattern.

There are many reasons for a jittery frame rate. Off the top of my head:
Not calling glFlush() at the end of each frame
other running software interfering
doing things in your code that certain graphics drivers don't like
bugs in graphics drivers
Using the standard windows time functions with their terrible resolution
Try these:
kill as many running programs as you can get away with. Use the process tab in the task manager (CTRL-SHIFT-ESC) for this.
bit by bit, reduce the amount of work your program is doing and see how that affects the frame rate and the smoothness of the display.
if you can, try enabling/disabling vertical sync (you may be able to do this in your graphic card's settings) to see if that helps
add some debug code to output the time taken to draw each frame, and see if there are anomalies in the numbers, e.g. every 20th frame taking an extra 20ms, or random frames taking 100ms.

Related

OSX pushing pixels to screen with minimum latency

I'm trying to develop some very low-latency graphics applications and am getting really frustrated by how long it takes to draw to screen through OpenGL. Every discussion I find about it online addresses optimizing the OpenGL pipeline, but doesn't get anywhere near the results that I need.
Check this out:
https://www.dropbox.com/s/dbz4bq67cxluhs7/MouseLatency.MOV?dl=0
You probably noticed this before: With a c++ OpenGL app, dragging the mouse around the screen, and drawing the mouse location in OpenGL, the OpenGL lags behind by 3 or 4 frames. Clearly OSX CAN draw [the cursor] to screen with very low latency, but OpenGL is much slower. So let's say I don't need to do any fancy OpenGL rendering. I just want to push pixels to screen somehow. Is there a way for me to bypass OpenGL completely and draw to screen faster? Or is this kind of functionality going to be locked inside the kernel somewhere that I can't reach it?
datenwolf's answer is excellent. I just wanted to add one thing to this discussion regarding triple buffering at the compositor level, since I am very familiar with the Microsoft Windows desktop compositor.
I know you are asking about OS X here, but the implementation details I am going to discuss are the most sensible way of implementing this stuff and I would expect to see other systems work this way too.
Triple buffering as you might enable at the application level adds a third buffer to the swap-chain that is synchronized to refresh. That way of doing triple buffering does add latency, because that third buffer has to be displayed and nothing is allowed to touch it until this happens (this is D3D's mandated behavior -- the behavior and feature itself are undefined in OpenGL); but the way the Desktop Window Manager (Windows) works is slightly different.
The behavior I have seen most drivers implement for desktop composition is frame dropping. Any situation where multiple frames are finished between refreshes, all but 1 of those frames are discarded. You actually get lower latency using a window rather than fullscreen + triple buffering, because it does not block buffer swaps when the third buffer (owned by the compositor) has a finished frame waiting to be displayed.
It creates a whole different set of visual issues if framerate is not reasonably consistent. Technically, pixels belonging to dropped frames have infinite latency, so the benefits from latency reduction done this way might be worthless if you needed every single frame drawn to appear on screen.
I believe you can get this behavior on OS X (if you want it) by disabling VSYNC and drawing in a window. VSYNC basically only serves as a form of frame pacing (trade latency for consistency) in this scenario and tearing is eliminated by the compositor itself regardless what rate you draw at.
Regarding mouse cursor latency:
The cursor in any modern window system will always track with minimum latency. There is literally a feature on graphics hardware called a "hardware cursor," where the driver stores the cursor position and then once per-refresh, has the hardware overlay the cursor on top of whatever is sitting in the framebuffer waiting to be scanned-out. So even if your application is drawing at 30 FPS on a 60 Hz display, the cursor is updated every 16 ms when the hardware cursor's used.
This bypasses all graphics APIs altogether, but is quite limited (e.g. it uses the OS-defined cursor).
TL;DR: Latency comes in many forms.
If your problem is input latency, then you can mitigate that by reducing the number of pre-rendered frames and avoiding triple buffering. I could not begin to tell you how to reduce the number of driver pre-rendered frames on OS X.
Minimize length of time before something shows up on screen
If your problem is the amount of time that passes between executions of your render loop, you would go the other way. Increase pre-rendered frames, draw in a window and disable VSYNC. You may run into a lot of frames that are drawn but never displayed in this scenario.
Minimize time spent blocking (increase FPS); some frames will never be displayed
Pre-rendered frames are a powerful little feature that you do not get control over at the OpenGL API level. It sets up how deeply the driver is allowed to pipeline everything and depending on the desired task you will trade different types of latency by fiddling with it. Many gamers swear by setting this value to 1 to minimize input latency at the cost of overall framerate "smoothness."
UPDATE:
Pre-rendered frames are one reason for your multi-frame delay. Fixing this in a cross-platform way is difficult (it's a driver setting), but if you have access to Fence Sync Objects you can produce the same behavior as forcing this to 1.
I can explain this in more detail if need be, the general idea is that you insert a fence sync after the buffer swap and then wait for it to be signaled before the first command in the next frame is allowed to begin. Performance may take a nose dive, but latency will be minimized since the CPU won't be rendering ahead of the GPU anymore.
There are a number of latencies at play here.
Input event → drawing state latency
In your typical interactive application you have a event loop that usually goes
collect user input
process user input
determine what's to be drawn
draw to the back buffer
swap back to front buffer
With the usual ways in which event–update–display loops are written there's almost no delay between step 5 of the previous and step 1 of the following iteration. which means that steps 2, 3, and 4 operate with data that lags about one frame period behind.
So this is the first source of latency.
Tripple buffering / composition latency
Many graphics pipelines enable triple buffering for smoother display update. Instead of keeping only a back and a front buffer around, there's also a third buffer inbetween. The average rate at which to these buffers is drawn is the display refresh period. The buffers themself are stepped at exactly the display refresh period. So this adds another frame period of latency.
If you're running on a system with a window compositor (which is the default by MacOS X) this adds effectively another buffer stage, so if you've got a double buffer mode it gives you triple buffer and if you had a triple buffer it'd give you a "quad" buffer (quotes here, because quad buffer is a term usually used with stereoscopic rendering).
What can you do about this:
Turn off composition
Windows through the DWM API and MacOS X allow to turn off composition or bypass the compositor.
Reducing input lag
Try to collect and integrate the user input as late as possible (use high resolution sleeps). If you've got only a very simple scene you can push the drawing quite close to the V-Sync deadline; in fact the NVidia OpenGL implementation has a vendor specific extension that allows to sleep until a specific amount of time before the next V-Sync.
If your scene is complex but is separable in parts that require low latency user input and stuff where it doesn't matter so much you can draw the higher latency stuff earlier and only at the very last moment integrate user input into it. Of course if the mouse is used to control the viewing direction, or even worse you're rendering for a VR head mounted display things are going to become difficult.

What is the accepted timing strategy when using Vertical Synchronisation?

Coming from a basic understanding of OpenGL programming, all required drawing operations are performed in a sequence, once per frame redraw. The performance of the hardware dictates essentially how fast this happens. As I understand, a game will attempt to draw as quickly as possible so redraw operations are essentially wrapped in a while loop. The graphics operations (graphics engine) will then be optimised to ensure the frame rate is acceptable for the application.
Graphics hardware supporting Vertical Synchronisation however locks frame rates to the display rate. A first question would be how should a graphics engine interact with the hardware synchronisation? Is this even possible or does the renderer work at maximum speed and the hardware selectively calls up the latest frame, discarding all unused previous frames..?
The motivation for this question is not that I am immediately intending to write a graphics engine, instead am debugging an issue with an existing system where the graphics of a moving scene appear to stutter onscreen. Symptomatically, the stutter is slight when VSync is turned off, when it is turned on either there is a significant and periodic stutter or alternatively the stutter is resolved entirely. I am somewhat clutching at straws as to what is happening or why, want to understand some more background information on graphics systems.
Summarily the question would be on how one is expected to interact with hardware redraw events and if that is even possible. However any additional information would be welcome.
A first question would be how should a graphics engine interact with the hardware synchronisation?
To avoid flicker modern rendering systems use double buffering i.e. there are two color plane buffers and after finishing drawing to one, the display readout pointer is set to the finished buffer plane. This buffer swap can happen synchronized or non-synchronized. With V-Sync enabled the buffer swap will be synchronized and the rendering thread blocks until the buffer swap happened.
Since with double buffering mandates buffer swaps this implicitly introduces a synchronization mechanism. This is how interactive rendering systems lock onto the display refresh.
Symptomatically, the stutter is slight when VSync is turned off, when it is turned on either there is a significant and periodic stutter or alternatively the stutter is resolved entirely.
This sounds like a badly written animation loop that assumes constant framerate locked onto the display refresh rate, based on the assumption that frames render faster than a display refresh interval and the buffer swap can be issued in time for the next retrace to happen.
The only robust way to deal with vertical synchronization is to actually measure the time between frame renderings and advance the rendering loop by that amount of time.
This is a guess, but:
The Problem Isn't Vertical Synchronization
I don't know what OS you're working with, but there are various ways to get information about the monitor and how fast the screen is refreshing (for the purposes of this answer, we'll assume your monitor is somewhat recent and redraws at a rate of 60 Hz, or 60 times every second, or once every 16.66666... milliseconds).
Renderers are usually paired up with an "Logic" side to the application: input, ui calculations, simulation running, etc. etc. It seems like the logic side of your application is running fast enough, but the Rendering side - i.e., the Draw Call as its commonly summed up into - is bounding the speed of your application.
Vertical Synchronization can exacerbate this in that if your Draw Call is made to happen every 16.66666 milliseconds - but it takes much longer than 16.666666 milliseconds - then you perceive a frame rate drop (i.e. frames will "stutter" because they're taking too long to produce a single frame). VSync - and the enabling or disabling thereof - is not something that bottlenecks your code: it just says "hey, since the Hardware is only going to take 1 frame from us every 16.666666 milliseconds, why make more draw calls than just one every 16.66666 milliseconds? As long as we do one draw call once for every passing of this time, our application will look as fluid as possible, and we don't have to waste time making more calls than that!"
The problem with that is that it assumes your code is going to run fast enough to make it in those 16.6666 milliseconds. If it does not, stuttering, lagging, visual artifacts, frozen frames, and other things manifest themselves on screen.
When you turn off VSync, you're telling your Render Call to be called as often as possible, as fast as possible. This may give it some extra wiggle room alongside the Logic call to get a frame rendered, so that when the Hardware Says "I'm gonna take a picture and put it on the screen now!" it's all prettied up, just in time, to get into posture and say cheese! (though by what you say, it barely makes it).
What To Do:
Start by profiling your code. Find out what functions are taking the most time. Judging by the stutter, something in your code is taking longer than is expected and is giving you undesirable performance. Make sure to profile first to find the critical sections of where you're burning away time, and figure out how to keep it correct and make it just as fast. You may want to figure out what's being called in the Render Call and profile the time it takes to complete one cycle of that specifically. Then time the Logic call(s) and see how long it takes to execute those as well. Then, chop away.
Good luck!

C++/opengl application running smoother with debugger attached

Have you experienced a situation, where C++ opengl application is running faster and smoother when executed from visual studio? When executed normally, without debugger, I get lower framerate, 50 instead of 80, and a strange lagging, where fps is diving to about 25 frames/sec every 20-30th frame. Is there a way to fix this?
Edit:
Also we are using quite many display lists (created with glNewList). And increasing the number of display lists seem to increase lagging.
Edit:
The problem seems to be caused by page faults. Adjusting process working set with SetProcessWorkingSetSizeEx() doesn't help.
Edit:
With some large models the problem is easy to spot with procexp-utility's GPU-memory usage. Memory usage is very unstable when there are many glCallList-calls per frame. No new geometry is added, no textures loaded, but gpu-memory-allocation fluctuates +-20 Mbytes. After a while it becomes even worse, and may allocate something like 150Mb in one go.
I believe that what you are seeing is the debugger locking some pages so they couldn't be swapped to be immediately accessible to the debugger. This brings some caveats for OS at the time of process switching and is, in general, not reccommended.
You will probably not like to hear me saying this, but there is no good way to fix this, even if you do.
Use VBOs, or at least vertex arrays, those can be expected to be optimized much better in the driver (let's face it - display lists are getting obsolete). Display lists can be easily wrapped to generate vertex buffers so only a little of the old code needs to be modified. Also, you can use "bindless graphics" which was designed to avoid page faults in the driver (GL_EXT_direct_state_access).
Do you have an nVidia graphics card by any chance? nVidia OpenGL appears to use a different implementation when attached to the debugger. For me, the non-debugger version is leaking memory at up to 1 MB/sec in certain situations where I draw to the front buffer and don't call glClear each frame. The debugger version is absolutely fine.
I have no idea why it needs to allocate and (sometimes) deallocate so much memory for a scene that's not changing.
And I'm not using display lists.
It's probably the thread or process priority. Visual Studio might launch your process with a slightly higher priority to make sure the debugger is responsive. Try using SetPriorityClass() in your app's code:
SetPriorityClass(GetCurrentProcess(), ABOVE_NORMAL_PRIORITY_CLASS);
The 'above normal' class just nudges it ahead of everything else with the 'normal' class. As the documentation says, don't slap on a super high priority or you can screw up the system's scheduler.
In an app running at 60 fps you only get 16ms to draw a frame (less at 80 fps!) - if it takes longer you drop the frame which can cause a small dip in framerate. If your app has the same priority as other apps, it's relatively likely another app could temporarily steal the CPU for some task and you drop a few frames or at least miss your 16 ms window for the current frame. The idea is boosting the priority slightly means Windows comes back to your app more often so it doesn't drop as many frames.

setting max frames per second in openGL

Is there any way to calculate how much updates should be made to reach desired frame rate, NOT system specific? I found that for windows, but I would like to know if something like this exists in openGL itself. It should be some sort of timer.
Or how else can I prevent FPS to drop or raise dramatically? For this time I'm testing it on drawing big number of vertices in line, and using fraps I can see frame rate to go from 400 to 200 fps with evident slowing down of drawing it.
You have two different ways to solve this problem:
Suppose that you have a variable called maximum_fps, which contains for the maximum number of frames you want to display.
Then You measure the amount of time spent on the last frame (a timer will do)
Now suppose that you said that you wanted a maximum of 60FPS on your application. Then you want that the time measured be no lower than 1/60. If the time measured s lower, then you call sleep() to reach the amount of time left for a frame.
Or you can have a variable called tick, that contains the current "game time" of the application. With the same timer, you will incremented it at each main loop of your application. Then, on your drawing routines you calculate the positions based on the tick var, since it contains the current time of the application.
The big advantage of option 2 is that your application will be much easier to debug, since you can play around with the tick variable, go forward and back in time whenever you want. This is a big plus.
Rule #1. Do not make update() or loop() kind of functions rely on how often it gets called.
You can't really get your desired FPS. You could try to boost it by skipping some expensive operations or slow it down by calling sleep() kind of functions. However, even with those techniques, FPS will be almost always different from the exact FPS you want.
The common way to deal with this problem is using elapsed time from previous update. For example,
// Bad
void enemy::update()
{
position.x += 10; // this enemy moving speed is totally up to FPS and you can't control it.
}
// Good
void enemy::update(elapsedTime)
{
position.x += speedX * elapsedTime; // Now, you can control its speedX and it doesn't matter how often it gets called.
}
Is there any way to calculate how much updates should be made to reach desired frame rate, NOT system specific?
No.
There is no way to precisely calculate how many updates should be called to reach desired framerate.
However, you can measure how much time has passed since last frame, calculate current framerate according to it, compare it with desired framerate, then introduce a bit of Sleeping to reduce current framerate to the desired value. Not a precise solution, but it will work.
I found that for windows, but I would like to know if something like this exists in openGL itself. It should be some sort of timer.
OpenGL is concerned only about rendering stuff, and has nothing to do with timers. Also, using windows timers isn't a good idea. Use QueryPerformanceCounter, GetTickCount or SDL_GetTicks to measure how much time has passed, and sleep to reach desired framerate.
Or how else can I prevent FPS to drop or raise dramatically?
You prevent FPS from raising by sleeping.
As for preventing FPS from dropping...
It is insanely broad topic. Let's see. It goes something like this: use Vertex buffer objects or display lists, profile application, do not use insanely big textures, do not use too much alpha-blending, avoid "RAW" OpenGL (glVertex3f), do not render invisible objects (even if no polygons are being drawn, processing them takes time), consider learning about BSPs or OCTrees for rendering complex scenes, in parametric surfaces and curves, do not needlessly use too many primitives (if you'll render a circle using one million polygons, nobody will notice the difference), disable vsync. In short - reduce to absolute possible minimum number of rendering calls, number of rendered polygons, number of rendered pixels, number of texels read, read every available performance documentation from NVidia, and you should get a performance boost.
You're asking the wrong question. Your monitor will only ever display at 60 fps (50 fps in Europe, or possibly 75 fps if you're a pro-gamer).
Instead you should be seeking to lock your fps at 60 or 30. There are OpenGL extensions that allow you to do that. However the extensions are not cross platform (luckily they are not video card specific or it'd get really scary).
windows: wglSwapIntervalEXT
x11 (linux): glXSwapIntervalSGI
max os x: ?
These extensions are closely tied to your monitor's v-sync. Once enabled calls to swap the OpenGL back-buffer will block until the monitor is ready for it. This is like putting a sleep in your code to enforce 60 fps (or 30, or 15, or some other number if you're not using a monitor which displays at 60 Hz). The difference it the "sleep" is always perfectly timed instead of an educated guess based on how long the last frame took.
You absolutely do wan't to throttle your frame-rate it all depends on what you got
going on in that rendering loop and what your application does. Especially with it's
Physics/Network related. Or if your doing any type of graphics processing with an out side toolkit (Cairo, QPainter, Skia, AGG, ...) unless you want out of sync results or 100% cpu usage.
This code may do the job, roughly.
static int redisplay_interval;
void timer(int) {
glutPostRedisplay();
glutTimerFunc(redisplay_interval, timer, 0);
}
void setFPS(int fps)
{
redisplay_interval = 1000 / fps;
glutTimerFunc(redisplay_interval, timer, 0);
}
Here is a similar question, with my answer and worked example
I also like deft_code's answer, and will be looking into adding what he suggests to my solution.
The crucial part of my answer is:
If you're thinking about slowing down AND speeding up frames, you have to think carefully about whether you mean rendering or animation frames in each case. In this example, render throttling for simple animations is combined with animation acceleration, for any cases when frames might be dropped in a potentially slow animation.
The example is for animation code that renders at the same speed regardless of whether benchmarking mode, or fixed FPS mode, is active. An animation triggered before the change even keeps a constant speed after the change.

What can cause a reduction in frame rate when upgrading a graphics card?

We have a two-screen DirectX application that previously ran at a consistent 60 FPS (the monitors' sync rate) using a NVIDIA 8400GS (256MB). However, when we swapped out the card for one with 512 MB of RAM the frame rate struggles to get above 40 FPS. (It only gets this high because we're using triple-buffering.) The two cards are from the same manufacturer (PNY). All other things are equal, this is a Windows XP Embedded application and we started from a fresh image for each card. The driver version number is 169.21.
The application is all 2D. I.E. just a bunch of textured quads and a whole lot of pre-rendered graphics (hence the need to upgrade the card's memory). We also have compressed animations which the CPU decodes on the fly - this involves a texture lock. The locks take forever but I've also tried having a separate system memory texture for the CPU to update and then updating the rendered texture using the device's UpdateTexture method. No overall difference in performance.
Although I've read through every FAQ I can find on the internet about DirectX performance, this is still the first time I've worked on a DirectX project so any arcane bits of knowledge you have would be useful. :)
One other thing whilst I'm on the subject; when calling Present on the swap chains it seems DirectX waits for the present to complete regardless of the fact that I'm using D3DPRESENT_DONOTWAIT in both present parameters (PresentationInterval) and the flags of the call itself. Because this is a two-screen application this is a problem as the two monitors do not appear to be genlocked, I'm working around it by running the Present calls through a threadpool. What could the underlying cause of this be?
Are the cards exactly the same (both GeForce 8400GS), and only the memory size differ? Quite often with different memory sizes come slightly different clock rates (i.e. your card with more memory might use slower memory!).
So the first thing to check would be GPU core & memory clock rates, using something like GPU-Z.
It's an easy test to see if the surface lock is the problem, just comment out the texture update and see if the framerate returns to 60hz. Unfortunately, writing to a locked surface and updating the resource kills perfomance, always has. Are you using mipmaps with the textures? I know DX9 added automatic generation of mipmaps, could be taking up a lot of time to generate those. If your constantly locking the same resource each frame, you could also try creating a pool of textures, kinda like triple-buffering except with textures. You would let the render use one texture, and on the next update you pick the next available texture in the pool that's not being used in to render. Unless of course your memory constrained or your only making diffs to the animated texture.