C++/opengl application running smoother with debugger attached

C++/opengl application running smoother with debugger attached - c++

Have you experienced a situation, where C++ opengl application is running faster and smoother when executed from visual studio? When executed normally, without debugger, I get lower framerate, 50 instead of 80, and a strange lagging, where fps is diving to about 25 frames/sec every 20-30th frame. Is there a way to fix this?
Edit:
Also we are using quite many display lists (created with glNewList). And increasing the number of display lists seem to increase lagging.
Edit:
The problem seems to be caused by page faults. Adjusting process working set with SetProcessWorkingSetSizeEx() doesn't help.
Edit:
With some large models the problem is easy to spot with procexp-utility's GPU-memory usage. Memory usage is very unstable when there are many glCallList-calls per frame. No new geometry is added, no textures loaded, but gpu-memory-allocation fluctuates +-20 Mbytes. After a while it becomes even worse, and may allocate something like 150Mb in one go.

I believe that what you are seeing is the debugger locking some pages so they couldn't be swapped to be immediately accessible to the debugger. This brings some caveats for OS at the time of process switching and is, in general, not reccommended.
You will probably not like to hear me saying this, but there is no good way to fix this, even if you do.
Use VBOs, or at least vertex arrays, those can be expected to be optimized much better in the driver (let's face it - display lists are getting obsolete). Display lists can be easily wrapped to generate vertex buffers so only a little of the old code needs to be modified. Also, you can use "bindless graphics" which was designed to avoid page faults in the driver (GL_EXT_direct_state_access).

Do you have an nVidia graphics card by any chance? nVidia OpenGL appears to use a different implementation when attached to the debugger. For me, the non-debugger version is leaking memory at up to 1 MB/sec in certain situations where I draw to the front buffer and don't call glClear each frame. The debugger version is absolutely fine.
I have no idea why it needs to allocate and (sometimes) deallocate so much memory for a scene that's not changing.
And I'm not using display lists.

It's probably the thread or process priority. Visual Studio might launch your process with a slightly higher priority to make sure the debugger is responsive. Try using SetPriorityClass() in your app's code:
SetPriorityClass(GetCurrentProcess(), ABOVE_NORMAL_PRIORITY_CLASS);
The 'above normal' class just nudges it ahead of everything else with the 'normal' class. As the documentation says, don't slap on a super high priority or you can screw up the system's scheduler.
In an app running at 60 fps you only get 16ms to draw a frame (less at 80 fps!) - if it takes longer you drop the frame which can cause a small dip in framerate. If your app has the same priority as other apps, it's relatively likely another app could temporarily steal the CPU for some task and you drop a few frames or at least miss your 16 ms window for the current frame. The idea is boosting the priority slightly means Windows comes back to your app more often so it doesn't drop as many frames.

Related

How do you know what you've displayed is completely drawn on screen?

Displaying images on a computer monitor involves the usage of a graphic API, which dispatches a series of asynchronous calls... and at some given time, put the wanted stuff on the computer screen.
But, what if you are interested in knowing the exact CPU time at the point where the required image is fully drawn (and visible to the user)?
I really need to grab a CPU timestamp when everything is displayed to relate this point in time to other measurements I take.
Without taking account of the asynchronous behavior of the graphic stack, many things can get the length of the graphic calls to jitter:
multi-threading;
Sync to V-BLANK (unfortunately required to avoid some tearing);
what else have I forgotten? :P
I target a solution on Linux, but I'm open to any other OS. I've already studied parts of the xvideo extension for X.org server and the OpenGL API but I havent found an effective solution yet.
I only hope the solution doesn't involve hacking into video drivers / hardware!
Note: I won't be able to use the recent Nvidia G-SYNC thing on the required hardware. Although, this technology would get rid of some of the unpredictable jitter, I think it wouldn't completely solve this issue.
OpenGL Wiki suggests the following: "If GPU<->CPU synchronization is desired, you should use a high-precision/multimedia timer rather than glFinish after a buffer swap."
Does somebody knows how properly grab such a high-precision/multimedia timer value just after the swapBuffer call is completed in the GPU queue?

Recent OpenGL provides sync/fence objects. You can place sync objects in the OpenGL command stream and later wait for them to get passed. See http://www.opengl.org/wiki/Sync_Object

What is the accepted timing strategy when using Vertical Synchronisation?

Coming from a basic understanding of OpenGL programming, all required drawing operations are performed in a sequence, once per frame redraw. The performance of the hardware dictates essentially how fast this happens. As I understand, a game will attempt to draw as quickly as possible so redraw operations are essentially wrapped in a while loop. The graphics operations (graphics engine) will then be optimised to ensure the frame rate is acceptable for the application.
Graphics hardware supporting Vertical Synchronisation however locks frame rates to the display rate. A first question would be how should a graphics engine interact with the hardware synchronisation? Is this even possible or does the renderer work at maximum speed and the hardware selectively calls up the latest frame, discarding all unused previous frames..?
The motivation for this question is not that I am immediately intending to write a graphics engine, instead am debugging an issue with an existing system where the graphics of a moving scene appear to stutter onscreen. Symptomatically, the stutter is slight when VSync is turned off, when it is turned on either there is a significant and periodic stutter or alternatively the stutter is resolved entirely. I am somewhat clutching at straws as to what is happening or why, want to understand some more background information on graphics systems.
Summarily the question would be on how one is expected to interact with hardware redraw events and if that is even possible. However any additional information would be welcome.

A first question would be how should a graphics engine interact with the hardware synchronisation?
To avoid flicker modern rendering systems use double buffering i.e. there are two color plane buffers and after finishing drawing to one, the display readout pointer is set to the finished buffer plane. This buffer swap can happen synchronized or non-synchronized. With V-Sync enabled the buffer swap will be synchronized and the rendering thread blocks until the buffer swap happened.
Since with double buffering mandates buffer swaps this implicitly introduces a synchronization mechanism. This is how interactive rendering systems lock onto the display refresh.
Symptomatically, the stutter is slight when VSync is turned off, when it is turned on either there is a significant and periodic stutter or alternatively the stutter is resolved entirely.
This sounds like a badly written animation loop that assumes constant framerate locked onto the display refresh rate, based on the assumption that frames render faster than a display refresh interval and the buffer swap can be issued in time for the next retrace to happen.
The only robust way to deal with vertical synchronization is to actually measure the time between frame renderings and advance the rendering loop by that amount of time.

This is a guess, but:
The Problem Isn't Vertical Synchronization
I don't know what OS you're working with, but there are various ways to get information about the monitor and how fast the screen is refreshing (for the purposes of this answer, we'll assume your monitor is somewhat recent and redraws at a rate of 60 Hz, or 60 times every second, or once every 16.66666... milliseconds).
Renderers are usually paired up with an "Logic" side to the application: input, ui calculations, simulation running, etc. etc. It seems like the logic side of your application is running fast enough, but the Rendering side - i.e., the Draw Call as its commonly summed up into - is bounding the speed of your application.
Vertical Synchronization can exacerbate this in that if your Draw Call is made to happen every 16.66666 milliseconds - but it takes much longer than 16.666666 milliseconds - then you perceive a frame rate drop (i.e. frames will "stutter" because they're taking too long to produce a single frame). VSync - and the enabling or disabling thereof - is not something that bottlenecks your code: it just says "hey, since the Hardware is only going to take 1 frame from us every 16.666666 milliseconds, why make more draw calls than just one every 16.66666 milliseconds? As long as we do one draw call once for every passing of this time, our application will look as fluid as possible, and we don't have to waste time making more calls than that!"
The problem with that is that it assumes your code is going to run fast enough to make it in those 16.6666 milliseconds. If it does not, stuttering, lagging, visual artifacts, frozen frames, and other things manifest themselves on screen.
When you turn off VSync, you're telling your Render Call to be called as often as possible, as fast as possible. This may give it some extra wiggle room alongside the Logic call to get a frame rendered, so that when the Hardware Says "I'm gonna take a picture and put it on the screen now!" it's all prettied up, just in time, to get into posture and say cheese! (though by what you say, it barely makes it).
What To Do:
Start by profiling your code. Find out what functions are taking the most time. Judging by the stutter, something in your code is taking longer than is expected and is giving you undesirable performance. Make sure to profile first to find the critical sections of where you're burning away time, and figure out how to keep it correct and make it just as fast. You may want to figure out what's being called in the Render Call and profile the time it takes to complete one cycle of that specifically. Then time the Logic call(s) and see how long it takes to execute those as well. Then, chop away.
Good luck!

How to do exactly one render per vertical sync (no repeating, no skipping)?

I'm trying to do vertical synced renders so that exactly one render is done per vertical sync, without skipping or repeating any frames. I would need this to work under Windows 7 and (in the future) Windows 8.
It would basically consist of drawing a sequence of QUADS that would fit the screen so that a pixel from the original images matches 1:1 a pixel on the screen. The rendering part is not a problem, either with OpenGL or DirectX. The problem is the correct syncing.
I previously tried using OpenGL, with the WGL_EXT_swap_control extension, by drawing and then calling
SwapBuffers(g_hDC);
glFinish();
I tried all combinations and permutation of these two instructions along with glFlush(), and it was not reliable.
I then tried with Direct3D 10, by drawing and then calling
g_pSwapChain->Present(1, 0);
pOutput->WaitForVBlank();
where g_pSwapChain is a IDXGISwapChain* and pOutput is the IDXGIOutput* associated to that SwapChain.
Both versions, OpenGL and Direct3D, result in the same: The first sequence of, say, 60 frames, doesn't last what it should (instead of about 1000ms at 60hz, is lasts something like 1030 or 1050ms), the following ones seem to work fine (about 1000.40ms), but every now and then it seems to skip a frame. I do the measuring with QueryPerformanceCounter.
On Direct3D, trying a loop of just the WaitForVBlank, the duration of 1000 iterations is consistently 1000.40 with little variation.
So the trouble here is not knowing exactly when each of the functions called return, and whether the swap is done during the vertical sync (not earlier, to avoid tearing).
Ideally (if I'm not mistaken), to achieve what I want, it would be to perform one render, wait until the sync starts, swap during the sync, then wait until the sync is done. How to do that with OpenGL or DirectX?
Edit:
A test loop of just WaitForVSync 60x takes consistently from 1000.30ms to 1000.50ms.
The same loop with Present(1,0) before WaitForVSync, with nothing else, no rendering, takes the same time, but sometimes it fails and takes 1017ms, as if having repeated a frame. There's no rendering, so there's something wrong here.

I have the same problem in DX11. I want to guarantee that my frame rendering code takes an exact multiple of the monitor's refresh rate, to avoid multi-buffering latency.
Just calling pSwapChain->present(1,0) is not sufficient. That will prevent tearing in fullscreen mode, but it does not wait for the vblank to happen. The present call is asynchronous and it returns right away if there are frame buffers remaining to be filled. So if your render code is producing a new frame very quickly (say 10ms to render everything) and the user has set the driver's "Maximum pre-rendered frames" to 4, then you will be rendering four frames ahead of what the user sees. This means 4*16.7=67ms of latency between mouse action and screen response, which is unacceptable. Note that the driver's setting wins - even if your app asked for pOutput->setMaximumFrameLatency(1), you'll get 4 frames regardless. So the only way to guarantee no mouse-lag regardless of driver setting is for your render loop to voluntarily wait until the next vertical refresh interval, so that you never use those extra frameBuffers.
IDXGIOutput::WaitForVBlank() is intended for this purpose. But it does not work! When I call the following
<render something in ~10ms>
pSwapChain->present(1,0);
pOutput->waitForVBlank();
and I measure the time it takes for the waitForVBlank() call to return, I am seeing it alternate between 6ms and 22ms, roughly.
How can that happen? How could waitForVBlank() ever take longer than 16.7ms to complete? In DX9 we solved this problem using getRasterState() to implement our own, much-more-accurate version of waitForVBlank. But that call was deprecated in DX11.
Is there any other way to guarantee that my frame is exactly aligned with the monitor's refresh rate? Is there another way to spy the current scanline like getRasterState used to do?

I previously tried using OpenGL, with the WGL_EXT_swap_control extension, by drawing and then calling
SwapBuffers(g_hDC);
glFinish();
That glFinish() or glFlush is superfluous. SwapBuffers implies a glFinish.
Could it be, that in your graphics driver settings you set "force V-Blank / V-Sync off"?

We use DX9 currently, and want to switch to DX11. We currently use GetRasterState() to manually sync to the screen. That goes away in DX11, but I've found that making a DirectDraw7 device doesn't seem to disrupt DX11. So just add this to your code and you should be able to get the scanline position.
IDirectDraw7* ddraw = nullptr;
DirectDrawCreateEx( NULL, reinterpret_cast<LPVOID*>(&ddraw), IID_IDirectDraw7, NULL );
DWORD scanline = -1;
ddraw->GetScanLine( &scanline );

On Windows 8.1 and Windows 10, you can make use of the DXGI 1.3 DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT. See MSDN. The sample here is for Windows 8 Store apps, but it should be adaptable to class Win32 windows swapchains as well.
You may find this video useful as well.

When creating a Direct3D device, set PresentationInterval parameter of the D3DPRESENT_PARAMETERS structure to D3DPRESENT_INTERVAL_DEFAULT.

If you run in kernel-mode or ring-0, you can attempt to read bit 3 from the VGA input register (03bah,03dah). The information is quite old but although it was hinted here that the bit might have changed location or may be obsoleted in later version of Windows 2000 and up, I actually doubt this. The second link has some very old source-code that attempts to expose the vblank signal for old Windows versions. It no longer runs, but in theory rebuilding it with latest Windows SDK should fix this.
The difficult part is building and registering a device driver that exposes this information reliably and then fetching it from your application.

Constantly lag in opengl application

I'm getting some repeating lags in my opengl application.
I'm using the win32 api to create the window and I'm also creating a 2.2 context.
So the main loop of the program is very simple:
Clearing the color buffer
Drawing a triangle
Swapping the buffers.
The triangle is rotating, that's the way I can see the lag.
Also my frame time isn't smooth which may be the problem.
But I'm very very sure the delta time calculation is correct because I've tried plenty ways.
Do you think it could be a graphic driver problem?
Because a friend of mine run almost the exactly same program except I do less calculations + I'm using the standard opengl shader.
Also, His program use more CPU power than mine and the CPU % is smoother than mine.
I should also add:
On my laptop I get same lag every ~1 second, so I can see some kind of pattern.

There are many reasons for a jittery frame rate. Off the top of my head:
Not calling glFlush() at the end of each frame
other running software interfering
doing things in your code that certain graphics drivers don't like
bugs in graphics drivers
Using the standard windows time functions with their terrible resolution
Try these:
kill as many running programs as you can get away with. Use the process tab in the task manager (CTRL-SHIFT-ESC) for this.
bit by bit, reduce the amount of work your program is doing and see how that affects the frame rate and the smoothness of the display.
if you can, try enabling/disabling vertical sync (you may be able to do this in your graphic card's settings) to see if that helps
add some debug code to output the time taken to draw each frame, and see if there are anomalies in the numbers, e.g. every 20th frame taking an extra 20ms, or random frames taking 100ms.

What can cause a reduction in frame rate when upgrading a graphics card?

We have a two-screen DirectX application that previously ran at a consistent 60 FPS (the monitors' sync rate) using a NVIDIA 8400GS (256MB). However, when we swapped out the card for one with 512 MB of RAM the frame rate struggles to get above 40 FPS. (It only gets this high because we're using triple-buffering.) The two cards are from the same manufacturer (PNY). All other things are equal, this is a Windows XP Embedded application and we started from a fresh image for each card. The driver version number is 169.21.
The application is all 2D. I.E. just a bunch of textured quads and a whole lot of pre-rendered graphics (hence the need to upgrade the card's memory). We also have compressed animations which the CPU decodes on the fly - this involves a texture lock. The locks take forever but I've also tried having a separate system memory texture for the CPU to update and then updating the rendered texture using the device's UpdateTexture method. No overall difference in performance.
Although I've read through every FAQ I can find on the internet about DirectX performance, this is still the first time I've worked on a DirectX project so any arcane bits of knowledge you have would be useful. :)
One other thing whilst I'm on the subject; when calling Present on the swap chains it seems DirectX waits for the present to complete regardless of the fact that I'm using D3DPRESENT_DONOTWAIT in both present parameters (PresentationInterval) and the flags of the call itself. Because this is a two-screen application this is a problem as the two monitors do not appear to be genlocked, I'm working around it by running the Present calls through a threadpool. What could the underlying cause of this be?

Are the cards exactly the same (both GeForce 8400GS), and only the memory size differ? Quite often with different memory sizes come slightly different clock rates (i.e. your card with more memory might use slower memory!).
So the first thing to check would be GPU core & memory clock rates, using something like GPU-Z.

It's an easy test to see if the surface lock is the problem, just comment out the texture update and see if the framerate returns to 60hz. Unfortunately, writing to a locked surface and updating the resource kills perfomance, always has. Are you using mipmaps with the textures? I know DX9 added automatic generation of mipmaps, could be taking up a lot of time to generate those. If your constantly locking the same resource each frame, you could also try creating a pool of textures, kinda like triple-buffering except with textures. You would let the render use one texture, and on the next update you pick the next available texture in the pool that's not being used in to render. Unless of course your memory constrained or your only making diffs to the animated texture.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js