How do you know what you've displayed is completely drawn on screen? - opengl

Displaying images on a computer monitor involves the usage of a graphic API, which dispatches a series of asynchronous calls... and at some given time, put the wanted stuff on the computer screen.
But, what if you are interested in knowing the exact CPU time at the point where the required image is fully drawn (and visible to the user)?
I really need to grab a CPU timestamp when everything is displayed to relate this point in time to other measurements I take.
Without taking account of the asynchronous behavior of the graphic stack, many things can get the length of the graphic calls to jitter:
multi-threading;
Sync to V-BLANK (unfortunately required to avoid some tearing);
what else have I forgotten? :P
I target a solution on Linux, but I'm open to any other OS. I've already studied parts of the xvideo extension for X.org server and the OpenGL API but I havent found an effective solution yet.
I only hope the solution doesn't involve hacking into video drivers / hardware!
Note: I won't be able to use the recent Nvidia G-SYNC thing on the required hardware. Although, this technology would get rid of some of the unpredictable jitter, I think it wouldn't completely solve this issue.
OpenGL Wiki suggests the following: "If GPU<->CPU synchronization is desired, you should use a high-precision/multimedia timer rather than glFinish after a buffer swap."
Does somebody knows how properly grab such a high-precision/multimedia timer value just after the swapBuffer call is completed in the GPU queue?

Recent OpenGL provides sync/fence objects. You can place sync objects in the OpenGL command stream and later wait for them to get passed. See http://www.opengl.org/wiki/Sync_Object

Related

Making GL textures uploading async and figuring out when it's done uploading

I have run into an issue where my application requires loading images dynamically at runtime. This ends up being a problem because it's not possible to load them all up front since I can't know which ones will be used... otherwise I have to upload everything. The problem is that some people do not have good PCs and have been complaining that loading all the images to the GPU takes a long time for them due to bad hardware.
My work around for the latter group was to upload textures just as is needed, and this worked for the most part. The problem is that during the application, there are times where delays occur when a series of images need to be uploaded and there's a delay due to the uploading.
I was researching how to get around this, and I have an idea: Users want a smooth experience and are okay if the textures are not immediately loaded but instead are absent. This makes it easy, as I can upload in the background and then just draw nothing for where the object should be, and then bring it into existence after it is done. This is acceptable because the uploads are usually pretty fast anyways, but they're slow enough that it causes it to dip under 60 fps for some people which causes some stutter. On average it causes anywhere from 1-3 frames of stutter so the uploads do resolve fast and on average less than 50ms.
My solution was to attempt something using a PBO to get some async-like uploading. Problem is I cannot find online how I can tell when the uploading is done. Is there a way to do this?
I figure there are four options:
There's a way to do what I want with OpenGL 3.1 onwards and that will be that
It is not possible to do (1), but I could place a fence in and then check if the fence is done, however I've never had to do this before so I'm not sure if it would work in this case
It's not possible, but I could make the assumption that everything would be uploaded in < 50ms and use some kind of timestamp to tell if it's drawable or not and just hope that it is the case (and if the time is < 50ms since issuing an upload, then draw nothing)
It's not possible do to this for texture uploading and I'm stuck
This leads me to my question: Can I tell when an asynchronous upload of pixels to a texture is done?
Fence sync objects tell when all previously issued commands have completed their execution. This includes asynchronous pixel transfer operations. So you can issue a fence after your transfers and use the sync object tools to check to see when it is done.
The annoying issue you'll have here is that it's very coarse-grained. Testing the fence also includes testing whether any non-transfer commands have also completed, despite the fact that the two operations are probably being handled by independent hardware. So if the transfer completes before the frame rendered before starting the transfer has finished rendering, the fence still won't be set. However, if you fire off a lot of texture uploading all at once, then the transfer operations will dominate the results.

Can I log graphic data from video game before it goes to gpu?

I am trying to understand how graphics are processed. Is it possible for a program to get graphic data before it is sent to the gpu, log it, send it to the gpu, get the processed results back from the gpu and send it to the video game requesting the process? I am not sure if I explained any of this clearly, but I am relatively new to this top, so I am sorry for that.
It is not clear what you mean by "graphic data" in this case, so I'll assume you're referring to the final rendered frame.
In that case, the answer is "no."
The frame buffer is generated (rendered) on the GPU. The only thing on the CPU, in any modern game, is a list of drawing commands that are sent to the video driver and translated into other commands that the GPU understands. There is no image on the CPU side unless you render to a texture and copy it back to the host.
Regarding the second part of your question:
The "game requesting the process" doesn't really get much in the way of feedback from the GPU. The display - monitor, HMD, or whatever - shows the contents of a buffer that resides on the GPU. The game never receives the rendered frame, unless it really wants it and goes through a lot of trouble to copy it back to the host. Rendering is very much an open-loop process. Aside from some error & status codes, the host application really can't verify what's happening on the GPU (again, without rendering to an off-screen buffer and copying that data back to the CPU).
I'd search for "OpenGL render pipeline" to get a better idea of how this works. (DirectX works the same way, btw.)
Now, if what you're looking for is a better understanding of how the rendering API calls in the game contribute to the final output on the screen, you could look at some tools like RenderDoc, which will let you see the effect on a frame buffer (and other render targets) of each API call. It's not trivial to parse what you'll see in these tools unless you have some familiarity with the API in use.

catch "NVIDIA OpenGL driver lost connection" error

I am developing an application in C++ (MVS2008) and I have problem like described in this thread:
NVIDIA OpenGL driver lost connection
what i want to ask is not for a solution, or why is this happenning (like in the posted thread), i want to ask if i can "catch" this error and do something before the application crushes, like an output log with some relevant information of the application state.
The error occurs every now and then while running the application without clear causes, therefore i would like such a thing.
Modern Windows versions put a number of hard constraints on the responsiveness of applications. If a process spends too much time in a graphics operation driver call or takes too long to fetch events from Windows a watchdog triggers and Windows assumes that the process got stuck in an infinite loop or violates the responsivenes demands and may "do" something about it.
Try what happens if you break down a single glDraw… call into a number of smaller batches. In general you want to minimize the number of glDraw… calls, but if a single glDraw… call takes more than 10ms or so to complete, you're far beyond OpenGL overhead territory anyway.
Note that due to the asynchronous nature of OpenGL that watchdog may bark in a OpenGL finishing call glFinish or SwapBuffers. In that case it may help to help to add glFlush commands between the draw batches. If that doesn't help try glFinish (which will have a performance impact). If that doesn't help, too. Create an auxiliary OpenGL context in a separate thread that renders to a texture using a FBO and have the main thread display only the contents of this intermediary texture. And if that doesn't help, create the auxiliary context on a PBuffer instead on the same window as your main context.

How to do exactly one render per vertical sync (no repeating, no skipping)?

I'm trying to do vertical synced renders so that exactly one render is done per vertical sync, without skipping or repeating any frames. I would need this to work under Windows 7 and (in the future) Windows 8.
It would basically consist of drawing a sequence of QUADS that would fit the screen so that a pixel from the original images matches 1:1 a pixel on the screen. The rendering part is not a problem, either with OpenGL or DirectX. The problem is the correct syncing.
I previously tried using OpenGL, with the WGL_EXT_swap_control extension, by drawing and then calling
SwapBuffers(g_hDC);
glFinish();
I tried all combinations and permutation of these two instructions along with glFlush(), and it was not reliable.
I then tried with Direct3D 10, by drawing and then calling
g_pSwapChain->Present(1, 0);
pOutput->WaitForVBlank();
where g_pSwapChain is a IDXGISwapChain* and pOutput is the IDXGIOutput* associated to that SwapChain.
Both versions, OpenGL and Direct3D, result in the same: The first sequence of, say, 60 frames, doesn't last what it should (instead of about 1000ms at 60hz, is lasts something like 1030 or 1050ms), the following ones seem to work fine (about 1000.40ms), but every now and then it seems to skip a frame. I do the measuring with QueryPerformanceCounter.
On Direct3D, trying a loop of just the WaitForVBlank, the duration of 1000 iterations is consistently 1000.40 with little variation.
So the trouble here is not knowing exactly when each of the functions called return, and whether the swap is done during the vertical sync (not earlier, to avoid tearing).
Ideally (if I'm not mistaken), to achieve what I want, it would be to perform one render, wait until the sync starts, swap during the sync, then wait until the sync is done. How to do that with OpenGL or DirectX?
Edit:
A test loop of just WaitForVSync 60x takes consistently from 1000.30ms to 1000.50ms.
The same loop with Present(1,0) before WaitForVSync, with nothing else, no rendering, takes the same time, but sometimes it fails and takes 1017ms, as if having repeated a frame. There's no rendering, so there's something wrong here.
I have the same problem in DX11. I want to guarantee that my frame rendering code takes an exact multiple of the monitor's refresh rate, to avoid multi-buffering latency.
Just calling pSwapChain->present(1,0) is not sufficient. That will prevent tearing in fullscreen mode, but it does not wait for the vblank to happen. The present call is asynchronous and it returns right away if there are frame buffers remaining to be filled. So if your render code is producing a new frame very quickly (say 10ms to render everything) and the user has set the driver's "Maximum pre-rendered frames" to 4, then you will be rendering four frames ahead of what the user sees. This means 4*16.7=67ms of latency between mouse action and screen response, which is unacceptable. Note that the driver's setting wins - even if your app asked for pOutput->setMaximumFrameLatency(1), you'll get 4 frames regardless. So the only way to guarantee no mouse-lag regardless of driver setting is for your render loop to voluntarily wait until the next vertical refresh interval, so that you never use those extra frameBuffers.
IDXGIOutput::WaitForVBlank() is intended for this purpose. But it does not work! When I call the following
<render something in ~10ms>
pSwapChain->present(1,0);
pOutput->waitForVBlank();
and I measure the time it takes for the waitForVBlank() call to return, I am seeing it alternate between 6ms and 22ms, roughly.
How can that happen? How could waitForVBlank() ever take longer than 16.7ms to complete? In DX9 we solved this problem using getRasterState() to implement our own, much-more-accurate version of waitForVBlank. But that call was deprecated in DX11.
Is there any other way to guarantee that my frame is exactly aligned with the monitor's refresh rate? Is there another way to spy the current scanline like getRasterState used to do?
I previously tried using OpenGL, with the WGL_EXT_swap_control extension, by drawing and then calling
SwapBuffers(g_hDC);
glFinish();
That glFinish() or glFlush is superfluous. SwapBuffers implies a glFinish.
Could it be, that in your graphics driver settings you set "force V-Blank / V-Sync off"?
We use DX9 currently, and want to switch to DX11. We currently use GetRasterState() to manually sync to the screen. That goes away in DX11, but I've found that making a DirectDraw7 device doesn't seem to disrupt DX11. So just add this to your code and you should be able to get the scanline position.
IDirectDraw7* ddraw = nullptr;
DirectDrawCreateEx( NULL, reinterpret_cast<LPVOID*>(&ddraw), IID_IDirectDraw7, NULL );
DWORD scanline = -1;
ddraw->GetScanLine( &scanline );
On Windows 8.1 and Windows 10, you can make use of the DXGI 1.3 DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT. See MSDN. The sample here is for Windows 8 Store apps, but it should be adaptable to class Win32 windows swapchains as well.
You may find this video useful as well.
When creating a Direct3D device, set PresentationInterval parameter of the D3DPRESENT_PARAMETERS structure to D3DPRESENT_INTERVAL_DEFAULT.
If you run in kernel-mode or ring-0, you can attempt to read bit 3 from the VGA input register (03bah,03dah). The information is quite old but although it was hinted here that the bit might have changed location or may be obsoleted in later version of Windows 2000 and up, I actually doubt this. The second link has some very old source-code that attempts to expose the vblank signal for old Windows versions. It no longer runs, but in theory rebuilding it with latest Windows SDK should fix this.
The difficult part is building and registering a device driver that exposes this information reliably and then fetching it from your application.

C++/opengl application running smoother with debugger attached

Have you experienced a situation, where C++ opengl application is running faster and smoother when executed from visual studio? When executed normally, without debugger, I get lower framerate, 50 instead of 80, and a strange lagging, where fps is diving to about 25 frames/sec every 20-30th frame. Is there a way to fix this?
Edit:
Also we are using quite many display lists (created with glNewList). And increasing the number of display lists seem to increase lagging.
Edit:
The problem seems to be caused by page faults. Adjusting process working set with SetProcessWorkingSetSizeEx() doesn't help.
Edit:
With some large models the problem is easy to spot with procexp-utility's GPU-memory usage. Memory usage is very unstable when there are many glCallList-calls per frame. No new geometry is added, no textures loaded, but gpu-memory-allocation fluctuates +-20 Mbytes. After a while it becomes even worse, and may allocate something like 150Mb in one go.
I believe that what you are seeing is the debugger locking some pages so they couldn't be swapped to be immediately accessible to the debugger. This brings some caveats for OS at the time of process switching and is, in general, not reccommended.
You will probably not like to hear me saying this, but there is no good way to fix this, even if you do.
Use VBOs, or at least vertex arrays, those can be expected to be optimized much better in the driver (let's face it - display lists are getting obsolete). Display lists can be easily wrapped to generate vertex buffers so only a little of the old code needs to be modified. Also, you can use "bindless graphics" which was designed to avoid page faults in the driver (GL_EXT_direct_state_access).
Do you have an nVidia graphics card by any chance? nVidia OpenGL appears to use a different implementation when attached to the debugger. For me, the non-debugger version is leaking memory at up to 1 MB/sec in certain situations where I draw to the front buffer and don't call glClear each frame. The debugger version is absolutely fine.
I have no idea why it needs to allocate and (sometimes) deallocate so much memory for a scene that's not changing.
And I'm not using display lists.
It's probably the thread or process priority. Visual Studio might launch your process with a slightly higher priority to make sure the debugger is responsive. Try using SetPriorityClass() in your app's code:
SetPriorityClass(GetCurrentProcess(), ABOVE_NORMAL_PRIORITY_CLASS);
The 'above normal' class just nudges it ahead of everything else with the 'normal' class. As the documentation says, don't slap on a super high priority or you can screw up the system's scheduler.
In an app running at 60 fps you only get 16ms to draw a frame (less at 80 fps!) - if it takes longer you drop the frame which can cause a small dip in framerate. If your app has the same priority as other apps, it's relatively likely another app could temporarily steal the CPU for some task and you drop a few frames or at least miss your 16 ms window for the current frame. The idea is boosting the priority slightly means Windows comes back to your app more often so it doesn't drop as many frames.