Is there a way to get an estimated (or exact) timestamp when the submitted frame will be presented on screen?
I'm interested in WSI windowed presentation as well as fullscreen on Windows and Linux.
UPD: One of the possible ways on Windows is IDCompositionDevice::GetFrameStatistics (msdn), which is used for DirectComposition and DirectManipulation, but I'm not sure is it applicable to Vulkan WSI presentation.
VK_GOOGLE_display_timing extension exposes timings of past presents, and allows to supply timing hint for a subsequent present. But the extension is supported only on some Androids.
VK_EXT_display_control provides a VSync counter and an Fence signal when Vblank starts. But it only works with a VkDisplayKHR type swapchain. And it has only some small support on Linuxes.
The appropriate issue has been raised at Vulkan-Docs#370. Unfortunately, it is taking its time to be resolved.
I don't think you can get the exact presentation time (which would be tricky in any case, since monitors have some internal latency). I think you can get close though: The docs for vkAcquireNextImageKHR say you can pass a fence that gets signaled when the driver is done with the image, which should be close to the time it gets sent off to the display. If you're using VK_PRESENT_MODE_FIFO_KHR you can then use the refresh rate to work out when later images in the queue get presented.
Related
I've noticed that CUDA applications tend to have a rough maximum run-time of 5-15 seconds before they will fail and exit out. I realize it's ideal to not have CUDA application run that long but assuming that it is the correct choice to use CUDA and due to the amount of sequential work per thread it must run that long, is there any way to extend this amount of time or to get around it?
I'm not a CUDA expert, --- I've been developing with the AMD Stream SDK, which AFAIK is roughly comparable.
You can disable the Windows watchdog timer, but that is highly not recommended, for reasons that should be obvious.
To disable it, you need to regedit HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Watchdog\Display\DisableBugCheck, create a REG_DWORD and set it to 1.
You may also need to do something in the NVidia control panel. Look for some reference to "VPU Recovery" in the CUDA docs.
Ideally, you should be able to break your kernel operations up into multiple passes over your data to break it up into operations that run in the time limit.
Alternatively, you can divide the problem domain up so that it's computing fewer output pixels per command. I.e., instead of computing 1,000,000 output pixels in one fell swoop, issue 10 commands to the gpu to compute 100,000 each.
The basic unit that has to fit within the time slice is not your entire application, but the execution of a single command buffer. In the AMD Stream SDK, a long sequence of operations can be broken up into multiple time slices by explicitly flushing the command queue with a CtxFlush() call. Perhaps CUDA has something similar?
You should not have to read all of your data back and forth across the PCIX bus on every time slice; you can leave your textures, etc. in gpu local memory; you just have some command buffers complete occasionally, to prove to the OS that you're not stuck in an infinite loop.
Finally, GPUs are fast, so if your application is not able to do useful work in that 5 or 10 seconds, I'd take that as a sign that something is wrong.
[EDIT Mar 2010 to update:] (outdated again, see the updates below for the most recent information) The registry key above is out-of-date. I think that was the key for Windows XP 64-bit. There are new registry keys for Vista and Windows 7. You can find them here: http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx
or here: http://msdn.microsoft.com/en-us/library/ee817001.aspx
[EDIT Apr 2015 to update:] This is getting really out of date. The easiest way to disable TDR for Cuda programming, assuming you have the NVIDIA Nsight tools installed, is to open the Nsight Monitor, click on "Nsight Monitor options", and under "General" set "WDDM TDR enabled" to false. This will change the registry setting for you. Close and reboot. Any change to the TDR registry setting won't take effect until you reboot.
[EDIT August 2018 to update:]
Although the NVIDIA tools allow disabling the TDR now, the same question is relevant for AMD/OpenCL developers. For those: The current link that documents the TDR settings is at https://learn.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys
On Windows, the graphics driver has a watchdog timer that kills any shader programs that run for more than 5 seconds. Note that the Xorg/XFree86 drivers don't do this, so one possible workaround is to run the CUDA apps on Linux.
AFAIK it is not possible to disable the watchdog timer on Windows. The only way to get around this on Windows is to use a second card that has no displayed screens on it. It doesn't have to be a Tesla but it must have no active screens.
Resolve Timeout Detection and Recovery - WINDOWS 7 (32/64 bit)
Create a registry key in Windows to change the TDR settings to a
higher amount, so that Windows will allow for a longer delay before
TDR process starts.
Open Regedit from Run or DOS.
In Windows 7 navigate to the correct registry key area, to create the
new key:
HKEY_LOCAL_MACHINE>SYSTEM>CurrentControlSet>Control>GraphicsDrivers.
There will probably one key in there called DxgKrnlVersion there as a
DWord.
Right click and select to create a new key REG_DWORD, and name it
TdrDelay. The value assigned to it is the number of seconds before
TDR kicks in - it > is currently 2 automatically in Windows (even
though the reg. key value doesn't exist >until you create it). Assign
it with a new value (I tried 4 seconds), which doubles the time before
TDR. Then restart PC. You need to restart the PC before the value will
work.
Source from Win7 TDR (Driver Timeout Detection & Recovery)
I have also verified this and works fine.
The most basic solution is to pick a point in the calculation some percentage of the way through that I am sure the GPU I am working with is able to complete in time, save all the state information and stop, then to start again.
Update:
For Linux: Exiting X will allow you to run CUDA applications as long as you want. No Tesla required (A 9600 was used in testing this)
One thing to note, however, is that if X is never entered, the drivers probably won't be loaded, and it won't work.
It also seems that for Linux, simply not having any X displays up at the time will also work, so X does not need to be exited as long as you screen to a non-X full-screen terminal.
This isn't possible. The time-out is there to prevent bugs in calculations from taking up the GPU for long periods of time.
If you use a dedicated card for CUDA work, the time limit is lifted. I'm not sure if this requires a Tesla card, or if a GeForce with no monitor connected can be used.
The solution I use is:
1. Pass all information to device.
2. Run iterative versions of algorithms, where each iteration invokes the kernel on the memory already stored within the device.
3. Finally transfer memory to host only after all iterations have ended.
This enables control over iterations from CPU (including option to abort), without the costly device<-->host memory transfers between iterations.
The watchdog timer only applies on GPUs with a display attached.
On Windows the timer is part of the WDDM, it is possible to modify the settings (timeout, behaviour on reaching timeout etc.) with some registry keys, see this Microsoft article for more information.
It is possible to disable this behavior in Linux. Although the "watchdog" has an obvious purpose, it may cause some very unexpected results when doing extensive computations using shaders / CUDA.
The option can be toggled in your X-configuration (likely /etc/X11/xorg.conf)
Adding: Option "Interactive" "0" to the device section of your GPU does the job.
see CUDA Visual Profiler 'Interactive' X config option?
For details on the config
and
see ftp://download.nvidia.com/XFree86/Linux-x86/270.41.06/README/xconfigoptions.html#Interactive
For a description of the parameter.
I have a few questions regarding the Desktop Window Manager (aka DWM) in Windows 10:
Background: For an OpenGL application I wrote in C++ I need precise timing regarding the swap of the front and back buffers in OpenGL and the realization of these commands on the OS level. (I know Windows 10, or Windows in general, is a bad choice for this, but there are other limiting factors).
Question 1: My internet research showed that the DWM manages a third buffer (making visualization a triple buffered system) which I cannot control and therefore creates an unpredictable delay. The investigation also showed that this can be bypassed by opening an OpenGL context in fullscreen mode. Is this information correct?
Question 2: Is this delay caused by the fact that the OS randomly instructs the DWM to copy the buffer?
Question 3: How long is the actual delay, my investigation showed numbers between < 1ms and up to 50ms, but there was no trustworthy source.
In fact, besides for the single fact, the mere existence of the delay, there was no trustworthy source for any of the other assumptions which I was able to find on the internet. Therefore I kindly ask anyone having an answer to this questions to include if this possible, a reference to their statement.
I don't know if this is important, but I'm using OpenGL via GLFW and GLEW.
Although I was unable to find an answer to question 2 and 3, contacting the Nvidia support provided the answer to question 1.
Nvidia statet that an application rendered in a full screen context cannot access the DWM. Only applications rendered in windowed mode are handled by it.
Warning: They also said that this was by design. Considering the fact that Microsoft attempts to force users/programmers to use the DWM there is no guarantee on how long this design decision will remain unchanged.
Original mail from Nvidia:
[...]
After checking your request with our specialized department, please note that when a game or anything is in Full Screen you cannot access this Windows Feature [annot.: DWM]. This is by design. It needs to be windowed mode if you want to access this feature.
[...]
I am trying to fix an Audacity bug that revolves around portmixer. The output/input level is settable using the mac version of portmixer, but not always in windows. I am debugging portmixer's window code to try to make it work there.
Using IAudioEndpointVolume::SetMasterVolumeLevelScalar to set the master volume works fine for onboard sound, but using pro external USB or firewire interfaces like the RME Fireface 400, the output volume won't change, although it is reflected in Window's sound control panel for that device, and also in the system mixer.
Also, outside of our program, changing the master slider for the system mixer (in the taskbar) there is no effect - the soundcard outputs the same (full) level regardless of the level the system says it is at. The only way to change the output level is using the custom app that the hardware developers give with the card.
The IAudioEndpointVolume::QueryHardwareSupport function gives back ENDPOINT_HARDWARE_SUPPORT_VOLUME so it should be able to do this.
This behavior exists for both input and output on many devices.
Is this possibly a Window's bug?
It is possible to workaround this by emulating (scaling) the output, but this is not preferred as it is not functionally identical - better to let the audio interface do the scaling (esp. for input if it involves a preamp).
The cards you talk about -like the RME- ones simply do not support setting the master or any other level through software, and there is not much you can do about it. This is not a Windows bug. One could argue that giving back ENDPOINT_HARDWARE_SUPPORT_VOLUME is a bug though, but that likely originates from the driver level, not Windows itself.
The only solution I found so far is hooking up a debugger (or adding a dll hook) to the vendor supplied software and looking at the DeviceIOControl calls it makes (those are the ones used to talk to the hardware) while setting the volume in the vendor software. Pretty hard to do this for every single card, but probably worth doing for a couple of pro cards. Especially for Audacity, for open source audio software it's actually not that bad so I can imagine some people being really happy if the volume on their card could be set by it. (at the time we were exclusively using an RME Multiface I spent quite some time in figuring out the DeviceIOControl calls, but in the end it was definitely worth it as I could set the volume in dB for any point in the matrix)
I've recently bought myself a new cellphone, running Windows Mobile 6.1 Professional. And of course I am currently looking into doing some coding for it, on a hobby basis. My plan is to have a service running as a DLL, loaded by Services.exe. This needs to gather som data, and do som processing at regular intervals (every 5-10 minutes).
Since I need to run this at regular intervals, it is a bit of a problem for me, that the system typically goes to sleep (suspend) after a short period of inactivity by the user.
I have been reading all the documentation I could find on MSDN, and MSDN blogs about this subject, and it seems to me, that there are three possible solutions to this problem:
Keep the system in an "Always On"-state, by calling SystemIdleTimerReset periodically. This seems a bit excessive, and is therefore out of the question.
Have the system periodically waken up with CeRunAppAtTime, and enter the unattended state, to do my processing.
Use the unattended state instead of going into a full suspend. This would be transparent to the user, but the system would never go into sleep.
The second approach seems to be preferred, however, this would require an executable to be called by the system on wake up, with the only task of notifying my service that it should commence processing. This seems a bit unnecessary and I would like to avoid this extra executable. I could of course move all my processing into this extra executable, but I would like to use some of the facilities provided when running as a service, and also not have a program pop up (even if its in the background) whenever processing starts.
At first glance, the third approach seems to have the same basic problem as the first. However, I have read on some of the MSDN blogs, that it might be possible to actually conserve battery consumption with this approach, instead of going in and out of suspend mode often (The arguments for this was that the nature of the WM platform is to have a very little battery consumption, when the system is idle. And that going in and out of suspend require quite a bit of processing).
So I guess my questions are as following:
Which approach would you recommend in my situation? With respect to keeping a minimum battery consumption, and a nice clean implementation.
In the case of approach number two, is it possible to eliminate the need for a notifying executable? Either through alternative API functions, or existing generic applications on the platform?
In the case of approach number three, do you know of any information/statistics relevant to the claim, that it is possible to extend the battery lifetime when using unattended mode over going into suspend. E.g. how often do you need to pull the system out of suspend, before unattended mode is to be preferred.
Implementation specific (bonus) question: Is it necessary to regularly call SystemIdleTimerReset to stay in unattended mode?
And finally, if you think I have prematurely eliminated approach number one, please tell me why.
Please include in your response whether you base your response on knowledge, or are merely guessing (the latter is also very welcome!).
Please leave a comment, if you think I need to clarify any parts of this question.
CERunAppAtTime is a much-misunderstood API (largely because of the terrible name). It doesn't have to run an app. It can simply set a named system event (see the description of the pwszAppName parameter in the MSDN docs). If you care to know when it has fired (to lat your app put the device to sleep again when it's done processing) simply have a worker thread that is doing a WaitForSingleObject on that same named event.
Unattended state is often used for devices that need to keep an app running continuously (like an MP3 player) but conserve power by shutting down the backlight (probably the single most power consuming subsystem).
Obviously unattended mode uses significantly more powr than suspend, becasue in suspend the only power draw is for RAM self-refresh. In unattended mode the processor is stuill powered and running (and several peripherals may be too - depends on how the OEM defined their unattended mode).
SystemIdleTimerReset simply prevents the power manager from putting the device into low-power mode due to inactivity. This mode, whether suspended, unattended, flight or other, is defined by the OEM. Use it sparingly because when your do it impacts the power consumption of the device. Doing it in unattended mode is especially problematic from a user perspective because they might think the device is off (it looks that way) but now their battery life has gone south.
I had a whole long post detailing how you shouldn't expect to be able to get acceptable battery life because WM is not designed to support what you're trying to do, but -- you could signal your service on wakeup, do your processing, then use the methods in this post to put the device back to sleep immediately. You should be able to keep the ratio of on-time-to-sleep-time very low this way -- but as you say, I'm only guessing.
See also:
Power-Efficient Apps (MSDN)
Power To The People (Developers 1, Developers 2, Devices)
Power-Efficient WM Apps (blog post)
I have a lengthy number-crunching process which takes advantage of quite abit of OpenGL off-screen rendering. It all works well but when I leave it to work on its own while I go make a sandwich I would usually find that it crashed while I was away.
I was able to determine that the crash occurs very close to the moment The laptop I'm using decides to turn off the screen to conserve energy. The crash itself is well inside the NVIDIA dlls so there is no hope to know what's going on.
The obvious solution is to turn off the power management feature that turns the screen and video card off but I'm looking for something more user friendly.
Is there a way to do this programatically?
I know there's a SETI#home implementation which takes advantage of GPU processing. How does it keep the video card from going to sleep?
I'm not sure what OS you're on, but windows sends a message that it is about to enter a new power state. You can listen for that and then either start processing on the CPU or deny the request to enter a lower-power state.
For the benefit of Linux users encountering a similar issue, I thought I'd add that, you can obtain similar notifications and inhibit power state changes using the DBUS API. An example script in Python, taken from the link, to inhibit power state change:
#!/usr/bin/python
import dbus
import time
bus = dbus.Bus(dbus.Bus.TYPE_SESSION)
devobj = bus.get_object('org.freedesktop.PowerManagement',
'/org/freedesktop/PowerManagement')
dev = dbus.Interface (devobj, "org.freedesktop.PowerManagement.Inhibit")
cookie = dev.Inhibit('Nautilus', 'Copying files from /media/SANVOL')
time.sleep(10)
dev.UnInhibit(cookie)
According to MSDN, there is an API that allows an application to tell Windows that it is still working and that Windows should not go to sleep or turn off the display.
The function is called SetThreadExecutionState (MSDN). It works for me, using the flags ES_SYSTEM_REQUIRED and ES_CONTINUOUS.
Note, however, that using this function does not stop the screen saver from running, which might interfere with your OpenGL app if the screen saver also uses OpenGL (oder Direct3D).