I've noticed that CUDA applications tend to have a rough maximum run-time of 5-15 seconds before they will fail and exit out. I realize it's ideal to not have CUDA application run that long but assuming that it is the correct choice to use CUDA and due to the amount of sequential work per thread it must run that long, is there any way to extend this amount of time or to get around it?
I'm not a CUDA expert, --- I've been developing with the AMD Stream SDK, which AFAIK is roughly comparable.
You can disable the Windows watchdog timer, but that is highly not recommended, for reasons that should be obvious.
To disable it, you need to regedit HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Watchdog\Display\DisableBugCheck, create a REG_DWORD and set it to 1.
You may also need to do something in the NVidia control panel. Look for some reference to "VPU Recovery" in the CUDA docs.
Ideally, you should be able to break your kernel operations up into multiple passes over your data to break it up into operations that run in the time limit.
Alternatively, you can divide the problem domain up so that it's computing fewer output pixels per command. I.e., instead of computing 1,000,000 output pixels in one fell swoop, issue 10 commands to the gpu to compute 100,000 each.
The basic unit that has to fit within the time slice is not your entire application, but the execution of a single command buffer. In the AMD Stream SDK, a long sequence of operations can be broken up into multiple time slices by explicitly flushing the command queue with a CtxFlush() call. Perhaps CUDA has something similar?
You should not have to read all of your data back and forth across the PCIX bus on every time slice; you can leave your textures, etc. in gpu local memory; you just have some command buffers complete occasionally, to prove to the OS that you're not stuck in an infinite loop.
Finally, GPUs are fast, so if your application is not able to do useful work in that 5 or 10 seconds, I'd take that as a sign that something is wrong.
[EDIT Mar 2010 to update:] (outdated again, see the updates below for the most recent information) The registry key above is out-of-date. I think that was the key for Windows XP 64-bit. There are new registry keys for Vista and Windows 7. You can find them here: http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx
or here: http://msdn.microsoft.com/en-us/library/ee817001.aspx
[EDIT Apr 2015 to update:] This is getting really out of date. The easiest way to disable TDR for Cuda programming, assuming you have the NVIDIA Nsight tools installed, is to open the Nsight Monitor, click on "Nsight Monitor options", and under "General" set "WDDM TDR enabled" to false. This will change the registry setting for you. Close and reboot. Any change to the TDR registry setting won't take effect until you reboot.
[EDIT August 2018 to update:]
Although the NVIDIA tools allow disabling the TDR now, the same question is relevant for AMD/OpenCL developers. For those: The current link that documents the TDR settings is at https://learn.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys
On Windows, the graphics driver has a watchdog timer that kills any shader programs that run for more than 5 seconds. Note that the Xorg/XFree86 drivers don't do this, so one possible workaround is to run the CUDA apps on Linux.
AFAIK it is not possible to disable the watchdog timer on Windows. The only way to get around this on Windows is to use a second card that has no displayed screens on it. It doesn't have to be a Tesla but it must have no active screens.
Resolve Timeout Detection and Recovery - WINDOWS 7 (32/64 bit)
Create a registry key in Windows to change the TDR settings to a
higher amount, so that Windows will allow for a longer delay before
TDR process starts.
Open Regedit from Run or DOS.
In Windows 7 navigate to the correct registry key area, to create the
new key:
HKEY_LOCAL_MACHINE>SYSTEM>CurrentControlSet>Control>GraphicsDrivers.
There will probably one key in there called DxgKrnlVersion there as a
DWord.
Right click and select to create a new key REG_DWORD, and name it
TdrDelay. The value assigned to it is the number of seconds before
TDR kicks in - it > is currently 2 automatically in Windows (even
though the reg. key value doesn't exist >until you create it). Assign
it with a new value (I tried 4 seconds), which doubles the time before
TDR. Then restart PC. You need to restart the PC before the value will
work.
Source from Win7 TDR (Driver Timeout Detection & Recovery)
I have also verified this and works fine.
The most basic solution is to pick a point in the calculation some percentage of the way through that I am sure the GPU I am working with is able to complete in time, save all the state information and stop, then to start again.
Update:
For Linux: Exiting X will allow you to run CUDA applications as long as you want. No Tesla required (A 9600 was used in testing this)
One thing to note, however, is that if X is never entered, the drivers probably won't be loaded, and it won't work.
It also seems that for Linux, simply not having any X displays up at the time will also work, so X does not need to be exited as long as you screen to a non-X full-screen terminal.
This isn't possible. The time-out is there to prevent bugs in calculations from taking up the GPU for long periods of time.
If you use a dedicated card for CUDA work, the time limit is lifted. I'm not sure if this requires a Tesla card, or if a GeForce with no monitor connected can be used.
The solution I use is:
1. Pass all information to device.
2. Run iterative versions of algorithms, where each iteration invokes the kernel on the memory already stored within the device.
3. Finally transfer memory to host only after all iterations have ended.
This enables control over iterations from CPU (including option to abort), without the costly device<-->host memory transfers between iterations.
The watchdog timer only applies on GPUs with a display attached.
On Windows the timer is part of the WDDM, it is possible to modify the settings (timeout, behaviour on reaching timeout etc.) with some registry keys, see this Microsoft article for more information.
It is possible to disable this behavior in Linux. Although the "watchdog" has an obvious purpose, it may cause some very unexpected results when doing extensive computations using shaders / CUDA.
The option can be toggled in your X-configuration (likely /etc/X11/xorg.conf)
Adding: Option "Interactive" "0" to the device section of your GPU does the job.
see CUDA Visual Profiler 'Interactive' X config option?
For details on the config
and
see ftp://download.nvidia.com/XFree86/Linux-x86/270.41.06/README/xconfigoptions.html#Interactive
For a description of the parameter.
I am using Eclipse for C/C++ development. I am trying to compile and run a project. When I compile and run the project after a while my CPU gets to 100% usage . I checked "Task Manager" and there I found that Eclipse isn't closing any of the previous build and it's running in the background which uses my CPU heavily. How do I solve this problem. When at 100% usage my PC becomes very very slow.
If you don't want the build to use up all your CPU time (maybe because you want to do other stuff while building) then you could decrease the parallelism of the build to a point where it leaves one or more cores unused. For example, if you have 8 cores you could configure your build to only use 6 of them.
Your build will take longer, but your machine will be more responsive for other tasks while the build runs.
Adding More RAM seems to have solved my problem. Disk usage is also low now. Maybe Since there wasnt enough RAM in my laptop the CPU was fetching data from the Disk directly which made the disk usage to go up.
I own a lets say a decent laptop with 6gb ram and i5 4CPUs ~2.5 Ghz and I create a virtual machine with half of my resources with default options, I don't change anything and after some time let's say 1h OR if I don't to nothing on virtual machine the cpu start to idle and if I came back to the virtual machine it works VERY slow, looks like the VM freeze or something. There is any options to improve performance on VMware? Why it freeze? Also sometimes the vmware crash. Any tips?
EDIT
I installed a new version of vmware I will comeback to an answer after a few hours too see how it goes...
SOLVED Works great with the last version of VMware... the version which cause me probles was from 2013.
When you have i5 and 6GB of ram. VMware should run perfectly fine. Make sure that there are no other applications running in your base operating system.
The probable reasons can be insufficient RAM or CPU over heat.You dont have to allot half of your resources until and unless you are going to use them in your virtual machine.
If possible reinstall your VMware with optimum resources. Need not to be half of your system.
I have developed an application in C. I am running this application on "Red Hat Enterprise Linux Server release 5.8 (Tikanga)" and everything looks good but when we deploy this application on "CentOS release 6.5 (Final)" it starts doing problem. It occupies more chache memory and after 30-45 minutes it shoot up a spike and all cpu shows 100% cpu utilization for 1-2 second.
I google this issue and found that CPU high usage of the usleep on Cent OS 6.3
Since one process in my application is using 10 usleep. It is taking less than 3% CPU in RedHat, however it is taking quite high in CentOS around 90%. After reading the link when I change the sleep from 10 usleep to 1000 usleep or 1 us then it takes 40% CPU.
I need to know that the Kernel of CentOS 6.5 is using high speed timers or not or I need to set any configuration in compiling the Kernel.
In the first place, you are comparing apples and oranges: CentOS 6 corresponds to RHEL 6. Very likely your code would behave the same on RHEL 6.5 as it does on CentOS 6.5, and the same on CentOS 5.8 as on RHEL 5.8. It is misleading to describe the issue as a difference between RHEL and CentOS.
In the second place, if your CPU utilization is that strongly affected by a few usleep() calls (executed, apparently, very many times), then your code is flawed and you should fix it. Building a custom kernel to mask the problem would be pretty backward. Nevertheless, if the objective is more to move over to CentOS than to move up to a (somewhat) more up-to-date environment, then switch to CentOS 5 instead of to CentOS 6.
I have imported an app from Visual Studio compiler to MinGW and I faced a problem – performance degradation. Usage of CPU increased from 30% to 100%.
There is one interesting thing. If before running my app or during, I’ve run Windows Media Player – performance of my app is going to fine. CPU usage is going down till 30% and works faster (about 10 times faster).
I’ve googled it and found. It relates to a service, which names as a Multimedia Class Scheduler Service (MMCSS). The main problem is: this service forks under Windows Vista and later, but I’ve tested and imported my app under Win XP.
So, does anyone know how to use this feature under XP? And how Windows Media Player increases performance of my app?
Windows Media Player changes the resolution of the system multimedia timer. Basically, this occurs when your application really should be using something like the High Performance Timer but is using the multimedia timer instead, which simply doesn't have and isn't intended to have the necessary accuracy or resolution to be a high-performance timer. As a result, any timings in your program essentially don't work as they should, which is especially bad if you're trying to sleep or block for a fixed time.