CUDA: Detect when JIT PTX caching will need to occur?

CUDA: Detect when JIT PTX caching will need to occur? - c++

I have an application that implements many CUDA kernels. When it is first run on a user's machine, JIT caching will occur. This process can take a while (several seconds)...and a sudden, several-second 'freeze' of my application can be worrying to users - I would like to be able to let users know with an on-screen prompt that this one-time-only caching process is normal and expected.
So...is there a way to determine if JIT caching needs to occur, prior to it occurring? I haven't found any 'cuNeedsToCache()'-esque functions in the CUDA SDK...am I'm missing something?

Related

How can i trigger a GPU Crash

Sometimes we can trigger a "CPU Crash" with code like:
*(int*)0=1234;
Correspondingly, any simple code can trigger a "GPU Crash" (may need some DirectX interface)?
Why I need this is because I want to learn about Unreal's(UE4/UE5) GPU crash processing flow.
Any code with UE is expected.
THX

It is very difficult to crash a modern GPU with decent drivers. Generally they will run until task completion. If the task is extremely large or takes a very long time, the OS will assume they have simply stopped responding. After this occurs, there are a few options:
Crash: Very rare
Reset the driver: This will reset the DirectX state and crash
whatever program is using it (unless it is checking for that, but
that seems like a waste)
Abort the operation: This is a feature on
more modern hardware and drivers, but they can simply stop the
program.
Run uninterrupted: This occurs only if the GPU is in use by NOTHING else.
Do some heavy calculation with no end (fractals) and see what happens.

You can trigger a 'Timeout Device Recovery' (TDR) which is very similar to a GPU crash.
Using a Visual Studio Developer Command Prompt running as admin:
dxcap -forcetdr
Keep in mind this triggers a TDR for all running applications.

Why does my program run faster on first launch than on next launches?

I have been working for 2.5 years on a personal flight sim project on my leisure time, written in C++ and using Opengl on a windows 7 PC.
I recently had to move to windows 10. Hardware is exactly the same. I reinstalled Code::blocks.
It turns out that on first launch of my project after the system start, performance is OK, similar to what I used to see with windows 7. But, the second, third, and all next launches give me lower performance, with significant less fluidity in frame rate compared to the first run, detectable by eye. This never happened with windows 7.
Any time I start my system, first run is fast, next ones are slower.
I had a look at the task manager while doing some runs. The first run is handled by one of the 4 cores of my CPU (iCore5-6500) at approximately 85%. For the next runs, the load is spread accross the 4 cores. During those slower runs on 4 cores, I tried to modify the affinity and direct my program to only one core without significant improvement in performance. The selected core was working at full load, though.
My C++ code doesn't explicitly use any thread function at this stage. From my modest programmer's point of view, there is only one main thread run in the main(). On the task manager, I can see that some 10 to 14 threads are alive when my program runs. I guess (wrongly?) that they are implicitly created by the use of joysticks, track ir or other communication task with GPU...
Could it come from memory not being correctly freed when my program stops? I thought windows would free it properly, even if I forgot some 'delete' after using 'new'.
Has anyone encountered a similar situation? Any explanation coming to your minds?
Any suggestion to better understand these facts? Obviously, my ultimate goal is to have a consistent performance level whatever the number of launches.
trying to upload screenshots of second run as viewed by task manager:
trying to upload screenshots of first run as viewed by task manager:

Well I got a problems when switching to win10 for clients at my work too here few I encountered all because Windows10 has changed up scheduling of processes creating a lot of issues like:
older windowses blockless thread synchronizations techniques not working anymore
well placed Sleep() helps sometimes. Btw. similar problems was encountered when switching from w2k to wxp.
huge slowdowns and frequent freezes for few seconds on older single threaded apps
usually setting affinity to single core solves this. You can do this also in task manager just to check and if helps then you can do this in code too. Here an example on how to do it with winapi:
Cache size estimation on your system?
messed up drivers timings causing zombies processes even total freeze and or BSOD
I deal with USB in my work and its a nightmare sometimes on win10. On top of all this Win10 tends to enforce wrong drivers on devices (like gfx cards, custom USB systems etc ...)
auto freeze close app if it does not respond the wndproc in time
In Windows10 the timeout is much much smaller than in older versions. If the case You can try running in compatibility mode (set in icon properties on desktop) for older windows (however does not work for #1,#2), or change the apps code to speed up response. For example in VCL you can call Process Messages from inside of blocking code to remedy this... Or you can use threads for the heavy lifting ... just be careful with rendering and using winapi as accessing some winapi (any window/visual related stuff) functions from outside main thread causes havoc ...
On top of all this old IDEs (especially for MCUs) don't work properly anymore and new ones are usually much worse (or unusable because of lack of functionality that was present in older versions) to work with so I stayed faith full to Windows7 for developer purposes.
If none of above helps then try to log the times some of your task did need ... it might show you which part of code is the problem. I usually do this using timing graph like this:
both x,y axises are time and each task has its own color and row in graph. the graph is scrolling in time (to the left side in my case) and has changeable time scale. The numbers are showing actual and max (or sliding avg) value ...
This way I can see if some task is not taking too much time or even overlaps its next execution, peaks are also nicely visible and all runs during runtime without any debug tools which might change the behavior of execution.

Redirect APPCRASH Dumps (Or turn them off)

I have an application (didn't write it) that is producing APPCRASH dumps in C:\Windows\SysWOW64. The application while dumping is crippled, but operating at bare minimum capacity to not lose data. The issue is that these dumps are so large that the system is spending most of it's time writing these and the application is falling far behind in processing and will start losing data soon.
The plan is to either entirely disable it, or mount it to a RAM drive and purge them as soon as they hit the RAM drive.
Now I've looked into using this key:
http://msdn.microsoft.com/en-us/library/windows/desktop/bb787181%28v=vs.85%29.aspx
But all it does is generate a second dump now instead of redirect the original.
The dump is named:
dump-2013_03_31-15_23_55_772.dmp
This is generally the realm of developers on Windows (with stuff like C/C++) so I'd like to hit them up, don't think ServerFault could get me any answers on this.
Additionally: It's not cycling dump files (they'll fill the 20GBs left on the hard drive), so I'm not sure if this is Windows behavior or custom code in the app (if it is... ick!).

To write a DumpFile, an app has to call the function "MiniDumpWriteDump" so this is not a behavior of the system or something you can control, it is application driven. If it dumps on crashes, it uses "SetUnhandledExceptionFilter" to set its own handling routine, before(!) the OS takes over. Unfortunately I didn't found a way to overwrite this handler from an other process, so the only hope left is, that there is a register entry for the app switching the behavior or change the path (as my applications have it for exactly the reason you describe).

How do I debug lower level File access exceptions/crashes in C++ unmanaged code?

I'm currently working on trying to resolve a crash/exception on an unmanaged C++ application.
The application crashes with some predicatibility. The program basically
process a high volume of files combined with running a bunch of queries through
the access DB.
It's definitely occuring during a file access. The error message is:
"failed reading. Network name is no longer available."
It always seems to be crashing in the same lower level file access code.
It's doing a lower level library Seek(), then a Read(). The exception occurs
during the read.
To further complicate things, we can only get the errors to occur when
we're running an disk balancing utility. The utility essentially examines file
access history and moves more frequently/recently used files to faster storage retrieval
while files that are used less frequently are moved to a slower retrieval area. I don't fully
understand the architecture of the this particular storage device,
but essentially it's got an area for "fast" retrieval and one for "archived/slower."
The issues are more easily/predicably reproducible when the utility app is started and
stopped several times. According to the disk manufacturer, we should be able to run
the utility in the background without effecting the behaviour of the client's main application.
Any suggestions how to proceed here? There are theories floating around here that it's somehow related to latency on the storage device. Is there a way to prove/disprove that? We've written a small sample app that basically goes out accesses/reads a whole mess of files on the drive. We've (so far) been unable to reproduce the issue even running with SmartPools. My thought is to try push the latency theory is to have multiple apps basically reading volumes of files from disk while running the utility application.
The memory usage and CPU usage do not look out of line in the Task Manager.
Thoughts? This is turning into a bit of a hairball.
Thanks,
JohnB

Grab your debug binaries.
Setup Application Verifier and add your application to its list.
Hopefully wait for a crash dumb.
Put that through WinDBG.
Try command: !avrf
See what you get....

Logging/monitoring all function calls from an application

we have a problem with an application we're developing. Very seldom, like once in a hundred, the application crashes at start up. When the crash happens it brings down the whole system, the computer starts to beep and freezes up completely, the only way to recover is to turn off the power (we're using Windows XP). The rarity of the crash combined with the fact that we can't break into the debugger or even generate a stackdump when it occurs makes it extremely hard to debug.
I'm looking for something that logs all function calls to a file. Does such a tool exist? It shouldn't be impossible to implement, profilers like VTune does something very similar.
We're using visual studio 2008 (C++).
Thanks
A.B.

Logging function entries/exits is a low-level approach to your problem. I would suggest using automatic debugger instrumentation (using Debugger key under Image File Execution Options with regedit or using gflags from the package I provide a link to below) and trying to repro the problem until it crashes. Additionally, you can have the debugger log function call history of suspected module(s) using a script or have collect any other information.
But not knowing the details of your application it is very hard to suggest a solution. Is it a user app, service or a driver? What does "crashes at startup" mean - at windows startup or app's startup?
Use this debugger package to troubleshoot.

The only problem with the logging idea is that when the system crashes, the latest log entries might still be in the cache and have no chance to be written to disk...
If it was me I would try running the program on a different PC - it might be flaky hardware or drivers causing the problem. An application program "shouldn't" be able to bring down the system.

A few Ideas-
There is a good chance that just prior to your crash there is some sort of exception in the application. if you set you handler for all unhandled exceptions using SetUnhandledExceptionFilter() and write a stack trace to your log file, you might have a chance to catch the crash in action.
Just remember to flush the file after every write.
Another option is to use a tool such as strace which logs all of the system calls into the kernel (there are multiple flavors and implementations for that so pick your favorite). if you look at the log just before the crash you might find the culprit

Have you considered using a second machine as a remote debugger (via the network)? When the application (and system) crashes, the second machine should still show some useful information, if not the actual point of the problem. I believe VC++ has that ability, at least in some versions.

For Visual C++ _penter() and _pexit() can be used to instrument your code.
See also Method Call Interception in C++.

GCC (including the version MingGW for Windows development) has a code generation switch called -finstrument-functions that tells the compiler to emit special calls to functions called __cyg_profile_func_enter and __cyg_profile_func_exit around every function call. For Visual C++, there are similar options called /GH and /Gh. These cause the compiler to emit calls to __penter and __pexit around function calls.
These instrumentation modes can be used to implement a logging system, with you implementing the calls that the compiler generates to output to your local filesystem or to another computer on your network.
If possible, I'd also try running your system using valgrind or a similar checking tool. This might catch your problem before it gets out-of-hand.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js