Why is the final return value of j 10, how does LLVM know it - llvm

i have this code:
int main() {
int i = 0;
int &j = i;
j = 10;
return i;
}
and after pass -mem2reg, get the ir as follows:
define dso_local i32 #main() #0 !dbg !7 {
entry:
call void #llvm.dbg.value(metadata i32 0, metadata !11, metadata !DIExpression()), !dbg !12
call void #llvm.dbg.value(metadata i32* undef, metadata !13, metadata !DIExpression()), !dbg !12
call void #llvm.dbg.value(metadata i32 10, metadata !11, metadata !DIExpression()), !dbg !12
ret i32 10, !dbg !15
}
What I'm confused about is what analysis LLVM uses to get i and j to be equivalent.
I got the details of this pass runtime:
[2021-12-02 11:53:05.295018000] 0x5626d58d2a20 Executing Pass 'Function Pass Manager' on Module '/usr/local/LLVM/test/e0.ll'...
[2021-12-02 11:53:05.295137300] 0x5626d58bf890 Executing Pass 'Dominator Tree Construction' on Function 'main'...
[2021-12-02 11:53:05.295169500] 0x5626d58bf890 Executing Pass 'Promote Memory to Register' on Function 'main'...
0x5626d58bf330 Required Analyses: Assumption Cache Tracker, Dominator Tree Construction
[2021-12-02 11:53:05.295364900] 0x5626d58bf890 Made Modification 'Promote Memory to Register' on Function 'main'...
0x5626d58bf330 Preserved Analyses: Natural Loop Information, Lazy Branch Probability Analysis, Lazy Block Frequency Analysis, Interval Partition Construction, Post-Dominator Tree Construction, Machine Dominance Frontier Construction, MachineDominator Tree Construction, WebAssembly Exception Information, Spill Code Placement Analysis, Bundle Machine CFG Edges, Machine Natural Loop Construction, Detect single entry single exit regions, Dominance Frontier Construction, View regions of function, Print regions of function to 'dot' file, View regions of function (with no function bodies), MachinePostDominator Tree Construction, Delinearization, Print regions of function to 'dot' file (with no function bodies), Detect single entry single exit regions, Dependence Analysis, Dominator Tree Construction, Dominator Info Printer, Print a call graph, Lazy Machine Block Frequency Analysis, Analysis if a function is memory bound, Strip gc.relocates inserted through RewriteStatepointsForGC, Machine Block Frequency Analysis, Block Frequency Analysis, Basic Alias Analysis (stateless AA impl)
-*- 'Promote Memory to Register' is the last user of following pass instances. Free these instances
[2021-12-02 11:53:05.295628600] 0x5626d58bf890 Freeing Pass 'Dominator Tree Construction' on Function 'main'...
[2021-12-02 11:53:05.295648400] 0x5626d58bf890 Freeing Pass 'Promote Memory to Register' on Function 'main'...
[2021-12-02 11:53:05.295663100] 0x5626d58bf890 Executing Pass 'Module Verifier' on Function 'main'...
-*- 'Module Verifier' is the last user of following pass instances. Free these instances
[2021-12-02 11:53:05.295725200] 0x5626d58bf890 Freeing Pass 'Module Verifier' on Function 'main'...
[2021-12-02 11:53:05.295740400] 0x5626d58d2a20 Made Modification 'Function Pass Manager' on Module '/usr/local/LLVM/test/e0.ll'...
-*- 'Function Pass Manager' is the last user of following pass instances. Free these instances
[2021-12-02 11:53:05.295828100] 0x5626d58d2a20 Freeing Pass 'Assumption Cache Tracker' on Module '/usr/local/LLVM/test/e0.ll'...
[2021-12-02 11:53:05.295850400] 0x5626d58d2a20 Executing Pass 'Print Module IR' on Module '/usr/local/LLVM/test/e0.ll'...
-*- 'Print Module IR' is the last user of following pass instances. Free these instances
[2021-12-02 11:53:05.299496500] 0x5626d58d2a20 Freeing Pass 'Print Module IR' on Module '/usr/local/LLVM/test/e0.ll'...
Can someone help me figure out which analysis or optimization lets LLVM know that i and j are equivalent,very thankful!

Generate unoptimized IR with clang
clang -S -emit-llvm -Xclang -disable-O0-optnone xx.cpp
Optimize the resulting IR with opt
opt --print-after-all -O3 xx.ll -S -o yy.ll
If you look at the IR as it gets optimized you'll see it gets simplified in the SimplifyCFGPass.

Related

If a program's main returns an i32, why is $? (as measured by the shell that called it) truncated to 8 bits?

Sorry for such a noob question, but why the result is not 516?
define i32 #main()
{
%1 = add i32 6, 500
%2 = add i32 5, 5
%3 = add i32 %1, %2
ret i32 %3
}
http://llvm.org/docs/LangRef.html#integer-type
i32 a 32-bit integer.
Usage:
./lli Program.ir; echo $?
4
Thanks in advance
The exit code of a process in Unix is only 8 bits. Any larger value gets truncated, regardless of whether LLVM is involved:
$ ( exit 516 ); echo $?
4
The exit code (I'm going to distinguish the exit value returned by your program, from the exit code made available to the process that started you program) is actually, in UNIX like operating systems, a conglomeration of several different items, one of which is the exit value. See, for example, this link, which contains (with my emphasis and [extra information]):
Don't confuse a program's exit status [value] with a process' termination status [code]. There are lots of ways a process can terminate besides having its program finish. In the event that the process termination is caused by program termination (i.e., exit), though, the program’s exit status [value] becomes part of the process' termination status [code].
The macro to get the actual exit status from the process (see here) states:
If WIFEXITED is true of status, this macro returns the low-order 8 bits of the exit status value from the child process.
That's also indicated by the actual source code of the Linux exit_group syscall, which is the one eventually called by exit:
SYSCALL_DEFINE1(exit_group, int, error_code)
{
do_group_exit((error_code & 0xff) << 8);
/* NOTREACHED */
return 0;
}
You can see there that it only uses the lower eight bits of the exit value, and shifts it left so it can store those other items (control information) in there, all zero in this case. Contrast that with the same call from the signal processor which only sets the control information:
do_group_exit(ksig->info.si_signo)
In other words, it also has to put other things in the process exit code, such as which signal terminated it (if it was terminated by a signal), whether it dumped core, and so on. That's why the exit value is limited to a lesser range than you expect.
The ISO standard (C11) also allows for this, in 7.22.4.4 The exit function /5 (since returning an integer value from main() is equivalent to calling exit() with that value:
Finally, control is returned to the host environment. If the value of status is zero or EXIT_SUCCESS, an implementation-defined form of the status successful termination is returned. If the value of status is EXIT_FAILURE, an implementation-defined form of the status unsuccessful termination is returned. Otherwise the status returned is implementation-defined.

Efficient variable watching in C/C++

I'm currently writing a multi-threaded, high efficient and scalable algorithm. Because I have to guess a parameter for the code and I'm not sure how the calculation performs on a specific data set, I would like to watch a variable. The test only works with a real world, huge data set. It is possible to analyze the collected data after profiling. Imagine the following, simple code example (real code can contain multiple watch points:
// function get's called by loops of multiple threads
void payload(data_t* data, double threshold) {
double value = calc(data);
// here I want to watch the value
if (value < threshold) {
doSomething(data);
} else {
doSomethingElse(data);
}
}
I thought about the following approaches:
Using cout or other system outputs
Use a binary output (file, network)
Set a breakpoint via gdb/lldb
Use variable watching + logging via gdb/lldb
I'm not happy with the results because: To use 1. and 2. I have to change the code, but this is a debugging/evaluating task. Furthermore 1. requires locking and 1.+2. requires I/O operations, which heavily slows down the entire code and makes testing with real data nearly impossible. 3. is also too slow. To use 4., I have to know the variable address because it's not a global variable, but because threads get created by a dynamic scheduler, this would require breaking + stepping for each thread.
So my conclusion is, that I need a profiler/debugger that works at machine code level and dumps/logs/watches the variable without double->string conversion and is highly efficient, or to sum up with other words: I would like to profile the internal state of my algorithm without heavy slow-down and without doing deep modification. Does anybody know a tool that is able to this?
OK, this took some time but now I'm able to present a solution for my problem. It's called tracepoints. Instead of breaking the program every time, it's more lightweight and (ideally) doesn't change performance/timing too much. It does not require code changes. Here is an explanation how to use them using gdb:
Make sure you compiled your program with debugging symbols (using the -g flag). Now, start the gdb server and provide a network port (e.g. 10000) and the program arguments:
gdbserver :10000 ./program --parameters you --want --to use
Now, switch to a second console and start gdb (program parameters are not required here):
gdb ./program
All following commands are entered in the gdb command line interface. So let's connect to the server:
target remote :10000
After you got the connection confirmation, use trace or ftrace to set a tracepoint to a specific source location (try ftrace first, it should be faster but doesn't work on all platforms):
trace source.c:127
This should create tracepoint #1. Now you can setup an action for this tracepoint. Here I want to collect the data from myVariable
action 1
collect myVariable
end
If expect much data or want to use the data later (after restart), you can set a binary trace file:
tsave trace.bin
Now, start tracing and run the program:
tstart
continue
You can wait for program exit or interrupt your program using CTRL-C (still on gdb console, not on server side). Continue by telling gdb that you want to stop tracing:
tstop
Now we come the tricky part and I'm not really happy with the following code because it's really slow:
set pagination off
set logging file trace.txt
tfind start
while ($trace_frame != -1)
set logging on
printf "%f\n", myVariable
set logging off
tfind
end
This dumps all variable data to a text file. You can add some filter or preparation here. Now you're done and you can exit gdb. This will also shutdown the server:
quit
For detailed documentation especially for explanation of filtering and more advanced tracepoint positions, you can visit the following document: http://sourceware.org/gdb/onlinedocs/gdb/Tracepoints.html
To isolate trace file writing from your program execution, you can use cgroups or another network connected computer. When using another computer, you have to add the host to the port information (e.g. 192.168.1.37:10000). To load a binary trace file later, just start gdb as shown above (forget the server) and change the target:
gdb ./program
target tfile trace.bin
you can set hardware watchpoint using gdb debugger, for example if you have
bool b;
variable and you want to be notified every time the value of it has chenged (by any thread)
you would declare a watchpoint like this:
(gdb) watch *(bool*)0x7fffffffe344
example:
root#comp:~# gdb prog
GNU gdb (GDB) 7.5-ubuntu
Copyright ...
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /dist/Debug/GNU-Linux-x86/cppapp_socket5_ipaddresses...done.
(gdb) watch *(bool*)0x7fffffffe344
Hardware watchpoint 1: *(bool*)0x7fffffffe344
(gdb) start
Temporary breakpoint 2 at 0x40079f: file main.cpp, line 26.
Starting program: /dist/Debug/GNU-Linux-x86/cppapp_socket5_ipaddresses
Hardware watchpoint 1: *(bool*)0x7fffffffe344
Old value = true
New value = false
main () at main.cpp:50
50 if (strcmp(mask, "255.0.0.0") != 0) {
(gdb) c
Continuing.
Hardware watchpoint 1: *(bool*)0x7fffffffe344
Old value = false
New value = true
main () at main.cpp:41
41 if (ifa ->ifa_addr->sa_family == AF_INET) { // check it is IP4
(gdb) c
Continuing.
mask:255.255.255.0
eth0 IP Address 192.168.1.5
[Inferior 1 (process 18146) exited normally]
(gdb) q

How to use comments in LLVM IR in my pass?

Is it possible to use the comments in IR in my pass? Basically I want to use the IR annotated with basic block frequency, which is written in comments, as shown below, and I need the frequency value in my pass. I know this is naive method, but it will suffice.
define internal void #MDFilter() #0 {
entry:
;;; Basic block executed 2 times. <-- I NEED THIS COMMENT AS A STRING IN MY PASS
%mdContext = alloca %struct.MD5_CTX, align 8
%bytes = alloca i32, align 4
%data = alloca [16 x i8], align 16
call void #MD5Init(%struct.MD5_CTX* %mdContext)
br label %while.cond
;;; Out-edge counts: [2.000000e+00 -> while.cond]
Any other method to obtain this info is also welcome.
No, there is no way to use the comments' contents this way, not without significantly changing the IR parser. However, there's no need to re-invent the wheel; there's a mechanism in LLVM which is intended precisely for these sorts of things - transferring information from the front-end into an LLVM pass - and that is metadata.
So whatever or whoever is adding this information to the IR should add it with metadata instead - see these sources for more information on how to do that:
http://llvm.org/docs/LangRef.html#metadata
http://llvm.org/docs/LangRef.html#named-metadata
Adding Metadata to Instructions in LLVM IR
How to attach metadata to LLVM IR using the C++ API?
How to add a Metadata String to an LLVM module with the C++ API?
If you have no control over the generation of data, then you should add some pre-processing step in which you convert the comments to metadata.
In the end the IR should look something like:
define internal void #MDFilter() #0 {
entry:
%mdContext = alloca %struct.MD5_CTX, align 8, !freq !1
%bytes = alloca i32, align 4
%data = alloca [16 x i8], align 16
call void #MD5Init(%struct.MD5_CTX* %mdContext)
br label %while.cond, !outedge !2
...
!1 = metadata !{i32 2}
!2 = metadata !{float 2.0}
And your pass needs to look for these !freq and !outedge nodes.

llvm get global definition line number

I followed the How to get variable definition line number etc. using dbg metadata? in order to get the line number definition for local variables (allocas), which works fine. But I need the same for globals. So I tried to hack the findDbgGlobalDeclare() method from http://llvm.org/docs/doxygen/html/DbgInfoPrinter_8cpp_source.html#l00062 . However, I have no llvm.dbg.gv in my bytecode, so there is no dbg info to extract. I compile my target code using clang++ -O0 -g -emit-llvm Test.cpp -c -o Test.bc . Some samples from my bytecode:
#r = global i32 3, align 4
%4 = load i32* #r, align 4, !dbg !942
...
%a = alloca i32, align 4
%1 = load i32* %a, align 4, !dbg !939
However, I do have:
!924 = metadata !{i32 786484, i32 0, null, metadata !"r", metadata !"r", metadata !"", metadata !841, i32 19, metadata !56, i32 0, i32 1, i32* #r} ; [ DW_TAG_variable ] [r] [line 19] [def]
with on which !0 is indirectly dependent and there is !llvm.dbg.cu = !{!0} .
Thank you !
Yes, !llvm.dbg.cu is the right place now. Quoting from the source-level debugging document:
Compile unit descriptors provide the root context for objects declared
in a specific compilation unit. File descriptors are defined using
this context. These descriptors are collected by a named metadata
!llvm.dbg.cu. They keep track of subprograms, global variables and
type information.
Specifically, see "Global variable descriptors".
The code you found is to support the older metadata nodes which are still generated by dragonegg so the readers support them for backwards compatibility. New LLVM code generates !llvm.dbg.cu.
The steps are as follows:
1. NamedMDNode *NMD = M->getNamedMetadata("llvm.dbg.cu");
Then get into the metadata nodes chain till the desired global declaration.
2. DIDescriptor DIG(cast<MDNode>(NMD->getOperand(i)));
3. DIDescriptor DIGG(cast<MDNode>(NMD->getOperand(NMD->getNumOperands()-1)));
4. DIDescriptor DIGF(cast<MDNode>(DIGG->getOperand(0)));
5. Value* VV = cast<Value>(DIGF->getOperand(i));
6. DIDescriptor DIGS(cast<MDNode>(VV));
At this point, do:
7. DIGS->getOperand(j)
and check http://llvm.org/docs/SourceLevelDebugging.html#c-c-front-end-specific-debug-information for all the fields you desire.

Debugging/bypassing BSOD without source code

Hello and good day to you.
Need a bit of assitance here:
Situation:
I have an obscure DirectX 9 application (name and application details are irrelevant to the question) that causes blue screen of death on all nvidia cards (GeForce 8400GS and up) since certain driver version. I believe that the problem is indirectly caused by DirectX 9 call or a flag that triggers driver bug.
Goal:
I'd like to track down offending flag/function call (for fun, this isn't my job/homework) and bypass error condition by writing proxy dll. I already have a finished proxy dll that provides wrappers for IDirect3D9, IDirect3DDevice9, IDirect3DVertexBuffer9 and IDirect3DIndexBuffer9 and provides basic logging/tracing of Direct3D calls. However, I can't pinpoint function which causes crash.
Problems:
No source code or technical support is available. There will be no assitance, and nobody else will fix the problem.
Memory dump produced by kernel wasn't helpful - apparently an access violation happens within nv4_disp.dll, but I can't use stacktrace to go to IDirect3DDevice9 method call, plus there's a chance that bug happens asynchronously.
(Main problem) Because of large number of Direct3D9Device method calls, I can't reliably log them into file or over network:
Logging into file causes significant slowdown even without flushing, and because of that all last contents of the log are lost when system BSODs.
Logging over network (using UDP and WINSOck's sendto)also causes significant slowdown and must not be done asynchronously (asynchronous packets are lost on BSOD), plus packets (the ones around the crash) are sometimes lost even when sent synchronously.
When application is "slowed" down by logging routines, BSOD is less likely to happen, which makes tracking it down harder.
Question:
I normally don't write drivers, and don't do this level of debugging, so I have impression that I'm missing something important there's a more trivial way to track down the problem than writing IDirect3DDevice9 proxy dll with custom logging mechanism. What is it? What is the standard way of diagnosing/handling/fixing problem like this (no source code, COM interface method triggers BSOD)?
Minidump analysis(WinDBG):
Loading User Symbols
Loading unloaded module list
...........
Unable to load image nv4_disp.dll, Win32 error 0n2
*** WARNING: Unable to verify timestamp for nv4_disp.dll
*** ERROR: Module load completed but symbols could not be loaded for nv4_disp.dll
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
Use !analyze -v to get detailed debugging information.
BugCheck 1000008E, {c0000005, bd0a2fd0, b0562b40, 0}
Probably caused by : nv4_disp.dll ( nv4_disp+90fd0 )
Followup: MachineOwner
---------
0: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
KERNEL_MODE_EXCEPTION_NOT_HANDLED_M (1000008e)
This is a very common bugcheck. Usually the exception address pinpoints
the driver/function that caused the problem. Always note this address
as well as the link date of the driver/image that contains this address.
Some common problems are exception code 0x80000003. This means a hard
coded breakpoint or assertion was hit, but this system was booted
/NODEBUG. This is not supposed to happen as developers should never have
hardcoded breakpoints in retail code, but ...
If this happens, make sure a debugger gets connected, and the
system is booted /DEBUG. This will let us see why this breakpoint is
happening.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: bd0a2fd0, The address that the exception occurred at
Arg3: b0562b40, Trap Frame
Arg4: 00000000
Debugging Details:
------------------
EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".
FAULTING_IP:
nv4_disp+90fd0
bd0a2fd0 39b8f8000000 cmp dword ptr [eax+0F8h],edi
TRAP_FRAME: b0562b40 -- (.trap 0xffffffffb0562b40)
ErrCode = 00000000
eax=00000808 ebx=e37f8200 ecx=e4ae1c68 edx=e37f8328 esi=e37f8400 edi=00000000
eip=bd0a2fd0 esp=b0562bb4 ebp=e37e09c0 iopl=0 nv up ei pl nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010202
nv4_disp+0x90fd0:
bd0a2fd0 39b8f8000000 cmp dword ptr [eax+0F8h],edi ds:0023:00000900=????????
Resetting default scope
CUSTOMER_CRASH_COUNT: 3
DEFAULT_BUCKET_ID: DRIVER_FAULT
BUGCHECK_STR: 0x8E
LAST_CONTROL_TRANSFER: from bd0a2e33 to bd0a2fd0
STACK_TEXT:
WARNING: Stack unwind information not available. Following frames may be wrong.
b0562bc4 bd0a2e33 e37f8200 e37f8200 e4ae1c68 nv4_disp+0x90fd0
b0562c3c bf8edd6b b0562cfc e2601714 e4ae1c58 nv4_disp+0x90e33
b0562c74 bd009530 b0562cfc bf8ede06 e2601714 win32k!WatchdogDdDestroySurface+0x38
b0562d30 bd00b3a4 e2601008 e4ae1c58 b0562d50 dxg!vDdDisableSurfaceObject+0x294
b0562d54 8054161c e2601008 00000001 0012c518 dxg!DxDdDestroySurface+0x42
b0562d54 7c90e4f4 e2601008 00000001 0012c518 nt!KiFastCallEntry+0xfc
0012c518 00000000 00000000 00000000 00000000 0x7c90e4f4
STACK_COMMAND: kb
FOLLOWUP_IP:
nv4_disp+90fd0
bd0a2fd0 39b8f8000000 cmp dword ptr [eax+0F8h],edi
SYMBOL_STACK_INDEX: 0
SYMBOL_NAME: nv4_disp+90fd0
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: nv4_disp
IMAGE_NAME: nv4_disp.dll
DEBUG_FLR_IMAGE_TIMESTAMP: 4e390d56
FAILURE_BUCKET_ID: 0x8E_nv4_disp+90fd0
BUCKET_ID: 0x8E_nv4_disp+90fd0
Followup: MachineOwner
nv4_disp+90fd0
bd0a2fd0 39b8f8000000 cmp dword ptr [eax+0F8h],edi
This is the important part. Looking at this, it is most probable that eax is invalid, hence attempting to access an invalid memory address.
What you need to do is load nv4_disp.dll into IDA (you can get a free version), check the image base that IDA loads nv4_disp at and hit 'g' to goto address, try adding 90fd0 to the image base IDA is using, and it should take you directly to the offending instruction (depending on section structure).
From here you can analyze the control flow, and how eax is set and used. If you have a good kernel level debugger you can set a breakpoint on this address and try and get it to hit.
Analysing the function, you should attempt to figure out what the function does, what eax is meant to be pointing to at that point, what its actually pointing to, and why. This is the hard part and is a great part of the difficulty and skill of reverse engineering.
Found a solution.
Problem:
Logging is unreliable since messages (when dumped to file) disappear during bsod, packets are sometimes lost when logging over network, and there's slowdown due to logging.
Solution:
Instead of logging to file or over network, configure system to produce full physical memory dump on BSOD and log all messages into any memory buffer. It'll be faster. Once system crashed, it'll dump entire memory into file, and it'll be possible to either view contents of log-file buffer using WinDBG's dt (if you have debug symbols) command, or you'll be able to search and locate logfile stored in memory using "memory" view.
I used circular buffer of std::strings to store messages and separate array of const char* to make things easier to read in WinDBG, but you could simply create huge array of char and store all messages within it in plaintext.
Details:
Entire process on winxp:
Ensure that minimum page file size is equal or larger than total amount of RAM + 1 megabytes. (Right Click "My Computer"->Properties->Advanced->Performance->Advanced->Change)
Configure system to produce complete memory dump on BSOD (RIght click "My Computer'->Properties->Advanced->Startup and Recovery->Settings->Write Debugging Information . Select "Complete memory dump" and specify path you want).
Ensure that disk (where the file will be written) has required amount of free space (total amount of RAM on your system.
Build app/dll (the one that does logging) with debug symbol, and Trigger BSOD.
Wait till memory dump is finished, reboot. Feel free to swear at driver developer while system writes memory dump and reboots.
Copy MEMORY.DMP system produced to a safe place, so you won't lose everything if system crashes again.
Launch windbg.
Open Memory Dump (File->Open Crash Dump).
If you want to see what happened, use !analyze -v command.
Access memory buffer that stores logged messages using one of those methods:
To see contents of global variable, use dt module!variable where "module" is name of your library (without *.dll), and "variable" is name of variable. You can use wildcards. You can use address without module!variable
To see contents of one field of the global variable (if global variable is a struct), use dt module!variable field where "field" is variable member.
To see more details about varaible (content of arrays and substructures) use dt -b module!variable field or dt -b module!variable
If you don't have symbols, you'll need to search for your "logfile" using memory window.
At this point you'll be able to see contents of log that were stored in memory, plus you'll have snapshot of the entire system at the moment when it crashed.
Also...
To see info about process that crashed the system, use !process.
To see loaded modules use lm
For info about thread there's !thread id where id is hexadecimal id you saw in !process output.
It looks like the crash may either be caused by a bad pointer, or heap corruption. You can tell this because the crash occurs in a memory-freeing function (DxDdDestroySurface). Destroying surfaces is something that you absolutely need to do - you can't just stub this out, the surface will still get freed when the program exits, and if you disable it inside the kernel, you'll run out of on-card memory very quickly and crash that way, as well.
You can try to figure out what sequence of events leads up to this heap corruption, but there's no silver bullet here - as fileoffset suggested, you'll need to actually reverse engineer the driver to see why this happens (it may help to compare drivers before and after the offending driver version as well!)