Currently we call MiniDumpWriteDump with the MiniDumpNormal | MiniDumpWithIndirectlyReferencedMemory flags. That works just fine for internal builds in Debug configuration, but isn't giving as much information as we need in Release configuration.
In Release, the minidump data contains enough stack information for the debugger to work out where in the code the failure occurred, but no other data. I don't simply mean local variables are missing due to being optimised out, as you'd expect in a Release build - I mean, there is nothing useful except for the call stack and current code line. No registers, no locals, no globals, no objects pointed to by locals - nothing. We don't even get 'this' which would allow us to view the current object. That was the point of using MiniDumpWithIndirectlyReferencedMemory - it should have included memory referenced by locals and stack variables, but doesn't seem to.
What flags should we be using instead? We don't want to use MiniDumpWithFullMemory and start generating 600MB+ dumps, but would happily expand the dumps somewhat beyond the 90KB we currently get if it meant getting more useful data. Perhaps we should be using MiniDumpWithDataSegments (globals) or...?
WinDbg uses the following flags for a .dump /ma:
0:003> .dumpdebug
----- User Mini Dump Analysis
MINIDUMP_HEADER:
Version A793 (62F0)
NumberOfStreams 13
Flags 41826
0002 MiniDumpWithFullMemory
0004 MiniDumpWithHandleData
0020 MiniDumpWithUnloadedModules
0800 MiniDumpWithFullMemoryInfo
1000 MiniDumpWithThreadInfo
40000 MiniDumpWithTokenInformation
I suggest you replace MiniDumpWithFullMemory by MiniDumpWithIndirectlyReferencedMemory.
Related
BACKGROUND
We have testers for our embedded GUI product and when a tester declares "test failed", sometimes it's hard for us developers to reproduce the exact problem because we don't have the exact trace of what happened.
We do currently have a logging framework but us devs have to manually input those logging statements in the code which is fine . . . except when a hard-to-reproduce bug occurs and we didn't have a log statement at the 'right' location and then when we re-build, re-run the test with the same steps, we get a different result.
ISSUE
We would like a solution wherein the compiler produces extra instrumentation code that allows us to see the exact sequence of events including, at the minimum:
function enter/exit (already provided by -finstrument-functions)
control-flow statement enter i.e. entering if/else, which case statement we jumped to
The log would look like this:
int main() entered
if-line 5 entered
else-line 10 entered
void EventLoop() entered
. . .
Some additional nice-to-haves are
Parameter values on function entry AND exit (for pass-by-reference types)
Function return value
QUESTION
Are there any gcc tools or even paid tools that can do this instrumentation automatically?
You can either use gdb for that, and you can automate that (I've got a tool for that in the works, you find it here or you can try to use gcov.
The gcov implementation is so, that it loads the latest gcov data when you start the program. But: you can manually dump and load the data. If you call __gcov_flush it will dump the current data and reset the current state. However: if you do that several times it will always merge the data, so you'd also need to rename the gcov data file then.
The go test command has support for the -c flag, described as follows:
-c Compile the test binary to pkg.test but do not run it.
(Where pkg is the last element of the package's import path.)
As far as I understand, generating a binary like this is the way to run it interactively using GDB. However, since the test binary is created by combining the source and test files temporarily in some /tmp/ directory, this is what happens when I run list in gdb:
Loading Go Runtime support.
(gdb) list
42 github.com/<username>/<project>/_test/_testmain.go: No such file or directory.
This means I cannot happily inspect the Go source code in GDB like I'm used to. I know it is possible to force the temporary directory to stay by passing the -work flag to the go test command, but then it is still a huge hassle since the binary is not created in that directory and such. I was wondering if anyone found a clean solution to this problem.
Go 1.5 has been released, and there is still no officially sanctioned Go debugger. I haven't had much success using GDB for effectively debugging Go programs or test binaries. However, I have had success using Delve, a non-official debugger that is still undergoing development: https://github.com/derekparker/delve
To run your test code in the debugger, simply install delve:
go get -u github.com/derekparker/delve/cmd/dlv
... and then start the tests in the debugger from within your workspace:
dlv test
From the debugger prompt, you can single-step, set breakpoints, etc.
Give it a whirl!
Unfortunately, this appears to be a known issue that's not going to be fixed. See this discussion:
https://groups.google.com/forum/#!topic/golang-nuts/nIA09gp3eNU
I've seen two solutions to this problem.
1) create a .gdbinit file with a set substitute-path command to
redirect gdb to the actual location of the source. This file could be
generated by the go tool but you'd risk overwriting someone's custom
.gdbinit file and would tie the go tool to gdb which seems like a bad
idea.
2) Replace the source file paths in the executable (which are pointing
to /tmp/...) with the location they reside on disk. This is
straightforward if the real path is shorter then the /tmp/... path.
This would likely require additional support from the compiler /
linker to make this solution more generic.
It spawned this issue on the Go Google Code issue tracker, to which the decision ended up being:
https://code.google.com/p/go/issues/detail?id=2881
This is annoying, but it is the least of many annoying possibilities.
As a rule, the go tool should not be scribbling in the source
directories, which might not even be writable, and it shouldn't be
leaving files elsewhere after it exits. There is next to nothing
interesting in _testmain.go. People testing with gdb can break on
testing.Main instead.
Russ
Status: Unfortunate
So, in short, it sucks, and while you can work around it and GDB a test executable, the development team is unlikely to make it as easy as it could be for you.
I'm still new to the golang game but for what it's worth basic debugging seems to work.
The list command you're trying to work can be used so long as you're already at a breakpoint somewhere in your code. For example:
(gdb) b aws.go:54
Breakpoint 1 at 0x61841: file /Users/mat/gocode/src/github.com/stellar/deliverator/aws/aws.go, line 54.
(gdb) r
Starting program: /Users/mat/gocode/src/github.com/stellar/deliverator/aws/aws.test
[snip: some arnings about BinaryCache]
Breakpoint 1, github.com/stellar/deliverator/aws.imageIsNewer (latest=0xc2081fe2d0, ami=0xc2081fe3c0, ~r2=false)
at /Users/mat/gocode/src/github.com/stellar/deliverator/aws/aws.go:54
54 layout := "2006-01-02T15:04:05.000Z"
(gdb) list
49 func imageIsNewer(latest *ec2.Image, ami *ec2.Image) bool {
50 if latest == nil {
51 return true
52 }
53
54 layout := "2006-01-02T15:04:05.000Z"
55
56 amiCreationTime, amiErr := time.Parse(layout, *ami.CreationDate)
57 if amiErr != nil {
58 panic(amiErr)
This is just after running the following in the aws subdir of my project:
go test -c
gdb aws.test
As an additional caveat, it does seem very selective about where breakpoints can be placed. Seems like it has to be an expression but that conclusion is only via experimentation.
If you're willing to use tools besides GDB, check out godebug. To use it, first install with:
go get github.com/mailgun/godebug
Next, insert a breakpoint somewhere by adding the following statement to your code:
_ = "breakpoint"
Now run your tests with the godebug test command.
godebug test
It supports many of the parameters from the go test command.
-test.bench string
regular expression per path component to select benchmarks to run
-test.benchmem
print memory allocations for benchmarks
-test.benchtime duration
approximate run time for each benchmark (default 1s)
-test.blockprofile string
write a goroutine blocking profile to the named file after execution
-test.blockprofilerate int
if >= 0, calls runtime.SetBlockProfileRate() (default 1)
-test.count n
run tests and benchmarks n times (default 1)
-test.coverprofile string
write a coverage profile to the named file after execution
-test.cpu string
comma-separated list of number of CPUs to use for each test
-test.cpuprofile string
write a cpu profile to the named file during execution
-test.memprofile string
write a memory profile to the named file after execution
-test.memprofilerate int
if >=0, sets runtime.MemProfileRate
-test.outputdir string
directory in which to write profiles
-test.parallel int
maximum test parallelism (default 4)
-test.run string
regular expression to select tests and examples to run
-test.short
run smaller test suite to save time
-test.timeout duration
if positive, sets an aggregate time limit for all tests
-test.trace string
write an execution trace to the named file after execution
-test.v
verbose: print additional output
Using Qt 5.1.1 for Windows 32-bit (with MinGW 4.8), when debugging GDB wants to drop into dissassembly while debugging code after the first time.
I make a "Plain C++" project, insert some simple code:
int x = 5;
cout << x << endl;
return 0;
Build, and debug it with a breakpoint on first line. First time through it debugs just fine stepping through the code with "Step Over". Any debug session after that, it will drop into dissamebly view of ntdll when it hits cout (or anything else library related).
Operate By Instruction is not checked and there is debug information for my code. It works as expected once, then refuses to.
I can delete the build folder and the .pro.user file and the project still exhibits the same behavior after a new build. Even tried wiping my QTProject settings folder. There seems to be no way to debug just my code more than once without it wanting to drop into assembly instead of stepping over statements. If I make a new project, I can debug it normally once, then it starts behaving the same way.
Looking for a fix or suggestions of things to try.
Had a chance to go back...diffed the debugger log on the good initial vs sequential runs. Everything looks similar until I get to this in good run:
=thread-exited,id="2",group-id="i1"
sThread 2 in group i1 exited
~"[Switching to Thread 5588.0x239c]\n"
=thread-selected,id="1"
sThread 1 selected
Bad runs never have that. Later, this is unique to bad run:
>1272^done,threads=[{id="2",target-id="Thread 7148.0x242c",frame=
{level="0",addr="0x7792fd91",func="ntdll!RtlFindSetBits",args=
[],from="C:\\Windows\\system32\\ntdll.dll"},state="stopped"},
//LINES BELOW COMMON TO GOOD+BAD
{id="1",target-id="Thread 7148.0x1bbc",frame=
{level="0",addr="0x00401606",func="main",args=
[],file="..\\untitled8\\main.cpp",fullname=
"C:\\Users\\Andrew\\Desktop\\untitled8\\main.cpp",line="7"},
state="stopped"}],current-thread-id="1"*
Then once it hits the breakpoint, good run shows this:
*stopped,reason="end-stepping-range",frame={addr="0x00401620",func="fu0__ZSt4cout",args[],
file="..\untitled8\main.cpp",
fullname="C:\Users\Andrew\Desktop\untitled8\main.cpp",line="9"},
thread-id="1",stopped-threads="all"
Bad run shows this:
>*stopped,reason="signal-received",signal-name="SIGTRAP",signal-meaning="Trace/breakpoint trap",
frame={addr="0x7792000d",func="ntdll!LdrFindResource_U",args=[],
from="C:\\Windows\\system32\\ntdll.dll"},thread-id="2",stopped-threads="all"
dNOTE: INFERIOR SPONTANEOUS STOP sStopped.
dState changed from InferiorRunOk(11) to InferiorStopOk(14) [master]
dSIGTRAP CONSIDERED HARMLESS. CONTINUING.
sStopped: "signal-received"
>=thread-selected,id="2"
sThread 2 selected
<1283-thread-info
>1283^done,threads=[{id="2",target-id="Thread 7148.0x242c",frame=
{level="0",addr="0x7792000d",func="ntdll!LdrFindResource_U",args=[],
from="C:\\Windows\\system32\\ntdll.dll"},state="stopped"},
{id="1",target-id="Thread 7148.0x1bbc",
frame={level="0",addr="0x756a133d",func="KERNEL32!GetPrivateProfileStructA",
args=[],from="C:\\Windows\\syswow64\\kernel32.dll"},state="stopped"}],current-thread-id="2"
<1284-stack-list-frames 0 20
>1284^done,stack=[frame={level="0",addr="0x7792000d",func="ntdll!LdrFindResource_U",
from="C:\\Windows\\system32\\ntdll.dll"},
frame={level="1",addr="0x779af926",
func="ntdll!RtlQueryTimeZoneInformation",
from="C:\\Windows\\system32\\ntdll.dll"},frame={level="2",addr="0x75f45dd1",func="??"},
frame={level="3",addr="0x00000000",func="??"}]
<1285-stack-select-frame 0
<1286disassemble 0x7791fff9,0x77920071
<1287bb options:fancy,autoderef,dyntype vars: expanded:return,local,watch,inspect typeformats: formats: watchers:
>1285^done
>&"disassemble 0x7791fff9,0x77920071\n"
>~"Dump of assembler code from 0x7791fff9 to 0x77920071:\n"
>~" 0x7791fff9 <ntdll!LdrFindResource_U+60953>:\t"
>&"Cannot access memory at address 0x7791fff9\n"
>1286^error,msg="Cannot access memory at address 0x7791fff9"
sDisassembler failed: Cannot access memory at address 0x7791fff9
Looks like for some reason that extra thread is not exiting when expected and qtcreator/gdb convince themselves there are breakpoints in ntdll that I want to stop at.
We're using Fogbugz for tracking issues and I am in the middle of writing a C++ wrapper around the XML API for Fogbugz.
The best practice seems to be to use the "scout" field so that similar/same crashes are just counted but not reported again. To do that we need a unique string for a particular cause of a crash.
In Win32 - after getting a dmp file or other crash handler what is a good way to make a unique string for a crash? (we're going to create a dmp file and send it to the fogbugz server)
In previous postings/articles/etc Joel has made various suggestions but much of those counted on a language like C# that use reflection and have a lot of information that is either harder to get or not possible to get.
Have any other people gotten things like stack traces or other things to make scout entries in fogbugz?
EDIT
To clarify - we don;t want a unique id for every incident - there are likely crashes that have the same code path. We want to capture that. I was thinking that we would get the last few stack calls that are in our code (not ones from win32 DLLs) - but not sure how to go about doing this.
Reporting every crash as unique is not right. Reporting all crashes under the same case is not right. Different users repeating a scenario that causes a crash should map to the same incident.
EDIT
What I think we want is a general "signature" of a crash - based on what is on the stack. Similar stacks should have the same signature. For example - take the top 5 methods that are in our app and then the first call (if any) we make into an MS DLL. This would probably be sufficient for a signature and would likely correlate the crashes that are "the same".
So how does one get the list of methods on the stack? And how can you tell if they are from your own app or in another DLL?
EDIT - NOTE
We want to create a "bucket id"/signature while in the exception handler so that we can create the minidump and send it to fogbugz as a scout description. Alternatively we can load up the dump on t he next start of the app and send it then with a signature we generate.
Here in my project I use the Address Memory of the Crash as a "Unique" ID.
IMO the best thing you can use will be bucket id from dump analysis. Use properly configured Debugging Tools for Windows (windbg), one can do !analyze -v and classify your dumps into different buckets based on bucket id. Bucket id guaranteed that if two dumps are the same, their bucket id will be the same. That solves part of the puzzle.
Many times two dumps rooted from same problem will create different bucket id's (maybe version difference, say your 1.0 and 1.1 both crash at same point). You can use faulting module and stack signature to correlate bugs from the same point of fault.
There will be certain things that causes very random dumps (e.g. heap corruption, the faulting module is typically the victim). Therefore dump analysis should be considered best-effort. When you can't, you can't.
I used something like this to generate exceptions in my last app (MSVC), so every error would get logged with the sourcefile and line it occured on:
class Error {
//...
public: Error(string file, string line, string error) ;
};
#define ERROR(err) Error(__FILE__, __LINE__, err)
It's probably a little bit late, but I will add my solution here, too, in case it can help other people.
You can do this using fools from "Debugging Tools for Windows", for example windbg.exe or better kd.exe.
Running the command "kd.exe -z "path_to_dump.dmp" -c "kd;q" >> dumpstack.txt, you might get the following result:
Microsoft (R) Windows Debugger Version 10.0.15063.400 X86
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [d:\work\bugs\14122\myexe.exe.2624.dmp]
User Mini Dump File with Full Memory: Only application data is available
************* Symbol Path validation summary **************
Response Time (ms) Location
Deferred srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
Symbol search path is: srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows 10 Version 15063 MP (4 procs) Free x86 compatible
Product: WinNt, suite: SingleUserTS
15063.0.x86fre.rs2_release.170317-1834
Machine Name:
Debug session time: Fri Oct 13 00:09:01.000 2017 (UTC + 1:00)
System Uptime: 0 days 0:18:33.797
Process Uptime: 0 days 0:03:40.000
................................................................
.....................................................
Loading unloaded module list
..............................
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(a40.2580): Security check failure or stack buffer overrun - code c0000409 (first/second chance not available)
eax=00000001 ebx=00000000 ecx=00000007 edx=77cc4350 esi=00000000 edi=00000000
eip=62ae7666 esp=0b75e17c ebp=0b75e1a8 iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
msvcr120!abort+0x28:
62ae7666 cd29 int 29h
0:068> kd: Reading initial command 'kb;q'
ChildEBP RetAddr Args to Child
0b75e178 62addc5f 935dda1f 00000000 00000000 msvcr120!abort+0x28
0b75e1a8 0b75e7d4 62a9b436 0b75e1dc 62a52aa5 msvcr120!terminate+0x33
WARNING: Frame IP not in any known module. Following frames may be wrong.
0b75e1ac 62a9b436 0b75e1dc 62a52aa5 00000000 0xb75e7d4
0b75e1b4 62a52aa5 00000000 62a59740 0b75e7d4 msvcr120!__FrameUnwindToState+0x89
0b75e1c8 62a52b33 00000000 00000000 00000000 msvcr120!_EH4_CallFilterFunc+0x12
0b75e1f4 62a5a0f3 62b1f7b8 62a4f7c6 0b75e324 msvcr120!_except_handler4_common+0x8e
0b75e214 77cd6152 0b75e324 0b75e7c4 0b75e344 msvcr120!_except_handler4+0x1e
0b75e238 77cd6124 0b75e324 0b75e7c4 0b75e344 ntdll!ExecuteHandler2+0x26
0b75e30c 77cc4266 0b75e324 0b75e344 0b75e324 ntdll!ExecuteHandler+0x24
0b75e30c 74cf28f2 0b75e324 0b75e344 0b75e324 ntdll!KiUserExceptionDispatcher+0x26
0b75e684 62a59339 e06d7363 00000001 00000003 KERNELBASE!RaiseException+0x62
0b75e6c4 6001821c 0b75e6e4 6004e1bc 946a8f2a msvcr120!_CxxThrowException+0x5b
0b75e6f8 60018042 0b75e720 946a8efa ffffffff mymodule!FunctionC+0x7c
0b75e730 60016544 946a8ece ffffffff 092889d8 mymodule!FunctionB+0x32
0b75e754 600166b8 00842338 6000588d 00000001 myothermodule!FunctionB+0x44
From this stack, you can create a unique bucket if you take for example only your methods from the stack and concatenate them in a string: "mymodule!FunctionC+0x7c;mymodule!FunctionB+0x32;myothermodule!FunctionB+0x44". In order for this to work, you need to have access to you personal symbols server, either using the environment variable _NT_SYMBOL_PATH or with the -y command line switch.
You can alternatively create a string from the return addresses only (second column): "62addc5f,0b75e7d4,62a9b436,62a52aa5,62a52b33,62a5a0f3,77cd6152,77cd6124,77cc4266,74cf28f2,62a59339,6001821c,60018042,60016544,600166b8"
Just use an MD5 string generated from the dump file and you will likely to get a unique string for every crash.
I would start with collecting the data on how often every function in your code has been "flashed" in a crash report stack trace. Every report would have to be added to some kind of database, and every function would have to be indexed so that you could later query, which functions seem to crash more often than others. (And of course, functions like main() will be in every report, but that's understandable).
Or, you think that only crash reports seem to be the problem, you could just remove all those entries from crash stack traces, and then hash the rest (your functions). That way you could see if any particular call chain of your own functions causes a crash repeatedly, no matter what external functions have been called in between.
Then of course, some of the more complicated problems will not be captured this way anyway, as the stack trace will be completely different. To help that, you could record other data from your application along with the stack trace in every report, like sizes of buffers, counters, states of different parts of the application and so on... And then do some statistics on that.
How do I restore the stack trace function name instead of <UNKNOWN>?
Event Type: Error
Event Source: abcd
Event Category: None
Event ID: 16
Date: 1/3/2010
Time: 10:24:10 PM
User: N/A
Computer: CMS01
Description:
Server.exe caused a in module at 2CA001B:77E4BEF7
Build 6.0.0.334
WorkingSetSize: 1291071488 bytes
EAX=02CAF420 EBX=00402C88 ECX=00000000 EDX=7C82860C ESI=02CAF4A8
EDI=02CAFB68 EBP=02CAF470 ESP=02CAF41C EIP=77E4BEF7 FLG=00000206
CS=2CA001B DS=2CA0023 SS=7C820023 ES=2CA0023 FS=7C82003B GS=2CA0000
2CA001B:77E4BEF7 (0xE06D7363 0x00000001 0x00000003 0x02CAF49C)
2CA001B:006DFAC7 (0x02CAF4B8 0x00807F50 0x00760D50 0x007D951C)
2CA001B:006DFC87 (0x00003561 0x7F6A0F38 0x008E7290 0x00021A6F)
2CA001B:0067E4C3 (0x00003561 0x00000000 0x02CAFBB8 0x02CAFB74)
2CA001B:00674CB2 (0x00003561 0x006EBAC7 0x02CAFB68 0x02CAFA64)
2CA001B:00402CA4 (0x00003560 0x00000000 0x00000000 0x02CAFBB8)
2CA001B:00402B29 (0x00003560 0x00000001 0x02CAFBB8 0x00000000)
2CA001B:00683096 (0x00003560 0x563DDDB6 0x00000000 0x02CAFC18)
2CA001B:00688E32 (0x02CAFC58 0x7C7BE590 0x00000000 0x00000000)
2CA001B:00689F0C (0x02CAFC58 0x7C7BE590 0x00000000 0x00650930)
2CA001B:0042E8EA (0x7F677648 0x7F677648 0x00CAFD6C 0x02CAFD6C)
2CA001B:004100CA (0x563DDB3E 0x00000000 0x00000000 0x008E7290)
2CA001B:0063AC39 (0x7F677648 0x02CAFD94 0x02CAFD88 0x77B5B540)
2CA001B:0064CB51 (0x7F660288 0x563DD9FE 0x00000000 0x00000000)
2CA001B:0063A648 (0x00000000 0x02CAFFEC 0x77E6482F 0x008E7290)
2CA001B:0063A74D (0x008E7290 0x00000000 0x00000000 0x008E7290)
2CA001B:77E6482F (0x0063A740 0x008E7290 0x00000000 0x00000000)
Get the exact same build of the program, objdump it, and have a look at what function is at the address. If symbol names have been stripped from the executable, this might be a little bit difficult.
If the program code is in any way dynamic, you may have to run it into a debugger to find the addresses of functions.
If the program is deliberately obfuscated and nasty, and it randomises its function addresses in some way at runtime (evasive things like viruses, or copy-protection code sometimes do this) then all bets are off.
In general:
The easiest way to find out what caused the crash is to follow the steps necessary to reproduce it in an instance of the application running in a debugger. All other approaches are much more difficult. This is why developers will often not spend time trying to tackle bugs which there are no known methods to reproduce.
You need to have the debug info file (.pdb) next to the exe when the crash occurs.
Then hopefully, your crash dumping code can load it and use the information in it.
Try to run the same built from your IDE, pause it and jump to the adresses in assembly. Then you can switch to source code and see the functions there.
At least that's how it works in Delphi. And I know of this possibility in Visual Studio as well.
If you don't have the built in your IDE, you have to use a Debugger like OllyDbg. Run the exe that caused the errors and pause the application in OllyDbg. Go to the adresses from the stack trace.
Open a similar application project in your IDE, run it and pause it. Search for the same binary pattern you see in OllyDbg and then switch to source code.
The last possibility I know is analysing the map file if you built it during your built.
I use StackWalker - just compile it into the source of your program, and if it crashes you can generate a stack trace at that time, including function names.
The most common cause is that you in fact don't have a module at the specified address. This can happen e.g. when you dereference an uninitialized function pointer, or call a virtual function using an invalid this pointer.
This is probably not the case here: "77E4BEF7" is with very high probability a Windows DLL, and 006DFAC7 an address in one of your modules. You don't need PDB files for this: Windows should always know the module name (.EXE or .DLL) - in fact, it needs first that module name to even find the proper PDB file.
Now, the remaining question is why you don't have the module information. This I can't tell from the information above. You need information about the actual system. For instance, does if have DEP? Is DEP enabled for your process?