get c++ function name from !DllRegisterServer+0x3ebfa notation to solve 'endless wait in critical section' puzzle - c++

I'm new in debugging with symbols (when no access to the testing machine is possible).
I already provided the client with Debug build with .pdb file but for some reason the dump file I get contains no entries specific to my .dll (although the customer insists the problem occurs there, in particular, the app hangs). The debug build was made with VC++ 2008 x86 (I also tried older VC++ 6.0 with no difference).
The stack trace customer provides looks like
ChildEBP RetAddr
01ece854 773f8e44 ntdll!NtWaitForSingleObject+0x15
01ece8b8 773f8d28 ntdll!RtlpWaitOnCriticalSection+0x13e
01ece8e0 02a92003 ntdll!RtlEnterCriticalSection+0x150
WARNING: Stack unwind information not available. Following frames may be wrong.
01ece8f0 02a8b4fa MyDllName!DllRegisterServer+0xbd2dc
01ece920 02a8b49e MyDllName!DllRegisterServer+0xb67d3
01ece930 02a8746c MyDllName!DllRegisterServer+0xb6777
01ece93c 029dc5ca MyDllName!DllRegisterServer+0xb2745
01ece99c 02a819e4 MyDllName!DllRegisterServer+0x78a3
01ecea80 02a09776 MyDllName!DllRegisterServer+0xaccbd
01eceb00 02a32506 MyDllName!DllRegisterServer+0x34a4f
01eceb58 029f44bf MyDllName!DllRegisterServer+0x5d7df
01ececdc 029f5e20 MyDllName!DllRegisterServer+0x1f798
01eceda0 029f76da MyDllName!DllRegisterServer+0x210f9
01ecedf4 291fe0ce MyDllName!DllRegisterServer+0x229b3
01ecee98 29365243 ClientAppName!Class.Method2+0x262
01eceeb8 293378d9 ClientAppName!Class.Method1+0x37
But I'm not sure what all of this exactly means. Does DllRegisterServer+0x229b3 mean "function which has address +229b3 to the address of DllRegisterServer in map file"?
In map file, I have something like
0002:0006d720 _DllRegisterServer#0 10137720 f DllName.obj
But when I sum 229b3 and 6d720, I don't have any match in the map file for the resulting value.
And why the stack trace shows DllRegisterServer as an address base? It's not the first address in the map file. There are many function before it, should they have negative offset then (seems meaningless)?
I guess I understand reading debugging things wrong, but can't figure out what exactly is wrong..
If I could find out the function names, this would let me move further.
Things get even more complicated as I don't think my .DLL has no critical sections but the customer insists it's my dll which causes entering a critical section and never getting out. For now, I don't yet know how to prove him wrong (or maybe find out that it's indeed my lib which somehow, indirectly, does this, maybe, Windows sockets or DNS resolve name to an IP address somewhere behind the scenes are using critical sections).

This recent blog post by Raymond Chen is exactly the answer you're looking for: Restoring symbols to a stack trace originally generated without symbols.
For some reason, the debugger (or whatever is producing that stack trace) is failing to find the debug symbols for your module, so it's doing the best it can with only the DLL's export table. To paraphrase Raymond:
Ugh. A stack trace taken without working symbols. (There's no way
that DllRegisterServer is a deeply recursive 750 KB function. Just by
casual inspection, you know that the symbols are wrong.)
To see how to fix this, you just have to understand what the debugger
does when it has no symbols to work from: It uses the symbols from the
exported function table. For every address it wants to resolve, it
looks for the nearest exported function whose address is less than or
equal to the target value.
Assuming you have the correct, matching symbols file (.pdb) for the version of the DLL that generated the stack trace, you can trick the debugger into loading the DLL as if it were a process dump, and then you can load the symbols for it:
C:> ntsd -z MyDllName.dll
NTSD is the Microsoft NT Symbolic Debugger, which is installed by default on all modern Windows versions. You can also use WinDbg, but I'm not sure if there's a way to use Visual Studio with this technique.
Once you've got the DLL loaded into the debugger with symbols, you can then let the debugger do the heavy lifting to decode the stack trace. See the blog post for more detailed examples of that.

I'd guess that your DLL is badly behaved and you're deadlocking on the loader lock.
See "Another reason not to do anything scary in your DllMain: Inadvertent deadlock" here

Related

Difficulties with using gdb and u-boot efi payload

I've been sinking days into getting this working with no results. I want to use gdb to debug u-boot on qemu. I am using the 64 bit efi payload of u-boot (in which u-boot is placed in the payload of an efi stub application), and a x86-64 qemu with ovmf firmware. I have two elf files for the debug symbols: u-boot and u-boot-payload. It seems that u-boot contains the symbols for u-boot itself, while u-boot-payload contains symbols for the stub application.
I have been following several guides on how to do this:
https://www.haiku-os.org/blog/kallisti5/2021-01-17_debugging_riscv-64_bootloader_in_qemu/
https://wiki.osdev.org/Debugging_UEFI_applications_with_GDB
My problem is that gdb is never able to hit the breakpoints. I believe the problem is that I can't find the offset I need to load the symbols to. All of these guides mention relocation, and I have tried using the same formula they suggest (<MAXMEM> - SYS_MONITOR_LEN), with no luck. I've tried the addresses that u-boot outputs:
SYS_TEXT = 01110000
UBOOT = 7e36f200
SIZE = 0005cdcf
But none of these worked. I've tried breaking in functions in the stub and u-boot itself, but nothing ever works. I've been going at this for so long its honestly hard for me to recall everything I've tried accurately, but I've run out of ideas.
Is there some underlying assumption I've made that's tripping me up, or is it really just that I'm using the wrong offset? I've read that u-boot relocates, but what I don't understand is what address it relocates to and when. Does the stub relocate, or does relocation happen before the stub runs? These are all things I haven't found any clarification on.
Anything feedback helps, it's hard for me to ask a more specific question because I'm not sure what I'm doing wrong.
Figured it out! The required argument for add-symbol-file comes from common/board_f.c after the setup_reloc function runs, the value of gd->relocaddr will contain the value you need for add-symbol-file.
The other problem I was having is that you need to use hbreak instead of break in gdb. This is a "hardware assisted" breakpoint, and for some reason all of the guides online I saw fail to mention this. I hope someone else finds this useful

How do I debug a core dump that aborted in a dlopen()'ed plugin?

I have a core dump from a user. The main program loads selected plugins via dlopen. The process aborted in the plugin module. The user provided a backtrace that includes the filename of the plugin, and the function it aborted in.
I need to look at data, such as arguments passed to the function. How do I tell gdb where the plugin was loaded, so it can figure out how to show the source and data?
How do I tell gdb where the plugin was loaded, so it can figure out how to show the source and data?
GDB should do that automatically (the load addresses are contained inside the core).
All you need to do is supply the binaries that match customer's environment exactly. See also this answer.
If the core file is good then it should contain the call stack for the crash. You indicated that the crash occurred in the plugin module and function. By going 'up' the stack, you should be able to see the crash point and the containing function. In general, you should be able to look at the local variables including arguments to the function/method.
In short, just debug it like any other core file. Once the call to dlopen completes successfully, the shared library looks (nearly) the same as others loaded at start up.
If you share the bt, I can give some more definitive pointers.
As Employed Russian noted, you local executable and shared libraries must be bitwise the same as your clients. If the local version is different, it will throw off the mapping that gdb does between the core and the executable. This usually results rubbish but sometimes results in a stack that appears vaguely correct. As a result the programmer spends time chasing false leads. This situation is really aggravating!

Using MAP file VS2010 MFC

I've developed a program by a customer who's experiencing when he do a certain operation. This isn't happening always on the same place and on the same data and, moreover, it is not happening nor in my local developing machine nor in my test Virtual Machine (which is free of all developing equipment).
Given these conditions, I've decided to compile with MAP (enabled in Configuring Properties-> Linker->Debugger with option /MAP) to see which function is causing crash.
If I've correctly understood, when the program crash I've to check down the offset error and then, search in my MAP under the column RVA+BASE:
Address Publics by Value Rva+Base Lib:Object
0001:00037af0 ?PersonalizzaPlancia#CDlgGestioneDatiProgetto#MosaicoDialogs##IAEXXZ 00438af0 f DlgGestioneDatiProgetto.obj
0001:00038000 ?SalvaTemporanei#CDlgGestioneDatiProgetto#MosaicoDialogs##IAEXXZ 00439000 f DlgGestioneDatiProgetto.obj
Actually, my crash happens at offset: 00038C90 So I should think that it's somewhere in the method:
MosaicoDialogs::CDlgGestioneDatiProgetto::PersonalizzaPlancia
but this is not absolutely possible, so assuming that the computer can't be wrong, I'm the one who's doing it bad.
Can someone explain me how to read MAP in correct way?
don't bother - instead, build the project with symbols enabled and strip them into a pdb file.
Modify the program a little, to write a minidump when it crashes using a unhandled exception handler
Give the newly compiled program to the customer, and when it crashes call MiniDumpWriteDump.
Ask the customer to send this .dmp file to you, and you then simply load it up in Visual Studio (or WinDbg) and it will match up the symbols to the program, and will also match up the code. You should be able to see the exact line of code and some of the variables involved. (if using VS, when you load the .dmp file, top right corner will be an option to "start debugging" click that as it will 'start debugging' at the point of the crash)
Try it first locally - put a div by zero error somewhere in your program and see if you can debug the dump after its been run. Note that you must keep the exact same symbol file for each build of your program - they match exactly. You cannot expect a symbol file for one build to match another build, even if nothing changed.
There are tutorials for this kind of thing, such as this one from CodeProject that looks like it describes what you need.
Reading of MAP files to find out crash location is explained nicely in this code project article.
http://www.codeproject.com/Articles/3472/Finding-crash-information-using-the-MAP-file
Hope helps.
For postmortem debugging, there's an alternative that would not required the use of a map file. Rather, it would require you to create a simple registry script to enable some WER (Windows Error Reporting) flags to trap the crash dump file. First, build your application with debug symbols. Then, follow the instructions for Collecting User-Mode Dumps. Basically, you create a sub key under the "LocalDumps" key. This sub key must be the name of your application, for example, "myapplication.exe". Then, create the "DumpCount", "DumpType", and "DumpFolder" keys/values. Have the user run the registry script. This will enable trapping the dump locally. Then, have the user force the crash to collect the dump file. The user can then send the dump file to you to debug using the symbols you created earlier. Lastly, you'll need to create a registry script that removes the keys/values you added to the registry.

Visual Studio - Call stack does not trace back to user function

Ran into some access violation in visual studio 2010 and here's the callstack:
Most of the call stack are assembly code in the dll(almost illegible to me). I want to trace back to the line in my code which caused the violation, but it seems there's no user function in the call stack.
How can I find the line in my function causing the violation ? Do I need to adjust some settings ?
Getting a reliable stack trace out of optimized C or C++ code is difficult. The optimizer chooses speed over diagnosability. The debugger needs PDB files for such code to know how to interpret the stack frames correctly and find the return address to the calling method.
Clearly you don't have these PDBs, you are getting the raw addresses from the operating system DLLs instead of their function names. Getting those PDBs is pretty simple, Microsoft has a public server that does nothing but deliver those PDBs for any released version of Windows, including service packs and security updates.
Telling the debugger about that server is required, the feature is off by default. It is particularly easy for VS2010, the server name is preprogrammed in the dialog, you only have to turn it on. Tools + Options, Debugging, Symbols, tick the checkbox in front of "Microsoft Symbol Servers". Set the cache directory, any writable directory will do.
Start debugging again, it will take a while at first to cache the PDBs. When it is done, you'll see a greatly improved stack trace. Accurate and with function names for the Windows DLLs.

How to extract debugging information from a crash

If my C++ app crashes on Windows I want to send useful debugging information to our server.
On Linux I would use the GNU backtrace() function - is there an equivalent for Windows?
Is there a way to extract useful debugging information after a program has crashed? Or only from within the process?
(Advice along the lines of "test you app so it doesn't crash" is not helpful! - all non-trivial programs will have bugs)
The function Stackwalk64 can be used to snap a stack trace on Windows.
If you intend to use this function, you should be sure to compile your code with FPO disabled - without symbols, StackWalk64 won't be able to properly walk FPO'd frames.
You can get some code running in process at the time of the crash via a top-level __try/__except block by calling SetUnhandledExceptionFilter. This is a bit unreliable since it requires you to have code running inside a crashed process.
Alternatively, you can just the built-in Windows Error Reporting to collect crash data. This is more reliable, since it doesn't require you to add code running inside the compromised, crashed process. The only cost is to get a code-signing certificate, since you must submit a signed binary to the service. https://sysdev.microsoft.com/en-US/Hardware/signup/ has more details.
You can use the Windows API call MiniDumpWriteDump if you wish to roll your own code. Both Windows XP and Vist automate this process and you can sign up at https://winqual.microsoft.com to gain access to the error reports.
Also check out http://kb.mozillazine.org/Breakpad and http://www.codeproject.com/KB/debug/crash_report.aspx for other solutions.
This website provides quite a detailed overview of stack retrieval on Win32 after a C++ exception:
http://www.eptacom.net/pubblicazioni/pub_eng/except.html
Of course, this will only work from within the process, so if the process gets terminated or crashes to the point where it terminates before that code is run, it won't work.
Generate a minidump file. You can then load it up in windbg or Visual Studio and inspect the entire stack where the crash occurred.
Here's a good place to start reading.
Its quite simple to dump the current stackframe addresses into a log file. All you have to do is get such a function called on program faults (i.e. a interrupt handler in Windows) or asserts. This can be done at released versions as well. The log file then can be matched with a map file resulting in a call stack with function names.
I published a article about this some years ago.
See http://www.ddj.com/architect/185300443
Let me describe how I handle crashes in my C++/WTL application.
First, in the main function, I call _set_se_translator, and pass in a function that will throw a C++ exception instead of using structured windows exceptions. This function gets an error code, for which you can get a Windows error message via FormatMessage, and a PEXCEPTION_POINTERS argument, which you can use to write a minidump (code here). You can also check the exception code for certain "meltdown" errors that you should just bail from, like EXCEPTION_NONCONTINUABLE_EXCEPTION or EXCEPTION_STACK_OVERFLOW :) (If it's recoverable, I prompt the user to email me this minidump file.)
The minidump file itself can be opened in Visual Studio like a normal project, and providing you've created a .pdb file for your executable, you can run the project and it'll jump to the exact location of the crash, together with the call stack and registers, which can be examined from the debugger.
If you want to grab a callstack (plus other good info) for a runtime crash, on a release build even on site, then you need to set up Dr Watson (run DrWtsn32.exe). If you check the 'generate crash dumps' option, when an app crashes, it'll write a mini dump file to the path specified (called user.dmp).
You can take this, combine it with the symbols you created when you built your server (set this in your compiler/linker to generate pdb files - keep these safe at home, you use them to match the dump so they can work out the source where the crash occurred)
Get yourself windbg, open it and use the menu option to 'load crash dump'. Once it's loaded everything you can type '~#kp' to get a callstack for every thread (or click the button at the top for the current thread).
There's good articles to know how to do this all over the web, This one is my favourite, and you'll want to read this to get an understanding of how to helpyourself manage the symbols really easily.
You will have to set up a dump generation framework in your application, here is how you may do it.
You may then upload the dump file to the server for further analysis using dump analyzers like windbg.
You may want to use adplus to capture the crash callstack.
You can download and install Debugging tools for Windows.
Usage of adplus is mentioned here:
Adplus usage
This creates the complete crash or hang dump. Once you have the dump, Windbg comes to the rescue. Map the correct pdbs and symbols and you are all set to analyze the dump. To start with use the command "!analyze -v"