How do I debug a core dump that aborted in a dlopen()'ed plugin? - gdb

I have a core dump from a user. The main program loads selected plugins via dlopen. The process aborted in the plugin module. The user provided a backtrace that includes the filename of the plugin, and the function it aborted in.
I need to look at data, such as arguments passed to the function. How do I tell gdb where the plugin was loaded, so it can figure out how to show the source and data?

How do I tell gdb where the plugin was loaded, so it can figure out how to show the source and data?
GDB should do that automatically (the load addresses are contained inside the core).
All you need to do is supply the binaries that match customer's environment exactly. See also this answer.

If the core file is good then it should contain the call stack for the crash. You indicated that the crash occurred in the plugin module and function. By going 'up' the stack, you should be able to see the crash point and the containing function. In general, you should be able to look at the local variables including arguments to the function/method.
In short, just debug it like any other core file. Once the call to dlopen completes successfully, the shared library looks (nearly) the same as others loaded at start up.
If you share the bt, I can give some more definitive pointers.
As Employed Russian noted, you local executable and shared libraries must be bitwise the same as your clients. If the local version is different, it will throw off the mapping that gdb does between the core and the executable. This usually results rubbish but sometimes results in a stack that appears vaguely correct. As a result the programmer spends time chasing false leads. This situation is really aggravating!

Related

trying to figure out the source of memory leak from dlls

One of our application (windows form application C++, MSVS 2010 )crashes after few minutes of usage. Task manager tells that the memory usage grows to 60% of total system RAM in just few seconds of the run.
I used Intel inspector to get any idea of memory leaks. I was expecting I will get a a list of functions that are creating problem. But what I got is only the dlls as can be seen in the following screenshot.
The application is using a couple of third party libraries such as those starting with Pv, OpenCv cdio, CAIO etc. As you can see the last one is an opencv library, and is occupying close to 400MB. (How is this possible ? )
Also the right panel shows different types of leaks which have occurred.
I want to pin point the memory leak code and correct it. What should be my strategy, what functions should I start looking into? Why the inspector is not giving me correct source code and just giving me dlls? I am sure dlls are perfect as these are used by millions of people.
Please advice,
Thanks
Update
I think I have done something wrong in various compiler setting while generating the exe. .
As can be seen above, no symbol information is loaded. That was the reason I was unable to get the source code where memory leaks were happening. Pressing F1 reaches me to the following instructions:
Troubleshooting No Symbolic Information Symptoms
In the Sources window, the Intel Inspector displays no source code for any code locations in the problem set.
Details Intel Inspector cannot display source code for viewing and editing unless the application has symbols available.
If the Intel Inspector detects no symbols for a location, it sets the call stack and code pane to the first location where it can find symbols.
If the Intel Inspector cannot find any location in the call stack with symbols, it provides the module name and relative virtual address (RVA) for the location.
Possible Correction Strategies
1- When you compile and link an application on Windows* systems:
a) Enable the debug information compiler option.
b) Set the linker option to generate debug information.
2- Configure the project to search non-standard directories.
3- Copy the symbol files, such as *.pdb files, to a location where the Intel Inspector can find them.
So now I am focusing on the above correction strategies. My latest question are:
1- how do I set the above three strategies in MSVS 2010.
2- Do I need to use debgug exe or a release exe while using Intel inspector ?
If this is your source code, and you are sure your code is causing the leaks, you can use Visual Leak Detector.
You just need very minimal changes in project - I would say just #include<vld.h> (which you can make conditional). It will report all memory leaks on Debug Output window. This differs from VC++ standard leaking staticitics - it shows where memory was allocated.
Probally it couldn't load symbols for some module / modules and thus the information is a bit incorrect. Is symbol file (like opencv_core240.pdb) opencv_core240.dll available? Check it!
Also I would suggest to try another memory leak detectors to compare their results to each other.
In general when using Inspector the recommendation is to use a debug build of the code. Release builds may optimize away some important pieces of code.
You can also enable just debugging symbols in a release build, which is important when using Amplifier and Advisor. You can do this by going to Project -> [project name] properties... -> Linker -> Debugging -> Generate Debug Info -> Yes and Project->[project name] properties...->C/C++->General->Debug Information Format -> Program Database. Even if you are in debug configuration make sure these settings are set correctly, because they may have been accidentally modified.
WRT what you are seeing in the report:
OpenCV (and others) isn't occupying 350MB, rather there has been a leak of that size (which means the last pointer to dynamically allocated memory was overwritten without releasing that memory). Is it possible that you're misusing the library APIs?
Also, you may find it useful to look at the call stack at the location of the leak. You will possibly find the API leading to the memory leak which can help you pinpoint the issue.

Using MAP file VS2010 MFC

I've developed a program by a customer who's experiencing when he do a certain operation. This isn't happening always on the same place and on the same data and, moreover, it is not happening nor in my local developing machine nor in my test Virtual Machine (which is free of all developing equipment).
Given these conditions, I've decided to compile with MAP (enabled in Configuring Properties-> Linker->Debugger with option /MAP) to see which function is causing crash.
If I've correctly understood, when the program crash I've to check down the offset error and then, search in my MAP under the column RVA+BASE:
Address Publics by Value Rva+Base Lib:Object
0001:00037af0 ?PersonalizzaPlancia#CDlgGestioneDatiProgetto#MosaicoDialogs##IAEXXZ 00438af0 f DlgGestioneDatiProgetto.obj
0001:00038000 ?SalvaTemporanei#CDlgGestioneDatiProgetto#MosaicoDialogs##IAEXXZ 00439000 f DlgGestioneDatiProgetto.obj
Actually, my crash happens at offset: 00038C90 So I should think that it's somewhere in the method:
MosaicoDialogs::CDlgGestioneDatiProgetto::PersonalizzaPlancia
but this is not absolutely possible, so assuming that the computer can't be wrong, I'm the one who's doing it bad.
Can someone explain me how to read MAP in correct way?
don't bother - instead, build the project with symbols enabled and strip them into a pdb file.
Modify the program a little, to write a minidump when it crashes using a unhandled exception handler
Give the newly compiled program to the customer, and when it crashes call MiniDumpWriteDump.
Ask the customer to send this .dmp file to you, and you then simply load it up in Visual Studio (or WinDbg) and it will match up the symbols to the program, and will also match up the code. You should be able to see the exact line of code and some of the variables involved. (if using VS, when you load the .dmp file, top right corner will be an option to "start debugging" click that as it will 'start debugging' at the point of the crash)
Try it first locally - put a div by zero error somewhere in your program and see if you can debug the dump after its been run. Note that you must keep the exact same symbol file for each build of your program - they match exactly. You cannot expect a symbol file for one build to match another build, even if nothing changed.
There are tutorials for this kind of thing, such as this one from CodeProject that looks like it describes what you need.
Reading of MAP files to find out crash location is explained nicely in this code project article.
http://www.codeproject.com/Articles/3472/Finding-crash-information-using-the-MAP-file
Hope helps.
For postmortem debugging, there's an alternative that would not required the use of a map file. Rather, it would require you to create a simple registry script to enable some WER (Windows Error Reporting) flags to trap the crash dump file. First, build your application with debug symbols. Then, follow the instructions for Collecting User-Mode Dumps. Basically, you create a sub key under the "LocalDumps" key. This sub key must be the name of your application, for example, "myapplication.exe". Then, create the "DumpCount", "DumpType", and "DumpFolder" keys/values. Have the user run the registry script. This will enable trapping the dump locally. Then, have the user force the crash to collect the dump file. The user can then send the dump file to you to debug using the symbols you created earlier. Lastly, you'll need to create a registry script that removes the keys/values you added to the registry.

get c++ function name from !DllRegisterServer+0x3ebfa notation to solve 'endless wait in critical section' puzzle

I'm new in debugging with symbols (when no access to the testing machine is possible).
I already provided the client with Debug build with .pdb file but for some reason the dump file I get contains no entries specific to my .dll (although the customer insists the problem occurs there, in particular, the app hangs). The debug build was made with VC++ 2008 x86 (I also tried older VC++ 6.0 with no difference).
The stack trace customer provides looks like
ChildEBP RetAddr
01ece854 773f8e44 ntdll!NtWaitForSingleObject+0x15
01ece8b8 773f8d28 ntdll!RtlpWaitOnCriticalSection+0x13e
01ece8e0 02a92003 ntdll!RtlEnterCriticalSection+0x150
WARNING: Stack unwind information not available. Following frames may be wrong.
01ece8f0 02a8b4fa MyDllName!DllRegisterServer+0xbd2dc
01ece920 02a8b49e MyDllName!DllRegisterServer+0xb67d3
01ece930 02a8746c MyDllName!DllRegisterServer+0xb6777
01ece93c 029dc5ca MyDllName!DllRegisterServer+0xb2745
01ece99c 02a819e4 MyDllName!DllRegisterServer+0x78a3
01ecea80 02a09776 MyDllName!DllRegisterServer+0xaccbd
01eceb00 02a32506 MyDllName!DllRegisterServer+0x34a4f
01eceb58 029f44bf MyDllName!DllRegisterServer+0x5d7df
01ececdc 029f5e20 MyDllName!DllRegisterServer+0x1f798
01eceda0 029f76da MyDllName!DllRegisterServer+0x210f9
01ecedf4 291fe0ce MyDllName!DllRegisterServer+0x229b3
01ecee98 29365243 ClientAppName!Class.Method2+0x262
01eceeb8 293378d9 ClientAppName!Class.Method1+0x37
But I'm not sure what all of this exactly means. Does DllRegisterServer+0x229b3 mean "function which has address +229b3 to the address of DllRegisterServer in map file"?
In map file, I have something like
0002:0006d720 _DllRegisterServer#0 10137720 f DllName.obj
But when I sum 229b3 and 6d720, I don't have any match in the map file for the resulting value.
And why the stack trace shows DllRegisterServer as an address base? It's not the first address in the map file. There are many function before it, should they have negative offset then (seems meaningless)?
I guess I understand reading debugging things wrong, but can't figure out what exactly is wrong..
If I could find out the function names, this would let me move further.
Things get even more complicated as I don't think my .DLL has no critical sections but the customer insists it's my dll which causes entering a critical section and never getting out. For now, I don't yet know how to prove him wrong (or maybe find out that it's indeed my lib which somehow, indirectly, does this, maybe, Windows sockets or DNS resolve name to an IP address somewhere behind the scenes are using critical sections).
This recent blog post by Raymond Chen is exactly the answer you're looking for: Restoring symbols to a stack trace originally generated without symbols.
For some reason, the debugger (or whatever is producing that stack trace) is failing to find the debug symbols for your module, so it's doing the best it can with only the DLL's export table. To paraphrase Raymond:
Ugh. A stack trace taken without working symbols. (There's no way
that DllRegisterServer is a deeply recursive 750 KB function. Just by
casual inspection, you know that the symbols are wrong.)
To see how to fix this, you just have to understand what the debugger
does when it has no symbols to work from: It uses the symbols from the
exported function table. For every address it wants to resolve, it
looks for the nearest exported function whose address is less than or
equal to the target value.
Assuming you have the correct, matching symbols file (.pdb) for the version of the DLL that generated the stack trace, you can trick the debugger into loading the DLL as if it were a process dump, and then you can load the symbols for it:
C:> ntsd -z MyDllName.dll
NTSD is the Microsoft NT Symbolic Debugger, which is installed by default on all modern Windows versions. You can also use WinDbg, but I'm not sure if there's a way to use Visual Studio with this technique.
Once you've got the DLL loaded into the debugger with symbols, you can then let the debugger do the heavy lifting to decode the stack trace. See the blog post for more detailed examples of that.
I'd guess that your DLL is badly behaved and you're deadlocking on the loader lock.
See "Another reason not to do anything scary in your DllMain: Inadvertent deadlock" here

How to extract debugging information from a crash

If my C++ app crashes on Windows I want to send useful debugging information to our server.
On Linux I would use the GNU backtrace() function - is there an equivalent for Windows?
Is there a way to extract useful debugging information after a program has crashed? Or only from within the process?
(Advice along the lines of "test you app so it doesn't crash" is not helpful! - all non-trivial programs will have bugs)
The function Stackwalk64 can be used to snap a stack trace on Windows.
If you intend to use this function, you should be sure to compile your code with FPO disabled - without symbols, StackWalk64 won't be able to properly walk FPO'd frames.
You can get some code running in process at the time of the crash via a top-level __try/__except block by calling SetUnhandledExceptionFilter. This is a bit unreliable since it requires you to have code running inside a crashed process.
Alternatively, you can just the built-in Windows Error Reporting to collect crash data. This is more reliable, since it doesn't require you to add code running inside the compromised, crashed process. The only cost is to get a code-signing certificate, since you must submit a signed binary to the service. https://sysdev.microsoft.com/en-US/Hardware/signup/ has more details.
You can use the Windows API call MiniDumpWriteDump if you wish to roll your own code. Both Windows XP and Vist automate this process and you can sign up at https://winqual.microsoft.com to gain access to the error reports.
Also check out http://kb.mozillazine.org/Breakpad and http://www.codeproject.com/KB/debug/crash_report.aspx for other solutions.
This website provides quite a detailed overview of stack retrieval on Win32 after a C++ exception:
http://www.eptacom.net/pubblicazioni/pub_eng/except.html
Of course, this will only work from within the process, so if the process gets terminated or crashes to the point where it terminates before that code is run, it won't work.
Generate a minidump file. You can then load it up in windbg or Visual Studio and inspect the entire stack where the crash occurred.
Here's a good place to start reading.
Its quite simple to dump the current stackframe addresses into a log file. All you have to do is get such a function called on program faults (i.e. a interrupt handler in Windows) or asserts. This can be done at released versions as well. The log file then can be matched with a map file resulting in a call stack with function names.
I published a article about this some years ago.
See http://www.ddj.com/architect/185300443
Let me describe how I handle crashes in my C++/WTL application.
First, in the main function, I call _set_se_translator, and pass in a function that will throw a C++ exception instead of using structured windows exceptions. This function gets an error code, for which you can get a Windows error message via FormatMessage, and a PEXCEPTION_POINTERS argument, which you can use to write a minidump (code here). You can also check the exception code for certain "meltdown" errors that you should just bail from, like EXCEPTION_NONCONTINUABLE_EXCEPTION or EXCEPTION_STACK_OVERFLOW :) (If it's recoverable, I prompt the user to email me this minidump file.)
The minidump file itself can be opened in Visual Studio like a normal project, and providing you've created a .pdb file for your executable, you can run the project and it'll jump to the exact location of the crash, together with the call stack and registers, which can be examined from the debugger.
If you want to grab a callstack (plus other good info) for a runtime crash, on a release build even on site, then you need to set up Dr Watson (run DrWtsn32.exe). If you check the 'generate crash dumps' option, when an app crashes, it'll write a mini dump file to the path specified (called user.dmp).
You can take this, combine it with the symbols you created when you built your server (set this in your compiler/linker to generate pdb files - keep these safe at home, you use them to match the dump so they can work out the source where the crash occurred)
Get yourself windbg, open it and use the menu option to 'load crash dump'. Once it's loaded everything you can type '~#kp' to get a callstack for every thread (or click the button at the top for the current thread).
There's good articles to know how to do this all over the web, This one is my favourite, and you'll want to read this to get an understanding of how to helpyourself manage the symbols really easily.
You will have to set up a dump generation framework in your application, here is how you may do it.
You may then upload the dump file to the server for further analysis using dump analyzers like windbg.
You may want to use adplus to capture the crash callstack.
You can download and install Debugging tools for Windows.
Usage of adplus is mentioned here:
Adplus usage
This creates the complete crash or hang dump. Once you have the dump, Windbg comes to the rescue. Map the correct pdbs and symbols and you are all set to analyze the dump. To start with use the command "!analyze -v"

How to get a stack trace when C++ program crashes? (using msvc8/2005)

Sometimes my c++ program crashes in debug mode, and what I got is a message box saying that an assertion failed in some of the internal memory management routines (accessing unallocated memory etc.). But I don't know where that was called from, because I didn't get any stack trace. How do I get a stack trace or at least see where it fails in my code (instead of library/ built-in routines)?
If you have a crash, you can get information about where the crash happened whether you have a debug or a release build. And you can see the call stack even if you are on a computer that does not have the source code.
To do this you need to use the PDB file that was built with your EXE. Put the PDB file inside the same directory as the EXE that crashed. Note: Even if you have the same source code, building twice and using the first EXE and the second PDB won't work. You need to use the exact PDB that was built with your EXE.
Then attach a debugger to the process that crashed. Example: windbg or VS.
Then simply checkout your call stack, while also having your threads window open. You will have to select the thread that crashed and check on the callstack for that thread. Each thread has a different call stack.
If you already have your VS debugger attached, it will automatically go to the source code that is causing the crash for you.
If the crash is happening inside a library you are using that you don't have the PDB for. There is nothing you can do.
If you run the debug version on a machine with VS, it should offer to bring it up and let you see the stack trace.
The problem is that the real problem is not on the call stack any more. If you free a pointer twice, that can result in this problem somewhere else unrelated to the program (the next time anything accesses the heap datastructures)
I wrote this blog on some tips for getting the problem to show up in the call stack so you can figure out what is going on.
http://www.atalasoft.com/cs/blogs/loufranco/archive/2007/02/06/6-_2200_Pointers_2200_-on-Debugging-Unmanaged-Code.aspx
The best tip is to use the gflags utility to make pointer issues cause immediate problems.
You can trigger a mini-dump by setting a handler for uncaught exceptions. Here's an article that explains all about minidumps
Google actually implemented their own open source crash handler called BreakPad, which also mozilla use I think (that's if you want something more serious - a rich and robust crash handler).
If I remember correctly that message box should have a button which says 'retry'. This should then break the program (in the debugger) at the point where the assertion happened.
CrashFinder can help you locate the place of the exception given the DLL and the address of the exception reported.
You can take this code and integrate it into your application to have a stack trage automatically generated when there is an uncaught exception. This is generally performed using __try{} __except{} or with a call to SetUnhandledExceptionFilter which allows you to specify a callback to all unhandled exceptions.
You can also have a post-mortem debugger installed on the client system. This is a decent, general way to get information when you do not have dump creation built into your application (maybe for an older version for which you must still get information).
Dr. Watson on Windows can be installed by running: drwtsn32 -i Running drwtsn32 (without any options) will bring up the configuration dialog. This will allow the creation of crash dump files, which you can later analyze with WinDbg or something similar.
You can use Poppy for this. You just sprinkle some macros across your code and it will gather the stack trace, together with the actual parameter values, local variables, loop counters, etc. It is very lightweight so it can be left in the release build to gather this information from crashes on end-user machines